Machado, Helena; Silva, Susana
The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of 'solidarity', traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system.
Machado, Helena; Silva, Susana
The ethical aspects of biobanks and forensic DNA databases are often treated as separate issues. As a reflection of this, public participation, or the involvement of citizens in genetic databases, has been approached differently in the fields of forensics and medicine. This paper aims to cross the boundaries between medicine and forensics by exploring the flows between the ethical issues presented in the two domains and the subsequent conceptualisation of public trust and legitimisation. We propose to introduce the concept of ‘solidarity’, traditionally applied only to medical and research biobanks, into a consideration of public engagement in medicine and forensics. Inclusion of a solidarity-based framework, in both medical biobanks and forensic DNA databases, raises new questions that should be included in the ethical debate, in relation to both health services/medical research and activities associated with the criminal justice system. PMID:26139851
Zieger, Martin; Utz, Silvia
During the last decade, DNA profiling and the use of DNA databases have become two of the most employed instruments of police investigations. This very rapid establishment of forensic genetics is yet far from being complete. In the last few years novel types of analyses have been presented to describe phenotypically a possible perpetrator. We conducted the present study among German speaking Swiss residents for two main reasons: firstly, we aimed at getting an impression of the public awareness and acceptance of the Swiss DNA database and the perception of a hypothetical DNA database containing all Swiss residents. Secondly, we wanted to get a broader picture of how people that are not working in the field of forensic genetics think about legal permission to establish phenotypic descriptions of alleged criminals by genetic means. Even though a significant number of study participants did not even know about the existence of the Swiss DNA database, its acceptance appears to be very high. Generally our results suggest that the current forensic use of DNA profiling is considered highly trustworthy. However, the acceptance of a hypothetical universal database would be only as low as about 30% among the 284 respondents to our study, mostly because people are concerned about the security of their genetic data, their privacy or a possible risk of abuse of such a database. Concerning the genetic analysis of externally visible characteristics and biogeographical ancestry, we discover a high degree of acceptance. The acceptance decreases slightly when precise characteristics are presented to the participants in detail. About half of the respondents would be in favor of the moderate use of physical traits analyses only for serious crimes threatening life, health or sexual integrity. The possible risk of discrimination and reinforcement of racism, as discussed by scholars from anthropology, bioethics, law, philosophy and sociology, is mentioned less frequently by the study
Shirokizawa, Yoshiko; Abe, Atsushi
Japan Information Center of Science and Technology (JICST) has started the on-line service of DNA database in October 1988. This database is composed of EMBL Nucleotide Sequence Library and Genetic Sequence Data Bank. The authors outline the database system, data items and search commands. Examples of retrieval session are presented.
Williams, Anthony J
The internet has rapidly become the first port of call for all information searches. The increasing array of chemistry-related resources that are now available provides chemists with a direct path to the information that was previously accessed via library services and was limited by commercial and costly resources. The diversity of the information that can be accessed online is expanding at a dramatic rate, and the support for publicly available resources offers significant opportunities in terms of the benefits to science and society. While the data online do not generally meet the quality standards of manually curated sources, there are efforts underway to gather scientists together and 'crowdsource' an improvement in the quality of the available data. This review discusses the types of public compound databases that are available online and provides a series of examples. Focus is also given to the benefits and disruptions associated with the increased availability of such data and the integration of technologies to data mine this information.
Walters, LeRoy B.
Final Report on Award No. DE-FG0201ER63171 Principal Investigator: LeRoy B. Walters February 18, 2008 This project successfully completed its goal of surveying and reporting on the DNA patenting and licensing policies at 30 major U.S. academic institutions. The report of survey results was published in the January 2006 issue of Nature Biotechnology under the title “The Licensing of DNA Patents by US Academic Institutions: An Empirical Survey.” Lori Pressman was the lead author on this feature article. A PDF reprint of the article will be submitted to our Program Officer under separate cover. The project team has continued to update the DNA Patent Database on a weekly basis since the conclusion of the project. The database can be accessed at dnapatents.georgetown.edu. This database provides a valuable research tool for academic researchers, policymakers, and citizens. A report entitled Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health was published in 2006 by the Committee on Intellectual Property Rights in Genomic and Protein Research and Innovation, Board on Science, Technology, and Economic Policy at the National Academies. The report was edited by Stephen A. Merrill and Anne-Marie Mazza. This report employed and then adapted the methodology developed by our research project and quoted our findings at several points. (The full report can be viewed online at the following URL: http://www.nap.edu/openbook.php?record_id=11487&page=R1). My colleagues and I are grateful for the research support of the ELSI program at the U.S. Department of Energy.
Teodorović, Smilja; Mijović, Dragan; Radovanović Nenadić, Una; Savić, Marina
Worldwide, the establishment of national forensic DNA databases has transformed personal identification in the criminal justice system over the past two decades. It has also stimulated much debate centering on ethical issues, human rights, individual privacy, lack of safeguards and other standards. Therefore, a balance between effectiveness and intrusiveness of a national DNA repository is an imperative and needs to be achieved through a suitable legal framework. On its path to the European Union (EU), the Republic of Serbia is required to harmonize its national policies and legislation with the EU. Specifically, Chapter 24 of the EU acquis communautaire (Justice, Freedom and Security) stipulates the compulsory creation of a forensic DNA registry and adoption of corresponding legislation. This process is expected to occur in 2016. Thus, in light of launching the national DNA database, the goal of this work is to instigate a consultation with the Serbian public regarding their views on various aspects of the forensic DNA databank. Importantly, this study specifically assessed the opinions of distinct categories of citizens, including the general public, the prosecutors' offices staff, prisoners, prison guards, and students majoring in criminalistics. Our findings set a baseline for Serbian attitudes towards DNA databank custody, DNA sample and profile inclusion and retention criteria, ethical issues and concerns. Furthermore, results clearly demonstrate a permissive outlook of the respondents who are professional "beneficiaries" of genetic profiling and a restrictive position taken by the respondents whose genetic material has been acquired by the government. We believe that this opinion poll will be essential in discussions regarding a national DNA database, as well as in motivating further research on the reasons behind the observed views and subsequent development of educational strategies. All of these are, in turn, expected to aid the creation of suitable
Fung, David C Y
Disease- and locus-specific variant databases have been a valuable resource to clinical and research geneticists. With the recent rapid developments in technologies, the number of DNA variants detected in a typical molecular genetics laboratory easily exceeds 1,000. To keep track of the growing inventory of DNA variants, many laboratories employ information technology to store the data as well as distributing the data and its associated information to clinicians and researchers via the Web. While it is a valuable resource, the hosting of a web-accessible database requires collaboration between bioinformaticians and biologists and careful planning to ensure its usability and availability. In this chapter, a series of tutorials on building a local DNA variant database out of a sample dataset will be provided. However, this tutorial will not include programming details on building a web interface and on constructing the web application necessary for web hosting. Instead, an introduction to the two commonly used methods for hosting web-accessible variant databases will be described. Apart from the tutorials, this chapter will also consider the resources and planning required for making a variant database project successful.
White, W Timothy J; Hendy, Michael D
Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Malin, B.; Sweeney, L.
CleanGene is a software program that helps determine the identifiability of sequenced DNA, independent of any explicit demographics or identifiers maintained with the DNA. The program computes the likelihood that the release of DNA database entries could be related to specific individuals that are the subjects of the data. The engine within CleanGene relies on publicly available health care data and on knowledge of particular diseases to help relate identified individuals to DNA entries. Over 20 diseases, ranging over ataxias, blood diseases, and sex-linked mutations are accounted for, with 98-100% of individuals found identifiable. We assume the genetic material is released in a linear sequencing format from an individual's genome. CleanGene and its related experiments are useful tools for any institution seeking to provide anonymous genetic material for research purposes. PMID:11079941
Tucker, James Cory
This study examines the extent to which databases support student and faculty research in the area of public administration. A list of journals in public administration, public policy, political science, public budgeting and finance, and other related areas was compared to the journal content list of six business databases. These databases…
SRD 130 Short Tandem Repeat DNA Internet Database (Web, free access) Short Tandem Repeat DNA Internet Database is intended to benefit research and application of short tandem repeat DNA markers for human identity testing. Facts and sequence information on each STR system, population data, commonly used multiplex STR systems, PCR primers and conditions, and a review of various technologies for analysis of STR alleles have been included.
Panneerchelvam, S.; Norazmi, M.N.
The incredible power of DNA technology as an identification tool had brought a tremendous change in crimnal justice . DNA data base is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. This article discusses the essential steps in compilation of COmbined DNA Index System (CODIS) on validated polymerase chain amplified STRs and their use in crime detection. PMID:23386793
The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798
Wilson, Concepcion S.; Boell, Sebastian K.; Kennan, Mary Anne; Willard, Patricia
This paper examines aspects of journal articles published from 1967 to 2008, located in eight databases, and authored or co-authored by academics serving for at least two years in Australian LIS programs from 1959 to 2008. These aspects are: inclusion of publications in databases, publications in journals, authorship characteristics of…
Graham, E A M
The national DNA database in United Kingdom has now been operational for over 10 years. This review looks at the history and development of this investigative resource. From the development of commercial DNA profiling kits to the current statistics for matches obtained in relation to criminal investigation in the United Kingdom, before moving onto discussing potential future direction that national DNA databases might take, including international collaboration on a European and global scale.
Guillén, M; Lareu, M V; Pestoni, C; Salas, A; Carracedo, A
Advances in DNA technology and the discovery of DNA polymorphisms have permitted the creation of DNA databases of individuals for the purpose of criminal investigation. Many ethical and legal problems arise in the preparation of a DNA database, and these problems are especially important when one analyses the legal regulations on the subject. In this paper three main groups of possibilities, three systems, are analysed in relation to databases. The first system is based on a general analysis of the population; the second one is based on the taking of samples for a particular list of crimes, and a third is based only on the specific analysis of each case. The advantages and disadvantages of each system are compared and controversial issues are then examined. We found the second system to be the best choice for Spain and other European countries with a similar tradition when we weighed the rights of an individual against the public's interest in the prosecution of a crime.
Online database searching experiences of nine Illinois public libraries--Arlington Heights, Deerfield, Elk Grove Village, Evanston, Glenview, Northbrook, Schaumburg Township, Waukegan, Wilmette--are discussed, noting search costs, user charges, popular databases, library acquisition, interaction with users, and staff training. Three sources are…
This paper evaluates five polling resource: iPOLL, Polling the Nations, Gallup Brain, Public Opinion Poll Question Database, and Polls and Surveys. Content was evaluated on disclosure standards from major polling organizations, scope on a model for public opinion polls, and presentation on a flow chart discussing search limitations and usability.
Joh, Elizabeth E
In the United States, those groups of persons eligible for compulsory DNA sampling by law enforcement authorities continue to expand. The collection of DNA samples from felony arrestees will likely be adopted by many more states after the U.S. Supreme Court's 2013 decision in Maryland v. King, which upheld a state law permitting the compulsory and warrantless DNA sampling from those arrested of serious offenses. At the time of the decision, 28 states and the federal government already had arrestee DNA collection statutes in place. Nevada became the 29th state to collect DNA from arrestees in May 2013, and several others have bills under consideration. Should states collect DNA from misdemeanor arrestees as well? This article considers this as yet largely unrealized but nevertheless important potential expansion of arrestee DNA databases. The collection of DNA samples from those arrested of relatively minor offenses would increase the number of samples, and perhaps consequently the number of "hits." On balance, however, such an expansion of current DNA laws raises enough serious concerns-chiefly about police discretion, inequitable enforcement, and cost-that legislators should refrain from changing arrestee DNA laws in this way.
Guillen, M.; Lareu, M. V.; Pestoni, C.; Salas, A.; Carracedo, A.
Advances in DNA technology and the discovery of DNA polymorphisms have permitted the creation of DNA databases of individuals for the purpose of criminal investigation. Many ethical and legal problems arise in the preparation of a DNA database, and these problems are especially important when one analyses the legal regulations on the subject. In this paper three main groups of possibilities, three systems, are analysed in relation to databases. The first system is based on a general analysis of the population; the second one is based on the taking of samples for a particular list of crimes, and a third is based only on the specific analysis of each case. The advantages and disadvantages of each system are compared and controversial issues are then examined. We found the second system to be the best choice for Spain and other European countries with a similar tradition when we weighed the rights of an individual against the public's interest in the prosecution of a crime. Key Words: DNA databases • forensic genetics • ethics PMID:10951922
Background Thousands of plants and animals possess pharmacological properties and there is an increased interest in using these materials for therapy and health maintenance. Efficacies of the application is critically dependent on the use of genuine materials. For time to time, life-threatening poisoning is found because toxic adulterant or substitute is administered. DNA barcoding provides a definitive means of authentication and for conducting molecular systematics studies. Owing to the reduced cost in DNA authentication, the volume of the DNA barcodes produced for medicinal materials is on the rise and necessitates the development of an integrated DNA database. Description We have developed an integrated DNA barcode multimedia information platform- Medicinal Materials DNA Barcode Database (MMDBD) for data retrieval and similarity search. MMDBD contains over 1000 species of medicinal materials listed in the Chinese Pharmacopoeia and American Herbal Pharmacopoeia. MMDBD also contains useful information of the medicinal material, including resources, adulterant information, medical parts, photographs, primers used for obtaining the barcodes and key references. MMDBD can be accessed at http://www.cuhk.edu.hk/icm/mmdbd.htm. Conclusions This work provides a centralized medicinal materials DNA barcode database and bioinformatics tools for data storage, analysis and exchange for promoting the identification of medicinal materials. MMDBD has the largest collection of DNA barcodes of medicinal materials and is a useful resource for researchers in conservation, systematic study, forensic and herbal industry. PMID:20576098
Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M.
The DNA Damage Response (DDR) signalling network is an essential system that protects the genome’s integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used. Database URL: http://ddr.cbbio.es. PMID:27577567
... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...
... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...
... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...
... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...
... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Public-use database and public... Public-use database and public information. (a) General. Except as provided in paragraph (c) of this section, the Secretary shall establish and make available for public use, a public-use database...
Grabowski, Marek; Langner, Karol M; Cymborowski, Marcin; Porebski, Przemyslaw J; Sroka, Piotr; Zheng, Heping; Cooper, David R; Zimmerman, Matthew D; Elsliger, Marc André; Burley, Stephen K; Minor, Wladek
The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessible via the web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics
... From the Federal Register Online via the Government Publishing Office ] Part III Consumer Product Safety Commission 16 CFR Part 1102 Publicly Available Consumer Product Safety Information Database...; ] CONSUMER PRODUCT SAFETY COMMISSION 16 CFR Part 1102 Publicly Available Consumer Product Safety...
The current project aims to chemically index the content of public genomic databases to make these data accessible in relation to other publicly available, chemically-indexed toxicological information.
Gershaw, Cassandra J; Schweighardt, Andrew J; Rourke, Linda C; Wallace, Margaret M
DNA evidence is widely recognized as an invaluable tool in the process of investigation and identification, as well as one of the most sought after types of evidence for presentation to a jury. In the United States, the development of state and federal DNA databases has greatly impacted the forensic community by creating an efficient, searchable system that can be used to eliminate or include suspects in an investigation based on matching DNA profiles - the profile already in the database to the profile of the unknown sample in evidence. Recent changes in legislation have begun to allow for the possibility to expand the parameters of DNA database searches, taking into account the possibility of familial searches. This article discusses prospective positive outcomes of utilizing familial DNA searches and acknowledges potential negative outcomes, thereby presenting both sides of this very complicated, rapidly evolving situation.
Plazzer, John-Paul; Macrae, Finlay
In this chapter we aim to provide an overview of DNA variant databases, commonly known as Locus-Specific Databases (LSDBs), or Gene-Disease Specific Databases (GDSDBs), but the term variant database will be used for simplicity. We restrict this overview to germ-line variants, particularly as related to Mendelian diseases, which are diseases caused by a variant in a single gene. Common difficulties associated with variant databases and some proposed solutions are reviewed. Finally, systems where technical solutions have been implemented are discussed. This work will be useful for anyone wishing to establish their own variant database, or to learn about the global picture of variant databases, and the technical challenges to be overcome.
Plant natural products have been intensively investigated during the past decades with a considerable amount of generated data. Databases are subsequently developed to facilitate the management and analysis of accumulated information including plant species, chemical compounds, structures and bioactivities. With the support of databases, the screening of novel bioactivities for plant natural products can benefit from advanced computational methods to accelerate the progress of drug discovery. This overview describes the contents of publicly available databases useful for computational research of plant natural products. Based on the databases, quantitative structure-activity relationship models and protein-ligand docking methods can be developed and applied to analyze and screen bioactive compounds. More public and structured databases with unique contents, search functions and links to major databases are needed for efficiently exploring the chemical space of plant natural products.
Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska
In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
Gardner, D; Abato, M; Knuth, K H; DeBellis, R; Erde, S M
We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.
Qadir, Hemin; Kozaitis, S. P.; Ali, Ehsan
We presented a system to display nightime imagery with natural colors using a public database of images. We initially combined two spectral bands of images, thermal and visible, to enhance night vision imagery, however the fused image gave an unnatural color appearance. Therefore, a color transfer based on look-up table (LUT) was used to replace the false color appearance with a colormap derived from a daytime reference image obtained from a public database using the GPS coordinates of the vehicle. Because of the computational demand in deriving the colormap from the reference image, we created an additional local database of colormaps. Reference images from the public database were compared to a compact local database to retrieve one of a limited number of colormaps that represented several driving environments. Each colormap in the local database was stored with an image from which it was derived. To retrieve a colormap, we compared the histogram of the fused image with histograms of images in the local database. The colormaps of the best match was then used for the fused image. Continuously selecting and applying colormaps using this approach offered a convenient way to color night vision imagery.
Zamir, Ashira; Dell'Ariccia-Carmon, Aviva; Zaken, Neomi; Oz, Carla
The Israel Police DNA database, also known as IPDIS (Israel Police DNA Index System), has been operating since February 2007. During that time more than 135,000 reference samples have been uploaded and more than 2000 hits reported. We have developed an effective semi-automated system that includes two automated punchers, three liquid handler robots and four genetic analyzers. An inhouse LIMS program enables full tracking of every sample through the entire process of registration, pre-PCR handling, analysis of profiles, uploading to the database, hit reports and ultimately storage. The LIMS is also responsible for the future tracking of samples and their profiles to be expunged from the database according to the Israeli DNA legislation. The database is administered by an in-house developed software program, where reference and evidentiary profiles are uploaded, stored, searched and matched. The DNA database has proven to be an effective investigative tool which has gained the confidence of the Israeli public and on which the Israel National Police force has grown to rely.
Savige, Judy; Dalgleish, Raymond; Cotton, Richard Gh; den Dunnen, Johan T; Macrae, Finlay; Povey, Sue
A recent review identified 60 common inherited renal diseases caused by DNA variants in 132 different genes. These diseases can be diagnosed with DNA sequencing, but each gene probably also has a thousand normal variants. Many more normal variants have been characterised by individual laboratories than are reported in the literature or found in publicly accessible collections. At present, testing laboratories must assess each novel change they identify for pathogenicity, even when this has been done elsewhere previously, and the distinction between normal and disease-associated variants is particularly an issue with the recent surge in exomic sequencing and gene discovery projects. The Human Variome Project recommends the establishment of gene-specific DNA variant databases to facilitate the sharing of DNA variants and decisions about likely disease causation. Databases improve diagnostic accuracy and testing efficiency, and reduce costs. They also help with genotype-phenotype correlations and predictive algorithms. The Human Variome Project advocates databases that use standardised descriptions, are up-to-date, include clinical information and are freely available. Currently, the genes affected in the most common inherited renal diseases correspond to 350 different variant databases, many of which are incomplete or have insufficient clinical details for genotype-phenotype correlations. Assistance is needed from nephrologists to maximise the usefulness of these databases for the diagnosis and management of inherited renal disease.
Steele, Christopher D; Balding, David J
When evaluating the weight of evidence (WoE) for an individual to be a contributor to a DNA sample, an allele frequency database is required. The allele frequencies are needed to inform about genotype probabilities for unknown contributors of DNA to the sample. Typically databases are available from several populations, and a common practice is to evaluate the WoE using each available database for each unknown contributor. Often the most conservative WoE (most favourable to the defence) is the one reported to the court. However the number of human populations that could be considered is essentially unlimited and the number of contributors to a sample can be large, making it impractical to perform every possible WoE calculation, particularly for complex crime scene profiles. We propose instead the use of only the database that best matches the ancestry of the queried contributor, together with a substantial FST adjustment. To investigate the degree of conservativeness of this approach, we performed extensive simulations of one- and two-contributor crime scene profiles, in the latter case with, and without, the profile of the second contributor available for the analysis. The genotypes were simulated using five population databases, which were also available for the analysis, and evaluations of WoE using our heuristic rule were compared with several alternative calculations using different databases. Using FST=0.03, we found that our heuristic gave WoE more favourable to the defence than alternative calculations in well over 99% of the comparisons we considered; on average the difference in WoE was just under 0.2 bans (orders of magnitude) per locus. The degree of conservativeness of the heuristic rule can be adjusted through the FST value. We propose the use of this heuristic for DNA profile WoE calculations, due to its ease of implementation, and efficient use of the evidence while allowing a flexible degree of conservativeness.
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information.
Mochizuki, Takako; Tanizawa, Yasuhiro; Fujisawa, Takatomo; Ohta, Tazro; Nikoh, Naruo; Shimizu, Tokurou; Toyoda, Atsushi; Fujiyama, Asao; Kurata, Nori; Nagasaki, Hideki; Kaminuma, Eli; Nakamura, Yasukazu
With the rapid advances in next-generation sequencing (NGS), datasets for DNA polymorphisms among various species and strains have been produced, stored, and distributed. However, reliability varies among these datasets because the experimental and analytical conditions used differ among assays. Furthermore, such datasets have been frequently distributed from the websites of individual sequencing projects. It is desirable to integrate DNA polymorphism data into one database featuring uniform quality control that is distributed from a single platform at a single place. DNA polymorphism annotation database (DNApod; http://tga.nig.ac.jp/dnapod/) is an integrated database that stores genome-wide DNA polymorphism datasets acquired under uniform analytical conditions, and this includes uniformity in the quality of the raw data, the reference genome version, and evaluation algorithms. DNApod genotypic data are re-analyzed whole-genome shotgun datasets extracted from sequence read archives, and DNApod distributes genome-wide DNA polymorphism datasets and known-gene annotations for each DNA polymorphism. This new database was developed for storing genome-wide DNA polymorphism datasets of plants, with crops being the first priority. Here, we describe our analyzed data for 679, 404, and 66 strains of rice, maize, and sorghum, respectively. The analytical methods are available as a DNApod workflow in an NGS annotation system of the DNA Data Bank of Japan and a virtual machine image. Furthermore, DNApod provides tables of links of identifiers between DNApod genotypic data and public phenotypic data. To advance the sharing of organism knowledge, DNApod offers basic and ubiquitous functions for multiple alignment and phylogenetic tree construction by using orthologous gene information. PMID:28234924
Adams, Michael Q.
Acquaints information professionals with Digital Equipment Corporation's compact optical disk read-only-memory (CDROM) search and retrieval software and growing library of CDROM database publications (COMPENDEX, Chemical Abstracts Services). Highlights include MicroBASIS, boolean operators, range operators, word and phrase searching, proximity…
The Prototype Food and Nutrient Database for Dietary Studies (Prototype FNDDS) Branded Food Products Database for Public Health is a proof of concept database. The database contains a small selection of food products which is being used to exhibit the approach for incorporation of the Branded Food ...
Malbet, Fabien; Mella, Guillaume; Lawson, Peter; Taillifet, Esther; Lafrasse, Sylvain
Optical long baseline interferometry is a technique that has generated almost 850 refereed papers to date. The targets span a large variety of objects from planetary systems to extragalactic studies and all branches of stellar physics. We have created a database hosted by the JMMC and connected to the Optical Long Baseline Interferometry Newsletter (OLBIN) web site using MySQL and a collection of XML or PHP scripts in order to store and classify these publications. Each entry is defined by its ADS bibcode, includes basic ADS informations and metadata. The metadata are specified by tags sorted in categories: interferometric facilities, instrumentation, wavelength of operation, spectral resolution, type of measurement, target type, and paper category, for example. The whole OLBIN publication list has been processed and we present how the database is organized and can be accessed. We use this tool to generate statistical plots of interest for the community in optical long baseline interferometry.
A 20-year-old database of scientific publications by NCI at Frederick, FNLCR, and affiliated employees has gotten a significant facelift. Maintained by the Scientific Library, the redesigned database—which is linked from each of the Scientific Library’s web pages—offers features that were not available in previous versions, such as additional search limits and non-traditional metrics for scholarly and scientific publishing known as altmetrics.
... COMMISSION Publicly Available Consumer Product Safety Information Database: Notice of Public Web Conferences... Commission (``Commission,'' ``CPSC,'' or ``we'') is announcing two Web conferences to demonstrate to...''). The Web conferences will be webcast live from the Commission's headquarters in Bethesda, MD via...
Johnson, Paul; Williams, Robin; Martin, Paul
This paper is based on a current study of the growing police use of the epistemic authority of molecular biology for the identification of criminal suspects in support of crime investigation. It discusses the development of DNA profiling and the establishment and development of the UK National DNA Database (NDNAD) as an instance of the ‘scientification of police work’ (Ericson and Shearing 1986) in which the police uses of science and technology have a recursive effect on their future development. The NDNAD, owned by the Association of Chief Police Officers of England and Wales, is the first of its kind in the world and currently contains the genetic profiles of more than 2 million people. The paper provides a framework for the examination of this socio-technical innovation, begins to tease out the dense and compact history of the database and accounts for the way in which changes and developments across disparate scientific, governmental and policing contexts, have all contributed to the range of uses to which it is put. PMID:16467921
Price, Dana C; Bhattacharya, Debashish
Dinoflagellates are dominant members of the plankton and play key roles in ocean ecosystems as primary producers, predators, parasites, coral photobionts, and causative agents of algal blooms that produce toxins harmful to humans and commercial fisheries. These unicellular protists exhibit remarkable trophic and morphological diversity and include species with some of the largest reported nuclear genomes. Despite their high ecological and economic importance, comprehensive genome (or transcriptome) based dinoflagellate trees of life are few in number. To address this issue, we used recently generated public sequencing data, including from the Moore Microbial Eukaryote Transcriptome Sequencing Project, to identify dinoflagellate-specific ortholog groups. These orthologs were combined to create a broadly sampled and highly resolved phylogeny of dinoflagellates. Our results emphasize the scope and utility of public sequencing databases in creating broad and robust phylogenies for large and complex taxonomic lineages, while also providing unique insights into the evolution of thecate dinoflagellates.
Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G
The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.
Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G
Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.
Harmsen, Dag; Dostal, Stefan; Roth, Andreas; Niemann, Stefan; Rothgänger, Jörg; Sammeth, Michael; Albert, Jürgen; Frosch, Matthias; Richter, Elvira
Background Molecular identification of Mycobacterium species has two primary advantages when compared to phenotypic identification: rapid turn-around time and improved accuracy. The information content of the 5' end of the 16S ribosomal RNA gene (16S rDNA) is sufficient for identification of most bacterial species. However, reliable sequence-based identification is hampered by many faulty and some missing sequence entries in publicly accessible databases. Methods In order to establish an improved 16S rDNA sequence database for the identification of clinical and environmental isolates, we sequenced both strands of the 5' end of 16S rDNA (Escherichia coli positions 54 to 510) from 199 mycobacterial culture collection isolates. All validly described species (n = 89; up to March 21, 2000) and nearly all published sequevar variants were included. If the 16S rDNA sequences were not discriminatory, the internal transcribed spacer (ITS) region sequences (n = 84) were also determined. Results Using 5'-16S rDNA sequencing a total of 64 different mycobacterial species (71.9%) could be identified. With the additional input of the ITS sequence, a further 16 species or subspecies could be differentiated. Only Mycobacterium tuberculosis complex species, M. marinum / M. ulcerans and the M. avium subspecies could not be differentiated using 5'-16S rDNA or ITS sequencing. A total of 77 culture collection strain sequences, exhibiting an overlap of at least 80% and identical by strain number to the isolates used in this study, were found in the GenBank. Comparing these with our sequences revealed that an average of 4.31 nucleotide differences (SD ± 0.57) were present. Conclusions The data from this analysis show that it is possible to differentiate most mycobacterial species by sequence analysis of partial 16S rDNA. The high-quality sequences reported here, together with ancillary information (e.g., taxonomic, medical), are available in a public database, which is currently being
Gaitan, Santiago; ten Veldhuis, Marie-claire; van de Giesen, Nick
Cities worldwide are challenged by increasing urban flood risks. Precise and realistic measures are required to decide upon investment to reduce their impacts. Obvious flooding factors affecting flood risk include sewer systems performance and urban topography. However, currently implemented sewer and topographic models do not provide realistic predictions of local flooding occurrence during heavy rain events. Assessing other factors such as spatially distributed rainfall and socioeconomic characteristics may help to explain probability and impacts of urban flooding. Several public databases were analyzed: complaints about flooding made by citizens, rainfall depths (15 min and 100 Ha spatio-temporal resolution), grids describing number of inhabitants, income, and housing price (1Ha and 25Ha resolution); and buildings age. Data analysis was done using Python and GIS programming, and included spatial indexing of data, cluster analysis, and multivariate regression on the complaints. Complaints were used as a proxy to characterize flooding impacts. The cluster analysis, run for all the variables except the complaints, grouped part of the grid-cells of central Amsterdam into a highly differentiated group, covering 10% of the analyzed area, and accounting for 25% of registered complaints. The configuration of the analyzed variables in central Amsterdam coincides with a high complaint count. Remaining complaints were evenly dispersed along other groups. An adjusted R2 of 0.38 in the multivariate regression suggests that explaining power can improve if additional variables are considered. While rainfall intensity explained 4% of the incidence of complaints, population density and building age significantly explained around 20% each. Data mining of public databases proved to be a valuable tool to identify factors explaining variability in occurrence of urban pluvial flooding, though additional variables must be considered to fully explain flood risk variability.
Sim, Jeong Eun; Park, Su Jeong; Lee, Han Chul; Kim, Se-Yong; Kim, Jong Yeol; Lee, Seung Hwan
Since the Korean criminal DNA database was launched in 2010, we have focused on establishing an automated DNA database profiling system that analyzes short tandem repeat loci in a high-throughput and cost-effective manner. We established a DNA database profiling system without DNA purification using a direct PCR buffer system. The quality of direct PCR procedures was compared with that of conventional PCR system under their respective optimized conditions. The results revealed not only perfect concordance but also an excellent PCR success rate, good electropherogram quality, and an optimal intra/inter-loci peak height ratio. In particular, the proportion of DNA extraction required due to direct PCR failure could be minimized to <3%. In conclusion, the newly developed direct PCR system can be adopted for automated DNA database profiling systems to replace or supplement conventional PCR system in a time- and cost-saving manner.
Mayer-Hasselwander, H. A.; Bennett, K.; Bignami, G. F.; Bloemen, J. B. G. M.; Buccheri, R.; Caraveo, P. A.; Hermsen, W.; Kanbach, G.; Lebrun, F.; Paul, J. A.
The data obtained by the gamma ray satellite COS-B was processed, condensed and integrated together with the relevant mission and experiment parameters into the Final COS-B Database. The database contents and the access programs available with the database are outlined. The final sky coverage and a presentation of the large scale distribution of the observed Milky Way emission are given. The database is announced to be available through the European Space Agency.
Yu, Jingyin; Dossa, Komivi; Wang, Linhai; Zhang, Yanxin; Wei, Xin; Liao, Boshou; Zhang, Xiurong
Microsatellite DNAs (or SSRs) are important genomic components involved in many important biological functions. SSRs have been extensively exploited as molecular markers for diverse applications including genetic diversity, linkage/association mapping of gene/QTL, marker-assisted selection, variety identification and evolution analysis. However, a comprehensive database or web service for studying microsatellite DNAs and marker development in plants is lacking. Here, we developed a database, PMDBase, which integrates large amounts of microsatellite DNAs from genome sequenced plant species and includes a web service for microsatellite DNAs identification. In PMDBase, 26 230 099 microsatellite DNAs were identified spanning 110 plant species. Up to three pairs of primers were supplied for every microsatellite DNA. For 81 species, genomic features of the microsatellite DNAs (genic or non-genic) were supplied with the corresponding genes or transcripts from public databases. Microsatellite DNAs can be explored through browsing and searching modules with a user-friendly web interface and customized software. Furthermore, we developed MISAweb and embedded Primer3web to help users to identify microsatellite DNAs and design corresponding primers in their own genomic sequences online. All datasets of microsatellite DNAs can be downloaded conveniently. PMDBase will be updated regularly with new available genome data and can be accessed freely via the address http://www.sesame-bioinfo.org/PMDBase. PMID:27733507
Milanowska, Kaja; Rother, Kristian; Bujnicki, Janusz M.
DNA is continuously exposed to many different damaging agents such as environmental chemicals, UV light, ionizing radiation, and reactive cellular metabolites. DNA lesions can result in different phenotypical consequences ranging from a number of diseases, including cancer, to cellular malfunction, cell death, or aging. To counteract the deleterious effects of DNA damage, cells have developed various repair systems, including biochemical pathways responsible for the removal of single-strand lesions such as base excision repair (BER) and nucleotide excision repair (NER) or specialized polymerases temporarily taking over lesion-arrested DNA polymerases during the S phase in translesion synthesis (TLS). There are also other mechanisms of DNA repair such as homologous recombination repair (HRR), nonhomologous end-joining repair (NHEJ), or DNA damage response system (DDR). This paper reviews bioinformatics resources specialized in disseminating information about DNA repair pathways, proteins involved in repair mechanisms, damaging agents, and DNA lesions. PMID:22091405
Marjanović, Damir; Konjhodzić, Rijad; Butorac, Sara Sanela; Drobnic, Katja; Merkas, Sinisa; Lauc, Gordan; Primorac, Damir; Andjelinović, Simun; Milosavljević, Mladen; Karan, Zeljko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vucetić Dragović, Andjelka; Kovacević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan
The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a 'regional supplement' to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations.
Marjanović, Damir; Konjhodžić, Rijad; Butorac, Sara Sanela; Drobnič, Katja; Merkaš, Siniša; Lauc, Gordan; Primorac, Damir; Anđelinović, Šimun; Milosavljević, Mladen; Karan, Željko; Vidović, Stojko; Stojković, Oliver; Panić, Bojana; Vučetić Dragović, Anđelka; Kovačević, Sandra; Jakovski, Zlatko; Asplen, Chris; Primorac, Dragan
The European Network of Forensic Science Institutes (ENFSI) recommended the establishment of forensic DNA databases and specific implementation and management legislations for all EU/ENFSI members. Therefore, forensic institutions from Bosnia and Herzegovina, Serbia, Montenegro, and Macedonia launched a wide set of activities to support these recommendations. To assess the current state, a regional expert team completed detailed screening and investigation of the existing forensic DNA data repositories and associated legislation in these countries. The scope also included relevant concurrent projects and a wide spectrum of different activities in relation to forensics DNA use. The state of forensic DNA analysis was also determined in the neighboring Slovenia and Croatia, which already have functional national DNA databases. There is a need for a ‘regional supplement’ to the current documentation and standards pertaining to forensic application of DNA databases, which should include regional-specific preliminary aims and recommendations. PMID:21674821
Racz, Rebecca; He, Yongqun
A DNA vaccine is a vaccine that uses a mammalian expression vector to express one or more protein antigens and is administered in vivo to induce an adaptive immune response. Since the 1990s, a significant amount of research has been performed on DNA vaccines and the mechanisms behind them. To meet the needs of the DNA vaccine research community, we created DNAVaxDB ( http://www.violinet.org/dnavaxdb ), the first Web-based database and analysis resource of experimentally verified DNA vaccines. All the data in DNAVaxDB, which includes plasmids, antigens, vaccines, and sources, is manually curated and experimentally verified. This chapter goes over the detail of DNAVaxDB system and shows how the DNA vaccine database, combined with the Vaxign vaccine design tool, can be used for rational design of a DNA vaccine against a pathogen, such as Mycobacterium bovis.
Baeta, Miriam; Martínez-Jarreta, Begoña
One of the most polemic issues regarding the use of deoxyribonucleic acid (DNA) in the legal sphere, refers to the creation of DNA databases. Until relatively recently, Spain did not have a law to support the establishment of a national DNA profile bank for forensic purposes, and preserve the fundamental rights of subjects whose data are archived therein. The regulatory law of police databases regarding identifiers obtained from DNA approved in 2007, covers this void in the Spanish legislation and responds to the incessant need to adapt the laws to continuous scientific and technological progress.
Background Although bivalves are among the most-studied marine organisms because of their ecological role and economic importance, very little information is available on the genome sequences of oyster species. This report documents three large-scale cDNA sequencing projects for the Pacific oyster Crassostrea gigas initiated to provide a large number of expressed sequence tags that were subsequently compiled in a publicly accessible database. This resource allowed for the identification of a large number of transcripts and provides valuable information for ongoing investigations of tissue-specific and stimulus-dependant gene expression patterns. These data are crucial for constructing comprehensive DNA microarrays, identifying single nucleotide polymorphisms and microsatellites in coding regions, and for identifying genes when the entire genome sequence of C. gigas becomes available. Description In the present paper, we report the production of 40,845 high-quality ESTs that identify 29,745 unique transcribed sequences consisting of 7,940 contigs and 21,805 singletons. All of these new sequences, together with existing public sequence data, have been compiled into a publicly-available Website http://public-contigbrowser.sigenae.org:9090/Crassostrea_gigas/index.html. Approximately 43% of the unique ESTs had significant matches against the SwissProt database and 27% were annotated using Gene Ontology terms. In addition, we identified a total of 208 in silico microsatellites from the ESTs, with 173 having sufficient flanking sequence for primer design. We also identified a total of 7,530 putative in silico, single-nucleotide polymorphisms using existing and newly-generated EST resources for the Pacific oyster. Conclusion A publicly-available database has been populated with 29,745 unique sequences for the Pacific oyster Crassostrea gigas. The database provides many tools to search cleaned and assembled ESTs. The user may input and submit several filters, such as
Charoute, Hicham; Nahili, Halima; Abidi, Omar; Gabi, Khalid; Rouba, Hassan; Fakiri, Malika; Barakat, Abdelhamid
National and ethnic mutation databases provide comprehensive information about genetic variations reported in a population or an ethnic group. In this paper, we present the Moroccan Genetic Disease Database (MGDD), a catalogue of genetic data related to diseases identified in the Moroccan population. We used the PubMed, Web of Science and Google Scholar databases to identify available articles published until April 2013. The Database is designed and implemented on a three-tier model using Mysql relational database and the PHP programming language. To date, the database contains 425 mutations and 208 polymorphisms found in 301 genes and 259 diseases. Most Mendelian diseases in the Moroccan population follow autosomal recessive mode of inheritance (74.17%) and affect endocrine, nutritional and metabolic physiology. The MGDD database provides reference information for researchers, clinicians and health professionals through a user-friendly Web interface. Its content should be useful to improve researches in human molecular genetics, disease diagnoses and design of association studies. MGDD can be publicly accessed at http://mgdd.pasteur.ma. PMID:23860041
Tvedebrink, Torben; Bright, Jo-Anne; Buckleton, John S; Curran, James M; Morling, Niels
Forensic DNA databases are powerful tools used for the identification of persons of interest in criminal investigations. Typically, they consist of two parts: (1) a database containing DNA profiles of known individuals and (2) a database of DNA profiles associated with crime scenes. The risk of adventitious or chance matches between crimes and innocent people increases as the number of profiles within a database grows and more data is shared between various forensic DNA databases, e.g. from different jurisdictions. The DNA profiles obtained from crime scenes are often partial because crime samples may be compromised in quantity or quality. When an individual's profile cannot be resolved from a DNA mixture, ambiguity is introduced. A wild card, F, may be used in place of an allele that has dropped out or when an ambiguous profile is resolved from a DNA mixture. Variant alleles that do not correspond to any marker in the allelic ladder or appear above or below the extent of the allelic ladder range are assigned the allele designation R for rare allele. R alleles are position specific with respect to the observed/unambiguous allele. The F and R designations are made when the exact genotype has not been determined. The F and R designation are treated as wild cards for searching, which results in increased chance of adventitious matches. We investigated the probability of adventitious matches given these two types of wild cards.
Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo
Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species.
The objective of this study was to create DNA fingerprints for the Razzle Dazzle® crape myrtle series using simple sequence repeat (SSR) markers, and compare them with the DNA fingerprints of a database made up of over 50 popular crape myrtle cultivars currently available in the trade. Data consiste...
...-counter drugs and dietary supplements should not be included in the Database because food and drugs are regulated and monitored by the U.S. Food and Drug Administration (``FDA''). The commenter notes that the..., including food and drugs. This information will include links to the appropriate government agencies that...
Background There are several reports describing thousands of SSR markers in the peanut (Arachis hypogaea L.) genome. There is a need to integrate various research reports of peanut DNA polymorphism into a single platform. Further, because of lack of uniformity in the labeling of these markers across the publications, there is some confusion on the identities of many markers. We describe below an effort to develop a central comprehensive database of polymorphic SSR markers in peanut. Findings We compiled 1,343 SSR markers as detecting polymorphism (14.5%) within a total of 9,274 markers. Amongst all polymorphic SSRs examined, we found that AG motif (36.5%) was the most abundant followed by AAG (12.1%), AAT (10.9%), and AT (10.3%).The mean length of SSR repeats in dinucleotide SSRs was significantly longer than that in trinucleotide SSRs. Dinucleotide SSRs showed higher polymorphism frequency for genomic SSRs when compared to trinucleotide SSRs, while for EST-SSRs, the frequency of polymorphic SSRs was higher in trinucleotide SSRs than in dinucleotide SSRs. The correlation of the length of SSR and the frequency of polymorphism revealed that the frequency of polymorphism was decreased as motif repeat number increased. Conclusions The assembled polymorphic SSRs would enhance the density of the existing genetic maps of peanut, which could also be a useful source of DNA markers suitable for high-throughput QTL mapping and marker-assisted selection in peanut improvement and thus would be of value to breeders. PMID:22818284
Thai, Quan Ke; Bös, Fabian; Pleiss, Jürgen
Background TEM β-lactamases are the main cause for resistance against β-lactam antibiotics. Sequence information about TEM β-lactamases is mainly found in the NCBI peptide database and TEM mutation table at . While the TEM mutation table is manually curated by experts in the lactamase field, who guarantee reliable and consistent information, the rapidly growing sequence and annotation information from the NCBI peptide database is sometimes inconsistent. Therefore, the Lactamase Engineering Database has been developed to collect the TEM β-lactamase sequences from the NCBI peptide database and the TEM mutation table, systematically compare sequence information and naming, identify inconsistencies, and thus provide a versatile tool for reconciliation of data and for an investigation of the sequence-function relationship. Description The LacED currently provides 2399 sequence entries and 37 structure entries. Sequence information on 150 different TEM β-lactamases was derived from the TEM mutation table which provides a unique number to each protein classified as TEM β-lactamase. 293 TEM-like proteins were found in the NCBI protein database, but only 113 TEM β-lactamase were common to both data sets. The 180 TEM β-lactamases from the NCBI protein database which have not yet been assigned to a TEM number fall in three classes: (1) 89 proteins from microbial organisms and 35 proteins from cloning or expression vectors had a new mutation profile; (2) 55 proteins had inconsistent annotation in terms of TEM assignment or reported mutation profile; (3) 39 proteins are fragments. The LacED is web accessible at and contains multisequence alignments, structure information and reconciled annotation of TEM β-lactamases. The LacED is weekly updated and supplies all data for download. Conclusion The Lactamase Engineering Database enables a systematic analysis of TEM β-lactamase sequence and annotation data from different data sources, and thus provides a valuable tool to
Since the 1980s, when DNA markers for identifying biological samples were first developed, the use of DNA evidence to convict defendants and to exonerate the wrongfully accused and wrongfully imprisoned has greatly increased. But the increase in databanks for storing DNA information on individuals convicted of certain crimes raises important legal and ethical issues on the use, collection and storage of DNA evidence. These issues have been the subject of a recent US National Commission, which will, hopefully, broaden public discourse about the future uses of DNA forensic technology.
Lopes, Tássia do Vale Cardoso; Cyrillo, Denise Cavallini; Giuntini, Eliana Bistriche; Lajolo, Franco Maria; Menezes, Elizabete Wenzel De
The article shows the evolution of the Brazilian Food Composition Database (TBCA-USP), since its creation until its next update. The article characterizes the TBCA-USP database like a public good and highlights the importance of the food composition data compilation as a high cost-effective activity. It reports the social relevance of the information about food composition and the importance of this database in the national context. It also indicates extension and update strategies of the TBCA-USP.
I weigh the arguments for and against the patenting of functional DNA sequences including genes, and find the objections to be compelling. Is an outright ban on DNA patenting the right policy response? Not necessarily. Governments may wish to consider options ranging from patent law reforms to the creation of new rights. There are alternative ways to protect DNA sequences that industry may choose if DNA patenting is restricted or banned. Some of these alternatives may be more harmful than patents. Such unintended consequences of patent bans mean that we should think hard before concluding that prohibition is the only response to legitimate concerns about the appropriateness of patents in the field of human genomics. PMID:16710549
I weigh the arguments for and against the patenting of functional DNA sequences including genes, and find the objections to be compelling. Is an outright ban on DNA patenting the right policy response? Not necessarily. Governments may wish to consider options ranging from patent law reforms to the creation of new rights. There are alternative ways to protect DNA sequences that industry may choose if DNA patenting is restricted or banned. Some of these alternatives may be more harmful than patents. Such unintended consequences of patent bans mean that we should think hard before concluding that prohibition is the only response to legitimate concerns about the appropriateness of patents in the field of human genomics.
Hoffman, Sharona; Podgurski, Andy
The accelerating adoption of electronic health record (EHR) systems will have far-reaching implications for public health research and surveillance, which in turn could lead to changes in public policy, statutes, and regulations. The public health benefits of EHR use can be significant. However, researchers and analysts who rely on EHR data must proceed with caution and understand the potential limitations of EHRs. Because of clinicians' workloads, poor user-interface design, and other factors, EHR data can be erroneous, miscoded, fragmented, and incomplete. In addition, public health findings can be tainted by the problems of selection bias, confounding bias, and measurement bias. These flaws may become all the more troubling and important in an era of electronic "big data," in which a massive amount of information is processed automatically, without human checks. Thus, we conclude the paper by outlining several regulatory and other interventions to address data analysis difficulties that could result in invalid conclusions and unsound public health policies.
WASHINGTON - Today, the U.S. Environmental Protection Agency (EPA) released updated environmental and public health indicators in an online database, making information about the current and historical condition of the nation's environment and human
DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
Ann M. Richard
US Environmental Protection Agency, Research Triangle Park, NC, USA
Distributed: Decentralized set of standardized, field-delimited databases,...
Walsh, S J; Moss, D S; Kliem, C; Vintiner, G M
The primary aim of any DNA Database is to link individuals to unsolved offenses and unsolved offenses to each other via DNA profiling. This aim has been successfully realised during the operation of the New Zealand (NZ) DNA Databank over the past five years. The DNA Intelligence Project (DIP), a collaborative project involving NZ forensic and law enforcement agencies, interrogated the forensic case data held on the NZ DNA databank and collated it into a functional intelligence database. This database has been used to identify significant trends which direct Police and forensic personnel towards the most appropriate use of DNA technology. Intelligence is being provided in areas such as the level of usage of DNA techniques in criminal investigation, the relative success of crime scene samples and the geographical distribution of crimes. The DIP has broadened the dimensions of the information offered through the NZ DNA Databank and has furthered the understanding and investigative capability of both Police and forensic scientists. The outcomes of this research fit soundly with the current policies of 'intelligence led policing', which are being adopted by Police jurisdictions locally and overseas.
... HUMAN SERVICES Food and Drug Administration FDA's Public Database of Products With Orphan-Drug... its public database of products that have received orphan-drug designation. The Orphan Drug Act... received orphan designation were published on our public database with non-informative code names....
Bobillo, Maria Cecilia; Zimmermann, Bettina; Sala, Andrea; Huber, Gabriela; Röck, Alexander; Bandelt, Hans-Jürgen; Corach, Daniel; Parson, Walther
The study presents South American mitochondrial DNA (mtDNA) data from selected north (N = 98), central (N = 193) and south (N = 47) Argentinean populations. Sequence analysis of the complete mtDNA control region (CR, 16024-576) resulted in 288 unique haplotypes ignoring C-insertions around positions 16193, 309, and 573; the additional analysis of coding region single nucleotide polymorphisms enabled a fine classification of the described lineages. The Amerindian haplogroups were most frequent in the north and south representing more than 60% of the sequences. A slightly different situation was observed in central Argentina where the Amerindian haplogroups represented less than 50%, and the European contribution was more relevant. Particular clades of the Amerindian subhaplogroups turned out to be nearly region-specific. A minor contribution of African lineages was observed throughout the country. This comprehensive admixture of worldwide mtDNA lineages and the regional specificity of certain clades in the Argentinean population underscore the necessity of carefully selecting regional samples in order to develop a nationwide mtDNA database for forensic and anthropological purposes. The mtDNA sequencing and analysis were performed under EMPOP guidelines in order to attain high quality for the mtDNA database.
Martin, P D; Schmitter, H; Schneider, P M
The introduction of DNA analysis to forensic science brought with it a number of choices for analysis, not all of which were compatible. As laboratories throughout Europe were eager to use the new technology different systems became routine in different laboratories and consequently, there was no basis for the exchange of results. A period of co-operation then started in which a nucleus of forensic scientists agreed on an uniform system. This collaboration spread to incorporate most of the established forensic science laboratories in Europe and continued through two major changes in the technology. At each step agreement was reached on which systems to use. From the beginning it was realised that DNA databases would provide the criminal justice systems with an efficient way of crime solving and consequently some local databases were created. It was not until the introduction of the amplification technology linked to the analysis of short tandem repeats that a sufficiently sensitive and robust system was available for the formation of efficient and effective DNA databases. Comprehensive legislation enacted in the UK in 1995 enabled forensic scientists to set up the first national DNA database which would hold both personal DNA profiles together with results obtained from crime scenes. Other countries quickly followed but in some the legislation has severely restricted the amount and type of data which can be retained and, therefore, effectiveness of the databases is limited. The widespread use of commercially produced multiplex kits has produced a situation in which nearly all European laboratories are using compatible systems and there is, therefore, the potential for the introduction of a pan-European DNA database. However, the exchange of results between countries is hampered by the various legislations which currently exist.
Blanford, William; Chambers, Jay G.
The Institute for Research on Educational Finance and Governance (IFG) has designed and implemented a major survey of public and private schools in the six-county San Francisco Bay Area which focuses on organizational dimensions in elementary and secondary schools. Private schools include Catholic parochial and Catholic private schools,…
Das, Raima; Ghosh, Sankar Kumar
DNA repair pathway is a primary defense system that eliminates wide varieties of DNA damage. Any deficiencies in them are likely to cause the chromosomal instability that leads to cell malfunctioning and tumorigenesis. Genetic polymorphisms in DNA repair genes have demonstrated a significant association with cancer risk. Our study attempts to give a glimpse of the overall scenario of the germline polymorphisms in the DNA repair genes by taking into account of the Exome Aggregation Consortium (ExAC) database as well as the Human Gene Mutation Database (HGMD) for evaluating the disease link, particularly in cancer. It has been found that ExAC DNA repair dataset (which consists of 228 DNA repair genes) comprises 30.4% missense, 12.5% dbSNP reported and 3.2% ClinVar significant variants. 27% of all the missense variants has the deleterious SIFT score of 0.00 and 6% variants carrying the most damaging Polyphen-2 score of 1.00, thus affecting the protein structure and function. However, as per HGMD, only a fraction (1.2%) of ExAC DNA repair variants was found to be cancer-related, indicating remaining variants reported in both the databases to be further analyzed. This, in turn, may provide an increased spectrum of the reported cancer linked variants in the DNA repair genes present in ExAC database. Moreover, further in silico functional assay of the identified vital cancer-associated variants, which is essential to get their actual biological significance, may shed some lights in the field of targeted drug development in near future.
FastStats is a site that provides quick and easy access to public health statistics. The freely available website is maintained by the Centers for Disease Control and Prevention's National Center for Health Statistics. Users can browse alphabetically by topic and state/territory or search across the National Center for Health Statistics site. A description of the browsing capabilities and sample searches are presented.
Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative.
Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community.
Paez-Espino, David; Chen, I.-Min A.; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M.; Nielsen, Torben; Huntemann, Marcel; K. Reddy, T. B.; Pavlopoulos, Georgios A.; Sullivan, Matthew B.; Campbell, Barbara J.; Chen, Feng; McMahon, Katherine; Hallam, Steve J.; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M.; Streit, Wolfgang R.; Webster, John; Handley, Kim M.; Salekdeh, Ghasem H.; Tsesmetzis, Nicolas; Setubal, Joao C.; Pope, Phillip B.; Liu, Wen-Tso; Rivers, Adam R.; Ivanova, Natalia N.; Kyrpides, Nikos C.
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. PMID:27799466
Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...
Federal and state law enforcement authorities have amassed large collections of DNA samples and the identifying profiles derived from them. These databases help to identify the guilty and to exonerate the innocent, but as the databanks grow, so do fears about civil liberties. The research reported here discusses three legal and social policy issues that have been raised in regard to these biobanks—the choice of loci to type for identifying individuals, the indefinite retention of DNA samples, and the use of the DNA samples or the identifying profiles for research purposes. It also considers the possible value of the databases for research into the genetics of human behavior and the ethics of using them for this purpose. It rejects the broad claim that such research is inherently unethical but proposes procedures for ensuring that the value of the proposed research justifies any psychosocial or other risks to the subjects of the research.
Parson, Walther; Roewer, Lutz
This manuscript extends on earlier recommendations of the editor of the International Journal of Legal Medicine on short tandem repeat population data and provides details on specific criteria relevant for the analysis and publication of population studies on haploid DNA markers, i.e. Y-chromosomal polymorphisms and mitochondrial DNA. The proposed concept is based on review experience with the two forensic haploid markers databases YHRD and EMPOP, which are both endorsed by the International Society for Forensic Genetics. The intention is to provide guidance with the preparation of population studies and their results to improve the reviewing process and the quality of published data. We also suggest a minimal set of required information to be presented in the publication to increase understanding and use of the data. The outlined procedure has in part been elaborated with the editors of the journal Forensic Science International Genetics.
Reilly, P.R.; McEwen, J.E.; Lawyer, J.D.; Small, D.
The purpose of this research was to provide support to enable the authors to: (1) perform legal and empirical research and critically analyze DNA banking and DNA databanking as those activities are conducted by state forensic laboratories, the military, academic researchers, and commercial enterprises; and (2) develop a broadcast quality educational videotape for viewing by the general public about DNA technology and the privacy and related issues that it raises. The grant thus had both a research and analysis component and a public education component. This report outlines the work completed under the project.
Voultsos, Polychronis; Njau, Samuel; Tairis, Nikolaos; Psaroulis, Dimitrios; Kovatsi, Leda
Since the creation of the first national DNA database in Europe in 1995, many European countries have legislated laws for initiating and regulating their own databases. The Greek government legislated a law in 2008, by which the National DNA Database of Greece was founded and regulated. According to this law, only DNA profiles from convicted criminals were recorded. Nevertheless, a year later, in 2009, the law was amended to permit the creation of an expanded database including innocent people and children. Unfortunately, the new law is very vague in many aspects and does not respect the principle of proportionality. Therefore, according to our opinion, it will soon need to be re-amended. Furthermore, prior to legislating the new law, there was no debate with the community itself in order to clarify what system would best suit Greece and what the citizens would be willing to accept. We present the current legal framework in Greece, we highlight issues that need to be clarified and we discuss possible ethical issues that may arise.
Schnoes, Alexandra M; Brown, Shoshana D; Dodevski, Igor; Babbitt, Patricia C
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with "overprediction" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.
Röck, Alexander; Irwin, Jodi; Dür, Arne; Parsons, Thomas; Parson, Walther
The analysis of the haploid mitochondrial (mt) genome has numerous applications in forensic and population genetics, as well as in disease studies. Although mtDNA haplotypes are usually determined by sequencing, they are rarely reported as a nucleotide string. Traditionally they are presented in a difference-coded position-based format relative to the corrected version of the first sequenced mtDNA. This convention requires recommendations for standardized sequence alignment that is known to vary between scientific disciplines, even between laboratories. As a consequence, database searches that are vital for the interpretation of mtDNA data can suffer from biased results when query and database haplotypes are annotated differently. In the forensic context that would usually lead to underestimation of the absolute and relative frequencies. To address this issue we introduce SAM, a string-based search algorithm that converts query and database sequences to position-free nucleotide strings and thus eliminates the possibility that identical sequences will be missed in a database query. The mere application of a BLAST algorithm would not be a sufficient remedy as it uses a heuristic approach and does not address properties specific to mtDNA, such as phylogenetically stable but also rapidly evolving insertion and deletion events. The software presented here provides additional flexibility to incorporate phylogenetic data, site-specific mutation rates, and other biologically relevant information that would refine the interpretation of mitochondrial DNA data. The manuscript is accompanied by freeware and example data sets that can be used to evaluate the new software (http://stringvalidation.org). PMID:21056022
Jiang, Xianhua; Guo, Fei; Jia, Fei; Jin, Ping; Sun, Zhu
The multiplex system allows the detection of 19 autosomal short tandem repeat (STR) loci [including all Combined DNA Index System (CODIS) STR loci as well as D2S1338, D6S1043, D12S391, D19S433, Penta D and Penta E] plus the sex-determining locus Amelogenin in a single reaction, comprising all STR loci in various commercial kits used in the China national DNA database (NDNAD). Primers are designed so that the amplicons are distributed ranging from 90 base pairs (bp) to 450 bp within a five-dye fluorescent design with the fifth dye reserved for the internal size standard. With 30 cycles, 125 pg to 2 ng DNA template showed optimal profiling result, while robust profiles could also be achieved by adjusting the cycle numbers for the DNA template beyond that optimal DNA input range. Mixture studies showed that 83% and 87% of minor alleles were detected at 9:1 and 1:9 ratios, respectively. When 4 ng of degraded DNA was digested by 2-min DNase and 1 ng undegraded DNA was added to 400 μM haematin, the complete profiles were still observed. Polymerase chain reaction (PCR)-based procedures were examined and optimized including the concentrations of primer set, magnesium and the Taq polymerase as well as volume, cycle number and annealing temperature. In addition, the system has been validated by 3000 bloodstain samples and 35 common case samples in line with the Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. The total probability of identity (TPI) can reach to 8×10(-24), where DNA database can be improved at the level of 10 million DNA profiles or more because the number of expected match is far from one person (4×10(-10)) and can be negligible. Further, our system also demonstrates its good performance in case samples and it will be an ideal tool for forensic DNA typing and databasing with potential application.
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the
Jamal, Qazi Mohammad Sajid; Siddiqui, Mughees Uddin; Alzohairy, Mohammad Abdulrahman; Al Karaawi, Mohammed Abdullah
The collaboration of public health education and information technology has made patient care safer and more reliable than before. Nurses and doctors use handheld computers to record a patient's medical history and check that they are administering the correct treatment. Fortunately Public Health Informatics (PHI) is the intersecting point of technology and public health. Therefore, the inclusion of online medical and epidemiology databases in the course curriculum of budding medical professionals and postgraduate students would be beneficial in enhancing the quality of health care, extensive epidemiological research, health education, health policies, health planning and consumer satisfaction as well. The purpose of this article is to discuss and provide introduction of various databases which have huge information and it could be used to enhance the public health education. PMID:26392847
Turchi, Chiara; Stanciu, Florin; Paselli, Giorgia; Buscemi, Loredana; Parson, Walther; Tagliabracci, Adriano
To evaluate the pattern of Romanian population from a mitochondrial perspective and to establish an appropriate mtDNA forensic database, we generated a high-quality mtDNA control region dataset from 407 Romanian subjects belonging to four major historical regions: Moldavia, Transylvania, Wallachia and Dobruja. The entire control region (CR) was analyzed by Sanger-type sequencing assays and the resulting 306 different haplotypes were classified into haplogroups according to the most updated mtDNA phylogeny. The Romanian gene pool is mainly composed of West Eurasian lineages H (31.7%), U (12.8%), J (10.8%), R (10.1%), T (9.1%), N (8.1%), HV (5.4%),K (3.7%), HV0 (4.2%), with exceptions of East Asian haplogroup M (3.4%) and African haplogroup L (0.7%). The pattern of mtDNA variation observed in this study indicates that the mitochondrial DNA pool is geographically homogeneous across Romania and that the haplogroup composition reveals signals of admixture of populations of different origin. The PCA scatterplot supported this scenario, with Romania located in southeastern Europe area, close to Bulgaria and Hungary, and as a borderland with respect to east Mediterranean and other eastern European countries. High haplotype diversity (0.993) and nucleotide diversity indices (0.00838±0.00426), together with low random match probability (0.0087) suggest the usefulness of this control region dataset as a forensic database in routine forensic mtDNA analysis and in the investigation of maternal genetic lineages in the Romanian population.
Yongye, Austin B; Waddell, Jacob; Medina-Franco, José L
Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery.
Chelala, Claude; Khan, Arshad; Lemoine, Nicholas R
Motivation: Design a new computational tool allowing scientists to functionally annotate newly discovered and public domain single nucleotide polymorphisms in order to help in prioritizing targets in further disease studies and large-scale genotyping projects. Summary: SNPnexus database provides functional annotation for both novel and public SNPs. Possible effects on the transcriptome and proteome levels are characterized and reported from five major annotation systems providing the most extensive information on alternative splicing. Additional information on HapMap genotype and allele frequency, overlaps with potential regulatory elements or structural variations as well as related genetic diseases can be also retrieved. The SNPnexus database has a user-friendly web interface, providing single or batch query options using SNP identifiers from dbSNP as well as genomic location on clones, contigs or chromosomes. Therefore, SNPnexus is the only database currently providing a complete set of functional annotations of SNPs in public databases and newly detected from sequencing projects. Hence, we describe SNPnexus, provide details of the query options, the annotation categories as well as biological examples of use. Availability: The SNPnexus database is freely available at http://www.snp-nexus.org. Contact: firstname.lastname@example.org PMID:19098027
This article examines the emergence of "digital governance" in public education in England. Drawing on and combining concepts from software studies, policy and political studies, it identifies some specific approaches to digital governance facilitated by network-based communications and database-driven information processing software…
Human Exposure Database System (HEDS) is an Internet-based system developed to provide public access to human-exposure-related data from studies conducted by EPA's National Exposure Research Laboratory (NERL). HEDS was designed to work with the EPA Office of Research and Devel...
Dumont, B.; Fuks, B.; Kraml, S.; Bein, S.; Chalons, G.; Conte, E.; Kulkarni, S.; Sengupta, D.; Wymant, C.
We present the implementation, in the MadAnalysis 5 framework, of several ATLAS and CMS searches for supersymmetry in data recorded during the first run of the LHC. We provide extensive details on the validation of our implementations and propose to create a public analysis database within this framework.
Standardization and structural annotation of public toxicity databases: Improving SAR capabilities and linkage to 'omics data
Ann M. Richard', ClarLynda Williams', Jamie Burch2
'Nat Health & Environ Res Lab, US EPA, RTP, NC 27711; 2EPA/NC Central Univ Student COOP Trainee<...
Hicks, T; Taroni, F; Curran, J; Buckleton, J; Ribaux, O; Castella, V
In traditional criminal investigation, uncertainties are often dealt with using a combination of common sense, practical considerations and experience, but rarely with tailored statistical models. For example, in some countries, in order to search for a given profile in the national DNA database, it must have allelic information for six or more of the ten SGM Plus loci for a simple trace. If the profile does not have this amount of information then it cannot be searched in the national DNA database (NDNAD). This requirement (of a result at six or more loci) is not based on a statistical approach, but rather on the feeling that six or more would be sufficient. A statistical approach, however, could be more rigorous and objective and would take into consideration factors such as the probability of adventitious matches relative to the actual database size and/or investigator's requirements in a sensible way. Therefore, this research was undertaken to establish scientific foundations pertaining to the use of partial SGM Plus loci profiles (or similar) for investigation.
O'Donnell, Kerry; Sutton, Deanna A.; Rinaldi, Michael G.; Sarver, Brice A. J.; Balajee, S. Arunmozhi; Schroers, Hans-Josef; Summerbell, Richard C.; Robert, Vincent A. R. G.; Crous, Pedro W.; Zhang, Ning; Aoki, Takayuki; Jung, Kyongyong; Park, Jongsun; Lee, Yong-Hwan; Kang, Seogchan; Park, Bongsoo; Geiser, David M.
Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated with human or animal mycoses encountered in clinical microbiology laboratories. The database comprises partial sequences from three nuclear genes: translation elongation factor 1α (EF-1α), the largest subunit of RNA polymerase (RPB1), and the second largest subunit of RNA polymerase (RPB2). These three gene fragments can be amplified by PCR and sequenced using primers that are conserved across the phylogenetic breadth of Fusarium. Phylogenetic analyses of the combined data set reveal that, with the exception of two monotypic lineages, all clinically relevant fusaria are nested in one of eight variously sized and strongly supported species complexes. The monophyletic lineages have been named informally to facilitate communication of an isolate's clade membership and genetic diversity. To identify isolates to the species included within the database, partial DNA sequence data from one or more of the three genes can be used as a BLAST query against the database which is Web accessible at FUSARIUM-ID (http://isolate.fusariumdb.org) and the Centraalbureau voor Schimmelcultures (CBS-KNAW) Fungal Biodiversity Center (http://www.cbs.knaw.nl/fusarium). Alternatively, isolates can be identified via phylogenetic analysis by adding sequences of unknowns to the DNA sequence alignment, which can be downloaded from the two aforementioned websites. The utility of this database should increase significantly as members of the clinical microbiology community deposit in internationally accessible culture collections (e.g., CBS-KNAW or the Fusarium Research Center) cultures of novel mycosis-associated fusaria, along with associated, corrected sequence chromatograms and data, so that the
Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery
We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).
This paper explores the introduction of professional systems engineers and information management practices into the first centralized DNA sequence database, developed at the European Molecular Biology Laboratory (EMBL) during the 1980s. In so doing, it complements the literature on the emergence of an information discourse after World War II and its subsequent influence in biological research. By the careers of the database creators and the computer algorithms they designed, analyzing, from the mid-1960s onwards information in biology gradually shifted from a pervasive metaphor to be embodied in practices and professionals such as those incorporated at the EMBL. I then investigate the reception of these database professionals by the EMBL biological staff, which evolved from initial disregard to necessary collaboration as the relationship between DNA, genes, and proteins turned out to be more complex than expected. The trajectories of the database professionals at the EMBL suggest that the initial subject matter of the historiography of genomics should be the long-standing practices that emerged after World War II and to a large extent originated outside biomedicine and academia. Only after addressing these practices, historians may turn to their further disciplinary assemblage in fields such as bioinformatics or biotechnology.
Oh, Kyuseok; Sarzi, M.; Schawinski, K.; Yi, S. K.
We present a new database of absorption and emission line measurements based on the Sloan Digital Sky Survey 7th data release for the galaxies within a redshift of 0.2. Our work makes use of the publicly available penalized pixel-fitting(pPXF) and GANDALF codes, aiming to improve the existing measurements for stellar kinematics, the strength of various absorption-line features, and the flux and width of the emissions from different species of ionized gas. The absorption line strengths measured by SDSS pipeline are seriously contaminated by emission fill-in. We effectively separate emission lines from absorption lines. For instance, this work successfully extract [NI] doublet from Mgb and it leads to more realistic result of alpha enhancement on late-type galaxies compared to the previous database. Besides accurately measuring line strengths, the database provides new parameters that are indicative of line strength measurement quality. Users can build a subset of database optimal for their studies using specific cuts in the fitting quality parameters as well as empirical signal-to-noise. Applying these parameters, we found `hidden’ broad-line-region galaxies and they turned out to be Seyfert I nuclei that were not picked up as AGN by SDSS. The database is publicly available at http://gem.yonsei.ac.kr/ossy
Weirick, Tyler; John, David; Uchida, Shizuka
Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions. We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel.
Machado, Helena; Silva, Susana
The creation and expansion of forensic DNA databases might involve potential threats to the protection of a range of human rights. At the same time, such databases have social benefits. Based on data collected through an online questionnaire applied to 628 individuals in Portugal, this paper aims to analyze the citizens' willingness to donate voluntarily a sample for profiling and inclusion in the National Forensic DNA Database and the views underpinning such a decision. Nearly one-quarter of the respondents would indicate 'no', and this negative response increased significantly with age and education. The overriding willingness to accept the inclusion of the individual genetic profile indicates an acknowledgement of the investigative potential of forensic DNA technologies and a relegation of civil liberties and human rights to the background, owing to the perceived benefits of protecting both society and the individual from crime. This rationale is mostly expressed by the idea that all citizens should contribute to the expansion of the National Forensic DNA Database for reasons that range from the more abstract assumption that donating a sample for profiling would be helpful in fighting crime to the more concrete suggestion that everyone (criminals and non-criminals) should be in the database. The concerns with the risks of accepting the donation of a sample for genetic profiling and inclusion in the National Forensic DNA Database are mostly related to lack of control and insufficient or unclear regulations concerning safeguarding individuals' data and supervising the access and uses of genetic data. By providing an empirically-grounded understanding of the attitudes regarding willingness to donate voluntary a sample for profiling and inclusion in a National Forensic DNA Database, this study also considers the citizens' perceived benefits and risks of operating forensic DNA databases. These collective views might be useful for the formation of international common
Livingston, Kara A.; Chung, Mei; Sawicki, Caleigh M.; Lyle, Barbara J.; Wang, Ding Ding; Roberts, Susan B.; McKeown, Nicola M.
Background Dietary fiber is a broad category of compounds historically defined as partially or completely indigestible plant-based carbohydrates and lignin with, more recently, the additional criteria that fibers incorporated into foods as additives should demonstrate functional human health outcomes to receive a fiber classification. Thousands of research studies have been published examining fibers and health outcomes. Objectives (1) Develop a database listing studies testing fiber and physiological health outcomes identified by experts at the Ninth Vahouny Conference; (2) Use evidence mapping methodology to summarize this body of literature. This paper summarizes the rationale, methodology, and resulting database. The database will help both scientists and policy-makers to evaluate evidence linking specific fibers with physiological health outcomes, and identify missing information. Methods To build this database, we conducted a systematic literature search for human intervention studies published in English from 1946 to May 2015. Our search strategy included a broad definition of fiber search terms, as well as search terms for nine physiological health outcomes identified at the Ninth Vahouny Fiber Symposium. Abstracts were screened using a priori defined eligibility criteria and a low threshold for inclusion to minimize the likelihood of rejecting articles of interest. Publications then were reviewed in full text, applying additional a priori defined exclusion criteria. The database was built and published on the Systematic Review Data Repository (SRDR™), a web-based, publicly available application. Conclusions A fiber database was created. This resource will reduce the unnecessary replication of effort in conducting systematic reviews by serving as both a central database archiving PICO (population, intervention, comparator, outcome) data on published studies and as a searchable tool through which this data can be extracted and updated. PMID:27348733
Bodner, Martin; Irwin, Jodi A.; Coble, Michael D.; Parson, Walther
Reliable data are crucial for all research fields applying mitochondrial DNA (mtDNA) as a genetic marker. Quality control measures have been introduced to ensure the highest standards in sequence data generation, validation and a posteriori inspection. A phylogenetic alignment strategy has been widely accepted as a prerequisite for data comparability and database searches, for forensic applications, for reconstructions of human migrations and for correct interpretation of mtDNA mutations in medical genetics. There is continuing effort to enhance the number of worldwide population samples in order to contribute to a better understanding of human mtDNA variation. This has often lead to the analysis of convenience samples collected for other purposes, which might not meet the quality requirement of random sampling for mtDNA data sets. Here, we introduce an additional quality control means that deals with one aspect of this limitation: by combining autosomal short tandem repeat (STR) marker with mtDNA information, it helps to avoid the bias introduced by related individuals included in the same (small) sample. By STR analysis of individuals sharing their mitochondrial haplotype, pedigree construction and subsequent software-assisted calculation of likelihood ratios based on the allele frequencies found in the population, closely maternally related individuals can be identified and excluded. We also discuss scenarios that allow related individuals in the same set. An ideal population sample would be representative for its population: this new approach represents another contribution towards this goal. PMID:21067986
Adams, Carolyn; Allen, Judy
Access to datasets of personal health information held by government agencies is essential to support public health research and to promote evidence-based public health policy development. Privacy legislation in Australia allows the use and disclosure of such information for public health research. However, access is not always forthcoming in a timely manner and the decision-making process undertaken by government data custodians is not always transparent. Given the public benefit in research using these health information datasets, this article suggests that it is time to recognise a right of access for approved research and that the decisions, and decision-making processes, of government data custodians should be subject to increased scrutiny. The article concludes that researchers should have an avenue of external review where access to information has been denied or unduly delayed.
Cousineau, J; Girard, N; Monardes, C; Leroux, T; Jean, M Stanton
Because many diseases are multifactorial disorders, the scientific progress in genomics and genetics should be taken into consideration in public health research. In this context, genomic databases will constitute an important source of information. Consequently, it is important to identify and characterize the State’s role and authority on matters related to public health, in order to verify whether it has access to such databases while engaging in public health genomic research. We first consider the evolution of the concept of public health, as well as its core functions, using a comparative approach (e.g. WHO, PAHO, CDC and the Canadian province of Quebec). Following an analysis of relevant Quebec legislation, the precautionary principle is examined as a possible avenue to justify State access to and use of genomic databases for research purposes. Finally, we consider the Influenza pandemic plans developed by WHO, Canada, and Quebec, as examples of key tools framing public health decision-making process. We observed that State powers in public health, are not, in Quebec, well adapted to the expansion of genomics research. We propose that the scope of the concept of research in public health should be clear and include the following characteristics: a commitment to the health and well-being of the population and to their determinants; the inclusion of both applied research and basic research; and, an appropriate model of governance (authorization, follow-up, consent, etc.). We also suggest that the strategic approach version of the precautionary principle could guide collective choices in these matters. PMID:23113174
Background The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. Results QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge “Predicting mortality of ICU Patients”. Conclusions QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database. PMID:23302652
Nilsson, R Henrik; Kristiansson, Erik; Ryberg, Martin; Larsson, Karl-Henrik
Background During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi – a field where species identification often is prohibitively complex – and the much used ITS locus were chosen as test bed. Results A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service , users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. Discussion The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases
Grahn, R. A.; Kurushima, J. D.; Billings, N. C.; Grahn, J.C.; Halverson, J. L.; Hammer, E.; Ho, C.K.; Kun, T. J.; Levy, J.K.; Lipinski, M. J.; Mwenda, J.M.; Ozpinar, H.; Schuster, R.K; Shoorijeh, S.J.; Tarditi, C. R.; Waly, N.E.; Wictum, E. J.; Lyons, L. A.
The domestic cat is the one of the most popular pets throughout the world. A by-product of owning, interacting with, or being in a household with a cat is the transfer of shed fur to clothing or personal objects. As trace evidence, transferred cat fur is a relatively untapped resource for forensic scientists. Both phenotypic and genotypic characteristics can be obtained from cat fur, but databases for neither aspect exist. Because cats incessantly groom, cat fur may have nucleated cells, not only in the hair bulb, but also as epithelial cells on the hair shaft deposited during the grooming process, thereby generally providing material for DNA profiling. To effectively exploit cat hair as a resource, representative databases must be established. This study evaluates 402 bp of the mtDNA control region (CR) from 1,394 cats, including cats from 25 distinct worldwide populations and 26 breeds. Eighty-three percent of the cats are represented by 12 major mitotypes. An additional 8.0% are clearly derived from the major mitotypes. Unique sequences were found in 7.5% of the cats. The overall genetic diversity for this data set was 0.8813 ± 0.0046 with a random match probability of 11.8%. This region of the cat mtDNA has discriminatory power suitable for forensic application worldwide. PMID:20457082
Grahn, R A; Kurushima, J D; Billings, N C; Grahn, J C; Halverson, J L; Hammer, E; Ho, C K; Kun, T J; Levy, J K; Lipinski, M J; Mwenda, J M; Ozpinar, H; Schuster, R K; Shoorijeh, S J; Tarditi, C R; Waly, N E; Wictum, E J; Lyons, L A
The domestic cat is the one of the most popular pets throughout the world. A by-product of owning, interacting with, or being in a household with a cat is the transfer of shed fur to clothing or personal objects. As trace evidence, transferred cat fur is a relatively untapped resource for forensic scientists. Both phenotypic and genotypic characteristics can be obtained from cat fur, but databases for neither aspect exist. Because cats incessantly groom, cat fur may have nucleated cells, not only in the hair bulb, but also as epithelial cells on the hair shaft deposited during the grooming process, thereby generally providing material for DNA profiling. To effectively exploit cat hair as a resource, representative databases must be established. The current study evaluates 402 bp of the mtDNA control region (CR) from 1394 cats, including cats from 25 distinct worldwide populations and 26 breeds. Eighty-three percent of the cats are represented by 12 major mitotypes. An additional 8.0% are clearly derived from the major mitotypes. Unique sequences are found in 7.5% of the cats. The overall genetic diversity for this data set is 0.8813±0.0046 with a random match probability of 11.8%. This region of the cat mtDNA has discriminatory power suitable for forensic application worldwide.
In attempting to apply Knowledge Discovery in Databases (KDD) to generate a predictive model from a health care dataset that is currently available to the public, the first step is to pre-process the data to overcome the challenges of missing data, redundant observations, and records containing inaccurate data. This study will demonstrate how to use simple pre-processing methods to improve the quality of input data. PMID:14728545
Demay, Christophe; Liens, Benjamin; Burguière, Thomas; Hill, Véronique; Couvin, David; Millet, Julie; Mokrousov, Igor; Sola, Christophe; Zozio, Thierry; Rastogi, Nalin
Among various genotyping methods to study Mycobacterium tuberculosis complex (MTC) genotypic polymorphism, spoligotyping and mycobacterial interspersed repetitive units-variable number of DNA tandem repeats (MIRU-VNTRs) have recently gained international approval as robust, fast, and reproducible typing methods generating data in a portable format. Spoligotyping constituted the backbone of a publicly available database SpolDB4 released in 2006; nonetheless this method possesses a low discriminatory power when used alone and should be ideally used in conjunction with a second typing method such as MIRU-VNTRs for high-resolution epidemiological studies. We hereby describe a publicly available international database named SITVITWEB which incorporates such multimarker data allowing to have a global vision of MTC genetic diversity worldwide based on 62,582 clinical isolates corresponding to 153 countries of patient origin (105 countries of isolation). We report a total of 7105 spoligotype patterns (corresponding to 58,180 clinical isolates) - grouped into 2740 shared-types or spoligotype international types (SIT) containing 53,816 clinical isolates and 4364 orphan patterns. Interestingly, only 7% of the MTC isolates worldwide were orphans whereas more than half of SITed isolates (n=27,059) were restricted to only 24 most prevalent SITs. The database also contains a total of 2379 MIRU patterns (from 8161 clinical isolates) from 87 countries of patient origin (35 countries of isolation); these were grouped in 847 shared-types or MIRU international types (MIT) containing 6626 isolates and 1533 orphan patterns. Lastly, data on 5-locus exact tandem repeats (ETRs) were available on 4626 isolates from 59 countries of patient origin (22 countries of isolation); a total of 458 different VNTR patterns were observed - split into 245 shared-types or VNTR International Types (VIT) containing 4413 isolates) and 213 orphan patterns. Datamining of SITVITWEB further allowed to update
Halverson, Joy; Basten, Christopher
Animal-derived trace evidence is a common finding at crime scenes and may provide an important link between victim(s) and suspect(s). A database of 558 dogs of pure and mixed breeds is described and analyzed with two PCR multiplexes of 17 microsatellites. Summary statistics (number of alleles, expected and observed heterozygosity and power of exclusion) are compared between breeds. Marked population substructure in dog breeds indicates significant inbreeding, and the use of a conservative theta value is recommended in likelihood calculations for determining the significance of a DNA match. Evidence is presented that the informativeness of the canine microsatellites, despite inbreeding, is comparable to the human CODIS loci. Two cases utilizing canine DNA typing, State of Washington v. Kenneth Leuluaialii and George Tuilefano and Crown v. Daniel McGowan, illustrate the potential of canine microsatellite markers for forensic investigations.
Pullman, Daryl; Perrot-Daley, Astrid; Hodgkinson, Kathy; Street, Catherine; Rahman, Proton
Objective To provide a legal and ethical analysis of some of the implementation challenges faced by the Population Therapeutics Research Group (PTRG) at Memorial University (Canada), in using genealogical information offered by individuals for its genetics research database. Materials and methods This paper describes the unique historical and genetic characteristics of the Newfoundland and Labrador founder population, which gave rise to the opportunity for PTRG to build the Newfoundland Genealogy Database containing digitized records of all pre-confederation (1949) census records of the Newfoundland founder population. In addition to building the database, PTRG has developed the Heritability Analytics Infrastructure, a data management structure that stores genotype, phenotype, and pedigree information in a single database, and custom linkage software (KINNECT) to perform pedigree linkages on the genealogy database. Discussion A newly adopted legal regimen in Newfoundland and Labrador is discussed. It incorporates health privacy legislation with a unique research ethics statute governing the composition and activities of research ethics boards and, for the first time in Canada, elevating the status of national research ethics guidelines into law. The discussion looks at this integration of legal and ethical principles which provides a flexible and seamless framework for balancing the privacy rights and welfare interests of individuals, families, and larger societies in the creation and use of research data infrastructures as public goods. Conclusion The complementary legal and ethical frameworks that now coexist in Newfoundland and Labrador provide the legislative authority, ethical legitimacy, and practical flexibility needed to find a workable balance between privacy interests and public goods. Such an approach may also be instructive for other jurisdictions as they seek to construct and use biobanks and related research platforms for genetic research. PMID
Turchi, Chiara; Buscemi, Loredana; Previderè, Carlo; Grignani, Pierangela; Brandstätter, Anita; Achilli, Alessandro; Parson, Walther; Tagliabracci, Adriano
This work is a review of a collaborative exercise on mtDNA analysis undertaken by the Italian working group (Ge.F.I.). A total of 593 samples from 11 forensic genetic laboratories were subjected to hypervariable region (HVS-I/HVS-II) sequence analysis. The raw lane data were sent to MtDNA Population Database (EMPOP) for an independent evaluation. For the inclusion of data for the Italian database, quality assurance procedures were applied to the control region profiles. Only eight laboratories with a final population sample of 395 subjects passed the quality conformance test. Control region haplogroup (hg) assignments were confirmed by restriction fragment length polymorphism (RFLP) typing of the most common European hg-diagnostic sites. A total of 306 unique haplotypes derived from the combined analysis of control and coding region polymorphisms were found; the most common haplotype--CRS, 263, 309.1C, 315.1C/ not7025 AluI--was shared by 20 subjects. The majority of mtDNAs detected in the Italian population fell into the most common west Eurasian hgs: R0a (0.76%), HV (4.81%), H (38.99%), HV0 (3.55%), J (7.85%), T (13.42%), U (11.65%), K (10.13%), I (1.52%), X (2.78%), and W (1.01%).
Shlyapnikov, A.; Bondar', N.; Gorbunov, M.
We describe the main principles of formation of databases (DBs) with information about astronomical objects and their physical characteristics derived from observations obtained at the Crimean Astrophysical Observatory (CrAO) and published in the ``Izvestiya of the CrAO'' and elsewhere. Emphasis is placed on the DBs missing from the most complete global library of catalogs and data tables, VizieR (supported by the Center of Astronomical Data, Strasbourg). We specially consider the problem of forming a digital archive of observational data obtained at the CrAO as an interactive DB related to database objects and publications. We present examples of all our DBs as elements integrated into the Crimean Astronomical Virtual Observatory. We illustrate the work with the CrAO DBs using tools of the International Virtual Observatory: Aladin, VOPlot, VOSpec, in conjunction with the VizieR and Simbad DBs.
Hodge, Steven M.; Gao, Yong; Frazier, Jean A.; Haselgrove, Christian
Every month, numerous publications appear that include neuroanatomic volumetric observations. The current and past literature that includes volumetric measurements is vast, but variable with respect to specific species, structures, and subject characteristics (such as gender, age, pathology, etc.). In this report we introduce the Internet Brain Volume Database (IBVD), www.nitrc.org/projects/ibvd, a site devoted to facilitating access to and utilization of neuroanatomic volumetric observations as published in the literature. We review the design and functionality of the site. The IBVD is the first database dedicated to integrating, exposing and sharing brain volumetric observations across species and disease. It offers valuable functionality for quality assurance assessment of results as well as support for meta-analysis across large segments of the published literature that are obscured from traditional text-based search engines. PMID:21931990
DeWitt, William S; Lindau, Paul; Snyder, Thomas M; Sherwood, Anna M; Vignali, Marissa; Carlson, Christopher S; Greenberg, Philip D; Duerkopp, Natalie; Emerson, Ryan O; Robins, Harlan S
The vast diversity of B-cell receptors (BCR) and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics.
Sherwood, Anna M.; Vignali, Marissa; Carlson, Christopher S.; Greenberg, Philip D.; Duerkopp, Natalie; Emerson, Ryan O.; Robins, Harlan S.
The vast diversity of B-cell receptors (BCR) and secreted antibodies enables the recognition of, and response to, a wide range of epitopes, but this diversity has also limited our understanding of humoral immunity. We present a public database of more than 37 million unique BCR sequences from three healthy adult donors that is many fold deeper than any existing resource, together with a set of online tools designed to facilitate the visualization and analysis of the annotated data. We estimate the clonal diversity of the naive and memory B-cell repertoires of healthy individuals, and provide a set of examples that illustrate the utility of the database, including several views of the basic properties of immunoglobulin heavy chain sequences, such as rearrangement length, subunit usage, and somatic hypermutation positions and dynamics. PMID:27513338
Bodner, Martin; Bastisch, Ingo; Butler, John M; Fimmers, Rolf; Gill, Peter; Gusmão, Leonor; Morling, Niels; Phillips, Christopher; Prinz, Mechthild; Schneider, Peter M; Parson, Walther
The statistical evaluation of autosomal Short Tandem Repeat (STR) genotypes is based on allele frequencies. These are empirically determined from sets of randomly selected human samples, compiled into STR databases that have been established in the course of population genetic studies. There is currently no agreed procedure of performing quality control of STR allele frequency databases, and the reliability and accuracy of the data are largely based on the responsibility of the individual contributing research groups. It has been demonstrated with databases of haploid markers (EMPOP for mitochondrial mtDNA, and YHRD for Y-chromosomal loci) that centralized quality control and data curation is essential to minimize error. The concepts employed for quality control involve software-aided likelihood-of-genotype, phylogenetic, and population genetic checks that allow the researchers to compare novel data to established datasets and, thus, maintain the high quality required in forensic genetics. Here, we present STRidER (http://strider.online), a publicly available, centrally curated online allele frequency database and quality control platform for autosomal STRs. STRidER expands on the previously established ENFSI DNA WG STRbASE and applies standard concepts established for haploid and autosomal markers as well as novel tools to reduce error and increase the quality of autosomal STR data. The platform constitutes a significant improvement and innovation for the scientific community, offering autosomal STR data quality control and reliable STR genotype estimates.
Pierson, Kawika; Hand, Michael L.; Thompson, Fred
Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available. PMID:26107821
THIS DATA ASSET NO LONGER ACTIVE: This is metadata documentation for the National Priorities List (NPL) Publication Assistance Databsae (PAD), a Lotus Notes application that holds Region 7's universe of NPL site information such as site description, threats and contaminants, cleanup approach, environmental process, community involvement, site repository, and regional contacts. This database used to be updated annually, at different times for different NPLs, but it is currently no longer being used. This work fell under objectives for EPA's 2003-2008 Strategic Plan (Goal 3) for Land Preservation & Restoration, which are to clean up and reuse contaminated land.
Pierson, Kawika; Hand, Michael L; Thompson, Fred
Quantitative public financial management research focused on local governments is limited by the absence of a common database for empirical analysis. While the U.S. Census Bureau distributes government finance data that some scholars have utilized, the arduous process of collecting, interpreting, and organizing the data has led its adoption to be prohibitive and inconsistent. In this article we offer a single, coherent resource that contains all of the government financial data from 1967-2012, uses easy to understand natural-language variable names, and will be extended when new data is available.
Jacobs, Jeffrey P
Three basic principles provide the rationale for the Society of Thoracic Surgeons (STS) Congenital Heart Surgery Database (CHSD) public reporting initiative: (1) Variation in congenital and pediatric cardiac surgical outcomes exist. (2) Patients and their families have the right to know the outcomes of the treatments that they will receive. (3). It is our professional responsibility to share this information with them in a format they can understand. The STS CHSD public reporting initiative facilitates the voluntary transparent public reporting of congenital and pediatric cardiac surgical outcomes using the STS CHSD Mortality Risk Model. The STS CHSD Mortality Risk Model is used to calculate risk-adjusted operative mortality and adjusts for the following variables: age, primary procedure, weight (neonates and infants), prior cardiothoracic operations, non-cardiac congenital anatomic abnormalities, chromosomal abnormalities or syndromes, prematurity (neonates and infants), and preoperative factors (including preoperative/preprocedural mechanical circulatory support [intraaortic balloon pump, ventricular assist device, extracorporeal membrane oxygenation, or cardiopulmonary support], shock [persistent at time of surgery], mechanical ventilation to treat cardiorespiratory failure, renal failure requiring dialysis and/or renal dysfunction, preoperative neurological deficit, and other preoperative factors). Operative mortality is defined in all STS databases as (1) all deaths, regardless of cause, occurring during the hospitalization in which the operation was performed, even if after 30 days (including patients transferred to other acute care facilities); and (2) all deaths, regardless of cause, occurring after discharge from the hospital, but before the end of the 30(th) postoperative day. The STS CHSD Mortality Risk Model has good model fit and discrimination with an overall C statistics of 0.875 and 0.858 in the development sample and the validation sample
Oh, Kyuseok; Sarzi, Marc; Schawinski, Kevin; Yi, Sukyoung K.
We present a new database of absorption and emission-line measurements based on the Sloan Digital Sky Survey (SDSS) 7th data release of galaxies within a redshift of 0.2. Using the publicly available penalized pixel-fitting (pPXF) and gas and absorption line fitting (gandalf) codes, our work improve the existing measurements for stellar kinematics, the strength of various absorption line features, and the flux and width of the emissions from different species of ionised gas. Most notable of our work is that, we provide quality of the fit to assess reliability of the measurements. The quality assessment can be highly effective for finding new classes of objects. For example, based on the quality assessment around the Ha and [NII] nebular lines, we found approximately 1% of the SDSS spectra which classified as galaxies by the SDSS pipeline are in fact type I Seyfert AGN. This paper presents a summary of the recent paper, Oh et al.(2011). The database is publicly available at http://gem.yonsei.ac.kr/ossy/.
Monzon, Alexander Miguel; Rohr, Cristian Oscar; Fornasari, María Silvina; Parisi, Gustavo
CoDNaS (conformational diversity of the native state) is a protein conformational diversity database. Conformational diversity describes structural differences between conformers that define the native state of proteins. It is a key concept to understand protein function and biological processes related to protein functions. CoDNaS offers a well curated database that is experimentally driven, thoroughly linked, and annotated. CoDNaS facilitates the extraction of key information on small structural differences based on protein movements. CoDNaS enables users to easily relate the degree of conformational diversity with physical, chemical and biological properties derived from experiments on protein structure and biological characteristics. The new version of CoDNaS includes ∼70% of all available protein structures, and new tools have been added that run sequence searches, display structural flexibility profiles and allow users to browse the database for different structural classes. These tools facilitate the exploration of protein conformational diversity and its role in protein function. Database URL:http://ufq.unq.edu.ar/codnas.
Monzon, Alexander Miguel; Rohr, Cristian Oscar; Fornasari, María Silvina; Parisi, Gustavo
CoDNaS (conformational diversity of the native state) is a protein conformational diversity database. Conformational diversity describes structural differences between conformers that define the native state of proteins. It is a key concept to understand protein function and biological processes related to protein functions. CoDNaS offers a well curated database that is experimentally driven, thoroughly linked, and annotated. CoDNaS facilitates the extraction of key information on small structural differences based on protein movements. CoDNaS enables users to easily relate the degree of conformational diversity with physical, chemical and biological properties derived from experiments on protein structure and biological characteristics. The new version of CoDNaS includes ∼70% of all available protein structures, and new tools have been added that run sequence searches, display structural flexibility profiles and allow users to browse the database for different structural classes. These tools facilitate the exploration of protein conformational diversity and its role in protein function. Database URL: http://ufq.unq.edu.ar/codnas PMID:27022160
Dogget, N.; Myers, G.; Wills, C.J.
This is the final report of a three-year, Laboratory Directed Research and Development (LDRD) project at the Los Alamos National Laboratory (LANL). The authors have used computer simulations and examination of a variety of databases to answer questions about a wide range of evolutionary questions. The authors have found that there is a clear distinction in the evolution of HIV-1 and HIV-2, with the former and more virulent virus evolving more rapidly at a functional level. The authors have discovered highly non-random patterns in the evolution of HIV-1 that can be attributed to a variety of selective pressures. In the course of examination of microsatellite DNA (short repeat regions) in microorganisms, the authors have found clear differences between prokaryotes and eukaryotes in their distribution, differences that can be tied to different selective pressures. They have developed a new method (topiary pruning) for enhancing the phylogenetic information contained in DNA sequences. Most recently, the authors have discovered effects in complex rainforest ecosystems that indicate strong frequency-dependent interactions between host species and their parasites, leading to the maintenance of ecosystem variability.
Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia
To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection . Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be
We constructed a two-locus database, comprising partial translation elongation factor (EF-1alpha) gene sequences and nearly full-length sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA) for 850 isolates spanning the phylogenetic breadth of the Fusarium oxysporum species complex ...
A database of Louisiana sugarcane molecular identity has been constructed and is being updated annually using FAM or HEX or NED fluorescence- and capillary electrophoresis (CE)-based microsatellite (SSR) fingerprinting information. The fingerprints are PCR-amplified from leaf DNA samples of current ...
Steinlechner, M; Parson, W
Automation and high through-put production of DNA profiles has become a necessity in every DNA database unit. In our laboratory we developed a Laboratory Information Management System (LIMS) controlled workflow architecture, which comprises a robotic DNA extraction- and pipetting-system and a capillary electrophoresis unit. This allows a through-put of 4,000 samples per person per year. Improved sample handling and data management, full sample- and batch-histories, and software-aided supervision of result data, with a consequent average turn-around time of 8 days, are the main features of our new system.
High dynamic range (HDR) displays and cameras are paving their ways through the consumer market at a rapid growth rate. Thanks to TV and camera manufacturers, HDR systems are now becoming available commercially to end users. This is taking place only a few years after the blooming of 3D video technologies. MPEG/ITU are also actively working towards the standardization of these technologies. However, preliminary research efforts in these video technologies are hammered by the lack of sufficient experimental data. In this paper, we introduce a Stereoscopic 3D HDR database of videos that is made publicly available to the research community. We explain the procedure taken to capture, calibrate, and post-process the videos. In addition, we provide insights on potential use-cases, challenges, and research opportunities, implied by the combination of higher dynamic range of the HDR aspect, and depth impression of the 3D aspect.
Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E
Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity.
Price, Curtis V.; Maupin, Molly A.
The purpose of this report is to document the PSDB and explain the methods used to populate and update the data from the SDWIS, State datasets, and map and geospatial imagery. This report describes 3 data tables and 11 domain tables, including field contents, data sources, and relations between tables. Although the PSDB database is not available to the general public, this information should be useful for others who are developing other database systems to store and analyze public-supply system and facility data.
Ye, Pohao; Luan, Yizhao; Chen, Kaining; Liu, Yizhi; Xiao, Chuanle; Xie, Zhi
DNA methylation is an important type of epigenetic modifications, where 5- methylcytosine (5mC), 6-methyadenine (6mA) and 4-methylcytosine (4mC) are the most common types. Previous efforts have been largely focused on 5mC, providing invaluable insights into epigenetic regulation through DNA methylation. Recently developed single-molecule real-time (SMRT) sequencing technology provides a unique opportunity to detect the less studied DNA 6mA and 4mC modifications at single-nucleotide resolution. With a rapidly increased amount of SMRT sequencing data generated, there is an emerging demand to systematically explore DNA 6mA and 4mC modifications from these data sets. MethSMRT is the first resource hosting DNA 6mA and 4mC methylomes. All the data sets were processed using the same analysis pipeline with the same quality control. The current version of the database provides a platform to store, browse, search and download epigenome-wide methylation profiles of 156 species, including seven eukaryotes such as Arabidopsis, C. elegans, Drosophila, mouse and yeast, as well as 149 prokaryotes. It also offers a genome browser to visualize the methylation sites and related information such as single nucleotide polymorphisms (SNP) and genomic annotation. Furthermore, the database provides a quick summary of statistics of methylome of 6mA and 4mC and predicted methylation motifs for each species. MethSMRT is publicly available at http://sysbio.sysu.edu.cn/methsmrt/ without use restriction. PMID:27924023
The forensic use of Deoxyribonucleic Acid (DNA) is demonstrating significant success as a crime-solving tool. However, numerous concerns have been raised regarding the potential for DNA use to contravene cultural, ethical, and legal codes. In this article the expectations and level of knowledge of the New Zealand public of the DNA data-bank and…
Reilly, P.R.; McEwen, J.E.; Small, D.
The purpose of the grant was to provide support to enable us to: (1) perform legal and empirical research and critically analyze DNA banking and DNA databanking as those activities are conducted by state forensic laboratories, the military, academic researchers, and commercial enterprises; and (2) develop a broadcast quality educational videotape for viewing by the general public about DNA technology and the privacy and related issues that it raises. The grant thus has both a research and analysis component and a public education component. This report outlines the work completed since the inception of the project and describes the activities still in progress.
Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh
DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait
Fróes, Adriana M; da Mota, Fábio F; Cuadrat, Rafael R C; Dávila, Alberto M R
β-lactam is the most used antibiotic class in the clinical area and it acts on blocking the bacteria cell wall synthesis, causing cell death. However, some bacteria have evolved resistance to these antibiotics mainly due the production of enzymes known as β-lactamases. Hospital sewage is an important source of dispersion of multidrug-resistant bacteria in rivers and oceans. In this work, we used next-generation DNA sequencing to explore the diversity and dissemination of serine β-lactamases in two hospital sewage from Rio de Janeiro, Brazil (South Zone, SZ and North Zone, NZ), presenting different profiles, and to compare them with public environmental data available. Also, we propose a Hidden-Markov-Model approach to screen potential serine β-lactamases genes (in public environments samples and generated hospital sewage data), exploring its evolutionary relationships. Due to the high variability in β-lactamases, we used a position-specific scoring matrix search method (RPS-BLAST) against conserved domain database profiles (CDD, Pfam, and COG) followed by visual inspection to detect conserved motifs, to increase the reliability of the results and remove possible false positives. We were able to identify novel β-lactamases from Brazilian hospital sewage and to estimate relative abundance of its types. The highest relative abundance found in SZ was the Class A (50%), while Class D is predominant in NZ (55%). CfxA (65%) and ACC (47%) types were the most abundant genes detected in SZ, while in NZ the most frequent were OXA-10 (32%), CfxA (28%), ACC (21%), CEPA (20%), and FOX (19%). Phylogenetic analysis revealed β-lactamases from Brazilian hospital sewage grouped in the same clade and close to sequences belonging to Firmicutes and Bacteroidetes groups, but distant from potential β-lactamases screened from public environmental data, that grouped closer to β-lactamases of Proteobacteria. Our results demonstrated that HMM-based approach identified homologs of
Fróes, Adriana M.; da Mota, Fábio F.; Cuadrat, Rafael R. C.; Dávila, Alberto M. R.
β-lactam is the most used antibiotic class in the clinical area and it acts on blocking the bacteria cell wall synthesis, causing cell death. However, some bacteria have evolved resistance to these antibiotics mainly due the production of enzymes known as β-lactamases. Hospital sewage is an important source of dispersion of multidrug-resistant bacteria in rivers and oceans. In this work, we used next-generation DNA sequencing to explore the diversity and dissemination of serine β-lactamases in two hospital sewage from Rio de Janeiro, Brazil (South Zone, SZ and North Zone, NZ), presenting different profiles, and to compare them with public environmental data available. Also, we propose a Hidden-Markov-Model approach to screen potential serine β-lactamases genes (in public environments samples and generated hospital sewage data), exploring its evolutionary relationships. Due to the high variability in β-lactamases, we used a position-specific scoring matrix search method (RPS-BLAST) against conserved domain database profiles (CDD, Pfam, and COG) followed by visual inspection to detect conserved motifs, to increase the reliability of the results and remove possible false positives. We were able to identify novel β-lactamases from Brazilian hospital sewage and to estimate relative abundance of its types. The highest relative abundance found in SZ was the Class A (50%), while Class D is predominant in NZ (55%). CfxA (65%) and ACC (47%) types were the most abundant genes detected in SZ, while in NZ the most frequent were OXA-10 (32%), CfxA (28%), ACC (21%), CEPA (20%), and FOX (19%). Phylogenetic analysis revealed β-lactamases from Brazilian hospital sewage grouped in the same clade and close to sequences belonging to Firmicutes and Bacteroidetes groups, but distant from potential β-lactamases screened from public environmental data, that grouped closer to β-lactamases of Proteobacteria. Our results demonstrated that HMM-based approach identified homologs of
Discovery of academic literature through Web search engines challenges the traditional role of specialized research databases. Creation of literature outside academic presses and peer-reviewed publications expands the content for scholarly research within a particular field. The resulting body of literature raises the question of whether scholars…
Gilson, Michael K.; Liu, Tiqing; Baitaluk, Michael; Nicola, George; Hwang, Linda; Chong, Jenny
BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole. PMID:26481362
Yu, Huidan; Meneveau, Charles
We study the Lagrangian time evolution of velocity gradient dynamics near the Vieillefosse tail. The data are obtained from fluid particle tracking through the 1024^4 space-time DNS of forced isotropic turbulence at Reλ=433, using a web-based public database (http://turbulence.pha.jhu.edu). Examination of individual time-series of velocity gradient invariants R and Q show that they are punctuated by strong peaks of negative Q and positive R. Most of these occur very close to the Viellefosse tail along Q = - (3/2^2/3) R^2/3. It is found there that the magnitude of pressure Hessian has positive Lagrangian time-derivative, meaning that it increases in order to resist the rapid growth. We also observe a "phase delay" of the pressure Hessian signals compared to those of R and Q, indicative of an "overshoot" of the controlling mechanism. We also examine the trajectories in the recently proposed 3-D extension of the R-Q plane (see Lüthi B, Holzner M, Tsinober A. 2009, J. Fluid Mech. 641, 497-507). Finally, Lagrangian models of the velocity gradient tensor are examined in the same light to identify similarities and differences with the observed dynamics. Such comparisons supply informative guidance to model improvements.
Gilson, Michael K; Liu, Tiqing; Baitaluk, Michael; Nicola, George; Hwang, Linda; Chong, Jenny
BindingDB, www.bindingdb.org, is a publicly accessible database of experimental protein-small molecule interaction data. Its collection of over a million data entries derives primarily from scientific articles and, increasingly, US patents. BindingDB provides many ways to browse and search for data of interest, including an advanced search tool, which can cross searches of multiple query types, including text, chemical structure, protein sequence and numerical affinities. The PDB and PubMed provide links to data in BindingDB, and vice versa; and BindingDB provides links to pathway information, the ZINC catalog of available compounds, and other resources. The BindingDB website offers specialized tools that take advantage of its large data collection, including ones to generate hypotheses for the protein targets bound by a bioactive compound, and for the compounds bound by a new protein of known sequence; and virtual compound screening by maximal chemical similarity, binary kernel discrimination, and support vector machine methods. Specialized data sets are also available, such as binding data for hundreds of congeneric series of ligands, drawn from BindingDB and organized for use in validating drug design methods. BindingDB offers several forms of programmatic access, and comes with extensive background material and documentation. Here, we provide the first update of BindingDB since 2007, focusing on new and unique features and highlighting directions of importance to the field as a whole.
Maguire, C N; McCallum, L A; Storey, C; Whitaker, J P
The National DNA Database (NDNAD) of England and Wales was established on April 10th 1995. The NDNAD is governed by a variety of legislative instruments that mean that DNA samples can be taken if an individual is arrested and detained in a police station. The biological samples and the DNA profiles derived from them can be used for purposes related to the prevention and detection of crime, the investigation of an offence and for the conduct of a prosecution. Following the South East Asian Tsunami of December 2004, the legislation was amended to allow the use of the NDNAD to assist in the identification of a deceased person or of a body part where death has occurred from natural causes or from a natural disaster. The UK NDNAD now contains the DNA profiles of approximately 6 million individuals representing 9.6% of the UK population. As the science of DNA profiling advanced, the National DNA Database provided a potential resource for increased intelligence beyond the direct matching for which it was originally created. The familial searching service offered to the police by several UK forensic science providers exploits the size and geographic coverage of the NDNAD and the fact that close relatives of an offender may share a significant proportion of that offender's DNA profile and will often reside in close geographic proximity to him or her. Between 2002 and 2011 Forensic Science Service Ltd. (FSS) provided familial search services to support 188 police investigations, 70 of which are still active cases. This technique, which may be used in serious crime cases or in 'cold case' reviews when there are few or no investigative leads, has led to the identification of 41 perpetrators or suspects. In this paper we discuss the processes, utility, and governance of the familial search service in which the NDNAD is searched for close genetic relatives of an offender who has left DNA evidence at a crime scene, but whose DNA profile is not represented within the NDNAD. We
Trijau, Sophie; de Lamotte, Gaëlle; Pradel, Vincent; Natali, François; Allaria-Lapierre, Véronique; Coudert, Hervé; Pham, Thao; Sciortino, Vincent; Lafforgue, Pierre
Introduction Long-term glucocorticoid therapy is the leading cause of secondary osteoporosis. The management of glucocorticoid-induced osteoporosis (GIOP) seems to be inadequate in many European countries. Objective To evaluate the rate of screening and treatment of GIOP. Design Information was collected from a national public health-insurance database in our geographic area of Provence-Alpes-Côte-d'Azur and in Corsica, from September 2009 through August 2011. Patients We identified participants aged 15 years and over starting glucocorticoid therapy (≥7.5 mg of prednisone equivalent per day during at least 90 days consecutive). This cohort was compared with an age-matched and sex-matched population that did not receive glucocorticoids. Main outcome measures Bone mass, prescription of bone antiresorptive medication and use of calcium and/or vitamin D treatment. Results We identified 32 812 patients who were prescribed glucocorticoid therapy, yielding 1% prevalence. Incidence of glucocorticoid therapy was 2.8/1000 inhabitants/year. Males represented 44%, the mean age was 58 years. The median prednisone-equivalent dose was 11 mg/day (IQR 9–18 mg/day). 8% underwent bone mass measurement. Calcium and/or vitamin D, and bisphosphonates were prescribed in 18% and 12%, respectively. Results were lower for the control population: 3% underwent bone mass measurement and 3% received bisphosphonate therapy. The rates of osteodensitometry and treatments were higher in women over 55 years of age than in men and women 55 years of age and younger, and also when glucocorticoid therapy was initiated by a rheumatologist versus other physician specialty. Conclusions The management of GIOP remains very inadequate, despite the availability of a statutory health insurance system. Targeted interventions are needed to improve the management of GIOP. PMID:27486526
Fostel, Jennifer M.
Integration, re-use and meta-analysis of high content study data, typical of DNA microarray studies, can increase its scientific utility. Access to study data and design parameters would enhance the mining of data integrated across studies. However, without standards for which data to include in exchange, and common exchange formats, publication of high content data is time-consuming and often prohibitive. The MGED Society ( (www.mged.org)) was formed in response to the widespread publication of microarray data, and the recognition of the utility of data re-use for meta-analysis. The NIEHS has developed the Chemical Effects in Biological Systems (CEBS) database, which can manage and integrate study data and design from biological and biomedical studies. As community standards are developed for study data and metadata it will become increasingly straightforward to publish high content data in CEBS, where they will be available for meta-analysis. Different exchange formats for study data are being developed: Standard for Exchange of Nonclinical Data (SEND; (www.cdisc.org)); Tox-ML ( (www.Leadscope.com)) and Simple Investigation Formatted Text (SIFT) from the NIEHS. Data integration can be done at the level of conclusions about responsive genes and phenotypes, and this workflow is supported by CEBS. CEBS also integrates raw and preprocessed data within a given platform. The utility and a method for integrating data within and across DNA microarray studies is shown in an example analysis using DrugMatrix data deposited in CEBS by Iconix Pharmaceuticals.
Oppikofer, Thierry; Nordahl, Bobo; Bunkholt, Halvor; Nicolaisen, Magnus; Jarna, Alexandra; Iversen, Sverre; Hermanns, Reginald L.; Böhme, Martina; Yugsi Molina, Freddy X.
The unstable rock slope database is developed and maintained by the Geological Survey of Norway as part of the systematic mapping of unstable rock slopes in Norway. This mapping aims to detect catastrophic rock slope failures before they occur. More than 250 unstable slopes with post-glacial deformation are detected up to now. The main aims of the unstable rock slope database are (1) to serve as a national archive for unstable rock slopes in Norway; (2) to serve for data collection and storage during field mapping; (3) to provide decision-makers with hazard zones and other necessary information on unstable rock slopes for land-use planning and mitigation; and (4) to inform the public through an online map service. The database is organized hierarchically with a main point for each unstable rock slope to which several feature classes and tables are linked. This main point feature class includes several general attributes of the unstable rock slopes, such as site name, general and geological descriptions, executed works, recommendations, technical parameters (volume, lithology, mechanism and others), displacement rates, possible consequences, as well as hazard and risk classification. Feature classes and tables linked to the main feature class include different scenarios of an unstable rock slope, field observation points, sampling points for dating, displacement measurement stations, lineaments, unstable areas, run-out areas, areas affected by secondary effects, along with tables for hazard and risk classification and URL links to further documentation and references. The database on unstable rock slopes in Norway will be publicly consultable through an online map service. Factsheets with key information on unstable rock slopes can be automatically generated and downloaded for each site. Areas of possible rock avalanche run-out and their secondary effects displayed in the online map service, along with hazard and risk assessments, will become important tools for
Samson, F; Brunaud, V; Balzergue, S; Dubreucq, B; Lepiniec, L; Pelletier, G; Caboche, M; Lecharny, A
A large collection of T-DNA insertion transformants of Arabidopsis thaliana has been generated at the Institute of Agronomic Research, Versailles, France. The molecular characterisation of the insertion sites is currently performed by sequencing genomic regions flanking the inserted T-DNA (FST). The almost complete sequence of the nuclear genome of A.thaliana provides the framework for organising FSTs in a genome oriented database, FLAGdb/FST (http://genoplante-info.infobiogen.fr). The main scope of FLAGdb/FST is to help biologists to find the FSTs that interrupt the genes in which they are interested. FSTs are anchored to the genome sequences of A.thaliana and positions of both predicted genes and FSTs are shown graphically on sequences. Requests to locate the genomic position of a query sequence are made using BLAST programs. The response delivered by FLAGdb/FST is a graphical representation of the putative FSTs and of predicted genes in a 20 kb region.
An electronically portable two-locus DNA sequence database, comprising partial sequences of the translation elongation factor gene (EF-1a, 634 bp alignment) and nearly complete sequences of the nuclear ribosomal intergenic spacer region (IGS rDNA, 2220 bp alignment) for 850 isolates spanning the phy...
Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A
A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser.
Dean, M.; Allikmets, R.
Partially sequenced cDNAs, or expressed sequence tags (ESTs), are claimed to represent an efficient strategy for characterizing an organism`s genes. By necessity, these sequences are incompletely characterized, and examples of contamination of cDNA libraries with sequences from other species have been described. It has been suggested that a Human T-cell cDNA library (Clontech HL1963g) is contaminated by sequences from yeast (Saccharomyces cerevisiae) and an unknown bacterium. We are characterizing human ESTs that represent new members of the ATP-binding cassette transporter super-family. In examining human ESTs generated from the T-cell library, we have encountered one gene that was in fact a yeast sequence (Genbank Z15214 = SSH2 locus) and several genes that do not hybridize to human DNA or RNA. PCR primers from these sequences failed to amplify a product from human, yeast, or Escherichia coli DNA but did produce a product from a Clontech kidney cDNA library (HL1123a). To determine the source of the contamination, we amplified a conserved segment of the 16S rDNA (following a suggestion from Dr. C. Savakis) from the kidney library. The sequence of this product was nearly identical to that of the bacterium Leuconostoc lactis (300 of 304 bp). Leuconostoc species are commonly found in dairy products, fruits, vegetables, and wine and are nonpathogenic to humans. 6 refs., 1 fig.
Lapointe, Martine; Rogic, Anita; Bourgoin, Sarah; Jolicoeur, Christine; Séguin, Diane
In recent years, sophisticated technology has significantly increased the sensitivity and analytical power of genetic analyses so that very little starting material may now produce viable genetic profiles. This sensitivity however, has also increased the risk of detecting unknown genetic profiles assumed to be that of the perpetrator, yet originate from extraneous sources such as from crime scene workers. These contaminants may mislead investigations, keeping criminal cases active and unresolved for long spans of time. Voluntary submission of DNA samples from crime scene workers is fairly low, therefore we have created a promotional method for our staff elimination database that has resulted in a significant increase in voluntary samples since 2011. Our database enforces privacy safeguards and allows for optional anonymity to all staff members. We also offer information sessions at various police precincts to advise crime scene workers of the importance and success of our staff elimination database. This study, a pioneer in its field, has obtained 327 voluntary submissions from crime scene workers to date, of which 46 individual profiles (14%) have been matched to 58 criminal cases. By implementing our methods and respect for individual privacy, forensic laboratories everywhere may see similar growth and success in explaining unidentified genetic profiles in stagnate criminal cases.
Hicks, T; Taroni, F; Curran, J; Buckleton, J; Castella, V; Ribaux, O
Familial searching consists of searching for a full profile left at a crime scene in a National DNA Database (NDNAD). In this paper we are interested in the circumstance where no full match is returned, but a partial match is found between a database member's profile and the crime stain. Because close relatives share more of their DNA than unrelated persons, this partial match may indicate that the crime stain was left by a close relative of the person with whom the partial match was found. This approach has successfully solved important crimes in the UK and the USA. In a previous paper, a model, which takes into account substructure and siblings, was used to simulate a NDNAD. In this paper, we have used this model to test the usefulness of familial searching and offer guidelines for pre-assessment of the cases based on the likelihood ratio. Siblings of "persons" present in the simulated Swiss NDNAD were created. These profiles (N=10,000) were used as traces and were then compared to the whole database (N=100,000). The statistical results obtained show that the technique has great potential confirming the findings of previous studies. However, effectiveness of the technique is only one part of the story. Familial searching has juridical and ethical aspects that should not be ignored. In Switzerland for example, there are no specific guidelines to the legality or otherwise of familial searching. This article both presents statistical results, and addresses criminological and civil liberties aspects to take into account risks and benefits of familial searching.
Myers, Steven P; Timken, Mark D; Piucci, Matthew L; Sims, Gary A; Greenwald, Michael A; Weigand, James J; Konzak, Kenneth C; Buoncristiani, Martin R
A validation study was performed to measure the effectiveness of using a likelihood ratio-based approach to search for possible first-degree familial relationships (full-sibling and parent-child) by comparing an evidence autosomal short tandem repeat (STR) profile to California's ∼1,000,000-profile State DNA Index System (SDIS) database. Test searches used autosomal STR and Y-STR profiles generated for 100 artificial test families. When the test sample and the first-degree relative in the database were characterized at the 15 Identifiler(®) (Applied Biosystems(®), Foster City, CA) STR loci, the search procedure included 96% of the fathers and 72% of the full-siblings. When the relative profile was limited to the 13 Combined DNA Index System (CODIS) core loci, the search procedure included 93% of the fathers and 61% of the full-siblings. These results, combined with those of functional tests using three real families, support the effectiveness of this tool. Based upon these results, the validated approach was implemented as a key, pragmatic and demonstrably practical component of the California Department of Justice's Familial Search Program. An investigative lead created through this process recently led to an arrest in the Los Angeles Grim Sleeper serial murders.
Eduardoff, Mayra; Huber, Gabriela; Bayer, Birgit; Schmid, Dagmar; Anslinger, Katja; Göbel, Tanja; Zimmermann, Bettina; Schneider, Peter M; Röck, Alexander W; Parson, Walther
In forensic genetics mitochondrial DNA (mtDNA) is usually analyzed by direct Sanger-type sequencing (STS). This method is known to be laborious and sometimes prone to human error. Alternative methods have been proposed that lead to faster results. Among these are methods that involve mass-spectrometry resulting in base composition profiles that are, by definition, less informative than the full nucleotide sequence. Here, we applied a highly automated electrospray ionization mass spectrometry (ESI-MS) system (PLEX-ID) to an mtDNA population study to compare its performance with respect to throughput and concordance to STS. We found that the loss of information power was relatively low compared to the gain in speed and analytical standardization. The detection of point and length heteroplasmy turned out to be roughly comparable between the technologies with some individual differences related to the processes. We confirm that ESI-MS provides a valuable platform for analyzing mtDNA variation that can also be applied in the forensic context.
Astrin, Jonas J.; Höfer, Hubert; Spelda, Jörg; Holstein, Joachim; Bayer, Steffen; Hendrich, Lars; Huber, Bernhard A.; Kielhorn, Karl-Hinrich; Krammer, Hans-Joachim; Lemke, Martin; Monje, Juan Carlos; Morinière, Jérôme; Rulik, Björn; Petersen, Malte; Janssen, Hannah; Muster, Christoph
As part of the German Barcode of Life campaign, over 3500 arachnid specimens have been collected and analyzed: ca. 3300 Araneae and 200 Opiliones, belonging to almost 600 species (median: 4 individuals/species). This covers about 60% of the spider fauna and more than 70% of the harvestmen fauna recorded for Germany. The overwhelming majority of species could be readily identified through DNA barcoding: median distances between closest species lay around 9% in spiders and 13% in harvestmen, while in 95% of the cases, intraspecific distances were below 2.5% and 8% respectively, with intraspecific medians at 0.3% and 0.2%. However, almost 20 spider species, most notably in the family Lycosidae, could not be separated through DNA barcoding (although many of them present discrete morphological differences). Conspicuously high interspecific distances were found in even more cases, hinting at cryptic species in some instances. A new program is presented: DiStats calculates the statistics needed to meet DNA barcode release criteria. Furthermore, new generic COI primers useful for a wide range of taxa (also other than arachnids) are introduced. PMID:27681175
Dürrschmid, Karin; Marzban, Gorji; Dürrschmid, Eberhard; Striedner, Gerald; Clementschitsch, Franz; Cserjan-Puschmann, Monika; Bayer, Karl
The expression of human superoxide dismutase in fed-batch fermentation of E. coli HMS174(DE3)(pET3ahSOD) was studied as model system. Due to the frequently used strong T7 promoter system a high metabolic load is exerted, which triggers stress response mechanisms and finally leads to the differentiation of the host cell. As a consequence, host cell metabolism is partly shifted from growth to survival accompanied by significant alterations of the protein pattern. In terms of process optimization two-dimensional electrophoresis deserves as a powerful tool to monitor these changes on protein level. For the analysis of samples derived from different states of recombinant protein production wide-range Immobiline Dry Strips pH 3-10 were used. In order to establish an efficient procedure for accelerated process optimization and to avoid costly and time-consuming analysis like mass spectrometry (MS), a database approach for the identification of significant changes of the protein pattern was evaluated. On average, 935 spots per gel were detected, whereby 50 are presumably stress-relevant. Out of these, 24 proteins could be identified by using the SWISS-2DPAGE database (www.expasy.ch/ch2d/). The identified proteins are involved in regulatory networks, energy metabolism, purine and pyrimidine nucleotide synthesis and translation. By this database approach, significant fluctuations of individual proteins in relation to recombinant protein production could be identified. Seven proteins show strong alterations (>100%) directly after induction and can therefore be stated as reliable marker proteins for the assessment of stress response. For distinctive interpretation of this highly specific information, a bioinformatic and statistic tool would be essential in order to perceive the role and contribution of individual proteins in stress response.
Lung cancer is one of the main public health issues in developed countries. Lung cancer typically manifests itself as non-calcified pulmonary nodules that can be detected reading lung Computed Tomography (CT) images. To assist radiologists in reading images, researchers started, a decade ago, the development of Computer Aided Detection (CAD) methods capable of detecting lung nodules. In this work, a CAD composed of two CAD subprocedures is presented: , devoted to the identification of parenchymal nodules, and , devoted to the identification of the nodules attached to the pleura surface. Both CADs are an upgrade of two methods previously presented as Voxel Based Neural Approach CAD . The novelty of this paper consists in the massive training using the public research Lung International Database Consortium (LIDC) database and on the implementation of new features for classification with respect to the original VBNA method. Finally, the proposed CAD is blindly validated on the ANODE09 dataset. The result of the validation is a score of 0.393, which corresponds to the average sensitivity of the CAD computed at seven predefined false positive rates: 1/8, 1/4, 1/2, 1, 2, 4, and 8 FP/CT.
Hatch, Scott A.
For more than 300 years, the peer-reviewed journal article has been the principal medium for packaging and delivering scientific data. With new tools for managing digital data, a new paradigm is emerging—one that demands open and direct access to data and that enables and rewards a broad-based approach to scientific questions. Ground-breaking papers in the future will increasingly be those that creatively mine and synthesize vast stores of data available on the Internet. This is especially true for conservation science, in which essential data can be readily captured in standard record formats. For seabird professionals, a number of globally shared databases are in the offing, or should be. These databases will capture the salient results of inventories and monitoring, pelagic surveys, diet studies, and telemetry. A number of real or perceived barriers to data sharing exist, but none is insurmountable. Our discipline should take an important stride now by adopting a specially designed markup language for annotating and sharing seabird data.
Brandt, Bernd W; Heringa, Jaap
Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.
Maldonado, Carla; Molina, Carlos I.; Zizka, Alexander; Persson, Claes; Taylor, Charlotte M.; Albán, Joaquina; Chilquillo, Eder; Antonelli, Alexandre
Abstract Aim Massive digitalization of natural history collections is now leading to a steep accumulation of publicly available species distribution data. However, taxonomic errors and geographical uncertainty of species occurrence records are now acknowledged by the scientific community – putting into question to what extent such data can be used to unveil correct patterns of biodiversity and distribution. We explore this question through quantitative and qualitative analyses of uncleaned versus manually verified datasets of species distribution records across different spatial scales. Location The American tropics. Methods As test case we used the plant tribe Cinchoneae (Rubiaceae). We compiled four datasets of species occurrences: one created manually and verified through classical taxonomic work, and the rest derived from GBIF under different cleaning and filling schemes. We used new bioinformatic tools to code species into grids, ecoregions, and biomes following WWF's classification. We analysed species richness and altitudinal ranges of the species. Results Altitudinal ranges for species and genera were correctly inferred even without manual data cleaning and filling. However, erroneous records affected spatial patterns of species richness. They led to an overestimation of species richness in certain areas outside the centres of diversity in the clade. The location of many of these areas comprised the geographical midpoint of countries and political subdivisions, assigned long after the specimens had been collected. Main conclusion Open databases and integrative bioinformatic tools allow a rapid approximation of large‐scale patterns of biodiversity across space and altitudinal ranges. We found that geographic inaccuracy affects diversity patterns more than taxonomic uncertainties, often leading to false positives, i.e. overestimating species richness in relatively species poor regions. Public databases for species distribution are valuable and should be
Eklund, Peter W.
This paper surveys public domain supervised learning algorithms and performs accuracy (error rate) analysis of their classification performance on unseen instances for twenty-nine of the University of California at Irvine machine learning datasets. The learning algorithms represent three types of classifiers: decision trees, neural networks and rule-based classifiers. The study performs data analysis and examines the effect of irrelevant attributes to explain the performance characteristics of the learning algorithms. The survey concludes with some general recommendations about the selection of public domain machine-learning algorithms relative to the properties of the data examined.
Allele frequency distributions for thirteen STR loci: D3S1358, vWA, FGA, D8S1179, D21S11, D18S51, D5S818, D13S317, D16S539, THO1, TPOX, CSF1PO, and D7S820, have been determined by multiplex amplification and subsequent automatic fluorescent detection. The result of statistical analysis shows that the 13 loci satisfy Hardy-Weinberg expectation. The data obtained from this study will be used as reference data for forensic DNA analysis in China.
Wu, Tsung-Jung; Shamsaddini, Amirhossein; Pan, Yang; Smith, Krista; Crichton, Daniel J; Simonyan, Vahan; Mazumder, Raja
Years of sequence feature curation by UniProtKB/Swiss-Prot, PIR-PSD, NCBI-CDD, RefSeq and other database biocurators has led to a rich repository of information on functional sites of genes and proteins. This information along with variation-related annotation can be used to scan human short sequence reads from next-generation sequencing (NGS) pipelines for presence of non-synonymous single-nucleotide variations (nsSNVs) that affect functional sites. This and similar workflows are becoming more important because thousands of NGS data sets are being made available through projects such as The Cancer Genome Atlas (TCGA), and researchers want to evaluate their biomarkers in genomic data. BioMuta, an integrated sequence feature database, provides a framework for automated and manual curation and integration of cancer-related sequence features so that they can be used in NGS analysis pipelines. Sequence feature information in BioMuta is collected from the Catalogue of Somatic Mutations in Cancer (COSMIC), ClinVar, UniProtKB and through biocuration of information available from publications. Additionally, nsSNVs identified through automated analysis of NGS data from TCGA are also included in the database. Because of the petabytes of data and information present in NGS primary repositories, a platform HIVE (High-performance Integrated Virtual Environment) for storing, analyzing, computing and curating NGS data and associated metadata has been developed. Using HIVE, 31 979 nsSNVs were identified in TCGA-derived NGS data from breast cancer patients. All variations identified through this process are stored in a Curated Short Read archive, and the nsSNVs from the tumor samples are included in BioMuta. Currently, BioMuta has 26 cancer types with 13 896 small-scale and 308 986 large-scale study-derived variations. Integration of variation data allows identifications of novel or common nsSNVs that can be prioritized in validation studies. Database URL: BioMuta: http
Kalupa, Frank B.; Trotter, Edgar P.
To develop research capabilities among public relations professionals, a research teaching program was begun as part of an advanced undergraduate course using an online computer terminal system to analyze up-to-date field survey data. Students use an interactive computer program designed to provide a linking system of live research data and active…
Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development
Many sources of public toxicity data are not currently linked to chemical structure, are not ...
Fuerst, Paul A
Species of Acanthamoeba have been traditionally described using morphology (primarily cyst structure), or cytology of nuclear division (used by Pussard and Pons, 1977). Twenty-plus putative species were proposed based on such criteria. Morphology, however, is often plastic, dependent upon culture conditions. DNA sequences of the nuclear small subunit (18S) rRNA that can be used for the study of the phylogeny of Acanthamoeba have increased from a single sequence in 1986 to more than 1800 in 2013. Some of the patterns of the sequence data for Acanthamoeba are reviewed, and some of the insights that this data illuminates are illustrated. In particular, the data suggest the existence of 20 or more genotypic types, a number not dissimilar to the number of named species of Acanthamoeba. However, molecular studies make clear that the relationship between phylogenetic relatedness and species names as we know them for Acanthamoeba is tenuous at best.
Guz, A. N.; Rushchitsky, J. J.
The paper analyzes the level of coverage and citation of publications by mechanicians of the National Academy of Sciences of Ukraine (NASU) in the Scopus database. Two groups of mechanicians are considered. One group includes 66 doctors of sciences of the S. P. Timoshenko Institute of Mechanics as representatives of the oldest institute of the NASU. The other group includes 34 members (academicians and corresponding members) of the Division of Mechanics of the NASU as representatives of the authoritative community of mechanicians in Ukraine. The results are presented for each scientist in the form of two indices—the total number of publications accessible in the database as the level of coverage of the scientist's publications in this database and the h-index as the citation level of these publications. This paper may be considered to continue the papers [6-12] published in Prikladnaya Mekhanika (International Applied Mechanics) in 2005-2009
Limburg, Petra A; Weider, Lawrence J
Recent work on the diapausing egg banks of zooplankton, such as Daphnia (Crustacea: Anomopoda), indicates that these eggs can remain viable for decades while, theoretically, DNA can remain intact for even longer periods (i.e. centuries or millennia). We isolated diapausing eggs of Daphnia from a 30 m long sediment core taken from a hypereutrophic, northern German lake (Belauer See), with some eggs found in dated core material as old as 4500 years. Using microsatellite markers, we analysed the genetic structure of the resting eggs dated as old as ca. 200 years, and found that, although levels of heterozygosity remained remarkably stable, significant genetic differentiation (Nei's D = 0.36; F(ST) = 0.15) between recent and 'ancient' resting eggs (including allele frequency shifts and private alleles) was detected. These shifts represent either species-level changes in this complex (i.e. species-specific characters of ephippia are not always robust), or intraspecific shifts in genetic variation, or a combination of both. This study demonstrates that the egg banks of aquatic zooplankton can serve as repositories of both genetic (intrapopulational) and ecological (interspecific) information. The use of molecular markers, such as microsatellites, on diapausing egg/seed banks may open new avenues of enquiry related to tracking the long-term genetic (and/or species) shifts that are associated with long-term environmental changes.
Recently, Basic Law 10/2007 of 8 October has entered into effect, which regulates the police database on identifiers that are obtained from DNA. In the following lines, the author reveals the process of approval of this law as well as approaching certain of its aspects from a genetic perspective.
Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A; Lewensohn, Rolf; Lühr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Björn; Zips, Daniel; Baumann, Michael
Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within "Big Data". Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology.
Miri, Mohammad Saleh; Ghayoor, Ali; Johnson, Hans J.; Sonka, Milan
This work reports on a comparative study between five manual and automated methods for intra-subject pair-wise registration of images from different modalities. The study includes a variety of inter-modal image registrations (MR-CT, PET-CT, PET-MR) utilizing different methods including two manual point-based techniques using rigid and similarity transformations, one automated point-based approach based on Iterative Closest Point (ICP) algorithm, and two automated intensity-based methods using mutual information (MI) and normalized mutual information (NMI). These techniques were employed for inter-modal registration of brain images of 9 subjects from a publicly available dataset, and the results were evaluated qualitatively via checkerboard images and quantitatively using root mean square error and MI criteria. In addition, for each inter-modal registration, a paired t-test was performed on the quantitative results in order to find any significant difference between the results of the studied registration techniques.
Gill, Peter; Haned, Hinda; Bleka, Oyvind; Hansson, Oskar; Dørum, Guro; Egeland, Thore
The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to
Kershenbaum, Anne D.; Langston, Michael A.; Levine, Robert S.; Saxton, Arnold M.; Oyana, Tonny J.; Kilbourne, Barbara J.; Rogers, Gary L.; Gittner, Lisaann S.; Baktash, Suzanne H.; Matthews-Juarez, Patricia; Juarez, Paul D.
Recent advances in informatics technology has made it possible to integrate, manipulate, and analyze variables from a wide range of scientific disciplines allowing for the examination of complex social problems such as health disparities. This study used 589 county-level variables to identify and compare geographical variation of high and low preterm birth rates. Data were collected from a number of publically available sources, bringing together natality outcomes with attributes of the natural, built, social, and policy environments. Singleton early premature county birth rate, in counties with population size over 100,000 persons provided the dependent variable. Graph theoretical techniques were used to identify a wide range of predictor variables from various domains, including black proportion, obesity and diabetes, sexually transmitted infection rates, mother’s age, income, marriage rates, pollution and temperature among others. Dense subgraphs (paracliques) representing groups of highly correlated variables were resolved into latent factors, which were then used to build a regression model explaining prematurity (R-squared = 76.7%). Two lists of counties with large positive and large negative residuals, indicating unusual prematurity rates given their circumstances, may serve as a starting point for ways to intervene and reduce health disparities for preterm births. PMID:25464130
Abstract Background The Canadian Institute for Health Information (CIHI) collects hospital discharge abstract data (DAD) from Canadian provinces and territories. There are many demands for the disclosure of this data for research and analysis to inform policy making. To expedite the disclosure of data for some of these purposes, the construction of a DAD public use microdata file (PUMF) was considered. Such purposes include: confirming some published results, providing broader feedback to CIHI to improve data quality, training students and fellows, providing an easily accessible data set for researchers to prepare for analyses on the full DAD data set, and serve as a large health data set for computer scientists and statisticians to evaluate analysis and data mining techniques. The objective of this study was to measure the probability of re-identification for records in a PUMF, and to de-identify a national DAD PUMF consisting of 10% of records. Methods Plausible attacks on a PUMF were evaluated. Based on these attacks, the 2008-2009 national DAD was de-identified. A new algorithm was developed to minimize the amount of suppression while maximizing the precision of the data. The acceptable threshold for the probability of correct re-identification of a record was set at between 0.04 and 0.05. Information loss was measured in terms of the extent of suppression and entropy. Results Two different PUMF files were produced, one with geographic information, and one with no geographic information but more clinical information. At a threshold of 0.05, the maximum proportion of records with the diagnosis code suppressed was 20%, but these suppressions represented only 8-9% of all values in the DAD. Our suppression algorithm has less information loss than a more traditional approach to suppression. Smaller regions, patients with longer stays, and age groups that are infrequently admitted to hospitals tend to be the ones with the highest rates of suppression. Conclusions The
Baussano, Iacopo; Brzoska, Patrick; Fedeli, Ugo; Larouche, Claudia; Razum, Oliver; Fung, Isaac C-H
Epidemiology and public health are usually context-specific. Journals published in different languages and countries play a role both as sources of data and as channels through which evidence is incorporated into local public health practice. Databases in these languages facilitate access to relevant journals, and professional education in these languages facilitates the growth of native expertise in epidemiology and public health. However, as English has become the lingua franca of scientific communication in the era of globalisation, many journals published in non-English languages face the difficult dilemma of either switching to English and competing internationally, or sticking to the native tongue and having a restricted circulation among a local readership. This paper discusses the historical development of epidemiology and the current scene of epidemiological and public health journals, databases and professional education in three Western European languages: French, German and Italian, and examines the dynamics and struggles they have today. PMID:18826570
Turchi, Chiara; Buscemi, Loredana; Giacchino, Erika; Onofri, Valerio; Fendt, Liane; Parson, Walther; Tagliabracci, Adriano
Current forensic mitochondrial (mt)DNA databases are limited in representative population data of African origin. We investigated HVS-I/HVS-II sequences of 120 Tunisian and Moroccan healthy male donors applying stringent quality criteria to assure high quality of the data and phylogenetic alignment and notation of the sequences. Among 64 Tunisians, 56 different haplotypes were observed and the most common haplotype (16187T 16189C 16223T 16264T 16270T 16278T 16293G 16311C 73G 152C 182T 185T 195C 247A 263G 309.1C 315.1C; haplogroup (hg) L1b) was shared by four individuals. 56 Moroccans could be assigned to 52 different haplotypes where the most common haplotype was of West Eurasian origin with the hg H sequence motif 263G 315.1C and variations in the HVS-II polyC-stretch (309.1C 309.2C) shared by six samples. The majority of the observed haplotypes belong to the west Eurasian phylogeny (50% in Tunisians and 62.5% in Moroccans). Our data are consistent with the current phylogeographic knowledge displaying the occurrence of sub-Saharan haplogroup L sequences, found in 48.4% of Tunisians and 25% of Moroccans as well as the presence of the two re-migrated haplogroups U6 (7.8% and 1.8% in Tunisians and Moroccans, respectively) and M1 (1.6% in Tunisians and 8.9% in Moroccans).
Hume, Maxwell A.; Barrera, Luis A.; Gisselbrecht, Stephen S.; Bulyk, Martha L.
The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k (‘k-mers’). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org. PMID:25378322
Hume, Maxwell A; Barrera, Luis A; Gisselbrecht, Stephen S; Bulyk, Martha L
The Universal PBM Resource for Oligonucleotide Binding Evaluation (UniPROBE) serves as a convenient source of information on published data generated using universal protein-binding microarray (PBM) technology, which provides in vitro data about the relative DNA-binding preferences of transcription factors for all possible sequence variants of a length k ('k-mers'). The database displays important information about the proteins and displays their DNA-binding specificity data in terms of k-mers, position weight matrices and graphical sequence logos. This update to the database documents the growth of UniPROBE since the last update 4 years ago, and introduces a variety of new features and tools, including a new streamlined pipeline that facilitates data deposition by universal PBM data generators in the research community, a tool that generates putative nonbinding (i.e. negative control) DNA sequences for one or more proteins and novel motifs obtained by analyzing the PBM data using the BEEML-PBM algorithm for motif inference. The UniPROBE database is available at http://uniprobe.org.
Researchers studying genes and their protein products need an easily available source for that gene. The I.M.A.G.E. Consortium at Lawrence Livermore National Laboratory is an important source of such genes in the form of arrayed cDNA libraries. The arrayed clones and associated data are available to the public, free of restriction. Libraries are transformed and titered into 384-well master plates, from which 2-8 copies are made. One copy plate is stored by LLNL while others are sent to sequencing groups, plate distributors, and to the group which contributed the library. Clones found to be unique and/or full-length are rearrayed and also made publicly available. Bioinformatics tools supporting the use of I.M.A.G.E. clones are accessible via the World Wide Web.
Sihtmäe, Mariliis; Blinova, Irina; Aruoja, Villem; Dubourguier, Henri-Charles; Legrand, Nicolas; Kahru, Anne
A new open-access online database, E-SovTox, is presented. E-SovTox provides toxicological data for substances relevant to the EU Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) system, from publicly-available Russian language data sources. The database contains information selected mainly from scientific journals published during the Soviet Union era. The main information source for this database - the journal, Gigiena Truda i Professional'nye Zabolevania [Industrial Hygiene and Occupational Diseases], published between 1957 and 1992 - features acute, but also chronic, toxicity data for numerous industrial chemicals, e.g. for rats, mice, guinea-pigs and rabbits. The main goal of the abovementioned toxicity studies was to derive the maximum allowable concentration limits for industrial chemicals in the occupational health settings of the former Soviet Union. Thus, articles featured in the database include mostly data on LD50 values, skin and eye irritation, skin sensitisation and cumulative properties. Currently, the E-SovTox database contains toxicity data selected from more than 500 papers covering more than 600 chemicals. The user is provided with the main toxicity information, as well as abstracts of these papers in Russian and in English (given as provided in the original publication). The search engine allows cross-searching of the database by the name or CAS number of the compound, and the author of the paper. The E-SovTox database can be used as a decision-support tool by researchers and regulators for the hazard assessment of chemical substances.
Since the Genome Project began several years ago, a plethora of databases have been developed or are in the works. They range from the massive Genome Data Base at Johns Hopkins University, the central repository of all gene mapping information, to small databases focusing on single chromosomes or organisms. Some are publicly available, others are essentially private electronic lab notebooks. Still others limit access to a consortium of researchers working on, say, a single human chromosome. An increasing number incorporate sophisticated search and analytical software, while others operate as little more than data lists. In consultation with numerous experts in the field, a list has been compiled of some key genome-related databases. The list was not limited to map and sequence databases but also included the tools investigators use to interpret and elucidate genetic data, such as protein sequence and protein structure databases. Because a major goal of the Genome Project is to map and sequence the genomes of several experimental animals, including E. coli, yeast, fruit fly, nematode, and mouse, the available databases for those organisms are listed as well. The author also includes several databases that are still under development - including some ambitious efforts that go beyond data compilation to create what are being called electronic research communities, enabling many users, rather than just one or a few curators, to add or edit the data and tag it as raw or confirmed.
El Bali, Latifa; Diman, Aurélie; Bernard, Alfred; Roosens, Nancy H C; De Keersmaecker, Sigrid C J
Human genomic DNA extracted from urine could be an interesting tool for large-scale public health studies involving characterization of genetic variations or DNA biomarkers as a result of the simple and noninvasive collection method. These studies, involving many samples, require a rapid, easy, and standardized extraction protocol. Moreover, for practicability, there is a necessity to collect urine at a moment different from the first void and to store it appropriately until analysis. The present study compared seven commercial kits to select the most appropriate urinary human DNA extraction procedure for epidemiological studies. DNA yield has been determined using different quantification methods: two classical, i.e., NanoDrop and PicoGreen, and two species-specific real-time quantitative (q)PCR assays, as DNA extracted from urine contains, besides human, microbial DNA also, which largely contributes to the total DNA yield. In addition, the kits giving a good yield were also tested for the presence of PCR inhibitors. Further comparisons were performed regarding the sampling time and the storage conditions. Finally, as a proof-of-concept, an important gene related to smoking has been genotyped using the developed tools. We could select one well-performing kit for the human DNA extraction from urine suitable for molecular diagnostic real-time qPCR-based assays targeting genetic variations, applicable to large-scale studies. In addition, successful genotyping was possible using DNA extracted from urine stored at -20°C for several months, and an acceptable yield could also be obtained from urine collected at different moments during the day, which is particularly important for public health studies.
El Bali, Latifa; Diman, Aurélie; Bernard, Alfred; Roosens, Nancy H. C.; De Keersmaecker, Sigrid C. J.
Human genomic DNA extracted from urine could be an interesting tool for large-scale public health studies involving characterization of genetic variations or DNA biomarkers as a result of the simple and noninvasive collection method. These studies, involving many samples, require a rapid, easy, and standardized extraction protocol. Moreover, for practicability, there is a necessity to collect urine at a moment different from the first void and to store it appropriately until analysis. The present study compared seven commercial kits to select the most appropriate urinary human DNA extraction procedure for epidemiological studies. DNA yield has been determined using different quantification methods: two classical, i.e., NanoDrop and PicoGreen, and two species-specific real-time quantitative (q)PCR assays, as DNA extracted from urine contains, besides human, microbial DNA also, which largely contributes to the total DNA yield. In addition, the kits giving a good yield were also tested for the presence of PCR inhibitors. Further comparisons were performed regarding the sampling time and the storage conditions. Finally, as a proof-of-concept, an important gene related to smoking has been genotyped using the developed tools. We could select one well-performing kit for the human DNA extraction from urine suitable for molecular diagnostic real-time qPCR-based assays targeting genetic variations, applicable to large-scale studies. In addition, successful genotyping was possible using DNA extracted from urine stored at −20°C for several months, and an acceptable yield could also be obtained from urine collected at different moments during the day, which is particularly important for public health studies. PMID:25365790
Brebi-Mieville, Priscilla; Ili-Gangas, Carmen; Leal-Rojas, Pamela; Noordhuis, Maartje; Soudry, Ethan; Perez, Jimena; Roa, Juan Carlos; Sidransky, David; Guerrero-Preston, Rafael
The methylated DNA immunoprecipitation method (MeDIP) is a genome-wide, high-resolution approach that detects DNA methylation with oligonucleotide tiling arrays or high throughput sequencing platforms. A simplified high-throughput MeDIP assay will enable translational research studies in clinics and populations, which will greatly enhance our understanding of the human methylome. We compared three commercial kits, MagMeDIP Kit TM (Diagenode), Methylated-DNA IP Kit (Zymo Research) and Methylamp™ Methylated DNA Capture Kit (Epigentek), in order to identify which one has better reliability and sensitivity for genomic DNA enrichment. Each kit was used to enrich two samples, one from fresh tissue and one from a cell line, with two different DNA amounts. The enrichment efficiency of each kit was evaluated by agarose gel band intensity after Nco I digestion and by reaction yield of methylated DNA. A successful enrichment is expected to have a 1:4 to 10:1 conversion ratio and a yield of 80% or higher. We also evaluated the hybridization efficiency to genome-wide methylation arrays in a separate cohort of tissue samples. We observed that the MagMeDIP kit had the highest yield for the two DNA amounts and for both the tissue and cell line samples, as well as for the positive control. In addition, the DNA was successfully enriched from a 1:4 to 10:1 ratio. Therefore, the MagMeDIP kit is a useful research tool that will enable clinical and public health genome-wide DNA methylation studies. PMID:22207357
High density genotyping techniques are needed for investigating antimicrobial resistance especially in the case of multi-drug resistant (MDR) isolates. To achieve this all antimicrobial resistance genes in the NCBI Genbank database were identified by key word searches of sequence annotations and the...
Albertazzi, Federico J
The spreading of knowledge depends on the access to the information and its immediate use. Models are useful to explain specific phenomena. The scientific community accepts some models in Biology after a period of time, once it has evidence to support it. The model of the structure and function of the DNA proposed by Watson & Crick (1953) was not the exception, since a few years later the DNA model was finally accepted. In Costa Rica, DNA function was first mentioned in 1970, in the magazine Biologia Tropical (Tropical Biology Magazine), more than 15 years after its first publication in a scientific journal. An opposite situation occurs with technical innovations. If the efficiency of a new scientific technique is proved in a compelling way, then the acceptance by the community comes swiftly. This was the case of the polymerase chain reaction, or PCR. The first PCR machine in Costa Rica arrived in 1991, only three years after its publication.
The functionality of standard zoological DNA barcoding practice (the identification of unknown specimens by comparison of COI sequences) is contingent on working barcode databases with sufficient taxonomic coverage. It has already been established that the main barcoding repositories, NCBI and BOLD, are devoid of data for many animal groups but the specific taxonomic coverage of the repositories across animal biodiversity remains unexplored. Here, I shed light on this mystery by contrasting the number of unique taxon labels in the two databases with the number of currently recognized species for each animal phylum. The numbers reveal an overall paucity of COI sequence data in the repositories (15.13% total coverage across the recognized biodiversity on Earth, and 20.76% average taxonomic coverage for each phylum) and, more importantly, bear witness to the idleness towards numerous phyla, rendering current barcoding efforts either ineffective or inaccurate. The importance of further integrating taxonomic expertise into barcoding practice is briefly discussed and some guidelines, previously mentioned in the barcoding literature, are suggested anew. Finally, the asserted values concerning the taxonomic coverage in barcoding databases for Animalia are contrasted with those of Plantae and Fungi.
Al-Zahrani, Rashed S.
Since its establishment in 1960, the Institute of Public Administration (IPA) in Riyadh, Saudi Arabia has had responsibility for documenting Saudi administrative literature, the official publications of Saudi Arabia, and the literature of regional and international organizations through establishment of the Document Center in 1961. This paper…
Zheleznyakova, Galina Y; Cao, Hao; Schiöth, Helgi B
Brain-derived neurotrophic factor (BDNF) plays an important role in nervous system development and function and it is well established that BDNF is involved in the pathogenesis of a wide range of psychiatric disorders. Recently, numerous studies have associated the DNA methylation level of BDNF promoters with certain psychiatric phenotypes. In this review, we summarize data from current literature as well as from our own analysis with respect to the correlation of BDNF methylation changes with psychiatric disorders and address questions about whether DNA methylation related to the BDNF can be useful as biomarker for specific neuropsychiatric disorders.
Morello, Samuel A.; Ricks, Wendell R.
The aviation safety issues database was instrumental in the refinement and substantiation of the National Aviation Safety Strategic Plan (NASSP). The issues database is a comprehensive set of issues from an extremely broad base of aviation functions, personnel, and vehicle categories, both nationally and internationally. Several aviation safety stakeholders such as the Commercial Aviation Safety Team (CAST) have already used the database. This broader interest was the genesis to making the database publically accessible and writing this report.
Wallerstein, Robert; Jelks, Andrea; Garabedian, Matthew J.
Objective. Cell-free DNA (cfDNA) offers highly accurate noninvasive screening for Down syndrome. Incorporating it into routine care is complicated. We present our experience implementing a novel program for cfDNA screening, emphasizing patient education, genetic counseling, and resource management. Study Design. Beginning in January 2013, we initiated a new patient care model in which high-risk patients for aneuploidy received genetic counseling at 12 weeks of gestation. Patients were presented with four pathways for aneuploidy risk assessment and diagnosis: (1) cfDNA; (2) integrated screening; (3) direct-to-invasive testing (chorionic villus sampling or amniocentesis); or (4) no first trimester diagnostic testing/screening. Patients underwent follow-up genetic counseling and detailed ultrasound at 18–20 weeks to review first trimester testing and finalize decision for amniocentesis. Results. Counseling and second trimester detailed ultrasound were provided to 163 women. Most selected cfDNA screening (69%) over integrated screening (0.6%), direct-to-invasive testing (14.1%), or no screening (16.6%). Amniocentesis rates decreased following implementation of cfDNA screening (19.0% versus 13.0%, P < 0.05). Conclusion. When counseled about screening options, women often chose cfDNA over integrated screening. This program is a model for patient-directed, efficient delivery of a newly available high-level technology in a public health setting. Genetic counseling is an integral part of patient education and determination of plan of care. PMID:25101177
Chen, Joseph J.; Saenz, Naomi J.; Siegel, Eliot L.
In order to validate CT imaging as a biomarker, it is important to ascertain the variability and artifacts associated with various forms of advanced visualization and quantification software. The purpose of the paper is to describe the rationale behind the creation of a free, public resource that contains phantom datasets for CT designed to facilitate testing, development and standardization of advanced visualization and quantification software. For our research, three phantoms were scanned at multiple kVp and mAs settings utilizing a 64-channel MDCT scanner at a collimation of 0.75 mm. Images were reconstructed at a slice thickness of 0.75 mm and archived in DICOM format. The phantoms consisted of precision spheres, balls of different materials and sizes, and slabs of Last-A-Foam(R) at varying densities. The database of scans is stored in an archive utilizing software developed for the National Cancer Imaging Archive and is publically available. The scans were completed successfully and the datasets are available for free and unrestricted download. The CT images can be accessed in DICOM format via http or FTP or utilizing caGRID. A DICOM database of phantom data was successfully created and made available to the public. We anticipate that this database will be useful as a reference for physicists for quality control purposes, for developers of advanced visualization and quantification software, and for others who need to test the performance of their systems against a known "gold" standard. We plan to add more phantom images in the future and expand to other imaging modalities.
da Silva Rosa, Teresa; Carneiro, Maria José
Access to scientific knowledge is a valuable resource than can inform and validate positions taken in formulating public policy. But access to this knowledge can be challenging, given the diversity and breadth of available scholarship. Communication between the fields of science and of politics requires the dissemination of scholarship and access to it. We conducted a study using an open-access search tool in order to map existent knowledge on a specific topic: agricultural contributions to the preservation of biodiversity. The present article offers a critical view of access to the information available through the Capes database on Brazilian theses and dissertations.
Cer, Regina Z.; Donohue, Duncan E.; Mudunuri, Uma S.; Temiz, Nuri A.; Loss, Michael A.; Starner, Nathan J.; Halusa, Goran N.; Volfovsky, Natalia; Yi, Ming; Luke, Brian T.; Bacolla, Albino; Collins, Jack R.; Stephens, Robert M.
The non-B DB, available at http://nonb.abcc.ncifcrf.gov, catalogs predicted non-B DNA-forming sequence motifs, including Z-DNA, G-quadruplex, A-phased repeats, inverted repeats, mirror repeats, direct repeats and their corresponding subsets: cruciforms, triplexes and slipped structures, in several genomes. Version 2.0 of the database revises and re-implements the motif discovery algorithms to better align with accepted definitions and thresholds for motifs, expands the non-B DNA-forming motifs coverage by including short tandem repeats and adds key visualization tools to compare motif locations relative to other genomic annotations. Non-B DB v2.0 extends the ability for comparative genomics by including re-annotation of the five organisms reported in non-B DB v1.0, human, chimpanzee, dog, macaque and mouse, and adds seven additional organisms: orangutan, rat, cow, pig, horse, platypus and Arabidopsis thaliana. Additionally, the non-B DB v2.0 provides an overall improved graphical user interface and faster query performance. PMID:23125372
This essay analyzes how academic institutions, government agencies, and the nascent biotech industry contested the legal ownership of recombinant DNA technology in the name of the public interest. It reconstructs the way a small but influential group of government officials and university research administrators introduced a new framework for the commercialization of academic research in the context of a national debate over scientific research's contributions to American economic prosperity and public health. They claimed that private ownership of inventions arising from public support would provide a powerful means to liberate biomedical discoveries for public benefit. This articulation of the causal link between private ownership and the public interest, it is argued, justified a new set of expectations about the use of research results arising from government or public support, in which commercialization became a new public obligation for academic researchers. By highlighting the broader economic and legal shifts that prompted the reconfiguration of the ownership of public knowledge in late twentieth-century American capitalism, the essay examines the threads of policy-informed legal ideas that came together to affirm private ownership of biomedical knowledge as germane to the public interest in the coming of age of biotechnology and genetic medicine.
van den Berge, M; Ozcanhan, G; Zijlstra, S; Lindenbergh, A; Sijen, T
Especially when minute evidentiary traces are analysed, background cell material unrelated to the crime may contribute to detectable levels in the genetic analyses. To gain understanding on the composition of human cell material residing on surfaces contributing to background traces, we performed DNA and mRNA profiling on samplings of various items. Samples were selected by considering events contributing to cell material deposits in exemplary activities (e.g. dragging a person by the trouser ankles), and can be grouped as public objects, private samples, transfer-related samples and washing machine experiments. Results show that high DNA yields do not necessarily relate to an increased number of contributors or to the detection of other cell types than skin. Background cellular material may be found on any type of public or private item. When a major contributor can be deduced in DNA profiles from private items, this can be a different person than the owner of the item. Also when a specific activity is performed and the areas of physical contact are analysed, the "perpetrator" does not necessarily represent the major contributor in the STR profile. Washing machine experiments show that transfer and persistence during laundry is limited for DNA and cell type dependent for RNA. Skin conditions such as the presence of sebum or sweat can promote DNA transfer. Results of this study, which encompasses 549 samples, increase our understanding regarding the prevalence of human cell material in background and activity scenarios.
Childs, Kevin L; Hamilton, John P; Zhu, Wei; Ly, Eugene; Cheung, Foo; Wu, Hank; Rabinowicz, Pablo D; Town, Chris D; Buell, C Robin; Chan, Agnes P
The TIGR Plant Transcript Assemblies (TA) database (http://plantta.tigr.org) uses expressed sequences collected from the NCBI GenBank Nucleotide database for the construction of transcript assemblies. The sequences collected include expressed sequence tags (ESTs) and full-length and partial cDNAs, but exclude computationally predicted gene sequences. The TA database includes all plant species for which more than 1000 EST or cDNA sequences are publicly available. The EST and cDNA sequences are first clustered based on an all-versus-all pairwise sequence comparison, followed by the generation of consensus sequences (TAs) from individual clusters. The clustering and assembly procedures use the TGICL tool, Megablast and the CAP3 assembler. The UniProt Reference Clusters (UniRef100) protein database is used as the reference database for the functional annotation of the assemblies. The transcription orientation of each TA is determined based on the orientation of the alignment with the best protein hit. The TA sequences and annotation are available via web interfaces and FTP downloads. Assemblies can be retrieved by a text-based keyword search or a sequence-based BLAST search. The current version of the TA database is Release 2 (July 17, 2006) and includes a total of 215 plant species.
Welcome to the Morchella MLST database. This dedicated database was set up at the CBS-KNAW Biodiversity Center by Vincent Robert in February 2012, using BioloMICS software (Robert et al., 2011), to facilitate DNA sequence-based identifications of Morchella species via the Internet. The current datab...
Information about NCI publications including PDQ cancer information for patients and health professionals, patient-education publications, fact sheets, dictionaries, NCI blogs and newsletters and major reports.
Background Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. Methods We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. Results We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. Conclusions About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma
... that implements the changes required by HERA by revising the single-family matrix in FHFA's Public Use... and multifamily data matrices of the PUDB, effective for 2010 and beyond, to conform the data fields... Order with accompanying Appendix containing the revised single-family and multifamily matrices,...
Gilbert, Scott F; Howes-Mischel, Rebecca
Embryology is an intensely visual field, and it has provided the public with images of human embryos and fetuses. The responses to these images can be extremely powerful and personal, and the images (as well as our reactions to them) are conditioned by social and political agendas. The image of the 'autonomous fetus' abstracts the fetus from the mother, the womb, and from all social contexts, thereby emphasizing 'individuality'. The image of 'sacred DNA' emphasizes DNA as the unmoved mover, the eidos, the soul of the human being. Since fertilization involves the forming of a new constellation of DNA in the zygote, the act of fertilization is being perceived as the secular and technical equivalent of ensoulment. This privileges fertilization above the other possible scientifically valued times when 'human life' begins.
Structural form, bonding scheme, and chromatin structure of and gene-modification experiments with deoxyribonucleic acid (DNA) are described. Indicates that DNA's double helix is variable and also flexible as it interacts with regulatory and other molecules to transfer hereditary messages. (DH)
Lesjak, Žiga; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga
Changes of white-matter lesions (WMLs) are good predictors of the progression of neurodegenerative diseases like multiple sclerosis (MS). Based on longitudinal magnetic resonance (MR) imaging the changes can be monitored, while the need for their accurate and reliable quantification led to the development of several automated MR image analysis methods. However, an objective comparison of the methods is difficult, because publicly unavailable validation datasets with ground truth and different sets of performance metrics were used. In this study, we acquired longitudinal MR datasets of 20 MS patients, in which brain regions were extracted, spatially aligned and intensity normalized. Two expert raters then delineated and jointly revised the WML changes on subtracted baseline and follow-up MR images to obtain ground truth WML segmentations. The main contribution of this paper is an objective, quantitative and systematic evaluation of two unsupervised and one supervised intensity based change detection method on the publicly available datasets with ground truth segmentations, using common pre- and post-processing steps and common evaluation metrics. Besides, different combinations of the two main steps of the studied change detection methods, i.e. dissimilarity map construction and its segmentation, were tested to identify the best performing combination.
The Chemical and Product Categories database (CPCat) catalogs the use of over 40,000 chemicals and their presence in different consumer products. The chemical use information is compiled from multiple sources while product information is gathered from publicly available Material Safety Data Sheets (MSDS). EPA researchers are evaluating the possibility of expanding the database with additional product and use information.
SRD 147 Ionic Liquids Database- (ILThermo) (Web, free access) IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.
Guz, A. N.; Rushchitsky, J. J.
The paper performs a citation analysis of publications of mechanicians of the National Academy of Sciences of Ukraine (NASU) based on information tools developed by the Thomson Reuters Institute for Scientific Information. Two groups of mechanicians are considered: representatives of the S. P. Timoshenko Institute of Mechanics of the NASU (NASU members, heads of departments) and members (academicians) of the NASU Division of Mechanics. Three elements of the Citation Report (Results Found, Citation Index (Sum of the Times Cited), h-index) are presented for each scientist. This paper may be considered as a follow-up on the papers [6-11] published by Prikladnaya Mekhanika ( International Applied Mechanics) in 2005-2009
Stent, Gunther S.
This history for molecular genetics and its explanation of DNA begins with an analysis of the Golden Jubilee essay papers, 1955. The paper ends stating that the higher nervous system is the one major frontier of biological inquiry which still offers some romance of research. (Author/VW)
Albayrak, Ozgür; Föcker, Manuel; Wibker, Katrin; Hebebrand, Johannes
We aimed to determine the quantitative scientific publication output of child and adolescent psychiatric/psychological affiliations during 2005-2010 by country based on both, "PubMed" and "Scopus" and performed a bibliometric qualitative evaluation for 2009 using "PubMed". We performed our search by affiliation related to child and adolescent psychiatric/psychological institutions using "PubMed". For the quantitative analysis for 2005-2010, we counted the number of abstracts. For the qualitative analysis for 2009 we derived the impact factor of each abstract's journal from "Journal Citation Reports". We related total impact factor scores to the gross domestic product (GDP) and population size of each country. Additionally, we used "Scopus" to determine the number of abstracts for each country that was identified via "PubMed" for 2009 and compared the ranking of countries between the two databases. 61 % of the publications between 2005 and 2010 originated from European countries and 26 % from the USA. After adjustment for GDP and population size, the ranking positions changed in favor of smaller European countries with a population size of less than 20 million inhabitants. The ranking of countries for the count of articles in 2009 as derived from "Scopus" was similar to that identified via the "PubMed" search. The performed search revealed only minor differences between "Scopus" and "PubMed" related to the ranking of countries. Our data indicate a sharp difference between countries with a high versus low GDP with regard to scientific publication output in child and adolescent psychiatry/psychology.
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Gelfand, Yevgeniy; Rodriguez, Alfredo; Benson, Gary
Tandem repeats in DNA have been under intensive study for many years, first, as a consequence of their usefulness as genomic markers and DNA fingerprints and more recently as their role in human disease and regulatory processes has become apparent. The Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA. It contains a variety of tools for repeat analysis, including the Tandem Repeats Finder program, query and filtering capabilities, repeat clustering, polymorphism prediction, PCR primer selection, data visualization and data download in a variety of formats. In addition, TRDB serves as a centralized research workbench. It provides user storage space and permits collaborators to privately share their data and analysis. TRDB is available at . PMID:17175540
The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...
Prakash, Ashwin; Bechtel, Jason; Fedorov, Alexei
Non-coding genomic regions in complex eukaryotes, including intergenic areas, introns, and untranslated segments of exons, are profoundly non-random in their nucleotide composition and consist of a complex mosaic of sequence patterns. These patterns include so-called Mid-Range Inhomogeneity (MRI) regions -- sequences 30-10000 nucleotides in length that are enriched by a particular base or combination of bases (e.g. (G+T)-rich, purine-rich, etc.). MRI regions are associated with unusual (non-B-form) DNA structures that are often involved in regulation of gene expression, recombination, and other genetic processes (Fedorova & Fedorov 2010). The existence of a strong fixation bias within MRI regions against mutations that tend to reduce their sequence inhomogeneity additionally supports the functionality and importance of these genomic sequences (Prakash et al. 2009). Here we demonstrate a freely available Internet resource -- the Genomic MRI program package -- designed for computational analysis of genomic sequences in order to find and characterize various MRI patterns within them (Bechtel et al. 2008). This package also allows generation of randomized sequences with various properties and level of correspondence to the natural input DNA sequences. The main goal of this resource is to facilitate examination of vast regions of non-coding DNA that are still scarcely investigated and await thorough exploration and recognition. PMID:21610667
Peach, Megan L; Zakharov, Alexey V; Liu, Ruifeng; Pugliese, Angelo; Tawa, Gregory; Wallqvist, Anders; Nicklaus, Marc C
Metabolism has been identified as a defining factor in drug development success or failure because of its impact on many aspects of drug pharmacology, including bioavailability, half-life and toxicity. In this article, we provide an outline and descriptions of the resources for metabolism-related property predictions that are currently either freely or commercially available to the public. These resources include databases with data on, and software for prediction of, several end points: metabolite formation, sites of metabolic transformation, binding to metabolizing enzymes and metabolic stability. We attempt to place each tool in historical context and describe, wherever possible, the data it was based on. For predictions of interactions with metabolizing enzymes, we show a typical set of results for a small test set of compounds. Our aim is to give a clear overview of the areas and aspects of metabolism prediction in which the currently available resources are useful and accurate, and the areas in which they are inadequate or missing entirely. PMID:23088273
Marucci-Wellman, Helen R; Lehto, Mark R; Corns, Helen L
Public health surveillance programs in the U.S. are undergoing landmark changes with the availability of electronic health records and advancements in information technology. Injury narratives gathered from hospital records, workers compensation claims or national surveys can be very useful for identifying antecedents to injury or emerging risks. However, classifying narratives manually can become prohibitive for large datasets. The purpose of this study was to develop a human-machine system that could be relatively easily tailored to routinely and accurately classify injury narratives from large administrative databases such as workers compensation. We used a semi-automated approach based on two Naïve Bayesian algorithms to classify 15,000 workers compensation narratives into two-digit Bureau of Labor Statistics (BLS) event (leading to injury) codes. Narratives were filtered out for manual review if the algorithms disagreed or made weak predictions. This approach resulted in an overall accuracy of 87%, with consistently high positive predictive values across all two-digit BLS event categories including the very small categories (e.g., exposure to noise, needle sticks). The Naïve Bayes algorithms were able to identify and accurately machine code most narratives leaving only 32% (4853) for manual review. This strategy substantially reduces the need for resources compared with manual review alone.
Zhu, Dianhui; Vaishampayan, Parag A; Venkateswaran, Kasthuri; Fox, George E
A comparison of variable regions within the 16S rRNA gene is widely used to characterize relationships between bacteria and to identify phylogenetic affiliation of unknown bacteria. In environmental studies, polymerase chain reaction amplification of 16S rRNA followed by cloning and sequencing of numerous individual clones is an extensively used molecular method for elucidating microbial diversity. The sequencing process typically utilizes a forward and reverse primer pair to produce two partial reads (~700 to 800 base pairs each) that overlap and in total cover a large region of the full 16S rRNA sequence (~1.5 k base). In a typical application, this approach rapidly generates very large numbers of 16S rRNA datasets that can overwhelm manual processing efforts leading to both delays and errors. In particular, the approach presents two computational challenges: (1) the assembly of a composite sequence from the two partial reads and (2) the subsequent appropriate identification of the organism represented by the newly sequenced clones. Herein, we describe a software package, search, trim, identify, track, and capture the uniqueness of 16S rRNAs using public and in-house database (STITCH), which offers automated sequence pair splicing and genetic identification, thus simplifying the computationally intensive analysis of large sequencing libraries. The STITCH software is freely accessible over the Internet at: http://prion.bchs.uh.edu/stitch/.
Peach, Megan L; Zakharov, Alexey V; Liu, Ruifeng; Pugliese, Angelo; Tawa, Gregory; Wallqvist, Anders; Nicklaus, Marc C
Metabolism has been identified as a defining factor in drug development success or failure because of its impact on many aspects of drug pharmacology, including bioavailability, half-life and toxicity. In this article, we provide an outline and descriptions of the resources for metabolism-related property predictions that are currently either freely or commercially available to the public. These resources include databases with data on, and software for prediction of, several end points: metabolite formation, sites of metabolic transformation, binding to metabolizing enzymes and metabolic stability. We attempt to place each tool in historical context and describe, wherever possible, the data it was based on. For predictions of interactions with metabolizing enzymes, we show a typical set of results for a small test set of compounds. Our aim is to give a clear overview of the areas and aspects of metabolism prediction in which the currently available resources are useful and accurate, and the areas in which they are inadequate or missing entirely.
Strait, Robert S.; Pearson, Peter K.; Sengupta, Sailes K.
A password system comprises a set of codewords spaced apart from one another by a Hamming distance (HD) that exceeds twice the variability that can be projected for a series of biometric measurements for a particular individual and that is less than the HD that can be encountered between two individuals. To enroll an individual, a biometric measurement is taken and exclusive-ORed with a random codeword to produce a "reference value." To verify the individual later, a biometric measurement is taken and exclusive-ORed with the reference value to reproduce the original random codeword or its approximation. If the reproduced value is not a codeword, the nearest codeword to it is found, and the bits that were corrected to produce the codeword to it is found, and the bits that were corrected to produce the codeword are also toggled in the biometric measurement taken and the codeword generated during enrollment. The correction scheme can be implemented by any conventional error correction code such as Reed-Muller code R(m,n). In the implementation using a hand geometry device an R(2,5) code has been used in this invention. Such codeword and biometric measurement can then be used to see if the individual is an authorized user. Conventional Diffie-Hellman public key encryption schemes and hashing procedures can then be used to secure the communications lines carrying the biometric information and to secure the database of authorized users.
Wang, Shi; Zhang, Lingling; Matz, Mikhail
Mining for microsatellites (also called simple sequence repeats [SSRs]) in public sequence databases of a common Indo-Pacific coral Acropora millepora identified 191 SSRs from 10 258 expressed sequence tag (EST) and 618 SSRs from 14 625 whole-genome shotgun (WGS) sequences. In contrast to other animals, trinucleotide repeats, rather than dinucleotide repeats, are dominant in the WGS-SSRs, and AAT is the most frequent trinucleotide motif in EST-SSRs. We successfully developed 40 polymorphic markers from EST-SSRs and WGS-SSRs. Both EST- and WGS-SSRs show high levels of polymorphism within corals from the same reef patch. Interestingly, markers WGS079 and WGS227 revealed SSR duplications in a few individuals, suggesting recent duplication events. Genotypic linkage disequilibrium was identified in 5 pairs of SSR markers, which will be invaluable for high-resolution studies of genetic admixture in natural populations of A. millepora. Transferability analysis showed that 25 of these markers can be successfully amplified in one of the most ubiquitous Indo-Pacific corals Acropora hyacinthus. The marker collection reported here is the largest ever developed for any reef-building coral. It holds great potential for addressing coral reef connectivity across the Indo-Pacific with an unprecedented precision, especially taking into account the cross-species transferability of a substantial number of markers.
A major goal of the emerging field of computational toxicology is the development of screening-level models that predict potential toxicity of chemicals from a combination of mechanistic in vitro assay data and chemical structure descriptors. In order to build these models, resea...
Picoult-Newberg, L; Ideker, T E; Pohl, M G; Taylor, S L; Donaldson, M A; Nickerson, D A; Boyce-Jacino, M
There is considerable interest in the discovery and characterization of single nucleotide polymorphisms (SNPs) to enable the analysis of the potential relationships between human genotype and phenotype. Here we present a strategy that permits the rapid discovery of SNPs from publicly available expressed sequence tag (EST) databases. From a set of ESTs derived from 19 different cDNA libraries, we assembled 300,000 distinct sequences and identified 850 mismatches from contiguous EST data sets (candidate SNP sites), without de novo sequencing. Through a polymerase-mediated, single-base, primer extension technique, Genetic Bit Analysis (GBA), we confirmed the presence of a subset of these candidate SNP sites and have estimated the allele frequencies in three human populations with different ethnic origins. Altogether, our approach provides a basis for rapid and efficient regional and genome-wide SNP discovery using data assembled from sequences from different libraries of cDNAs.
Jack, Elkin Terry
This paper examines the development of the concept of the "public" and the "people" in U.S. society. Community problem-solving is an art, and like the art of dance or the game of soccer, the dispositions and skills of communal life are learned by doing and reflecting on what has been done. This essay discusses the "arts" of democracy, including:…
Biofuel Database (Web, free access) This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.
Bao, Wenjun; Schmid, Judith E; Goetz, Amber K; Ren, Hongzu; Dix, David J
Reproductive toxicogenomic studies generate large amounts of toxicological and genomic data. On the toxicology side, a substantial quantity of data accumulates from conventional endpoints such as histology, reproductive physiology and biochemistry. The largest source of genomics data is DNA microarrays, which generate enormous amounts of information in the course of profiling gene expression. Thus, data storage and management become essential and require a more sophisticated system than lab notebooks and electronic spreadsheets. We developed a database for tracking toxicogenomic samples and procedures (TSP 1.0) for our reproductive studies based on the MIAME-Tox guidelines and relational database theory. This database stores the various types of data from both toxicological and genomic assays in a hierarchical fashion. The user-friendly interface provides easy procedures for researchers to add, edit, save, delete, and navigate different records. Finally, TSP facilitates exporting microarray data into public databases.
Wright, Thomas L.; Takahashi, Taeko Jane
The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and s or (if no ) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.
Wright, T.L.; Takahashi, T.J.
The Hawaii bibliographic database has been created to contain all of the literature, from 1779 to the present, pertinent to the volcanological history of the Hawaiian-Emperor volcanic chain. References are entered in a PC- and Macintosh-compatible EndNote Plus bibliographic database with keywords and abstracts or (if no abstract) with annotations as to content. Keywords emphasize location, discipline, process, identification of new chemical data or age determinations, and type of publication. The database is updated approximately three times a year and is available to upload from an ftp site. The bibliography contained 8460 references at the time this paper was submitted for publication. Use of the database greatly enhances the power and completeness of library searches for anyone interested in Hawaiian volcanism.
Song, Qiang; Decato, Benjamin; Hong, Elizabeth E.; Zhou, Meng; Fang, Fang; Qu, Jianghan; Garvin, Tyler; Kessler, Michael; Zhou, Jun; Smith, Andrew D.
DNA methylation is implicated in a surprising diversity of regulatory, evolutionary processes and diseases in eukaryotes. The introduction of whole-genome bisulfite sequencing has enabled the study of DNA methylation at a single-base resolution, revealing many new aspects of DNA methylation and highlighting the usefulness of methylome data in understanding a variety of genomic phenomena. As the number of publicly available whole-genome bisulfite sequencing studies reaches into the hundreds, reliable and convenient tools for comparing and analyzing methylomes become increasingly important. We present MethPipe, a pipeline for both low and high-level methylome analysis, and MethBase, an accompanying database of annotated methylomes from the public domain. Together these resources enable researchers to extract interesting features from methylomes and compare them with those identified in public methylomes in our database. PMID:24324667
The Internet and electronic commerce (e-commerce) generate lots of data. Data must be stored, organized, and managed. Database administrators, or DBAs, work with database software to find ways to do this. They identify user needs, set up computer databases, and test systems. They ensure that systems perform as they should and add people to the…
Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo; Katayama, Toshiaki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data.
Mashima, Jun; Kodama, Yuichi; Fujisawa, Takatomo; Katayama, Toshiaki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa
The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data. PMID:27924010
Lee, Tae-Ho; Kim, Junah; Robertson, Jon S; Paterson, Andrew H
Genome duplication, widespread in flowering plants, is a driving force in evolution. Genome alignments between/within genomes facilitate identification of homologous regions and individual genes to investigate evolutionary consequences of genome duplication. PGDD (the Plant Genome Duplication Database), a public web service database, provides intra- or interplant genome alignment information. At present, PGDD contains information for 47 plants whose genome sequences have been released. Here, we describe methods for identification and estimation of dates of genome duplication and speciation by functions of PGDD.The database is freely available at http://chibba.agtec.uga.edu/duplication/.
Mehlhorn, Hendrik; Lange, Matthias; Scholz, Uwe; Schreiber, Falk
Knowledge found in biomedical databases, in particular in Web information systems, is a major bioinformatics resource. In general, this biological knowledge is worldwide represented in a network of databases. These data is spread among thousands of databases, which overlap in content, but differ substantially with respect to content detail, interface, formats and data structure. To support a functional annotation of lab data, such as protein sequences, metabolites or DNA sequences as well as a semi-automated data exploration in information retrieval environments, an integrated view to databases is essential. Search engines have the potential of assisting in data retrieval from these structured sources, but fall short of providing a comprehensive knowledge except out of the interlinked databases. A prerequisite of supporting the concept of an integrated data view is to acquire insights into cross-references among database entities. This issue is being hampered by the fact, that only a fraction of all possible cross-references are explicitely tagged in the particular biomedical informations systems. In this work, we investigate to what extend an automated construction of an integrated data network is possible. We propose a method that predicts and extracts cross-references from multiple life science databases and possible referenced data targets. We study the retrieval quality of our method and report on first, promising results. The method is implemented as the tool IDPredictor, which is published under the DOI 10.5447/IPK/2012/4 and is freely available using the URL: http://dx.doi.org/10.5447/IPK/2012/4.
Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul
In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.
The outline of JICST factual database (JOIS-F), which JICST has started from January, 1988, and its online service are described in this paper. First, the author mentions the circumstances from 1973, when its planning was started, to the present, and its relation to "Project by Special Coordination Founds for Promoting Science and Technology". Secondly, databases, which are now under development aiming to start its services from fiscal 1988 or fiscal 1989, of DNA, metallic material intensity, crystal structure, chemical substance regulations, and so forth, are described. Lastly, its online service is briefly explained.
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as
Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…
This chapter is a succinct overview of maize data held in the species-specific database MaizeGDB (the Maize Genomics and Genetics Database), and selected multi-species data repositories, such as Gramene/Ensembl Plants, Phytozome, UniProt and the National Center for Biotechnology Information (NCBI), ...
Dziuban, J.; Sears, R.
The U.S. Environmental Protection Agency (EPA) recently developed a searchable database and website for the Environmental Radiation Ambient Monitoring System (ERAMS) data. This site contains nationwide radiation monitoring data for air particulates, precipitation, drinking water, surface water and pasteurized milk. This site provides location-specific as well as national information on environmental radioactivity across several media. It provides high quality data for assessing public exposure and environmental impacts resulting from nuclear emergencies and provides baseline data during routine conditions. The database and website are accessible at www.epa.gov/enviro/. This site contains (1) a query for the general public which is easy to use--limits the amount of information provided, but includes the ability to graph the data with risk benchmarks and (2) a query for a more technical user which allows access to all of the data in the database, (3) background information on ER AMS.
A GIS compiled locational database in Microsoft Access of ~15,000 mines with uranium occurrence or production, primarily in the western United States. The metadata was cooperatively compiled from Federal and State agency data sets and enables the user to conduct geographic and analytical studies on mine impacts on the public and environment.
Irinyi, Laszlo; Serena, Carolina; Garcia-Hermoso, Dea; Arabatzis, Michael; Desnos-Ollivier, Marie; Vu, Duong; Cardinali, Gianluigi; Arthur, Ian; Normand, Anne-Cécile; Giraldo, Alejandra; da Cunha, Keith Cassia; Sandoval-Denis, Marcelo; Hendrickx, Marijke; Nishikaku, Angela Satie; de Azevedo Melo, Analy Salles; Merseguel, Karina Bellinghausen; Khan, Aziza; Parente Rocha, Juliana Alves; Sampaio, Paula; da Silva Briones, Marcelo Ribeiro; e Ferreira, Renata Carmona; de Medeiros Muniz, Mauro; Castañón-Olivares, Laura Rosio; Estrada-Barcenas, Daniel; Cassagne, Carole; Mary, Charles; Duan, Shu Yao; Kong, Fanrong; Sun, Annie Ying; Zeng, Xianyu; Zhao, Zuotao; Gantois, Nausicaa; Botterel, Françoise; Robbertse, Barbara; Schoch, Conrad; Gams, Walter; Ellis, David; Halliday, Catriona; Chen, Sharon; Sorrell, Tania C; Piarroux, Renaud; Colombo, Arnaldo L; Pais, Célia; de Hoog, Sybren; Zancopé-Oliveira, Rosely Maria; Taylor, Maria Lucia; Toriello, Conchita; de Almeida Soares, Célia Maria; Delhaes, Laurence; Stubbe, Dirk; Dromer, Françoise; Ranque, Stéphane; Guarro, Josep; Cano-Lira, Jose F; Robert, Vincent; Velegraki, Aristea; Meyer, Wieland
Human and animal fungal pathogens are a growing threat worldwide leading to emerging infections and creating new risks for established ones. There is a growing need for a rapid and accurate identification of pathogens to enable early diagnosis and targeted antifungal therapy. Morphological and biochemical identification methods are time-consuming and require trained experts. Alternatively, molecular methods, such as DNA barcoding, a powerful and easy tool for rapid monophasic identification, offer a practical approach for species identification and less demanding in terms of taxonomical expertise. However, its wide-spread use is still limited by a lack of quality-controlled reference databases and the evolving recognition and definition of new fungal species/complexes. An international consortium of medical mycology laboratories was formed aiming to establish a quality controlled ITS database under the umbrella of the ISHAM working group on "DNA barcoding of human and animal pathogenic fungi." A new database, containing 2800 ITS sequences representing 421 fungal species, providing the medical community with a freely accessible tool at http://www.isham.org/ and http://its.mycologylab.org/ to rapidly and reliably identify most agents of mycoses, was established. The generated sequences included in the new database were used to evaluate the variation and overall utility of the ITS region for the identification of pathogenic fungi at intra-and interspecies level. The average intraspecies variation ranged from 0 to 2.25%. This highlighted selected pathogenic fungal species, such as the dermatophytes and emerging yeast, for which additional molecular methods/genetic markers are required for their reliable identification from clinical and veterinary specimens.
Hsueh, Aaron J.; Rauch, Rami
ABSTRACT Ovarian Kaleidoscope database (OKdb) is an online, searchable, public database containing text-based and DNA microarray data to facilitate research by ovarian researchers. Using key words and predetermined categories, users can search ovarian gene information based on gene function, cell type of expression, cellular localization, hormonal regulation, mutant phenotypes, chromosomal location, ligand-receptor relationship, and other criteria, either alone or in combination. For individual genes, users can access more than 10 extensive DNA microarray datasets to interrogate gene expression patterns in a development-specific and cell type-specific manner. All ligand and receptor genes expressed in the ovary are matched to facilitate investigation of paracrine/autocrine signaling. More than 3500 ovarian genes in the database are matched to 185 gene pathways in the Kyoto Encyclopedia of Genes and Genomes to allow for elucidation of gene interactions and relationships. In addition to >400 genes with infertility or subfertility phenotypes when mutated in mice or humans, the OKdb also lists ∼50 and ∼40 genes associated with polycystic ovarian syndrome and primary ovarian insufficiency, respectively. The expanding OKdb is updated weekly and allows submission of new genes by ovarian researchers to allow instant access to DNA microarray datasets for newly submitted genes. The present database is a virtual community for ovarian researchers and allows users to instantaneously provide their comments for individual gene pages based on an automated Web-discussion system. In the coming years, we will continue to add new features to serve the ovarian research community. PMID:22441797
Gabriel, Matthew; Boland, Cherisse; Holt, Cydne
Over the past decade, the Combined DNA Index System (CODIS) has increased solvability of violent crimes by linking evidence DNA profiles to known offenders. At present, an in-depth analysis of the United States National DNA Data Bank effort has not assessed the success of this national public safety endeavor. Critics of this effort often focus on laboratory and police investigators unable to provide timely investigative support as a root cause(s) of CODIS' failure to increase public safety. By studying a group of nearly 200 DNA cold hits obtained in SFPD criminal investigations from 2001-2006, three key performance metrics (Significance of Cold Hits, Case Progression & Judicial Resolution, and Potential Reduction of Future Criminal Activity) provide a proper context in which to define the impact of CODIS at the City and County level. Further, the analysis of a recidivist group of cold hit offenders and their past interaction with law enforcement established five noteworthy criminal case resolution trends; these trends signify challenges to CODIS in achieving meaningful case resolutions. CODIS' effectiveness and critical activities to support case resolutions are the responsibility of all criminal justice partners in order to achieve long-lasting public safety within the United States.
Baycin Hizal, Deniz; Wolozny, Daniel; Colao, Joseph; Jacobson, Elena; Tian, Yuan; Krag, Sharon S; Betenbaugh, Michael J; Zhang, Hui
Protein glycosylation serves critical roles in the cellular and biological processes of many organisms. Aberrant glycosylation has been associated with many illnesses such as hereditary and chronic diseases like cancer, cardiovascular diseases, neurological disorders, and immunological disorders. Emerging mass spectrometry (MS) technologies that enable the high-throughput identification of glycoproteins and glycans have accelerated the analysis and made possible the creation of dynamic and expanding databases. Although glycosylation-related databases have been established by many laboratories and institutions, they are not yet widely known in the community. Our study reviews 15 different publicly available databases and identifies their key elements so that users can identify the most applicable platform for their analytical needs. These databases include biological information on the experimentally identified glycans and glycopeptides from various cells and organisms such as human, rat, mouse, fly and zebrafish. The features of these databases - 7 for glycoproteomic data, 6 for glycomic data, and 2 for glycan binding proteins are summarized including the enrichment techniques that are used for glycoproteome and glycan identification. Furthermore databases such as Unipep, GlycoFly, GlycoFish recently established by our group are introduced. The unique features of each database, such as the analytical methods used and bioinformatical tools available are summarized. This information will be a valuable resource for the glycobiology community as it presents the analytical methods and glycosylation related databases together in one compendium. It will also represent a step towards the desired long term goal of integrating the different databases of glycosylation in order to characterize and categorize glycoproteins and glycans better for biomedical research.
Provides reviews of 10 online databases: Consumer Reports; Public Opinion Online; Encyclopedia of Associations; Official Airline Guide Adventure Atlas and Events Calendar; CENDATA; Hollywood Hotline; Fearless Taster; Soap Opera Summaries; and Human Sexuality. (LRW)
Access to database information in libraries will increase as licenses for tape loading of data onto public access catalogs becomes more widespread. Institutions with adequate storage capacity will have full text databases, and the adoption of the Z39.50 standard, which allows differing computer systems to interface with each other, will increase…
Craig, N.; Chen, T.; Hawkins, I.; Fruscione, A.
The EUVE survey database contains fundamental science data for 9000 potential source locations (pigeonholes) in the sky. The first release of the Bright Source List is now available to the public through an interface with the NASA Astrophysical Data System. We describe the database schema design and the EUVE source categorization algorithm that compares sources to the ROSAT Wide Field Camera source list.
Robinson, James; Marsh, Steven G E
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer cell immunoglobulin-like receptors (KIRs); IPD-MHC, a database of sequences of the major histocompatibility complex (MHC) of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC, share the same database structure.
Bellmore, J. Ryan; Vittum, Katherine; Duda, Jeff J.; Greene, Samantha L.
This database is the result of an extensive literature search aimed at identifying documents relevant to the emerging field of dam removal science. In total the database contains 179 citations that contain empirical monitoring information associated with 130 different dam removals across the United States and abroad. Data includes publications through 2014 and supplemented with the U.S. Army Corps of Engineers National Inventory of Dams database, U.S. Geological Survey National Water Information System and aerial photos to estimate locations when coordinates were not provided. Publications were located using the Web of Science, Google Scholar, and Clearinghouse for Dam Removal Information.
Vanschoren, Joaquin; Blockeel, Hendrik
Next to running machine learning algorithms based on inductive queries, much can be learned by immediately querying the combined results of many prior studies. Indeed, all around the globe, thousands of machine learning experiments are being executed on a daily basis, generating a constant stream of empirical information on machine learning techniques. While the information contained in these experiments might have many uses beyond their original intent, results are typically described very concisely in papers and discarded afterwards. If we properly store and organize these results in central databases, they can be immediately reused for further analysis, thus boosting future research. In this chapter, we propose the use of experiment databases: databases designed to collect all the necessary details of these experiments, and to intelligently organize them in online repositories to enable fast and thorough analysis of a myriad of collected results. They constitute an additional, queriable source of empirical meta-data based on principled descriptions of algorithm executions, without reimplementing the algorithms in an inductive database. As such, they engender a very dynamic, collaborative approach to experimentation, in which experiments can be freely shared, linked together, and immediately reused by researchers all over the world. They can be set up for personal use, to share results within a lab or to create open, community-wide repositories. Here, we provide a high-level overview of their design, and use an existing experiment database to answer various interesting research questions about machine learning algorithms and to verify a number of recent studies.
Reverón Palenzuela, Benito
In this work one studies the new "Ley Orgánica" 10/2007 where it regulates the data base police on identifiers obtained from the DNA. And, particularly, the procedural aspects that the same one contains. With this new regulation, the Spanish legislator completes the dispositions on the use of the criminal identification from the DNA techniques, that they were already contained in the Spanish Law of Criminal Judgment. Thus Spain to the rest of countries of our surroundings adds itself in which already it was counted on this type of data bases, very useful and effectiveness for the investigation of certain crimes with special gravity.
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will
Parker, Meaghan; Stones-Havas, Steven; Starger, Craig; Meyer, Christopher
In the field of molecular biology, laboratory information management systems (LIMSs) have been created to track workflows through a process pipeline. For the purposes of DNA barcoding, this workflow involves tracking tissues through extraction, PCR, cycle sequencing, and consensus assembly. Importantly, a LIMS that serves the DNA barcoding community must link required elements for public submissions (e.g., primers, trace files) that are generated in the molecular lab with specimen metadata. Here, we demonstrate an example workflow of a specimen's entry into the LIMS database to the publishing of the specimen's genetic data to a public database using Geneious bioinformatics software. Throughout the process, the connections between steps in the workflow are maintained to facilitate post-processing annotation, structured reporting, and fully transparent edits to reduce subjectivity and increase repeatability.
The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air- conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on R-32, R-123, R-124, R- 125, R-134a, R-141b, R142b, R-143a, R-152a, R-290 (propane), R-717 (ammonia), ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses polyalkylene glycol (PAG), ester, and other lubricants. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits.
Splendiani, Andrea; Brandizi, Marco; Even, Gael; Beretta, Ottavio; Pavelka, Norman; Pelizzola, Mattia; Mayhaus, Manuel; Foti, Maria; Mauri, Giancarlo; Ricciardi-Castagnoli, Paola
Background Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. Results The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip® platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. Conclusion The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local
SRD 106 IUPAC-NIST Solubility Database (Web, free access) These solubilities are compiled from 18 volumes (Click here for List) of the International Union for Pure and Applied Chemistry(IUPAC)-NIST Solubility Data Series. The database includes liquid-liquid, solid-liquid, and gas-liquid systems. Typical solvents and solutes include water, seawater, heavy water, inorganic compounds, and a variety of organic compounds such as hydrocarbons, halogenated hydrocarbons, alcohols, acids, esters and nitrogen compounds. There are over 67,500 solubility measurements and over 1800 references.
Jäger, Anne C; Alvarez, Michelle L; Davis, Carey P; Guzmán, Ernesto; Han, Yonmee; Way, Lisa; Walichiewicz, Paulina; Silva, David; Pham, Nguyen; Caves, Glorianna; Bruand, Jocelyne; Schlesinger, Felix; Pond, Stephanie J K; Varlaro, Joe; Stephens, Kathryn M; Holt, Cydne L
Human DNA profiling using PCR at polymorphic short tandem repeat (STR) loci followed by capillary electrophoresis (CE) size separation and length-based allele typing has been the standard in the forensic community for over 20 years. Over the last decade, Next-Generation Sequencing (NGS) matured rapidly, bringing modern advantages to forensic DNA analysis. The MiSeq FGx™ Forensic Genomics System, comprised of the ForenSeq™ DNA Signature Prep Kit, MiSeq FGx™ Reagent Kit, MiSeq FGx™ instrument and ForenSeq™ Universal Analysis Software, uses PCR to simultaneously amplify up to 231 forensic loci in a single multiplex reaction. Targeted loci include Amelogenin, 27 common, forensic autosomal STRs, 24 Y-STRs, 7 X-STRs and three classes of single nucleotide polymorphisms (SNPs). The ForenSeq™ kit includes two primer sets: Amelogenin, 58 STRs and 94 identity informative SNPs (iiSNPs) are amplified using DNA Primer Set A (DPMA; 153 loci); if a laboratory chooses to generate investigative leads using DNA Primer Set B, amplification is targeted to the 153 loci in DPMA plus 22 phenotypic informative (piSNPs) and 56 biogeographical ancestry SNPs (aiSNPs). High-resolution genotypes, including detection of intra-STR sequence variants, are semi-automatically generated with the ForenSeq™ software. This system was subjected to developmental validation studies according to the 2012 Revised SWGDAM Validation Guidelines. A two-step PCR first amplifies the target forensic STR and SNP loci (PCR1); unique, sample-specific indexed adapters or "barcodes" are attached in PCR2. Approximately 1736 ForenSeq™ reactions were analyzed. Studies include DNA substrate testing (cotton swabs, FTA cards, filter paper), species studies from a range of nonhuman organisms, DNA input sensitivity studies from 1ng down to 7.8pg, two-person human DNA mixture testing with three genotype combinations, stability analysis of partially degraded DNA, and effects of five commonly encountered PCR
The ECOTOXicology database (ECOTOX) is a comprehensive, publicly available knowledgebase developed and maintained by ORD/NHEERL. It is used for environmental toxicity data on aquatic life, terrestrial plants and wildlife. Publications are identified for potential applicability af...
Rots, A. H.; Winkelman, S. L.; Paltani, S.; Blecksmith, S. E.; Bright, J. D.
Early in the mission, the Chandra Data Archive started the development of a bibliography database, tracking publications in refereed journals and on-line conference proceedings that are based on Chandra observations, allowing our users to link directly to articles in the ADS from our archive, and to link to the relevant data in the archive from the ADS entries. Subsequently, we have been working closely with the ADS and other data centers, in the context of the ADEC-ITWG, on standardizing the literature-data linking. We have also extended our bibliography database to include all Chandra-related articles and we are also keeping track of the number of citations of each paper. Obviously, in addition to providing valuable services to our users, this database allows us to extract a wide variety of statistical information. The project comprises five components: the bibliography database-proper, a maintenance database, an interactive maintenance tool, a user browsing interface, and a web services component for exchanging information with the ADS. All of these elements are nearly mission-independent and we intend make the package as a whole available for use by other data centers. The capabilities thus provided represent support for an essential component of the Virtual Observatory.
Angermeier, Paul L.; Frimpong, Emmanuel A.
The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.
Mariño-Ramírez, Leonardo; Levine, Kevin M; Morales, Mario; Zhang, Suiyuan; Moreland, R Travis; Baxevanis, Andreas D; Landsman, David
Eukaryotic chromatin is composed of DNA and protein components-core histones-that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins.
John, L. M.; Drake, J.
The publicly accessible databases for the Extreme Ultraviolet Explorer include: the EUVE Archive mailserver; the CEA ftp site; the EUVE Guest Observer Mailserver; and the Astronomical Data System node. The EUVE Performance Assurance team is responsible for verifying that these public EUVE databases are working properly, and that the public availability of EUVE data contained therein does not infringe any data rights which may have been assigned. In this poster, we describe the Quality Assurance (QA) procedures we have developed from the approach of QA as a service organization, thus reflecting the overall EUVE philosophy of Quality Assurance integrated into normal operating procedures, rather than imposed as an external, post facto, control mechanism.
Library of Congress, Washington, DC. Copyright Office.
This description of the copyright protection available for automated databases provides a definition of an automated database; discusses the extent of copyright protection, i.e., the compilation of facts; explains copyright registration and what constitutes publication of a database; and describes the procedures for registering both published and…
Review of developments in databases highlights a new emphasis on accessibility. Topics discussed include the internationalization of databases; databases that deal with finance, drugs, and toxic waste; access to public records, both personal and corporate; media online; reducing large files of data to smaller, more manageable files; and…
Heller, Stephen R.
Describes the different approaches taken by the Chemical Abstracts Services database, which abstracts and indexes chemical publications, and the Belstein Handbook of Organic Chemistry database, which produces a collection of critical reviews. The resulting content of the databases and their ability to meet the needs of different users are…
A few problems in the generic nomenclature of insects and amphibians, with recommendations for the publication of new generic nomina in zootaxonomy and comments on taxonomic and nomenclatural databases and websites.
Dahanukar et al. (2016a) proposed the nomen Walkerana for a new genus of amphibians, but shortly after (2016b) they replaced it by the new nomen Sallywalkerana, believing that their nomen Walkerana was preoccupied by a generic nomen of orthopterans. This was unjustified because the orthopteran nomen 'Walkerella' Otte & Perez-Gelabert, 2009a and its new replacement nomen 'Walkerana' Otte & Perez-Gelabert, 2009b were both nomina nuda. These recent examples of nomenclatural errors in generic nomenclature are just a few among many in recent zootaxonomic publications. This opportunity is taken to make some general methodological recommendations, in several domains (availability, homonymy, synonymy, neonymy, length and palatability of nomina), for the publication of new generic nomina in zootaxonomy. However, the absence of a comprehensive database and website providing all the relevant information necessary to establish the nomenclatural status of all zoological generic and subgeneric nomina is a brake on the efforts that can be made to avoid nomenclatural errors in zoological generic nomenclature. The international community of taxonomists should seek at establishing such a database and website.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F.; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S.
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as ‘single exon genes’ (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs. Database URL: www.sinex.cl PMID:27278816
Bolser, Dan M.; Chibon, Pierre-Yves; Palopoli, Nicolas; Gong, Sungsam; Jacob, Daniel; Angel, Victoria Dominguez Del; Swan, Dan; Bassi, Sebastian; González, Virginia; Suravajhala, Prashanth; Hwang, Seungwoo; Romano, Paolo; Edwards, Rob; Bishop, Bryan; Eargle, John; Shtatland, Timur; Provart, Nicholas J.; Clements, Dave; Renfro, Daniel P.; Bhak, Daeui; Bhak, Jong
Biology is generating more data than ever. As a result, there is an ever increasing number of publicly available databases that analyse, integrate and summarize the available data, providing an invaluable resource for the biological community. As this trend continues, there is a pressing need to organize, catalogue and rate these resources, so that the information they contain can be most effectively exploited. MetaBase (MB) (http://MetaDatabase.Org) is a community-curated database containing more than 2000 commonly used biological databases. Each entry is structured using templates and can carry various user comments and annotations. Entries can be searched, listed, browsed or queried. The database was created using the same MediaWiki technology that powers Wikipedia, allowing users to contribute on many different levels. The initial release of MB was derived from the content of the 2007 Nucleic Acids Research (NAR) Database Issue. Since then, approximately 100 databases have been manually collected from the literature, and users have added information for over 240 databases. MB is synchronized annually with the static Molecular Biology Database Collection provided by NAR. To date, there have been 19 significant contributors to the project; each one is listed as an author here to highlight the community aspect of the project. PMID:22139927
Turner, Erick H.; Knoepflmacher, Daniel; Shapley, Lee
Background Publication bias compromises the validity of evidence-based medicine, yet a growing body of research shows that this problem is widespread. Efficacy data from drug regulatory agencies, e.g., the US Food and Drug Administration (FDA), can serve as a benchmark or control against which data in journal articles can be checked. Thus one may determine whether publication bias is present and quantify the extent to which it inflates apparent drug efficacy. Methods and Findings FDA Drug Approval Packages for eight second-generation antipsychotics—aripiprazole, iloperidone, olanzapine, paliperidone, quetiapine, risperidone, risperidone long-acting injection (risperidone LAI), and ziprasidone—were used to identify a cohort of 24 FDA-registered premarketing trials. The results of these trials according to the FDA were compared with the results conveyed in corresponding journal articles. The relationship between study outcome and publication status was examined, and effect sizes derived from the two data sources were compared. Among the 24 FDA-registered trials, four (17%) were unpublished. Of these, three failed to show that the study drug had a statistical advantage over placebo, and one showed the study drug was statistically inferior to the active comparator. Among the 20 published trials, the five that were not positive, according to the FDA, showed some evidence of outcome reporting bias. However, the association between trial outcome and publication status did not reach statistical significance. Further, the apparent increase in the effect size point estimate due to publication bias was modest (8%) and not statistically significant. On the other hand, the effect size for unpublished trials (0.23, 95% confidence interval 0.07 to 0.39) was less than half that for the published trials (0.47, 95% confidence interval 0.40 to 0.54), a difference that was significant. Conclusions The magnitude of publication bias found for antipsychotics was less than that found
Rosenbloom, Kate R; Armstrong, Joel; Barber, Galt P; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R; Fujita, Pauline A; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T; Li, Chin H; Miga, Karen H; Nguyen, Ngan; Paten, Benedict; Raney, Brian J; Smit, Arian F A; Speir, Matthew L; Zweig, Ann S; Haussler, David; Kuhn, Robert M; Kent, W James
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), 'mined the web' for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled.
Mashima, Jun; Kodama, Yuichi; Kosuge, Takehide; Fujisawa, Takatomo; Katayama, Toshiaki; Nagasaki, Hideki; Okuda, Yoshihiro; Kaminuma, Eli; Ogasawara, Osamu; Okubo, Kousaku; Nakamura, Yasukazu; Takagi, Toshihisa
The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration. PMID:26578571
Wing, Louise; Massoud, Tarik F
Quantitative, qualitative, and innovative application of bibliometric research performance indicators to anatomy and radiology research and education can enhance cross-fertilization between the two disciplines. We aim to use these indicators to identify long-term trends in dissemination of publications in neuroimaging anatomy (including both productivity and citation rates), which has subjectively waned in prestige during recent years. We examined publications over the last 40 years in two neuroradiological journals, AJNR and Neuroradiology, and selected and categorized all neuroimaging anatomy research articles according to theme and type. We studied trends in their citation activity over time, and mathematically analyzed these trends for 1977, 1987, and 1997 publications. We created a novel metric, "citation half-life at 10 years postpublication" (CHL-10), and used this to examine trends in the skew of citation numbers for anatomy articles each year. We identified 367 anatomy articles amongst a total of 18,110 in these journals: 74.2% were original articles, with study of normal anatomy being the commonest theme (46.7%). We recorded a mean of 18.03 citations for each anatomy article, 35% higher than for general neuroradiology articles. Graphs summarizing the rise (upslope) in citation rates after publication revealed similar trends spanning two decades. CHL-10 trends demonstrated that more recently published anatomy articles were likely to take longer to reach peak citation rate. Bibliometric analysis suggests that anatomical research in neuroradiology is not languishing. This novel analytical approach can be applied to other aspects of neuroimaging research, and within other subspecialties in radiology and anatomy, and also to foster anatomical education.
Mogul, Rakesh; Keagy, Laura; Nava, Argelia; Zerehi, Farah
The microbial inventories within the assembly facilities for spacecraft represent the primary pool of forward contaminants that may compromise life-detection missions. Accordingly, we are constructing a meta-database of these microorganisms for the purpose of building a bioinformatic resource for planetary protection and astrobiology-related endeavors. Using student-led efforts, the meta-database is being constructed from literature reports and is inclusive of both isolated microorganisms and those solely detected through DNA-based techniques. The Spacecraft-Associated Microbial Meta-database (SAMM) currently includes over 800 entries that are organized using 32 meta-tags involving taxonomy, location of isolation (facility and component), category of characterization (culture and/or genetic), types of characterizations (e.g., culture, 16s rDNA, phylochip, FAME, and DNA hybridization), growth conditions, Gram stain, and general physiological traits (e.g., sporulation, extremotolerance, and respiration properties). Interrogations on the database show that the cleanrooms at Kennedy Space Center (KSC) are ~ 2-fold greater in diversity in bacterial genera when compared to the Jet Propulsion Laboratory (JPL), and that bacteria related to water, plant, and human environments are more often associated with the KSC-specific genera. These results are parallel to those reported in the literature, and hence serve as benchmarks demonstrating the bioinformatic potential of this meta-database. The ultimate plans for SAMM include public availability, expansion through crowdsourcing efforts, and potential use as a companion resource to the culture collections assembled by DSMZ and JPL.
Sørensen, Sarah Mejer; Bjørn, Signe Frahm; Jochumsen, Kirsten Marie; Jensen, Pernille Tine; Thranov, Ingrid Regitze; Hare-Bruun, Helle; Seibæk, Lene; Høgdall, Claus
Aim of database The Danish Gynecological Cancer Database (DGCD) is a nationwide clinical cancer database and its aim is to monitor the treatment quality of Danish gynecological cancer patients, and to generate data for scientific purposes. DGCD also records detailed data on the diagnostic measures for gynecological cancer. Study population DGCD was initiated January 1, 2005, and includes all patients treated at Danish hospitals for cancer of the ovaries, peritoneum, fallopian tubes, cervix, vulva, vagina, and uterus, including rare histological types. Main variables DGCD data are organized within separate data forms as follows: clinical data, surgery, pathology, pre- and postoperative care, complications, follow-up visits, and final quality check. DGCD is linked with additional data from the Danish “Pathology Registry”, the “National Patient Registry”, and the “Cause of Death Registry” using the unique Danish personal identification number (CPR number). Descriptive data Data from DGCD and registers are available online in the Statistical Analysis Software portal. The DGCD forms cover almost all possible clinical variables used to describe gynecological cancer courses. The only limitation is the registration of oncological treatment data, which is incomplete for a large number of patients. Conclusion The very complete collection of available data from more registries form one of the unique strengths of DGCD compared to many other clinical databases, and provides unique possibilities for validation and completeness of data. The success of the DGCD is illustrated through annual reports, high coverage, and several peer-reviewed DGCD-based publications. PMID:27822089
Wright, J. T.; Fakhouri, O.; Marcy, G. W.; Han, E.; Feng, Y.; Johnson, John Asher; Howard, A. W.; Fischer, D. A.; Valenti, J. A.; Anderson, J.; Piskunov, N.
We present a database of well-determined orbital parameters of exoplanets, and their host stars’ properties. This database comprises spectroscopic orbital elements measured for 427 planets orbiting 363 stars from radial velocity and transit measurements as reported in the literature. We have also compiled fundamental transit parameters, stellar parameters, and the method used for the planets discovery. This Exoplanet Orbit Database includes all planets with robust, well measured orbital parameters reported in peer-reviewed articles. The database is available in a searchable, filterable, and sortable form online through the Exoplanets Data Explorer table, and the data can be plotted and explored through the Exoplanet Data Explorer plotter. We use the Data Explorer to generate publication-ready plots, giving three examples of the signatures of exoplanet migration and dynamical evolution: We illustrate the character of the apparent correlation between mass and period in exoplanet orbits, the different selection biases between radial velocity and transit surveys, and that the multiplanet systems show a distinct semimajor-axis distribution from apparently singleton systems.
The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage.
Szeverenyi, Ildiko; Cassidy, Andrew J; Chung, Cheuk Wang; Lee, Bernett T K; Common, John E A; Ogg, Stephen C; Chen, Huijia; Sim, Shu Yin; Goh, Walter L P; Ng, Kee Woei; Simpson, John A; Chee, Li Lian; Eng, Goi Hui; Li, Bin; Lunny, Declan P; Chuon, Danny; Venkatesh, Aparna; Khoo, Kian Hoe; McLean, W H Irwin; Lim, Yun Ping; Lane, E Birgitte
We describe a revised and expanded database on human intermediate filament proteins, a major component of the eukaryotic cytoskeleton. The family of 70 intermediate filament genes (including those encoding keratins, desmins, and lamins) is now known to be associated with a wide range of diverse diseases, at least 72 distinct human pathologies, including skin blistering, muscular dystrophy, cardiomyopathy, premature aging syndromes, neurodegenerative disorders, and cataract. To date, the database catalogs 1,274 manually-curated pathogenic sequence variants and 170 allelic variants in intermediate filament genes from over 459 peer-reviewed research articles. Unrelated cases were collected from all of the six sequence homology groups and the sequence variations were described at cDNA and protein levels with links to the related diseases and reference articles. The mutations and polymorphisms are presented in parallel with data on protein structure, gene, and chromosomal location and basic information on associated diseases. Detailed statistics relating to the variants records in the database are displayed by homology group, mutation type, affected domain, associated diseases, and nucleic and amino acid substitutions. Multiple sequence alignment algorithms can be run from queries to determine DNA or protein sequence conservation. Literature sources can be interrogated within the database and external links are provided to public databases. The database is freely and publicly accessible online at www.interfil.org (last accessed 13 September 2007). Users can query the database by various keywords and the search results can be downloaded. It is anticipated that the Human Intermediate Filament Database (HIFD) will provide a useful resource to study human genome variations for basic scientists, clinicians, and students alike.
Peters, Robert; Jaffe, Bruce E.
This report describes a database of sedimentary characteristics of tsunami deposits derived from published accounts of tsunami deposit investigations conducted shortly after the occurrence of a tsunami. The database contains 228 entries, each entry containing data from up to 71 categories. It includes data from 51 publications covering 15 tsunamis distributed between 16 countries. The database encompasses a wide range of depositional settings including tropical islands, beaches, coastal plains, river banks, agricultural fields, and urban environments. It includes data from both local tsunamis and teletsunamis. The data are valuable for interpreting prehistorical, historical, and modern tsunami deposits, and for the development of criteria to identify tsunami deposits in the geologic record.
Atomic and molecular data are required in a variety of fields ranging from the traditional astronomy, atmospherics and fusion research to fast growing technologies such as lasers, lighting, low-temperature plasmas, plasma assisted etching and radiotherapy. In this context, there are some research groups, both theoretical and experimental, scattered round the world that attend to most of this data demand, but the implementation of atomic databases has grown independently out of sheer necessity. In some cases the latter has been associated with the data production process or with data centers involved in data collection and evaluation; but sometimes it has been the result of individual initiatives that have been quite successful. In any case, the development and maintenance of atomic databases call for a number of skills and an entrepreneurial spirit that are not usually associated with most physics researchers. In the present report we present some of the highlights in this area in the past five years and discuss what we think are some of the main issues that have to be addressed.
Next-generation sequencing (NGS) technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. Escherichia coli DNA polymerase I proteolytic (Klenow) fragment was originally utilized in Sanger's dideoxy chain-terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE) and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage ϕ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphate-labeled nucleoside polyphosphates. Furthermore, ϕ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistor-based sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.
Fidelis, K; Adzhubej, A; Kryshtafovych, A; Daniluk, P
The phenomenal success of the genome sequencing projects reveals the power of completeness in revolutionizing biological science. Currently it is possible to sequence entire organisms at a time, allowing for a systemic rather than fractional view of their organization and the various genome-encoded functions. There is an international plan to move towards a similar goal in the area of protein structure. This will not be achieved by experiment alone, but rather by a combination of efforts in crystallography, NMR spectroscopy, and computational modeling. Only a small fraction of structures are expected to be identified experimentally, the remainder to be modeled. Presently there is no organized infrastructure to critically evaluate and present these data to the biological community. The goal of the Protein Model Database project is to create such infrastructure, including (1) public database of theoretically derived protein structures; (2) reliable annotation of protein model quality, (3) novel structure analysis tools, and (4) access to the highest quality modeling techniques available.
The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air- conditioning and refrigeration equipment. The database identifies sources of specific information on R-32, R-123, R-124, R-125, R-134, R-134a, R-141b, R-142b, R-143a, R-152a, R-245ca, R-290 (propane), R- 717 (ammonia), ethers, and others as well as azeotropic and zeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, ester, and other synthetics as well as mineral oils. It also references documents on compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. A computerized version is available that includes retrieval software.
Nowacki, P M; Byck, S; Prevost, L; Scriver, C R
PAHdb (http://www.mcgill.ca/pahdb ) is a curated relational database (Fig. 1) of nucleotide variation in the human PAH cDNA (GenBank U49897). Among 328 different mutations by state (Fig. 2) the majority are rare mutations causing hyperphenylalaninemia (HPA) (OMIM 261600), the remainder are polymorphic variants without apparent effect on phenotype. PAHdb modules contain mutations, polymorphic haplotypes, genotype-phenotype correlations, expression analysis, sources of information and the reference sequence; the database also contains pages of clinical information and data on three ENU mouse orthologues of human HPA. Only six different mutations account for 60% of human HPA chromosomes worldwide, mutations stratify by population and geographic region, and the Oriental and Caucasian mutation sets are different (Fig. 3). PAHdb provides curated electronic publication and one third of its incoming reports are direct submissions. Each different mutation receives a systematic (nucleotide) name and a unique identifier (UID). Data are accessed both by a Newsletter and a search engine on the website; integrity of the database is ensured by keeping the curated template offline. There have been >6500 online interrogations of the website. PMID:9399840
Schwartz-Marín, Ernesto; Wade, Peter; Cruz-Santiago, Arely; Cárdenas, Roosbelinda
This article examines the role that vernacular notions of racialized-regional difference play in the constitution and stabilization of DNA populations in Colombian forensic science, in what we frame as a process of public science. In public science, the imaginations of the scientific world and common-sense public knowledge are integral to the production and circulation of science itself. We explore the origins and circulation of a scientific object – ‘La Tabla’, published in Paredes et al. and used in genetic forensic identification procedures – among genetic research institutes, forensic genetics laboratories and courtrooms in Bogotá. We unveil the double life of this central object of forensic genetics. On the one hand, La Tabla enjoys an indisputable public place in the processing of forensic genetic evidence in Colombia (paternity cases, identification of bodies, etc.). On the other hand, the relations it establishes between ‘race’, geography and genetics are questioned among population geneticists in Colombia. Although forensic technicians are aware of the disputes among population geneticists, they use and endorse the relations established between genetics, ‘race’ and geography because these fit with common-sense notions of visible bodily difference and the regionalization of race in the Colombian nation. PMID:27480000
Schwartz-Marín, Ernesto; Wade, Peter; Cruz-Santiago, Arely; Cárdenas, Roosbelinda
Abstract This article examines the role that vernacular notions of racialized-regional difference play in the constitution and stabilization of DNA populations in Colombian forensic science, in what we frame as a process of public science. In public science, the imaginations of the scientific world and common-sense public knowledge are integral to the production and circulation of science itself. We explore the origins and circulation of a scientific object--'La Tabla', published in Paredes et al. and used in genetic forensic identification procedures--among genetic research institutes, forensic genetics laboratories and courtrooms in Bogotá. We unveil the double life of this central object of forensic genetics. On the one hand, La Tabla enjoys an indisputable public place in the processing of forensic genetic evidence in Colombia (paternity cases, identification of bodies, etc.). On the other hand, the relations it establishes between 'race', geography and genetics are questioned among population geneticists in Colombia. Although forensic technicians are aware of the disputes among population geneticists, they use and endorse the relations established between genetics, 'race' and geography because these fit with common-sense notions of visible bodily difference and the regionalization of race in the Colombian nation.
deVarvalho, Robert; Desai, Shailen D.; Haines, Bruce J.; Kruizinga, Gerhard L.; Gilmer, Christopher
This software provides storage retrieval and analysis functionality for managing satellite altimetry data. It improves the efficiency and analysis capabilities of existing database software with improved flexibility and documentation. It offers flexibility in the type of data that can be stored. There is efficient retrieval either across the spatial domain or the time domain. Built-in analysis tools are provided for frequently performed altimetry tasks. This software package is used for storing and manipulating satellite measurement data. It was developed with a focus on handling the requirements of repeat-track altimetry missions such as Topex and Jason. It was, however, designed to work with a wide variety of satellite measurement data [e.g., Gravity Recovery And Climate Experiment -- GRACE). The software consists of several command-line tools for importing, retrieving, and analyzing satellite measurement data.
Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data
Tosar, Juan Pablo; Rovira, Carlos; Naya, Hugo; Cayota, Alfonso
The report that exogenous plant miRNAs are able to cross the mammalian gastrointestinal tract and exert gene-regulation mechanism in mammalian tissues has yielded a lot of controversy, both in the public press and the scientific literature. Despite the initial enthusiasm, reproducibility of these results was recently questioned by several authors. To analyze the causes of this unease, we searched for diet-derived miRNAs in deep-sequencing libraries performed by ourselves and others. We found variable amounts of plant miRNAs in publicly available small RNA-seq data sets of human tissues. In human spermatozoa, exogenous RNAs reached extreme, biologically meaningless levels. On the contrary, plant miRNAs were not detected in our sequencing of human sperm cells, which was performed in the absence of any known sources of plant contamination. We designed an experiment to show that cross-contamination during library preparation is a source of exogenous RNAs. These contamination-derived exogenous sequences even resisted oxidation with sodium periodate. To test the assumption that diet-derived miRNAs were actually contamination-derived, we sought in the literature for previous sequencing reports performed by the same group which reported the initial finding. We analyzed the spectra of plant miRNAs in a small RNA sequencing study performed in amphioxus by this group in 2009 and we found a very strong correlation with the plant miRNAs which they later reported in human sera. Even though contamination with exogenous sequences may be easy to detect, cross-contamination between samples from the same organism can go completely unnoticed, possibly affecting conclusions derived from NGS transcriptomics. PMID:24729469
Colon cancer is the second leading cause of cancer death in the United States. A key issue in treating colon cancer patients is inability to accurately predict tumors that have metastatic potential and require adjuvant chemotherapy. This project will test the model that tumor metastases arise from intra-tumor heterogeneity generated by DNA methylation events, and that detecting these events can provide a predictve signature of tumors with poor outcome
Robinson, James; Halliwell, Jason A; Hayhurst, James D; Flicek, Paul; Parham, Peter; Marsh, Steven G E
The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities.
Robinson, James; Halliwell, Jason A.; Hayhurst, James D.; Flicek, Paul; Parham, Peter; Marsh, Steven G. E.
The Immuno Polymorphism Database (IPD) was developed to provide a centralized system for the study of polymorphism in genes of the immune system. Through the IPD project we have established a central platform for the curation and publication of locus-specific databases involved either directly or related to the function of the Major Histocompatibility Complex in a number of different species. We have collaborated with specialist groups or nomenclature committees that curate the individual sections before they are submitted to IPD for online publication. IPD consists of five core databases, with the IMGT/HLA Database as the primary database. Through the work of the various nomenclature committees, the HLA Informatics Group and in collaboration with the European Bioinformatics Institute we are able to provide public access to this data through the website http://www.ebi.ac.uk/ipd/. The IPD project continues to develop with new tools being added to address scientific developments, such as Next Generation Sequencing, and to address user feedback and requests. Regular updates to the website ensure that new and confirmatory sequences are dispersed to the immunogenetics community, and the wider research and clinical communities. PMID:25414341
Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla
The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.
Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika; Tanaka, Yoshihiro; Teranishi, Kristen S.; Sunagawa, Shinichi; Wong, Mike; Stillman, Jonathon H.
Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set of tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in
The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to property, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufacturers and those using alterative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on various refrigerants. It addresses lubricants including alkylbenzene, polyalkylene glycol, polyolester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents. They are included to accelerate availability of the information and will be completed or replaced in future updates.
The Refrigerant Database is an information system on alternative refrigerants, associated lubricants, and their use in air conditioning and refrigeration. It consolidates and facilitates access to property, compatibility, environmental, safety, application and other information. It provides corresponding information on older refrigerants, to assist manufactures and those using alternative refrigerants, to make comparisons and determine differences. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included, though some may be added at a later date. The database identifies sources of specific information on many refrigerants including propane, ammonia, water, carbon dioxide, propylene, ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, polyolester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents. They are included to accelerate availability of the information and will be completed or replaced in future updates.
Groom, Colin R; Bruno, Ian J; Lightfoot, Matthew P; Ward, Suzanna C
The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal-organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.
The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern. The database provides bibliographic citations and abstracts for publications that may be useful in research and design of air-conditioning and refrigeration equipment. The complete documents are not included. The database identifies sources of specific information on R-32, R-123, R-124, R-125, R-134, R-134a, R-141b, R-142b, R-143a, R-152a, R-245ca, R-290 (propane), R-717 (ammonia), ethers, and others as well as azeotropic and zeotropic blends of these fluids. It addresses lubricants including alkylbenzene, polyalkylene glycol, ester, and other synthetics as well as mineral oils. It also references documents addressing compatibility of refrigerants and lubricants with metals, plastics, elastomers, motor insulation, and other materials used in refrigerant circuits. Incomplete citations or abstracts are provided for some documents to accelerate availability of the information and will be completed or replaced in future updates.
Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.
The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719
Chan, Agnes P; Pertea, Geo; Cheung, Foo; Lee, Dan; Zheng, Li; Whitelaw, Cathy; Pontaroli, Ana C; SanMiguel, Phillip; Yuan, Yinan; Bennetzen, Jeffrey; Barbazuk, William Brad; Quackenbush, John; Rabinowicz, Pablo D
Maize is a staple crop of the grass family and also an excellent model for plant genetics. Owing to the large size and repetitiveness of its genome, we previously investigated two approaches to accelerate gene discovery and genome analysis in maize: methylation filtration and high C(0)t selection. These techniques allow the construction of gene-enriched genomic libraries by minimizing repeat sequences due to either their methylation status or their copy number, yielding a 7-fold enrichment in genic sequences relative to a random genomic library. Approximately 900,000 gene-enriched reads from maize were generated and clustered into Assembled Zea mays (AZM) sequences. Here we report the current AZM release, which consists of approximately 298 Mb representing 243,807 sequence assemblies and singletons. In order to provide a repository of publicly available maize genomic sequences, we have created the TIGR Maize Database (http://maize.tigr.org). In this resource, we have assembled and annotated the AZMs and used available sequenced markers to anchor AZMs to maize chromosomes. We have constructed a maize repeat database and generated draft sequence assemblies of 287 maize bacterial artificial chromosome (BAC) clone sequences, which we annotated along with 172 additional publicly available BAC clones. All sequences, assemblies and annotations are available at the project website via web interfaces and FTP downloads.
Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki
In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops.
Shirasawa, Kenta; Isobe, Sachiko; Tabata, Satoshi; Hirakawa, Hideki
In order to provide useful genomic information for agronomical plants, we have established a database, the Kazusa Marker DataBase (http://marker.kazusa.or.jp). This database includes information on DNA markers, e.g., SSR and SNP markers, genetic linkage maps, and physical maps, that were developed at the Kazusa DNA Research Institute. Keyword searches for the markers, sequence data used for marker development, and experimental conditions are also available through this database. Currently, 10 plant species have been targeted: tomato (Solanum lycopersicum), pepper (Capsicum annuum), strawberry (Fragaria × ananassa), radish (Raphanus sativus), Lotus japonicus, soybean (Glycine max), peanut (Arachis hypogaea), red clover (Trifolium pratense), white clover (Trifolium repens), and eucalyptus (Eucalyptus camaldulensis). In addition, the number of plant species registered in this database will be increased as our research progresses. The Kazusa Marker DataBase will be a useful tool for both basic and applied sciences, such as genomics, genetics, and molecular breeding in crops. PMID:25320561
Yang, Xu; Lazar, Iulia M
To enable the identification of mutated peptide sequences in complex biological samples, in this work, two novel cancer- and disease-related protein databases with mutation information collected from several public resources such as COSMIC, IARC P53, OMIM, and UniProtKB were developed. In-house developed Perl scripts were used to search and process the data and to translate each gene-level mutation into a mutated peptide sequence. The cancer and disease mutation databases comprise a total of 872,125 and 27,148 peptide entries from 25 642 and 2913 proteins, respectively. A description line for each entry provides the parent protein ID and name, the cDNA- and protein-level mutation site and type, the originating database, and the disease or cancer tissue type and corresponding hits. The two databases are FASTA-formatted to enable data retrieval by commonly used tandem MS search engines. While the largest number of mutations were encountered for the amino acids A/D/E/G/L/P/R/S, the global mutation profiles replicate closely the outcome of the 1000 Genomes Project aimed at cataloguing natural mutations in the human population. The affected proteins were primarily involved in transcription regulation, splicing, protein synthesis/folding/binding, redox/energy production, adhesion/motility, and to some extent in DNA damage repair and signaling. The applicability of the database to identifying the presence of mutated peptides was investigated with MCF-7 breast cancer cell extracts.
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did no...
McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L
The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.
Mineta, Katsuhiko; Gojobori, Takashi
The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.
The history of NICMOS platescale, focus, and coronographic hole data has been stored in a database which is accessible via tools on the World Wide Web. The history tool which allows queries on the data, is available to the public.
Reviews recent trends in databases and online systems. Topics discussed include new access points for established databases; acquisitions, consolidations, and competition between vendors; European coverage; international services; online reference materials, including telephone directories; political and legal materials and public records;…
Reviews changes in online database searching in academic libraries. Topics include librarians conducting all searches; the advent of end-user searching and the need for user instruction; compact disk technology; online public catalogs; the Internet; full text databases; electronic information literacy; user education and the remote library user;…
... 24 Housing and Urban Development 4 2014-04-01 2014-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....
... 24 Housing and Urban Development 4 2012-04-01 2012-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....
... 24 Housing and Urban Development 4 2011-04-01 2011-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....
Gottfried, John C.
This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.…
... 24 Housing and Urban Development 4 2013-04-01 2013-04-01 false Database adjustment. 902.24 Section 902.24 Housing and Urban Development REGULATIONS RELATING TO HOUSING AND URBAN DEVELOPMENT (CONTINUED... PUBLIC HOUSING ASSESSMENT SYSTEM Physical Condition Indicator § 902.24 Database adjustment....
Cremer, Miriam L.; Maza, Mauricio; Alfaro, Karla M.; Kim, Jane J.; Ditzian, Lauren R.; Villalta, Sofia; Alonzo, Todd A.; Felix, Juan C.; Castle, Philip E.; Gage, Julia C.
Objective In a primary human papillomavirus (HPV) screening program, we compared the 6-month follow-up among colposcopy and noncolposcopy-based management strategies for screen-positive women. Materials and Methods Women aged 30 to 49 years were screened with HPV DNA tests using both self-collection and provider collection of samples. Women testing positive received either (1) colposcopy management (CM) consisting of colposcopy and management per local guidelines or (2) screen-and-treat (ST) management using visual inspection with acetic acid to determine cryotherapy eligibility, with eligible women undergoing immediate cryotherapy. One thousand women were recruited in each cohort. Of these, 368 (18.4%) of 2000 women were recruited using a more intensive outreach strategy. Demographics, HPV positivity, and treatment compliance were compared across recruitment and management strategies. Results More women in the ST cohort received treatment within 6 months compared with those in the CM cohort (117/119 [98.3%] vs 64/93 [68.8%]; p < .001). Women recruited through more intensive outreach were more likely to be HPV positive, lived in urban areas, were more educated, and had higher numbers of lifetime sexual partners and fewer children. Conclusions Women in the CM arm were less likely to complete care than women in the ST arm. Targeted outreach to underscreened women successfully identified women with higher prevalence of HPV and possibly higher disease burden. PMID:26890683
CD-ROM has rapidly evolved as a new information medium with large capacity, In the U.S. it is predicted that it will become two hundred billion yen market in three years, and thus CD-ROM is strategic target of database industry. Here in Japan the movement toward its commercialization has been active since this year. Shall CD-ROM bussiness ever conquer information market as an on-disk database or electronic publication? Referring to some cases of the applications in the U.S. the author views marketability and the future trend of this new optical disk medium.
Pan, Y.; Lin, W.
Magnetotactic bacteria (MTB) are of interest in biogeomagnetism, rock magnetism, microbiology, biomineralization, and advanced magnetic materials because of their ability to synthesize highly ordered intracellular nano-sized magnetic minerals, magnetite or greigite. Great strides for MTB studies have been made in the past few decades. More than 600 articles concerning MTB have been published. These rapidly growing data are stimulating cross disciplinary studies in such field as biogeomagnetism. We have compiled the first online database for MTB, i.e., Database of Magnestotactic Bacteria (DMTB, http://database.biomnsl.com). It contains useful information of 16S rRNA gene sequences, oligonucleotides, and magnetic properties of MTB, and corresponding ecological metadata of sampling sites. The 16S rRNA gene sequences are collected from the GenBank database, while all other data are collected from the scientific literature. Rock magnetic properties for both uncultivated and cultivated MTB species are also included. In the DMTB database, data are accessible through four main interfaces: Site Sort, Phylo Sort, Oligonucleotides, and Magnetic Properties. References in each entry serve as links to specific pages within public databases. The online comprehensive DMTB will provide a very useful data resource for researchers from various disciplines, e.g., microbiology, rock magnetism and paleomagnetism, biogeomagnetism, magnetic material sciences and others.
Alshamali, Farida; Brandstätter, Anita; Zimmermann, Bettina; Parson, Walther
249 entire mtDNA control region sequences were generated and analyzed in a population sample from Dubai, one of the seven United Arab Emirates. The control region was amplified in one piece and sequenced with different sequencing primers. Sequence evaluation was performed twice and validated by a third senior mtDNA scientist. Phylogenetic analyses were used for quality assurance purposes and for the determination of the haplogroup affiliation of the samples. Upon publication, the population data are going to be available in the EMPOP database (www.empop.org).
National Land Cover Database 2011 (NLCD 2011) is the most recent national land cover product created by the Multi-Resolution Land Characteristics (MRLC) Consortium. NLCD 2011 provides - for the first time - the capability to assess wall-to-wall, spatially explicit, national land cover changes and trends across the United States from 2001 to 2011. As with two previous NLCD land cover products NLCD 2011 keeps the same 16-class land cover classification scheme that has been applied consistently across the United States at a spatial resolution of 30 meters. NLCD 2011 is based primarily on a decision-tree classification of circa 2011 Landsat satellite data. This dataset is associated with the following publication:Homer, C., J. Dewitz, L. Yang, S. Jin, P. Danielson, G. Xian, J. Coulston, N. Herold, J. Wickham , and K. Megown. Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information. PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING. American Society for Photogrammetry and Remote Sensing, Bethesda, MD, USA, 81(0): 345-354, (2015).
Eidietis, N. W.; Gerhardt, S. P.; Granetz, R. S.; Kawano, Y.; Lehnen, M.; Lister, J. B.; Pautasso, G.; Riccardo, V.; Tanna, R. L.; Thornton, A. J.; ITPA Disruption Database Participants, The
A multi-device database of disruption characteristics has been developed under the auspices of the International Tokamak Physics Activity magneto-hydrodynamics topical group. The purpose of this ITPA disruption database (IDDB) is to find the commonalities between the disruption and disruption mitigation characteristics in a wide variety of tokamaks in order to elucidate the physics underlying tokamak disruptions and to extrapolate toward much larger devices, such as ITER and future burning plasma devices. In contrast to previous smaller disruption data collation efforts, the IDDB aims to provide significant context for each shot provided, allowing exploration of a wide array of relationships between pre-disruption and disruption parameters. The IDDB presently includes contributions from nine tokamaks, including both conventional aspect ratio and spherical tokamaks. An initial parametric analysis of the available data is presented. This analysis includes current quench rates, halo current fraction and peaking, and the effectiveness of massive impurity injection. The IDDB is publicly available, with instruction for access provided herein.
Thodima, Venkata; Pirooznia, Mehdi; Deng, Youping
Background Catalytic RNA molecules are called ribozymes. The aptamers are DNA or RNA molecules that have been selected from vast populations of random sequences, through a combinatorial approach known as SELEX. The selected oligo-nucleotide sequences (~200 bp in length) have the ability to recognize a broad range of specific ligands by forming binding pockets. These novel aptamer sequences can bind to nucleic acids, proteins or small organic and inorganic chemical compounds and have many potential uses in medicine and technology. Results The comprehensive sequence information on aptamers and ribozymes that have been generated by in vitro selection methods are included in this RiboaptDB database. Such types of unnatural data generated by in vitro methods are not available in the public 'natural' sequence databases such as GenBank and EMBL. The amount of sequence data generated by in vitro selection experiments has been accumulating exponentially. There are 370 artificial ribozyme sequences and 3842 aptamer sequences in the total 4212 sequences from 423 citations in this RiboaptDB. We included general search feature, and individual feature wise search, user submission form for new data through online and also local BLAST search. Conclusion This database, besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility in medicine, provides valuable information for computational and theoretical biologists. The RiboaptDB is extremely useful for garnering information about in vitro selection experiments as a whole and for better understanding the distribution of functional nucleic acids in sequence space. The database is updated regularly and is publicly available at . PMID:17118149
Šubelj, Lovro; Bajec, Marko; Mileva Boshkoska, Biljana; Kastrin, Andrej; Levnajić, Zoran
Science is a social process with far-reaching impact on our modern society. In recent years, for the first time we are able to scientifically study the science itself. This is enabled by massive amounts of data on scientific publications that is increasingly becoming available. The data is contained in several databases such as Web of Science or PubMed, maintained by various public and private entities. Unfortunately, these databases are not always consistent, which considerably hinders this study. Relying on the powerful framework of complex networks, we conduct a systematic analysis of the consistency among six major scientific databases. We found that identifying a single "best" database is far from easy. Nevertheless, our results indicate appreciable differences in mutual consistency of different databases, which we interpret as recipes for future bibliometric studies. PMID:25984946
Gibney, Gretchen; Baxevanis, Andreas D
One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
Baxevanis, Andreas D
One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
Gibney, Gretchen; Baxevanis, Andreas D
One of the most widely used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two basic protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An alternate protocol builds upon the first basic protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The support protocol reviews how to save frequently issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
The National Residential Efficiency Measures Database is a publicly available, centralized resource of residential building retrofit measures and costs for the U.S. building industry. With support from the U.S. Department of Energy, NREL developed this tool to help users determine the most cost-effective retrofit measures for improving energy efficiency of existing homes. Software developers who require residential retrofit performance and cost data for applications that evaluate residential efficiency measures are the primary audience for this database. In addition, home performance contractors and manufacturers of residential materials and equipment may find this information useful. The database offers the following types of retrofit measures: 1) Appliances, 2) Domestic Hot Water, 3) Enclosure, 4) Heating, Ventilating, and Air Conditioning (HVAC), 5) Lighting, 6) Miscellaneous.
Olson, Lars E.
"Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…
This presented paper offers an elementary description of database characteristics and then provides a survey of databases that may be useful to the teacher and researcher in Slavic and East European languages and literatures. The survey focuses on commercial databases that are available, usable, and needed. Individual databases discussed include:…
Arko, R. A.; Chayes, D. N.
The rapidly increasing volume and complexity of MG&G data, and the growing demand from funding agencies and the user community that it be easily accessible, demand that we improve our approach to data management in order to reach a broader user-base and operate more efficient and effectively. We have chosen an approach based on industry-standard relational database management systems (RDBMS) that use community-wide data specifications, where there is a clear and well-documented external interface that allows use of general purpose as well as customized clients. Rapid prototypes assembled with this approach show significant advantages over the traditional, custom-built data management systems that often use "in-house" legacy file formats, data specifications, and access tools. We have developed an effective database prototype based a public domain RDBMS (PostgreSQL) and metadata standard (FGDC), and used it as a template for several ongoing MG&G database management projects - including ADGRAV (Antarctic Digital Gravity Synthesis), MARGINS, the Community Review system of the Digital Library for Earth Science Education, multibeam swath bathymetry metadata, and the R/V Maurice Ewing onboard acquisition system. By using standard formats and specifications, and working from a common prototype, we are able to reuse code and deploy rapidly. Rather than spend time on low-level details such as storage and indexing (which are built into the RDBMS), we can focus on high-level details such as documentation and quality control. In addition, because many commercial off-the-shelf (COTS) and public domain data browsers and visualization tools have built-in RDBMS support, we can focus on backend development and leave the choice of a frontend client(s) up to the end user. While our prototype is running under an open source RDBMS on a single processor host, the choice of standard components allows this implementation to scale to commercial RDBMS products and multiprocessor servers as
SRD 131 Human Mitochondrial Protein Database (Web, free access) The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases.
Caivano, Jose L.
The paper describes the methodology and results of a project under development, aimed at the elaboration of an interactive bibliographical database on color in all fields of application: philosophy, psychology, semiotics, education, anthropology, physical and natural sciences, biology, medicine, technology, industry, architecture and design, arts, linguistics, geography, history. The project is initially based upon an already developed bibliography, published in different journals, updated in various opportunities, and now available at the Internet, with more than 2,000 entries. The interactive database will amplify that bibliography, incorporating hyperlinks and contents (indexes, abstracts, keywords, introductions, or eventually the complete document), and devising mechanisms for information retrieval. The sources to be included are: books, doctoral dissertations, multimedia publications, reference works. The main arrangement will be chronological, but the design of the database will allow rearrangements or selections by different fields: subject, Decimal Classification System, author, language, country, publisher, etc. A further project is to develop another database, including color-specialized journals or newsletters, and articles on color published in international journals, arranged in this case by journal name and date of publication, but allowing also rearrangements or selections by author, subject and keywords.
Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G E
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory.
Robinson, James; Halliwell, Jason A; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G E
The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project.
Robinson, James; Mistry, Kavita; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors, IPD-MHC, is a database of sequences of the Major Histocompatibility Complex of different species; IPD-human platelet antigens, alloantigens expressed only on platelets and IPD-ESTDAB, which provides access to the European Searchable Tumour cell-line database, a cell bank of immunologically characterised melanoma cell lines. The data is currently available online from the website and ftp directory. PMID:19875415
Park, Yu Rang; Kim, Jae-Jung; Yoon, Young Jo; Yoon, Young-Kwang; Koo, Ha Yeong; Hong, Young Mi; Jang, Gi Young; Shin, Soo-Yong; Lee, Jong-Keuk
Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients’ clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals. Each sample contains the relevant clinical data, genomic DNA and plasma samples isolated from patients’ blood, omics data and KD-associated genotype data. Clinical data was collected and saved using the common data elements based on the ISO/IEC 11179 metadata standard. Two genome-wide association study data of total 482 samples and whole exome sequencing data of 12 samples were also collected. In addition, KDD includes the rare cases of KD (16 cases with family history, 46 cases with recurrence, 119 cases with intravenous immunoglobulin non-responsiveness, and 52 cases with coronary artery aneurysm). As the first public database for KD, KDD can significantly facilitate KD studies. All data in KDD can be searchable and downloadable. KDD was implemented in PHP, MySQL and Apache, with all major browsers supported. Database URL: http://www.kawasakidisease.kr PMID:27630202
Park, Yu Rang; Kim, Jae-Jung; Yoon, Young Jo; Yoon, Young-Kwang; Koo, Ha Yeong; Hong, Young Mi; Jang, Gi Young; Shin, Soo-Yong; Lee, Jong-Keuk
Kawasaki disease (KD) is a rare disease that occurs predominantly in infants and young children. To identify KD susceptibility genes and to develop a diagnostic test, a specific therapy, or prevention method, collecting KD patients' clinical and genomic data is one of the major issues. For this purpose, Kawasaki Disease Database (KDD) was developed based on the efforts of Korean Kawasaki Disease Genetics Consortium (KKDGC). KDD is a collection of 1292 clinical data and genomic samples of 1283 patients from 13 KKDGC-participating hospitals. Each sample contains the relevant clinical data, genomic DNA and plasma samples isolated from patients' blood, omics data and KD-associated genotype data. Clinical data was collected and saved using the common data elements based on the ISO/IEC 11179 metadata standard. Two genome-wide association study data of total 482 samples and whole exome sequencing data of 12 samples were also collected. In addition, KDD includes the rare cases of KD (16 cases with family history, 46 cases with recurrence, 119 cases with intravenous immunoglobulin non-responsiveness, and 52 cases with coronary artery aneurysm). As the first public database for KD, KDD can significantly facilitate KD studies. All data in KDD can be searchable and downloadable. KDD was implemented in PHP, MySQL and Apache, with all major browsers supported.Database URL: http://www.kawasakidisease.kr.
The purpose of this web-accessible database is for the public to be able to view instantaneous readings from a solar-powered air monitoring station located in a public location (prototype pilot test is outside of a library in Durham County, NC). The data are wirelessly transmitte...
Use of online bibliographic databases in Mexico is provided through Servicio de Consulta a Bancos de Informacion, a public service that provides information retrieval, document delivery, translation, technical support, and training services. Technical infrastructure is based on a public packet-switching network and institutional users may receive…
Malin, B.; Sweeney, L.
This work demonstrates how seemingly anonymous DNA database entries can be related to publicly available health information to uniquely and specifically identify the persons who are the subjects of the information even though the DNA information contains no accompanying explicit identifiers such as name, address, or Social Security number and contains no additional fields of personal information. The software program, REID (Re-Identification of DNA), iteratively uncovers unique occurrences in visit-disease patterns across data collections that reveal inferences about the identities of the patients who are the subject of the DNA. Using real-world data, REID established identifiable linkages in 33-100% of the 10,886 cases explicitly surveyed over 8 gene-based diseases. PMID:11825223
Neves, Susana S; Forrest, Laura L
DNA sequences are important sources of data for phylogenetic analysis. Nowadays, DNA sequencing is a routine technique in molecular biology laboratories. However, there are specific questions associated with project design and sequencing of plant samples for phylogenetic analysis, which may not be familiar to researchers starting in the field. This chapter gives an overview of methods and protocols involved in the sequencing of plant samples, including general recommendations on the selection of species/taxa and DNA regions to be sequenced, and field collection of plant samples. Protocols of plant sample preparation, DNA extraction, PCR and cloning, which are critical to the success of molecular phylogenetic projects, are described in detail. Common problems of sequencing (using the Sanger method) are also addressed. Possible applications of second-generation sequencing techniques in plant phylogenetics are briefly discussed. Finally, orientation on the preparation of sequence data for phylogenetic analyses and submission to public databases is also given.
Shukla, Ankita; Singh, Tiratha Raj
DNA repair mechanisms act as a warrior combating various damaging processes that ensue critical malignancies. DREMECELS was designed considering the malignancies with frequent alterations in DNA repair pathways, that is, colorectal and endometrial cancers, associated with Lynch syndrome (also known as HNPCC). Since lynch syndrome carries high risk (~40–60%) for both cancers, therefore we decided to cover all three diseases in this portal. Although a large population is presently affected by these malignancies, many resources are available for various cancer types but no database archives information on the genes specifically for only these cancers and disorders. The database contains 156 genes and two repair mechanisms, base excision repair (BER) and mismatch repair (MMR). Other parameters include some of the regulatory processes that have roles in these disease progressions due to incompetent repair mechanisms, specifically BER and MMR. However, our unique database mainly provides qualitative and quantitative information on these cancer types along with methylation, drug sensitivity, miRNAs, copy number variation (CNV) and somatic mutations data. This database would serve the scientific community by providing integrated information on these disease types, thus sustaining diagnostic and therapeutic processes. This repository would serve as an excellent accompaniment for researchers and biomedical professionals and facilitate in understanding such critical diseases. DREMECELS is publicly available at http://www.bioinfoindia.org/dremecels. PMID:27276067
Justie, Kevin M.
The WEBrary(R) databases at the Morton Grove Public Library (Illinois) provide patron-accessible searchable databases, easily available over the library's Web site. Database offerings include the locally maintained Song Collection Index, Obituary Index, Continuations Listings, On-Order files, topical and personalized New Acquisitions files, and…
... 42 Public Health 4 2014-10-01 2014-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...
... 42 Public Health 4 2012-10-01 2012-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...
... 42 Public Health 4 2013-10-01 2013-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... databases. (b) Check the Social Security Administration's Death Master File, the National Plan and...
Rosenbloom, Kate R.; Armstrong, Joel; Barber, Galt P.; Casper, Jonathan; Clawson, Hiram; Diekhans, Mark; Dreszer, Timothy R.; Fujita, Pauline A.; Guruvadoo, Luvina; Haeussler, Maximilian; Harte, Rachel A.; Heitner, Steve; Hickey, Glenn; Hinrichs, Angie S.; Hubley, Robert; Karolchik, Donna; Learned, Katrina; Lee, Brian T.; Li, Chin H.; Miga, Karen H.; Nguyen, Ngan; Paten, Benedict; Raney, Brian J.; Smit, Arian F. A.; Speir, Matthew L.; Zweig, Ann S.; Haussler, David; Kuhn, Robert M.; Kent, W. James
Launched in 2001 to showcase the draft human genome assembly, the UCSC Genome Browser database (http://genome.ucsc.edu) and associated tools continue to grow, providing a comprehensive resource of genome assemblies and annotations to scientists and students worldwide. Highlights of the past year include the release of a browser for the first new human genome reference assembly in 4 years in December 2013 (GRCh38, UCSC hg38), a watershed comparative genomics annotation (100-species multiple alignment and conservation) and a novel distribution mechanism for the browser (GBiB: Genome Browser in a Box). We created browsers for new species (Chinese hamster, elephant shark, minke whale), ‘mined the web’ for DNA sequences and expanded the browser display with stacked color graphs and region highlighting. As our user community increasingly adopts the UCSC track hub and assembly hub representations for sharing large-scale genomic annotation data sets and genome sequencing projects, our menu of public data hubs has tripled. PMID:25428374
EPA has developed a physiological information database (created using Microsoft ACCESS) intended to be used in PBPK modeling. The database contains physiological parameter values for humans from early childhood through senescence as well as similar data for laboratory animal spec...
The database provides chemical-specific toxicity information for aquatic life, terrestrial plants, and terrestrial wildlife. ECOTOX is a comprehensive ecotoxicology database and is therefore essential for providing and suppoirting high quality models needed to estimate population...
Jewison, Timothy; Knox, Craig; Neveu, Vanessa; Djoumbou, Yannick; Guo, An Chi; Lee, Jacqueline; Liu, Philip; Mandal, Rupasri; Krishnamurthy, Ram; Sinelnikov, Igor; Wilson, Michael; Wishart, David S.
The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry. PMID:22064855
Many primary biological databases are dedicated to providing annotation for a specific type of biological molecule such as a clone, transcript, gene or protein, but often with limited cross-references. Therefore, enhanced mapping is required between these databases to facilitate the correlation of independent experimental datasets. For example, molecular biology experiments conducted on samples (DNA, mRNA or protein) often yield more than one type of 'omics' dataset as an object for analysis (eg a sample can have a genomics as well as proteomics expression dataset available for analysis). Thus, in order to map the two datasets, the identifier type from one dataset is required to be linked to another dataset, so preventing loss of critical information in downstream analysis. This identifier mapping can be performed using identifier converter software relevant to the query and target identifier databases. This review presents the publicly available web-based biological database identifier converters, with comparison of their usage, input and output formats, and the types of available query and target database identifier types. PMID:22155608
Burnham, Judy F
The Scopus database provides access to STM journal articles and the references included in those articles, allowing the searcher to search both forward and backward in time. The database can be used for collection development as well as for research. This review provides information on the key points of the database and compares it to Web of Science. Neither database is inclusive, but complements each other. If a library can only afford one, choice must be based in institutional needs.
Hayase, Shuichi; Okano, Keiko
Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST Crystal Structure Database (JICST CR) in this January (1990). This database provides the information of atomic positions in a crystal and related informations of the crystal. The database system and the crystal data in JICST CR are outlined in this manuscript.
Alonso, Antonio; Martin, Pablo; Albarrán, Cristina; Garcia, Pilar; Fernandez de Simon, Lourdes; Jesús Iturralde, Maria; Fernández-Rodriguez, Amparo; Atienza, Inmaculada; Capilla, Javier; García-Hirschfeld, Julia; Martinez, Pilar; Vallejo, Gloria; García, Oscar; García, Emilio; Real, Pilar; Alvarez, David; León, Antonio; Sancho, Manuel
In cases of mass disaster, there is often a need for managing, analyzing, and comparing large numbers of biological samples and DNA profiles. This requires the use of laboratory information management systems for large-scale sample logging and tracking, coupled with bioinformatic tools for DNA database searching according to different matching algorithms, and for the evaluation of the significance of each match by likelihood ratio calculations. There are many different interrelated factors and circumstances involved in each specific mass disaster scenario that may challenge the final DNA identification goal, such as: the number of victims, the mechanisms of body destruction, the extent of body fragmentation, the rate of DNA degradation, the body accessibility for sample collection, or the type of DNA reference samples availability. In this paper, we examine the different steps of the DNA identification analysis (DNA sampling, DNA analysis and technology, DNA database searching, and concordance and kinship analysis) reviewing the "lessons learned" and the scientific progress made in some mass disaster cases described in the scientific literature. We will put special emphasis on the valuable scientific feedback that genetic forensic community has received from the collaborative efforts of several public and private USA forensic laboratories in assisting with the more critical areas of the World Trade Center (WTC) mass fatality of September 11, 2001. The main challenges in identifying the victims of the recent South Asian Tsunami disaster, which has produced the steepest death count rise in history, will also be considered. We also present data from two recent mass fatality cases that involved Spanish victims: the Madrid terrorist attack of March 11, 2004, and the Yakolev-42 aircraft accident in Trabzon, Turkey, of May 26, 2003.
Nakamura, Yasukazu; Cochrane, Guy; Karsch-Mizrachi, Ilene
The International Nucleotide Sequence Database Collaboration (INSDC; http://www.insdc.org), one of the longest-standing global alliances of biological data archives, captures, preserves and provides comprehensive public domain nucleotide sequence information. Three partners of the INSDC work in cooperation to establish formats for data and metadata and protocols that facilitate reliable data submission to their databases and support continual data exchange around the world. In this article, the INSDC current status and update for the year of 2012 are presented. Among discussed items of international collaboration meeting in 2012, BioSample database and changes in submission are described as topics.
Kryukov, Kirill; Imanishi, Tadashi
Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases. PMID:27611326
Kryukov, Kirill; Imanishi, Tadashi
Contamination in genome assembly can lead to wrong or confusing results when using such genome as reference in sequence comparison. Although bacterial contamination is well known, the problem of human-originated contamination received little attention. In this study we surveyed 45,735 available genome assemblies for evidence of human contamination. We used lineage specificity to distinguish between contamination and conservation. We found that 154 genome assemblies contain fragments that with high confidence originate as contamination from human DNA. Majority of contaminating human sequences were present in the reference human genome assembly for over a decade. We recommend that existing contaminated genomes should be revised to remove contaminated sequence, and that new assemblies should be thoroughly checked for presence of human DNA before submitting them to public databases.
The NCBI Taxonomy database (http://www.ncbi.nlm.nih.gov/taxonomy) is the standard nomenclature and classification repository for the International Nucleotide Sequence Database Collaboration (INSDC), comprising the GenBank, ENA (EMBL) and DDBJ databases. It includes organism names and taxonomic lineages for each of the sequences represented in the INSDC's nucleotide and protein sequence databases. The taxonomy database is manually curated by a small group of scientists at the NCBI who use the current taxonomic literature to maintain a phylogenetic taxonomy for the source organisms represented in the sequence databases. The taxonomy database is a central organizing hub for many of the resources at the NCBI, and provides a means for clustering elements within other domains of NCBI web site, for internal linking between domains of the Entrez system and for linking out to taxon-specific external resources on the web. Our primary purpose is to index the domain of sequences as conveniently as possible for our user community.
Discusses highlights in the development of genetic engineering, examining techniques with recombinant DNA, legal and ethical issues, GenBank (a national database of nucleic acid sequences), and other topics. (JN)
Geary, Janis; Camicioli, Emma; Bubela, Tania
Paul Hebert and colleagues first described DNA barcoding in 2003, which led to international efforts to promote and coordinate its use. Since its inception, DNA barcoding has generated considerable media coverage. We analysed whether this coverage reflected both the scientific and social mandates of international barcoding organizations. We searched newspaper databases to identify 900 English-language articles from 2003 to 2013. Coverage of the science of DNA barcoding was highly positive but lacked context for key topics. Coverage omissions pose challenges for public understanding of the science and applications of DNA barcoding; these included coverage of governance structures and issues related to the sharing of genetic resources across national borders. Our analysis provided insight into how barcoding communication efforts have translated into media coverage; more targeted communication efforts may focus media attention on previously omitted, but important topics. Our analysis is timely as the DNA barcoding community works to establish the International Society for the Barcode of Life.
Joly, Simon; Davies, T Jonathan; Archambault, Annie; Bruneau, Anne; Derry, Alison; Kembel, Steven W; Peres-Neto, Pedro; Vamosi, Jana; Wheeler, Terry A
Ten years after DNA barcoding was initially suggested as a tool to identify species, millions of barcode sequences from more than 1100 species are available in public databases. While several studies have reviewed the methods and potential applications of DNA barcoding, most have focused on species identification and discovery, and relatively few have addressed applications of DNA barcoding data to ecology. These data, and the associated information on the evolutionary histories of taxa that they can provide, offer great opportunities for ecologists to investigate questions that were previously difficult or impossible to address. We present an overview of potential uses of DNA barcoding relevant in the age of ecoinformatics, including applications in community ecology, species invasion, macroevolution, trait evolution, food webs and trophic interactions, metacommunities, and spatial ecology. We also outline some of the challenges and potential advances in DNA barcoding that lie ahead.
Homer, Collin H.; Fry, Joyce A.; Barnes, Christopher A.
The National Land Cover Database (NLCD) serves as the definitive Landsat-based, 30-meter resolution, land cover database for the Nation. NLCD provides spatial reference and descriptive data for characteristics of the land surface such as thematic class (for example, urban, agriculture, and forest), percent impervious surface, and percent tree canopy cover. NLCD supports a wide variety of Federal, State, local, and nongovernmental applications that seek to assess ecosystem status and health, understand the spatial patterns of biodiversity, predict effects of climate change, and develop land management policy. NLCD products are created by the Multi-Resolution Land Characteristics (MRLC) Consortium, a partnership of Federal agencies led by the U.S. Geological Survey. All NLCD data products are available for download at no charge to the public from the MRLC Web site: http://www.mrlc.gov.
Wang, Kun; Deng, Jiao; Damaris, Rebecca Njeri; Yang, Mei; Xu, Liming; Yang, Pingfang
Besides its important significance in plant taxonomy and phylogeny, sacred lotus (Nelumbo nucifera Gaertn.) might also hold the key to the secrets of aging, which attracts crescent attentions from researchers all over the world. The genetic or molecular studies on this species depend on its genome information. In 2013, two publications reported the sequencing of its full genome, based on which we constructed a database named as LOTUS-DB. It will provide comprehensive information on the annotation, gene function and expression for the sacred lotus. The information will facilitate users to efficiently query and browse genes, graphically visualize genome and download a variety of complex data information on genome DNA, coding sequence (CDS), transcripts or peptide sequences, promoters and markers. It will accelerate researches on gene cloning, functional identification of sacred lotus, and hence promote the studies on this species and plant genomics as well. Database URL: http://lotus-db.wbgcas.cn. PMID:25819075
Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…
Kyrpides, Nikos; Liolios, Dinos; Chen, Amy; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor; Bernal, Alex
Since its inception in 1997, GOLD has continuously monitored genome sequencing projects worldwide and has provided the community with a unique centralized resource that integrates diverse information related to Archaea, Bacteria, Eukaryotic and more recently Metagenomic sequencing projects. As of September 2007, GOLD recorded 639 completed genome projects. These projects have their complete sequence deposited into the public archival sequence databases such as GenBank EMBL,and DDBJ. From the total of 639 complete and published genome projects as of 9/2007, 527 were bacterial, 47 were archaeal and 65 were eukaryotic. In addition to the complete projects, there were 2158 ongoing sequencing projects. 1328 of those were bacterial, 59 archaeal and 771 eukaryotic projects. Two types of metadata are provided by GOLD: (i) project metadata and (ii) organism/environment metadata. GOLD CARD pages for every project are available from the link of every GOLD_STAMP ID. The information in every one of these pages is organized into three tables: (a) Organism information, (b) Genome project information and (c) External links. [The Genomes On Line Database (GOLD) in 2007: Status of genomic and metagenomic projects and their associated metadata, Konstantinos Liolios, Konstantinos Mavromatis, Nektarios Tavernarakis and Nikos C. Kyrpides, Nucleic Acids Research Advance Access published online on November 2, 2007, Nucleic Acids Research, doi:10.1093/nar/gkm884]
The basic tables in the GOLD database that can be browsed or searched include the following information:
Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...
The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less
Rushton, Paul J; Bokowiec, Marta T; Laudeman, Thomas W; Brannock, Jennifer F; Chen, Xianfeng; Timko, Michael P
Background Regulation of gene expression at the level of transcription is a major control point in many biological processes. Transcription factors (TFs) can activate and/or repress the transcriptional rate of target genes and vascular plant genomes devote approximately 7% of their coding capacity to TFs. Global analysis of TFs has only been performed for three complete higher plant genomes – Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa) and rice (Oryza sativa). Presently, no large-scale analysis of TFs has been made from a member of the Solanaceae, one of the most important families of vascular plants. To fill this void, we have analysed tobacco (Nicotiana tabacum) TFs using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering of the tobacco genome. An analytical pipeline was developed to isolate TF sequences from the GSR data set. This involved multiple (typically 10–15) independent searches with different versions of the TF family-defining domain(s) (normally the DNA-binding domain) followed by assembly into contigs and verification. Our analysis revealed that tobacco contains a minimum of 2,513 TFs representing all of the 64 well-characterised plant TF families. The number of TFs in tobacco is higher than previously reported for Arabidopsis and rice. Results TOBFAC: the database of tobacco transcription factors, is an integrative database that provides a portal to sequence and phylogeny data for the identified TFs, together with a large quantity of other data concerning TFs in tobacco. The database contains an individual page dedicated to each of the 64 TF families. These contain background information, domain architecture via Pfam links, a list of all sequences and an assessment of the minimum number of TFs in this family in tobacco. Downloadable phylogenetic trees of the major families are provided along with detailed information on the bioinformatic pipeline that was used to find all family members
Robinson, James; Waller, Matthew J; Stoehr, Peter; Marsh, Steven G E
The Immuno Polymorphism Database (IPD) (http://www.ebi.ac.uk/ipd/) is a set of specialist databases related to the study of polymorphic genes in the immune system. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of Killer-cell Immunoglobulin-like Receptors; IPD-MHC, a database of sequences of the Major Histocompatibility Complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. Those sections with similar data, such as IPD-KIR and IPD-MHC share the same database structure. The sharing of a common database structure makes it easier to implement common tools for data submission and retrieval. The data are currently available online from the website and ftp directory; files will also be made available in different formats to download from the website and ftp server. The data will also be included in SRS, BLAST and FASTA search engines at the European Bioinformatics Institute.
Eldredge, Jonathan D; Waitzkin, Howard; Buchanan, Holly S; Teal, Janis; Iriart, Celia; Wiley, Kevin; Tregear, Jonathan
Background Public health practitioners and researchers for many years have been attempting to understand more clearly the links between social conditions and the health of populations. Until recently, most public health professionals in English-speaking countries were unaware that their colleagues in Latin America had developed an entire field of inquiry and practice devoted to making these links more clearly understood. The Latin American Social Medicine (LASM) database finally bridges this previous gap. Description This public health informatics case study describes the key features of a unique information resource intended to improve access to LASM literature and to augment understanding about the social determinants of health. This case study includes both quantitative and qualitative evaluation data. Currently the LASM database at The University of New Mexico brings important information, originally known mostly within professional networks located in Latin American countries to public health professionals worldwide via the Internet. The LASM database uses Spanish, Portuguese, and English language trilingual, structured abstracts to summarize classic and contemporary works. Conclusion This database provides helpful information for public health professionals on the social determinants of health and expands access to LASM. PMID:15627401
... REGULATIONS PUBLICLY AVAILABLE CONSUMER PRODUCT SAFETY INFORMATION DATABASE (Eff. Jan. 10, 2011) Procedural..., the Commission will publish reports of harm that meet the requirements for publication in the Database...(d) in the Database beyond the 10-business-day time frame set forth in paragraph (a) of this...
Bell, Karen L.; Loeffler, Virginia M.; Brosi, Berry J.
Premise of the study: DNA metabarcoding has broad-ranging applications in ecology, aerobiology, biosecurity, and forensics. A bioinformatics pipeline has recently been published for identification using a comprehensive database of ITS2, one of the common plant DNA barcoding markers. There is, however, no corresponding database for rbcL, the other primary marker used in plants. Methods: Using publicly available data, we compiled a reference library of rbcL sequences and trained databases for use with UTAX and RDP classifier algorithms. We used this reference library, along with the existing bioinformatics pipeline and ITS2 reference library, to identify species in an artificial mixture of nine species of pollen. We have made this database publicly available in multiple formats, to allow use with multiple bioinformatics pipelines, now and in the future. Results: Using the rbcL database, in addition to the ITS2 database, we succeeded in making species-level identifications for eight species and a family-level identification of the ninth species. This is an improvement on ITS2 sequence alone. Discussion: The reference library described here will assist with identification of plant species using rbcL. By making another gene region available for standard barcoding, this will increase the resolution and accuracy of identifications. PMID:28337390
Tabor, Stanley; Richardson, Charles C.
An automated DNA sequencing apparatus having a reactor for providing at least two series of DNA products formed from a single primer and a DNA strand, each DNA product of a series differing in molecular weight and having a chain terminating agent at one end; separating means for separating the DNA products to form a series bands, the intensity of substantially all nearby bands in a different series being different, band reading means for determining the position an This invention was made with government support including a grant from the U.S. Public Health Service, contract number AI-06045. The U.S. government has certain rights in the invention.
Ingeholm, Peter; Gögenur, Ismail; Iversen, Lene H
Aim of database The aim of the database, which has existed for registration of all patients with colorectal cancer in Denmark since 2001, is to improve the prognosis for this patient group. Study population All Danish patients with newly diagnosed colorectal cancer who are either diagnosed or treated in a surgical department of a public Danish hospital. Main variables The database comprises an array of surgical, radiological, oncological, and pathological variables. The surgeons record data such as diagnostics performed, including type and results of radiological examinations, lifestyle factors, comorbidity and performance, treatment including the surgical procedure, urgency of surgery, and intra- and postoperative complications within 30 days after surgery. The pathologists record data such as tumor type, number of lymph nodes and metastatic lymph nodes, surgical margin status, and other pathological risk factors. Descriptive data The database has had >95% completeness in including patients with colorectal adenocarcinoma with >54,000 patients registered so far with approximately one-third rectal cancers and two-third colon cancers and an overrepresentation of men among rectal cancer patients. The stage distribution has been more or less constant until 2014 with a tendency toward a lower rate of stage IV and higher rate of stage I after introduction of the national screening program in 2014. The 30-day mortality rate after elective surgery has been reduced from >7% in 2001–2003 to <2% since 2013. Conclusion The database is a national population-based clinical database with high patient and data completeness for the perioperative period. The resolution of data is high for description of the patient at the time of diagnosis, including comorbidities, and for characterizing diagnosis, surgical interventions, and short-term outcomes. The database does not have high-resolution oncological data and does not register recurrences after primary surgery. The Danish
The 2010 Worldwide Gasification Database describes the current world gasification industry and identifies near-term planned capacity additions. The database lists gasification projects and includes information (e.g., plant location, number and type of gasifiers, syngas capacity, feedstock, and products). The database reveals that the worldwide gasification capacity has continued to grow for the past several decades and is now at 70,817 megawatts thermal (MWth) of syngas output at 144 operating plants with a total of 412 gasifiers.
SRD 60 NIST ITS-90 Thermocouple Database (Web, free access) Web version of Standard Reference Database 60 and NIST Monograph 175. The database gives temperature -- electromotive force (emf) reference functions and tables for the letter-designated thermocouple types B, E, J, K, N, R, S and T. These reference functions have been adopted as standards by the American Society for Testing and Materials (ASTM) and the International Electrotechnical Commission (IEC).
The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.
NIST Mugshot Identification Database (MID) (PC database for purchase) NIST Special Database 18 is being distributed for use in development and testing of automated mugshot identification systems. The database consists of three CD-ROMs, containing a total of 3248 images of variable size using lossless compression. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.
Zhulin, Igor B.
Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.
Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493
Kuiken, Carla; Korber, Bette; Shafer, Robert W.
Two important databases are often used in HIV genetic research, the HIV Sequence Database in Los Alamos, which collects all sequences and focuses on annotation and data analysis, and the HIV RT/Protease Sequence Database in Stanford, which collects sequences associated with the development of viral resistance against anti-retroviral drugs and focuses on analysis of those sequences. The types of data and services these two databases offer, the tools they provide, and the way they are set up and operated are described in detail. PMID:12875108
Yang, In Seok; Ryu, Chunsun; Cho, Ki Joon; Kim, Jin Kwang; Ong, Swee Hoe; Mitchell, Wayne P; Kim, Bong Su; Oh, Hee-Bok; Kim, Kyung Hyun
Biomarkers enable early diagnosis, guide molecularly targeted therapy and monitor the activity and therapeutic responses across a variety of diseases. Despite intensified interest and research, however, the overall rate of development of novel biomarkers has been falling. Moreover, no solution is yet available that efficiently retrieves and processes biomarker information pertaining to infectious diseases. Infectious Disease Biomarker Database (IDBD) is one of the first efforts to build an easily accessible and comprehensive literature-derived database covering known infectious disease biomarkers. IDBD is a community annotation database, utilizing collaborative Web 2.0 features, providing a convenient user interface to input and revise data online. It allows users to link infectious diseases or pathogens to protein, gene or carbohydrate biomarkers through the use of search tools. It supports various types of data searches and application tools to analyze sequence and structure features of potential and validated biomarkers. Currently, IDBD integrates 611 biomarkers for 66 infectious diseases and 70 pathogens. It is publicly accessible at http://biomarker.cdc.go.kr and http://biomarker.korea.ac.kr.
Chen, Calvin Yu-Chian
Rapid advancing computational technologies have greatly speeded up the development of computer-aided drug design (CADD). Recently, pharmaceutical companies have increasingly shifted their attentions toward traditional Chinese medicine (TCM) for novel lead compounds. Despite the growing number of studies on TCM, there is no free 3D small molecular structure database of TCM available for virtual screening or molecular simulation. To address this shortcoming, we have constructed TCM Database@Taiwan (http://tcm.cmu.edu.tw/) based on information collected from Chinese medical texts and scientific publications. TCM Database@Taiwan is currently the world's largest non-commercial TCM database. This web-based database contains more than 20,000 pure compounds isolated from 453 TCM ingredients. Both cdx (2D) and Tripos mol2 (3D) formats of each pure compound in the database are available for download and virtual screening. The TCM database includes both simple and advanced web-based query options that can specify search clauses, such as molecular properties, substructures, TCM ingredients, and TCM classification, based on intended drug actions. The TCM database can be easily accessed by all researchers conducting CADD. Over the last eight years, numerous volunteers have devoted their time to analyze TCM ingredients from Chinese medical texts as well as to construct structure files for each isolated compound. We believe that TCM Database@Taiwan will be a milestone on the path towards modernizing traditional Chinese medicine. PMID:21253603
David Nix, Lisa Simirenko
The Biolmaging Database (BID) is a relational database developed to store the data and meta-data for the 3D gene expression in early Drosophila embryo development on a cellular level. The schema was written to be used with the MySQL DBMS but with minor modifications can be used on any SQL compliant relational DBMS.
SRD 21 Biological Macromolecule Crystallization Database (Web, free access) The Biological Macromolecule Crystallization Database and NASA Archive for Protein Crystal Growth Data (BMCD) contains the conditions reported for the crystallization of proteins and nucleic acids used in X-ray structure determinations and archives the results of microgravity macromolecule crystallization studies.
Littlejohn, Alice C.; Parker, Joan M.
Designed primarily for use by first-time searchers, this workbook provides an overview of online searching. Following a brief introduction which defines online searching, databases, and database producers, five steps in carrying out a successful search are described: (1) identifying the main concepts of the search statement; (2) selecting a…
SRD 102 HIV Structural Database (Web, free access) The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.
SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access) This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.
SRD 30 NIST Structural Ceramics Database (Web, free access) The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.
Snell, William H.; Turner, Anne M.; Gifford, Luther; Stites, William
A quality system database (QSD), and software to administer the database, were developed to support recording of administrative nonconformance activities that involve requirements for documentation of corrective and/or preventive actions, which can include ISO 9000 internal quality audits and customer complaints.
Norton, M. Jay
Knowledge discovery in databases (KDD) revolves around the investigation and creation of knowledge, processes, algorithms, and mechanisms for retrieving knowledge from data collections. The article is an introductory overview of KDD. The rationale and environment of its development and applications are discussed. Issues related to database design…
Detailed reviews of two legal information databases--"Laborlaw I" and "Legal Resource Index"--are presented in this paper. Each database review begins with a bibliographic entry listing the title; producer; vendor; cost per hour contact time; offline print cost per citation; time period covered; frequency of updates; and size…
Barrett, Tanya; Suzek, Tugba O; Troup, Dennis B; Wilhite, Stephen E; Ngau, Wing-Chi; Ledoux, Pierre; Rudnev, Dmitry; Lash, Alex E; Fujibuchi, Wataru; Edgar, Ron
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest fully public repository for high-throughput molecular abundance data, primarily gene expression data. The database has a flexible and open design that allows the submission, storage and retrieval of many data types. These data include microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules, as well as non-array-based technologies such as serial analysis of gene expression (SAGE) and mass spectrometry proteomic technology. GEO currently holds over 30,000 submissions representing approximately half a billion individual molecular abundance measurements, for over 100 organisms. Here, we describe recent database developments that facilitate effective mining and visualization of these data. Features are provided to examine data from both experiment- and gene-centric perspectives using user-friendly Web-based interfaces accessible to those without computational or microarray-related analytical expertise. The GEO database is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
D'Souza, M.; Romine, M. F.; Maltsev, N.; Mathematics and Computer Science; PNNL
SENTRA, available via URL http://wit.mcs.anl.gov/WIT2/Sentra/, is a database of proteins associated with microbial signal transduction. The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions. Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included. Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SWISS-PROT and EMBL). The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.
Gibbons, Susan M C; Kaye, Jane
This paper provides an introduction to a collection of five papers, published as a special symposium journal issue, under the title: "Governing Genetic Databases: Collection, Storage and Use". It begins by setting the scene, to provide a backdrop and context for the papers. It describes the evolving scientific landscape around genetic databases and genomic research, particularly within the biomedical and criminal forensic investigation fields. It notes the lack of any clear, coherent or coordinated legal governance regime, either at the national or international level. It then identifies and reflects on key cross-cutting issues and themes that emerge from the five papers, in particular: terminology and definitions; consent; special concerns around population genetic databases (biobanks) and forensic databases; international harmonisation; data protection; data access; boundary-setting; governance; and issues around balancing individual interests against public good values.
Rothman, Laurence S.; Gordon, Iouli E.; Barbe, Alain; Benner, D. Chris; Bernath, Peter F.; Birk, Manfred; Boudon, V.; Brown, Linda R.; Campargue, Alain; Champion, J.-P.; Chance, Kelly V.; Coudert, L. H.; Sung, K.; Toth, R. A.
This paper describes the status of the 2008 edition of the HITRAN molecular spectroscopic database. The new edition is the first official public release since the 2004 edition, although a number of crucial updates had been made available online since 2004. The HITRAN compilation consists of several components that serve as input for radiative-transfer calculation codes: individual line parameters for the microwave through visible spectra of molecules in the gas phase; absorption cross-sections for molecules having dense spectral features, i.e., spectra in which the individual lines are not resolved; individual line parameters and absorption cross sections for bands in the ultra-violet; refractive indices of aerosols, tables and files of general properties associated with the database; and database management software. The line-by-line portion of the database contains spectroscopic parameters for forty-two molecules including many of their isotopologues.
Background Most information on genomic variations and their associations with phenotypes are covered exclusively in scientific publications rather than in structured databases. These texts commonly describe variations using natural language; database identifiers are seldom mentioned. This complicates the retrieval of variations, associated articles, as well as information extraction, e. g. the search for biological implications. To overcome these challenges, procedures to map textual mentions of variations to database identifiers need to be developed. Results This article describes a workflow for normalization of variation mentions, i.e. the association of them to unique database identifiers. Common pitfalls in the interpretation of single nucleotide polymorphism (SNP) mentions are highlighted and discussed. The developed normalization procedure achieves a precision of 98.1 % and a recall of 67.5% for unambiguous association of variation mentions with dbSNP identifiers on a text corpus based on 296 MEDLINE abstracts containing 527 mentions of SNPs. The annotated corpus is freely available at http://www.scai.fraunhofer.de/snp-normalization-corpus.html. Conclusions Comparable approaches usually focus on variations mentioned on the protein sequence and neglect problems for other SNP mentions. The results presented here indicate that normalizing SNPs described on DNA level is more difficult than the normalization of SNPs described on protein level. The challenges associated with normalization are exemplified with ambiguities and errors, which occur in this corpus. PMID:21992066
The Drinking Water Treatability Database (TDB) assembles referenced data on the control of contaminants in drinking water, housed on an interactive, publicly-available, USEPA web site (www.epa.gov/tdb). The TDB is of use to drinking water utilities, treatment process design engin...
The Drinking Water Treatability Database (TDB) assembles referenced data on the control of contaminants in drinking water, housed on an interactive, publicly-available, USEPA web site (www.epa.gov/tdb). The TDB is of use to drinking water utilities, treatment process design engin...
Benbasat, Izak; And Others
Describes a computer-assisted testing system which produces multiple-choice examinations for a college course in business administration. The system uses SPIRES (Stanford Public Information REtrieval System) to manage a database of questions and related data, mark-sense cards for machine grading tests, and ACL (6) (Audit Command Language) to…
It is anticipated that the coming years will see the generation of large datasets including diagnostic markers in several plant species with emphasis on crop plants. To use these datasets effectively in any plant breeding program, it is essential to have the information available via public database...
Revoir, A; Ballard, D J; Syndercombe Court, D
Upon re-testing of a DNA extract as part of a defence examination, a discordant result was observed at D16S539. Further STR testing and DNA sequencing of the sample identified the cause as a primer binding site mutation which was shown to be a previously unreported SNP. The testing results obtained in this case are considered in light of the current ongoing Multiplex Upgrade Project in the UK and the likely increase in discordant results that may be observed once different next generation kits are introduced.
McDonald, Jessica; Lehman, Donald C
Before the routine use of DNA profiling, blood typing was an important forensic tool. However, blood typing was not very discriminating. For example, roughly 30% of the United States population has type A-positive blood. Therefore, if A-positive blood were found at a crime scene, it could have come from 30% of the population. DNA profiling has a much better ability for discrimination. Forensic laboratories no longer routinely determine blood type. If blood is found at a crime scene, DNA profiling is performed. From Jeffrey's discovery of DNA fingerprinting to the development of PCR of STRs to the formation of DNA databases, our knowledge of DNA and DNA profiling have expanded greatly. Also, the applications for which we use DNA profiling have increased. DNA profiling is not just used for criminal case work, but it has expanded to encompass paternity testing, disaster victim identification, monitoring bone marrow transplants, detecting fetal cells in a mother's blood, tracing human history, and a multitude of other areas. The future of DNA profiling looks expansive with the development of newer instrumentation and techniques.
Brosens, Dimitri; Vankerkhoven, François; Ignace, David; Wegnez, Philippe; Noé, Nicolas; Heughebaert, André; Bortels, Jeannine; Dekoninck, Wouter
Abstract FORMIDABEL is a database of Belgian Ants containing more than 27.000 occurrence records. These records originate from collections, field sampling and literature. The database gives information on 76 native and 9 introduced ant species found in Belgium. The collection records originated mainly from the ants collection in Royal Belgian Institute of Natural Sciences (RBINS), the ‘Gaspar’ Ants collection in Gembloux and the zoological collection of the University of Liège (ULG). The oldest occurrences date back from May 1866, the most recent refer to August 2012. FORMIDABEL is a work in progress and the database is updated twice a year. The latest version of the dataset is publicly and freely accessible through this url: http://ipt.biodiversity.be/resource.do?r=formidabel. The dataset is also retrievable via the GBIF data portal through this link: http://data.gbif.org/datasets/resource/14697 A dedicated geo-portal, developed by the Belgian Biodiversity Platform is accessible at: http://www.formicidae-atlas.be Purpose: FORMIDABEL is a joint cooperation of the Flemish ants working group “Polyergus” (http://formicidae.be) and the Wallonian ants working group “FourmisWalBru” (http://fourmiswalbru.be). The original database was created in 2002 in the context of the preliminary red data book of Flemish Ants (Dekoninck et al. 2003). Later, in 2005, data from the Southern part of Belgium; Wallonia and Brussels were added. In 2012 this dataset was again updated for the creation of the first Belgian Ants Atlas (Figure 1) (Dekoninck et al. 2012). The main purpose of this atlas was to generate maps for all outdoor-living ant species in Belgium using an overlay of the standard Belgian ecoregions. By using this overlay for most species, we can discern a clear and often restricted distribution pattern in Belgium, mainly based on vegetation and soil types. PMID:23794918
Henry, Todd J.; Jao, Wei-Chun; Pewett, Tiffany; Riedel, Adric R.; Silverstein, Michele L.; Slatten, Kenneth J.; Winters, Jennifer G.; Recons Team
The REsearch Consortium On Nearby Stars (RECONS, www.recons.org) Team has been mapping the solar neighborhood since 1994. Nearby stars provide the fundamental framework upon which all of stellar astronomy is based, both for individual stars and stellar populations. The nearest stars are also the primary targets for extrasolar planet searches, and will undoubtedly play key roles in understanding the prevalence and structure of solar systems, and ultimately, in our search for life elsewhere.We have built the RECONS 25 Parsec Database to encourage and enable exploration of the Sun's nearest neighbors. The Database, slated for public release in 2015, contains 3088 stars, brown dwarfs, andexoplanets in 2184 systems as of October 1, 2014. All of these systems have accurate trigonometric parallaxes in the refereed literature placing them closer than 25.0 parsecs, i.e., parallaxes greater than 40 mas with errors less than 10 mas. Carefully vetted astrometric, photometric, and spectroscopic data are incorporated intothe Database from reliable sources, including significant original data collected by members of the RECONS Team.Current exploration of the solar neighborhood by RECONS, enabled by the Database, focuses on the ubiquitous red dwarfs, including: assessing the stellar companion population of ~1200 red dwarfs (Winters), investigating the astrophysical causes that spread red dwarfs of similar temperatures by a factor of 16 in luminosity (Pewett), and canvassing ~3000 red dwarfs for excess emission due to unseen companions and dust (Silverstein). In addition, a decade long astrometric survey of ~500 red dwarfs in the southern sky has begun, in an effort to understand the stellar, brown dwarf, and planetary companion populations for the stars that make up at least 75% of all stars in the Universe.This effort has been supported by the NSF through grants AST-0908402, AST-1109445, and AST-1412026, and via observations made possible by the SMARTS Consortium.
Mohadjer, Solmaz; Ehlers, Todd A.; Kakar, Najibullah
The ongoing collision of the Indian subcontinent with Asia controls active tectonics and seismicity in Central Asia. This motion is accommodated by faults that have historically caused devastating earthquakes and continue to pose serious threats to the population at risk. Despite international and regional efforts to assess seismic hazards in Central Asia, little attention has been given to development of a comprehensive database for active faults in the region. To address this issue and to better understand the distribution and level of seismic hazard in Central Asia, we are developing a publically available database for active faults of Central Asia (including but not limited to Afghanistan, Tajikistan, Kyrgyzstan, northern Pakistan and western China) using ArcGIS. The database is designed to allow users to store, map and query important fault parameters such as fault location, displacement history, rate of movement, and other data relevant to seismic hazard studies including fault trench locations, geochronology constraints, and seismic studies. Data sources integrated into the database include previously published maps and scientific investigations as well as strain rate measurements and historic and recent seismicity. In addition, high resolution Quickbird, Spot, and Aster imagery are used for selected features to locate and measure offset of landforms associated with Quaternary faulting. These features are individually digitized and linked to attribute tables that provide a description for each feature. Preliminary observations include inconsistent and sometimes inaccurate information for faults documented in different studies. For example, the Darvaz-Karakul fault which roughly defines the western margin of the Pamir, has been mapped with differences in location of up to 12 kilometers. The sense of motion for this fault ranges from unknown to thrust and strike-slip in three different studies despite documented left-lateral displacements of Holocene and late
Qi, Yunfeng; Wang, Dadong; Wang, Daying; Jin, Taicheng; Yang, Liping; Wu, Hui; Li, Yaoyao; Zhao, Jing; Du, Fengping; Song, Mingxia; Wang, Renjun
Epigenetic drugs are chemical compounds that target disordered post-translational modification of histone proteins and DNA through enzymes, and the recognition of these changes by adaptor proteins. Epigenetic drug-related experimental data such as gene expression probed by high-throughput sequencing, co-crystal structure probed by X-RAY diffraction and binding constants probed by bio-assay have become widely available. The mining and integration of multiple kinds of data can be beneficial to drug discovery and drug repurposing. HEMD and other epigenetic databases store comprehensively epigenetic data where users can acquire segmental information of epigenetic drugs. However, some data types such as high-throughput datasets are not provide by these databases and they do not support flexible queries for epigenetic drug-related experimental data. Therefore, in reference to HEMD and other epigenetic databases, we developed a relatively comprehensive database for human epigenetic drugs. The human epigenetic drug database (HEDD) focuses on the storage and integration of epigenetic drug datasets obtained from laboratory experiments and manually curated information. The latest release of HEDD incorporates five kinds of datasets: (i) drug, (ii) target, (iii) disease, (vi) high-throughput and (v) complex. In order to facilitate data extraction, flexible search options were built in HEDD, which allowed an unlimited condition query for specific kinds of datasets using drug names, diseases and experiment types. Database URL: http://hedds.org/ PMID:28025347
Landi, Monica; Dimech, Mark; Arculeo, Marco; Biondo, Girolama; Martins, Rogelia; Carneiro, Miguel; Carvalho, Gary Robert; Brutto, Sabrina Lo; Costa, Filipe O.
Background DNA barcoding enhances the prospects for species-level identifications globally using a standardized and authenticated DNA-based approach. Reference libraries comprising validated DNA barcodes (COI) constitute robust datasets for testing query sequences, providing considerable utility to identify marine fish and other organisms. Here we test the feasibility of using DNA barcoding to assign species to tissue samples from fish collected in the central Mediterranean Sea, a major contributor to the European marine ichthyofaunal diversity. Methodology/Principal Findings A dataset of 1278 DNA barcodes, representing 218 marine fish species, was used to test the utility of DNA barcodes to assign species from query sequences. We tested query sequences against 1) a reference library of ranked DNA barcodes from the neighbouring North East Atlantic, and 2) the public databases BOLD and GenBank. In the first case, a reference library comprising DNA barcodes with reliability grades for 146 fish species was used as diagnostic dataset to screen 486 query DNA sequences from fish specimens collected in the central basin of the Mediterranean Sea. Of all query sequences suitable for comparisons 98% were unambiguously confirmed through complete match with reference DNA barcodes. In the second case, it was possible to assign species to 83% (BOLD-IDS) and 72% (GenBank) of the sequences from the Mediterranean. Relatively high intraspecific genetic distances were found in 7 species (2.2%–18.74%), most of them of high commercial relevance, suggesting possible cryptic species. Conclusion/Significance We emphasize the discriminatory power of COI barcodes and their application to cases requiring species level resolution starting from query sequences. Results highlight the value of public reference libraries of reliability grade-annotated DNA barcodes, to identify species from different geographical origins. The ability to assign species with high precision from DNA samples of
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O’Neill, Kathleen; Tolstoy, Igor
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks. PMID:24316578
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Fosha, Charles E.
This paper addresses approaches to using publicity and public relations to meet the goals of the NASA Space Grant College. Methods universities and colleges can use to publicize space activities are presented.
Kannegaard, Pia Nimann; Vinding, Kirsten L; Hare-Bruun, Helle
Aim of database The aim of the National Database of Geriatrics is to monitor the quality of interdisciplinary diagnostics and treatment of patients admitted to a geriatric hospital unit. Study population The database population consists of patients who were admitted to a geriatric hospital unit. Geriatric patients cannot be defined by specific diagnoses. A geriatric patient is typically a frail multimorbid elderly patient with decreasing functional ability and social challenges. The database includes 14–15,000 admissions per year, and the database completeness has been stable at 90% during the past 5 years. Main variables An important part of the geriatric approach is the interdisciplinary collaboration. Indicators, therefore, reflect the combined efforts directed toward the geriatric patient. The indicators include Barthel index, body mass index, de Morton Mobility Index, Chair Stand, percentage of discharges with a rehabilitation plan, and the part of cases where an interdisciplinary conference has taken place. Data are recorded by doctors, nurses, and therapists in a database and linked to the Danish National Patient Register. Descriptive data Descriptive patient-related data include information about home, mobility aid, need of fall and/or cognitive diagnosing, and categorization of cause (general geriatric, orthogeriatric, or neurogeriatric). Conclusion The National Database of Geriatrics covers ∼90% of geriatric admissions in Danish hospitals and provides valuable information about a large and increasing patient population in the health care system. PMID:27822120
The Hazard Analysis Database was developed in conjunction with the hazard analysis activities conducted in accordance with DOE-STD-3009-94, Preparation Guide for U S . Department of Energy Nonreactor Nuclear Facility Safety Analysis Reports, for HNF-SD-WM-SAR-067, Tank Farms Final Safety Analysis Report (FSAR). The FSAR is part of the approved Authorization Basis (AB) for the River Protection Project (RPP). This document describes, identifies, and defines the contents and structure of the Tank Farms FSAR Hazard Analysis Database and documents the configuration control changes made to the database. The Hazard Analysis Database contains the collection of information generated during the initial hazard evaluations and the subsequent hazard and accident analysis activities. The Hazard Analysis Database supports the preparation of Chapters 3 ,4 , and 5 of the Tank Farms FSAR and the Unreviewed Safety Question (USQ) process and consists of two major, interrelated data sets: (1) Hazard Analysis Database: Data from the results of the hazard evaluations, and (2) Hazard Topography Database: Data from the system familiarization and hazard identification.
This catalog lists 783 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into NASA Scientific and Technical Information Database during the year's 1987 through 1990. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
This catalog lists 239 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered in the NASA scientific and technical information database during accession year 1987. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
Johnson, Karl E.
Conclusions of surveys (63 libraries, OCLC database, University of Rhode Island users) assessing handling of Institute of Electrical and Electronics Engineers (IEEE) conference publications indicate that most libraries fully catalog these publications using LC cataloging, and library patrons frequently require series access to publications. Eight…
This catalog lists 458 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA Scientific and Technical Information database during accession year 1991 through 1992. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
This catalog lists 190 citations of all NASA Special Publications, NASA Reference Publications, NASA Conference Publications, and NASA Technical Papers that were entered into the NASA scientific and technical information database during accession year 1989. The entries are grouped by subject category. Indexes of subject terms, personal authors, and NASA report numbers are provided.
SRD 31 NIST/ACerS Phase Equilibria Diagrams Database (PC database for purchase) The Phase Equilibria Diagrams Database contains commentaries and more than 21,000 diagrams for non-organic systems, including those published in all 21 hard-copy volumes produced as part of the ACerS-NIST Phase Equilibria Diagrams Program (formerly titled Phase Diagrams for Ceramists): Volumes I through XIV (blue books); Annuals 91, 92, 93; High Tc Superconductors I & II; Zirconium & Zirconia Systems; and Electronic Ceramics I. Materials covered include oxides as well as non-oxide systems such as chalcogenides and pnictides, phosphates, salt systems, and mixed systems of these classes.
Suzuki, Kazuaki; Shimura, Kazuki; Monma, Yoshio; Sakamoto, Masao; Morishita, Hiroshi; Kanazawa, Kenji
The Japan Information Center of Science and Technology (JICST) has started the on-line service of JICST/NRIM Materials Strength Database for Engineering Steels and Alloys (JICST ME) in this March (1990). This database has been developed under the joint research between JICST and the National Research Institute for Metals (NRIM). It provides material strength data (creep, fatigue, etc.) of engineering steels and alloys. It is able to search and display on-line, and to analyze the searched data statistically and plot the result on graphic display. The database system and the data in JICST ME are described.
Lamberg, Anna Lei; Sølvsten, Henrik; Lei, Ulrikke; Vinding, Gabrielle Randskov; Stender, Ida Marie; Jemec, Gregor Borut Ernst; Vestergaard, Tine; Thormann, Henrik; Hædersdal, Merete; Dam, Tomas Norman; Olesen, Anne Braae
Aim of database The Danish Nonmelanoma Skin Cancer Dermatology Database was established in 2008. The aim of this database was to collect data on nonmelanoma skin cancer (NMSC) treatment and improve its treatment in Denmark. NMSC is the most common malignancy in the western countries and represents a significant challenge in terms of public health management and health care costs. However, high-quality epidemiological and treatment data on NMSC are sparse. Study population The NMSC database includes patients with the following skin tumors: basal cell carcinoma (BCC), squamous cell carcinoma, Bowen’s disease, and keratoacanthoma diagnosed by the participating office-based dermatologists in Denmark. Main variables Clinical and histological diagnoses, BCC subtype, localization, size, skin cancer history, skin phototype, and evidence of metastases and treatment modality are the main variables in the NMSC database. Information on recurrence, cosmetic results, and complications are registered at two follow-up visits at 3 months (between 0 and 6 months) and 12 months (between 6 and 15 months) after treatment. Descriptive data In 2014, 11,522 patients with 17,575 tumors were registered in the database. Of tumors with a histological diagnosis, 13,571 were BCCs, 840 squamous cell carcinomas, 504 Bowen’s disease, and 173 keratoakanthomas. Conclusion The NMSC database encompasses detailed information on the type of tumor, a variety of prognostic factors, treatment modalities, and outcomes after treatment. The database has revealed that overall, the quality of care of NMSC in Danish dermatological clinics is high, and the database provides the necessary data for continuous quality assurance. PMID:27822110
Meschel, S. V.
Provides exploration into types of numeric databases available (also known as source databases, nonbibliographic databases, data-files, data-banks, fact banks); examines differences and similarities between bibliographic and numeric databases; identifies disciplines that utilize numeric databases; and surveys representative examples in the…
Mishchenko, Michael I.; Zakharova, Nadezhda T.; Khlebtsov, Nikolai G.; Wriedt, Thomas; Videen, Gorden
This paper is the sixth update to the comprehensive thematic database of peer-reviewedT-matrix publications initiated by us in 2004 and includes relevant publications that have appeared since 2013. It also lists several earlier publications not incorporated in the original database and previous updates.
New York State Library, Albany. Database Services.
This brochure describes the online information services at the New York State Library, which has online access to over 250 databases covering a broad range of subject areas, including current events, law, science, medicine, public affairs, grants, business, computer technology, education, social welfare, and humanities. Many of these databases are…
Flores-Buils, Raquel; Gil-Beltran, Jose Manuel; Caballer-Miedes, Antonio; Martinez-Martinez, Miguel Angel
The scientometric study of scientific output through publications in specialized journals cannot be undertaken exclusively with the databases available today. For this reason, the objective of this article is to introduce the "Base de Datos de Investigacion en Orientacion Vocacional" [Vocational Guidance Research Database], based on the…
... 42 Public Health 4 2011-10-01 2011-10-01 false Federal database checks. 455.436 Section 455.436....436 Federal database checks. The State Medicaid agency must do all of the following: (a) Confirm the... interest or who is an agent or managing employee of the provider through routine checks of...
Connell, Tschera Harkness; Prabha, Chandra
Examines the characteristics of Web resources in Online Computer Library Center's (OCLC) Cooperative Online Resource Catalog (CORC) in terms of subject matter, source of content, publication patterns, and units of information chosen for representation in the database. Suggests that the ability to successfully use a database depends on…
Selama, Okba; James, Phillip; Nateche, Farida; Wellington, Elizabeth M H; Hacène, Hocine
Databases are an essential tool and resource within the field of bioinformatics. The primary aim of this study was to generate an overview of global bacterial biodiversity and biogeography using available data from the two largest public online databases, NCBI Nucleotide and GBIF. The secondary aim was to highlight the contribution each geographic area has to each database. The basis for data analysis of this study was the metadata provided by both databases, mainly, the taxonomy and the geographical area origin of isolation of the microorganism (record). These were directly obtained from GBIF through the online interface, while E-utilities and Python were used in combination with a programmatic web service access to obtain data from the NCBI Nucleotide Database. Results indicate that the American continent, and more specifically the USA, is the top contributor, while Africa and Antarctica are less well represented. This highlights the imbalance of exploration within these areas rather than any reduction in biodiversity. This study describes a novel approach to generating global scale patterns of bacterial biodiversity and biogeography and indicates that the Proteobacteria are the most abundant and widely distributed phylum within both databases.
The CTEPP (Children's Total Exposure to Persistent Pesticides and Other Persistent Organic Pollutants) database contains a wealth of data on children's aggregate exposures to pollutants in their everyday surroundings. Chemical analysis data for the environmental media and ques...
SRD 17 NIST Chemical Kinetics Database (Web, free access) The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.
Pangalos, G; Khair, M; Bozios, L
A methodology for the enhancement of database security in a hospital environment is presented in this paper which is based on both the discretionary and the mandatory database security policies. In this way the advantages of both approaches are combined to enhance medical database security. An appropriate classification of the different types of users according to their different needs and roles and a User Role Definition Hierarchy has been used. The experience obtained from the experimental implementation of the proposed methodology in a major general hospital is briefly discussed. The implementation has shown that the combined discretionary and mandatory security enforcement effectively limits the unauthorized access to the medical database, without severely restricting the capabilities of the system.
EPA has compiled mine location information from federal, state, and Tribal agencies into a single database as part of its investigation into the potential environmental hazards of wastes from abandoned uranium mines in the western United States.
The Anaerobic Digester Database provides basic information about anaerobic digesters on livestock farms in the United States, organized in Excel spreadsheets. It includes projects that are under construction, operating, or shut down.
Pritychenko, B.; Běták, E.; Singh, B.; Totans, J.
The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energy Agency (http://www-nds.iaea.org/nsr)
The Refrigerant Database consolidates and facilitates access to information to assist industry in developing equipment using alternative refrigerants. The underlying purpose is to accelerate phase out of chemical compounds of environmental concern.
Baehr, A.; Hagstrom, R.; Joerg, D.; Overbeek, R.
A natural-language interface has been developed that retrieves genomic information by using a simple subset of English. The interface spares the biologist from the task of learning database-specific query languages and computer programming. Currently, the interface deals with the E. coli genome. It can, however, be readily extended and shows promise as a means of easy access to other sequenced genomic databases as well.
Day, C.T.; Loken, S.; MacFarlane, J.F. ); May, E.; Lifka, D.; Lusk, E.; Price, L.E. ); Baden, A. . Dept. of Physics); Grossman, R.; Qin, X. . Dept. of Mathematics, Statistics and Computer Science); Cormell, L.; Leibold, P.; Liu, D
The major SSC experiments are expected to produce up to 1 Petabyte of data per year each. Once the primary reconstruction is completed by farms of inexpensive processors. I/O becomes a major factor in further analysis of the data. We believe that the application of database techniques can significantly reduce the I/O performed in these analyses. We present examples of such I/O reductions in prototype based on relational and object-oriented databases of CDF data samples.
Talbot, C; Cuticchia, A J
This unit concentrates on the data contained within two human genome databasesGDB (Genome Database) and OMIM (Online Mendelian Inheritance in Man)and includes discussion of different methods for submitting and accessing data. An understanding of electronic mail, FTP, and the use of a World Wide Web (WWW) navigational tool such as Netscape or Internet Explorer is a prerequisite for utilizing the information in this unit.
SRD 10 NIST/ASME Steam Properties Database (PC database for purchase) Based upon the International Association for the Properties of Water and Steam (IAPWS) 1995 formulation for the thermodynamic properties of water and the most recent IAPWS formulations for transport and other properties, this updated version provides water properties over a wide range of conditions according to the accepted international standards.
Vita, Randi; Zarebski, Laura; Greenbaum, Jason A.; Emami, Hussein; Hoof, Ilka; Salimi, Nima; Damle, Rohini; Sette, Alessandro; Peters, Bjoern
The Immune Epitope Database (IEDB, www.iedb.org) provides a catalog of experimentally characterized B and T cell epitopes, as well as data on Major Histocompatibility Complex (MHC) binding and MHC ligand elution experiments. The database represents the molecular structures recognized by adaptive immune receptors and the experimental contexts in which these molecules were determined to be immune epitopes. Epitopes recognized in humans, nonhuman primates, rodents, pigs, cats and all other tested species are included. Both positive and negative experimental results are captured. Over the course of 4 years, the data from 180 978 experiments were curated manually from the literature, which covers ∼99% of all publicly available information on peptide epitopes mapped in infectious agents (excluding HIV) and 93% of those mapped in allergens. In addition, data that would otherwise be unavailable to the public from 129 186 experiments were submitted directly by investigators. The curation of epitopes related to autoimmunity is expected to be completed by the end of 2010. The database can be queried by epitope structure, source organism, MHC restriction, assay type or host organism, among other criteria. The database structure, as well as its querying, browsing and reporting interfaces, was completely redesigned for the IEDB 2.0 release, which became publicly available in early 2009. PMID:19906713
Stoppacher, Norbert; Neumann, Nora K N; Burgstaller, Lukas; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer
Peptaibiotics are nonribosomally biosynthesized peptides, which - according to definition - contain the marker amino acid α-aminoisobutyric acid (Aib) and possess antibiotic properties. Being known since 1958, a constantly increasing number of peptaibiotics have been described and investigated with a particular emphasis on hypocrealean fungi. Starting from the existing online 'Peptaibol Database', first published in 1997, an exhaustive literature survey of all known peptaibiotics was carried out and resulted in a list of 1043 peptaibiotics. The gathered information was compiled and used to create the new 'The Comprehensive Peptaibiotics Database', which is presented here. The database was devised as a software tool based on Microsoft (MS) Access. It is freely available from the internet at http://peptaibiotics-database.boku.ac.at and can easily be installed and operated on any computer offering a Windows XP/7 environment. It provides useful information on characteristic properties of the peptaibiotics included such as peptide category, group name of the microheterogeneous mixture to which the peptide belongs, amino acid sequence, sequence length, producing fungus, peptide subfamily, molecular formula, and monoisotopic mass. All these characteristics can be used and combined for automated search within the database, which makes The Comprehensive Peptaibiotics Database a versatile tool for the retrieval of valuable information about peptaibiotics. Sequence data have been considered as to December 14, 2012.
Murray, ShaTerea R.
This summer I had the opportunity to work in the Environmental Management Office (EMO) under the Chemical Sampling and Analysis Team or CS&AT. This team s mission is to support Glenn Research Center (GRC) and EM0 by providing chemical sampling and analysis services and expert consulting. Services include sampling and chemical analysis of water, soil, fbels, oils, paint, insulation materials, etc. One of this team s major projects is the Drinking Water Project. This is a project that is done on Glenn s water coolers and ten percent of its sink every two years. For the past two summers an intern had been putting together a database for this team to record the test they had perform. She had successfully created a database but hadn't worked out all the quirks. So this summer William Wilder (an intern from Cleveland State University) and I worked together to perfect her database. We began be finding out exactly what every member of the team thought about the database and what they would change if any. After collecting this data we both had to take some courses in Microsoft Access in order to fix the problems. Next we began looking at what exactly how the database worked from the outside inward. Then we began trying to change the database but we quickly found out that this would be virtually impossible.
Saier, Milton H.; Reddy, Vamsee S.; Tamang, Dorjee G.; Västermark, Åke
The Transporter Classification Database (TCDB; http://www.tcdb.org) serves as a common reference point for transport protein research. The database contains more than 10 000 non-redundant proteins that represent all currently recognized families of transmembrane molecular transport systems. Proteins in TCDB are organized in a five level hierarchical system, where the first two levels are the class and subclass, the second two are the family and subfamily, and the last one is the transport system. Superfamilies that contain multiple families are included as hyperlinks to the five tier TC hierarchy. TCDB includes proteins from all types of living organisms and is the only transporter classification system that is both universal and recognized by the International Union of Biochemistry and Molecular Biology. It has been expanded by manual curation, contains extensive text descriptions providing structural, functional, mechanistic and evolutionary information, is supported by unique software and is interconnected to many other relevant databases. TCDB is of increasing usefulness to the international scientific community and can serve as a model for the expansion of database technologies. This manuscript describes an update of the database descriptions previously featured in NAR database issues. PMID:24225317
Gasparyan, Armen Yuri; Yessirkepov, Marlen; Voronov, Alexander A; Trukhachev, Vladimir I; Kostyukova, Elena I; Gerasimov, Alexey N; Kitas, George D
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls.
Specialist bibliographic databases offer essential online tools for researchers and authors who work on specific subjects and perform comprehensive and systematic syntheses of evidence. This article presents examples of the established specialist databases, which may be of interest to those engaged in multidisciplinary science communication. Access to most specialist databases is through subscription schemes and membership in professional associations. Several aggregators of information and database vendors, such as EBSCOhost and ProQuest, facilitate advanced searches supported by specialist keyword thesauri. Searches of items through specialist databases are complementary to those through multidisciplinary research platforms, such as PubMed, Web of Science, and Google Scholar. Familiarizing with the functional characteristics of biomedical and nonbiomedical bibliographic search tools is mandatory for researchers, authors, editors, and publishers. The database users are offered updates of the indexed journal lists, abstracts, author profiles, and links to other metadata. Editors and publishers may find particularly useful source selection criteria and apply for coverage of their peer-reviewed journals and grey literature sources. These criteria are aimed at accepting relevant sources with established editorial policies and quality controls. PMID:27134485
Shay, Johanna Y.
The composition and physical properties of crude oil vary widely from one reservoir to another within an oil field, as well as from one field or region to another. Although all oils consist of hydrocarbons and their derivatives, the proportions of various types of compounds differ greatly. This makes some oils more suitable than others for specific refining processes and uses. To take advantage of this diversity, one needs access to information in a large database of crude oil analyses. The Crude Oil Analysis Database (COADB) currently satisfies this need by offering 9,056 crude oil analyses. Of these, 8,500 are United States domestic oils. The database contains results of analysis of the general properties and chemical composition, as well as the field, formation, and geographic location of the crude oil sample. [Taken from the Introduction to COAMDATA_DESC.pdf, part of the zipped software and database file at http://www.netl.doe.gov/technologies/oil-gas/Software/database.html] Save the zipped file to your PC. When opened, it will contain PDF documents and a large Excel spreadsheet. It will also contain the database in Microsoft Access 2002.
Bayefsky, Michelle J
Since the human genome was decoded, great emphasis has been placed on the unique, personal nature of the genome, along with the benefits that personalized medicine can bring to individuals and the importance of safeguarding genetic privacy. As a result, an equally important aspect of the human genome - its common nature - has been underappreciated and underrepresented in the ethics literature and policy dialogue surrounding genetics and genomics. This article will argue that, just as the personal nature of the genome has been used to reinforce individual rights and justify important privacy protections, so too the common nature of the genome can be employed to support protections of the genome at a population level and policies designed to promote the public's wellbeing. In order for public health officials to have the authority to develop genetics policies for the sake of the public good, the genome must have not only a common, but also a public, dimension. This article contends that DNA carries a public dimension through the use of two conceptual frameworks: the common heritage (CH) framework and the common resource (CR) framework. Both frameworks establish a public interest in the human genome, but the CH framework can be used to justify policies aimed at preserving and protecting the genome, while the CR framework can be employed to justify policies for utilizing the genome for the public benefit. A variety of possible policy implications are discussed, with special attention paid to the use of large-scale genomics databases for public health research.
Neumann, Nora K N; Stoppacher, Norbert; Zeilinger, Susanne; Degenkolb, Thomas; Brückner, Hans; Schuhmacher, Rainer
In this work, we present the 'Peptaibiotics Database' (PDB), a comprehensive online resource, which intends to cover all Aib-containing non-ribosomal fungal peptides currently described in scientific literature. This database shall extend and update the recently published 'Comprehensive Peptaibiotics Database' and currently consists of 1,297 peptaibiotic sequences. In a literature survey, a total of 235 peptaibiotic sequences published between January 2013 and June 2014 have been compiled, and added to the list of 1,062 peptides in the recently published 'Comprehensive Peptaibiotics Database'. The presented database is intended as a public resource freely accessible to the scientific community at peptaibiotics-database.boku.ac.at. The search options of the previously published repository and the presentation of sequence motif searches have been extended significantly. All of the available search options can be combined to create complex database queries. As a public repository, the presented database enables the easy upload of new peptaibiotic sequences or the correction of existing informations. In addition, an administrative interface for maintenance of the content of the database has been implemented, and the design of the database can be easily extended to store additional information to accommodate future needs of the 'peptaibiomics community'.
Ross, B.M.; Bonaldo, M.F.; Vitale, E.
A YAC contig has been constructed across the spinal muscular atrophy (SMA) region of chromosome 5 (5q11-13). Further definition by pedigree analysis has yielded a minimal genetic region of 400 kb. For isolation of candidate genes in this region, the following cDNA selection method was hybridized to directionally cloned normalized (Cot 1 DNA-preannealed) cDNA libraries in the form of single-stranded circles. The libraries used were constructed from SMA infant brain and normal fetal liver+spleen. Hybridizing circles were eluted off the filter, partially converted into duplexes and electroporated into bacteria. The selected clones were then sequentially hybridized with a human Cot 1 DNA probe (BRL), and a probe made from the corresponding YAC DNA. Clones that hybridized only to the YAC DNA probe were verified to map to the critical region by genomic Southern analyses. Ten different cDNA clones have been isolated by this procedure so far. Three of them have been definitively mapped back to the region. Four of the ten clones are now completely sequenced. One clone shows sequence homology to a transcriptional initiation factor; another has homology to a prokaryotic attachment site sequence for the lipid moiety of membrane lipoproteins. Two clones show no homology to sequences represented in the public databases. We are continuing the full characterization of the cDNA clones as candidates for the SMA gene.
Reviews the best and worst in databases on disk, CD-ROM, and online, and offers judgments and observations on database characteristics. Two databases are praised and three are criticized. (Author/JMV)
Baek, Su-Jin; Yang, Sungjin; Kang, Tae-Wook; Park, Seong-Min; Kim, Yong Sung; Kim, Seon-Young
Integrated analysis of DNA methylation and gene expression can reveal specific epigenetic patterns that are important during carcinogenesis. We built an integrated database of DNA methylation and gene expression termed MENT (Methylation and Expression database of Normal and Tumor tissues) to provide researchers information on both DNA methylation and gene expression in diverse cancers. It contains integrated data of DNA methylation, gene expression, correlation of DNA methylation and gene expression in paired samples, and clinicopathological conditions gathered from the GEO (Gene Expression Omnibus) and TCGA (The Cancer Genome Atlas). A user-friendly interface allows users to search for differential DNA methylation by either 'gene search' or 'dataset search'. The 'gene search' returns which conditions are differentially methylated in a gene of interest, while 'dataset search' returns which genes are differentially methylated in a condition of interest based on filtering options such as direction, DM (differential methylation value), and p-value. MENT is the first database which provides both DNA methylation and gene expression information in diverse normal and tumor tissues. Its user-friendly interface allows users to easily search and view both DNA methylation and gene expression patterns. MENT is freely available at http://mgrc.kribb.re.kr:8080/MENT/.
Vianello, Dario; Sevini, Federica; Castellani, Gastone; Lomartire, Laura; Capri, Miriam; Franceschi, Claudio
Deep sequencing technologies are completely revolutionizing the approach to DNA analysis. Mitochondrial DNA (mtDNA) studies entered in the "postgenomic era": the burst in sequenced samples observed in nuclear genomics is expected also in mitochondria, a trend that can already be detected checking complete mtDNA sequences database submission rate. Tools for the analysis of these data are available, but they fail in throughput or in easiness of use. We present here a new pipeline based on previous algorithms, inherited from the "nuclear genomic toolbox," combined with a newly developed algorithm capable of efficiently and easily classify new mtDNA sequences according to PhyloTree nomenclature. Detected mutations are also annotated using data collected from publicly available databases. Thanks to the analysis of all freely available sequences with known haplogroup obtained from GenBank, we were able to produce a PhyloTree-based weighted tree, taking into account each haplogroup pattern conservation. The combination of a highly efficient aligner, coupled with our algorithm and massive usage of asynchronous parallel processing, allowed us to build a high-throughput pipeline for the analysis of mtDNA sequences that can be quickly updated to follow the ever-changing nomenclature. HaploFind is freely accessible at the following Web address: https://haplofind.unibo.it.
Outlined here is the subject scope of the NASA Aerospace Database, a publicly available subset of the NASA Scientific and Technical (STI) Database. Topics of interest to NASA are outlined and placed within the framework of the following broad aerospace subject categories: aeronautics, astronautics, chemistry and materials, engineering, geosciences, life sciences, mathematical and computer sciences, physics, social sciences, space sciences, and general. A brief discussion of the subject scope is given for each broad area, followed by a similar explanation of each of the narrower subject fields that follow. The subject category code is listed for each entry.
Brown, D A; Vogt, R
The authors propose to develop a high-energy heavy-ion experimental database and make it accessible to the scientific community through an on-line interface. This database will be searchable and cross-indexed with relevant publications, including published detector descriptions. Since this database will be a community resource, it requires the high-energy nuclear physics community's financial and manpower support. This database should eventually contain all published data from Bevalac, AGS and SPS to RHIC and CERN-LHC energies, proton-proton to nucleus-nucleus collisions as well as other relevant systems, and all measured observables. Such a database would have tremendous scientific payoff as it makes systematic studies easier and allows simpler benchmarking of theoretical models to a broad range of old and new experiments. Furthermore, there is a growing need for compilations of high-energy nuclear data for applications including stockpile stewardship, technology development for inertial confinement fusion and target and source development for upcoming facilities such as the Next Linear Collider. To enhance the utility of this database, they propose periodically performing evaluations of the data and summarizing the results in topical reviews.
This report describes the approach followed to develop a database for mechanical properties of textile composites. The data in this database is assembled from NASA Advanced Composites Technology (ACT) programs and from data in the public domain. This database meets the data documentation requirements of MIL-HDBK-17, Section 8.1.2, which describes in detail the type and amount of information needed to completely document composite material properties. The database focuses on mechanical properties of textile composite. Properties are available for a range of parameters such as direction, fiber architecture, materials, environmental condition, and failure mode. The composite materials in the database contain innovative textile architectures such as the braided, woven, and knitted materials evaluated under the NASA ACT programs. In summary, the database contains results for approximately 3500 coupon level tests, for ten different fiber/resin combinations, and seven different textile architectures. It also includes a limited amount of prepreg tape composites data from ACT programs where side-by-side comparisons were made.
Robinson, James; Halliwell, Jason A.; McWilliam, Hamish; Lopez, Rodrigo; Marsh, Steven G. E.
The Immuno Polymorphism Database (IPD), http://www.ebi.ac.uk/ipd/ is a set of specialist databases related to the study of polymorphic genes in the immune system. The IPD project works with specialist groups or nomenclature committees who provide and curate individual sections before they are submitted to IPD for online publication. The IPD project stores all the data in a set of related databases. IPD currently consists of four databases: IPD-KIR, contains the allelic sequences of killer-cell immunoglobulin-like receptors, IPD-MHC, a database of sequences of the major histocompatibility complex of different species; IPD-HPA, alloantigens expressed only on platelets; and IPD-ESTDAB, which provides access to the European Searchable Tumour Cell-Line Database, a cell bank of immunologically characterized melanoma cell lines. The data is currently available online from the website and FTP directory. This article describes the latest updates and additional tools added to the IPD project. PMID:23180793
Solberg, J. L.; Pleasant, L. G.
A list of publications supported by the Space Medicine Program, Office of Space Science and Applications is given. Included are publications entered into the Life Sciences Bibliographic Database by The George Washington University as of October 1, 1984.
Zhang, N.; Blodgett, R.B.; Hofstra, A.H.
The U.S. Geological Survey has constructed a paleontological database for the Great Basin physiographic province that can be served over the World Wide Web for data entry, queries, displays, and retrievals. It is similar to the web-database solution that we constructed for Alaskan paleontological data (www.alaskafossil.org). The first phase of this effort was to compile a paleontological bibliography for Nevada and portions of adjacent states in the Great Basin that has recently been completed. In addition, we are also compiling paleontological reports (Known as E&R reports) of the U.S. Geological Survey, which are another extensive source of l,egacy data for this region. Initial population of the database benefited from a recently published conodont data set and is otherwise focused on Devonian and Mississippian localities because strata of this age host important sedimentary exhalative (sedex) Au, Zn, and barite resources and enormons Carlin-type An deposits. In addition, these strata are the most important petroleum source rocks in the region, and record the transition from extension to contraction associated with the Antler orogeny, the Alamo meteorite impact, and biotic crises associated with global oceanic anoxic events. The finished product will provide an invaluable tool for future geologic mapping, paleontological research, and mineral resource investigations in the Great Basin, making paleontological data acquired over nearly the past 150 yr readily available over the World Wide Web. A description of the structure of the database and the web interface developed for this effort are provided herein. This database is being used ws a model for a National Paleontological Database (which we am currently developing for the U.S. Geological Survey) as well as for other paleontological databases now being developed in other parts of the globe. ?? 2008 Geological Society of America.
This technical highlight describes NREL research to develop a publicly available database of energy retrofit measures containing performance characteristics and cost estimates for nearly 3,000 measures.
Asher, Robert J
Background Recent publications concerning the interordinal phylogeny of placental mammals have converged on a common signal, consisting of four major radiations with some ambiguity regarding the placental root. The DNA data with which these relationships have been reconstructed are easily accessible from public databases; access to morphological characters is much more difficult. Here, I present a graphical web-database of morphological characters focusing on placental mammals, in tandem with a combined-data phylogenetic analysis of placental mammal phylogeny. Results The results reinforce the growing consensus regarding the extant placental mammal clades of Afrotheria, Xenarthra, Euarchontoglires, and Laurasiatheria. Unweighted parsimony applied to all DNA sequences and insertion-deletion (indel) characters of extant taxa alone support a placental root at murid rodents; combined with morphology this shifts to Afrotheria. Bayesian analyses of morphology, indels, and DNA support both a basal position for Afrotheria and the position of Cretaceous eutherians outside of crown Placentalia. Depending on treatment of third codon positions, the affinity of several fossils (Leptictis,Paleoparadoxia, Plesiorycteropus and Zalambdalestes) vary, highlighting the potential effect of sequence data on fossils for which such data are missing. Conclusion The combined dataset supports the location of the placental mammal root at Afrotheria or Xenarthra, not at Erinaceus or rodents. Even a small morphological dataset can have a marked influence on the location of the root in a combined-data analysis. Additional morphological data are desirable to better reconstruct the position of several fossil taxa; and the graphic-rich, web-based morphology data matrix presented here will make it easier to incorporate more taxa into a larger data matrix. PMID:17608930
Hervold, Kieran; Martin, Andrew; Kirkpatrick, Roger A; Mc Kenna, Paul F; Ramirez-Weber, F A
The Hedgehog Signaling Pathway Database is a curated repository of information pertaining to the Hedgehog developmental pathway. It was designed to provide centralized access to a wide range of relevant information in an organism-agnostic manner. Data are provided for all genes and gene targets known to be involved in the Hh pathway across various organisms. The data provided include DNA and protein sequences as well as domain structure motifs. All known human diseases associated with the Hh pathway are indexed including experimental data on therapeutic agents and their molecular targets. Hh researchers will find useful information on relevant protocols, tissue cell lines and reagents used in current Hh research projects. Curated content is also provided for publications, grants and patents relating to the Hh pathway. The database can be accessed at http://www.hedgehog.sfsu.edu.
Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi
All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomy databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.
Paula, Débora P; Linard, Benjamin; Crampton-Platt, Alex; Srivathsan, Amrita; Timmermans, Martijn J T N; Sujii, Edison R; Pires, Carmen S S; Souza, Lucas M; Andow, David A; Vogler, Alfried P
Characterizing trophic networks is fundamental to many questions in ecology, but this typically requires painstaking efforts, especially to identify the diet of small generalist predators. Several attempts have been devoted to develop suitable molecular tools to determine predatory trophic interactions through gut content analysis, and the challenge has been to achieve simultaneously high taxonomic breadth and resolution. General and practical methods are still needed, preferably independent of PCR amplification of barcodes, to recover a broader range of interactions. Here we applied shotgun-sequencing of the DNA from arthropod predator gut contents, extracted from four common coccinellid and dermapteran predators co-occurring in an agroecosystem in Brazil. By matching unassembled reads against six DNA reference databases obtained from public databases and newly assembled mitogenomes, and filtering for high overlap length and identity, we identified prey and other foreign DNA in the predator guts. Good taxonomic breadth and resolution was achieved (93% of prey identified to species or genus), but with low recovery of matching reads. Two to nine trophic interactions were found for these predators, some of which were only inferred by the presence of parasitoids and components of the microbiome known to be associated with aphid prey. Intraguild predation was also found, including among closely related ladybird species. Uncertainty arises from the lack of comprehensive reference databases and reliance on low numbers of matching reads accentuating the risk of false positives. We discuss caveats and some future prospects that could improve the use of direct DNA shotgun-sequencing to characterize arthropod trophic networks.
Yao, Jianbo; Coussens, Paul M; Saama, Peter; Suchyta, Steven; Ernst, Catherine W
Recent developments in microarray technologies permit scientists to analyze expression of thousands of genes simultaneously in diverse biological systems. In an effort to provide integrated resources for application of microarray technologies to studies of skeletal muscle growth and development in swine, we have constructed a normalized cDNA library from porcine skeletal muscle. The effectiveness of normalization was evaluated by DNA sequencing of clones randomly picked from the library before and after normalization, and also by Southern blot hybridization using probes representing abundant transcripts. Our data suggests that the normalization procedure successfully reduced the highly abundant cDNA species in the normalized library. To date, a total of 782 EST (expressed sequence tag) sequences have been generated from this normalized library (687 ESTs) and the original library (95 ESTs). The sequence information of these ESTs plus their BLAST results has been made available through a web accessible database (http://nbfgc.msu.edu). Cluster analysis of the data indicates that a total of 742 unique sequences are present in this collection. BLASTN search of the 742 EST sequences against the public database (dbEST) revealed that 139 had no significant matches (E-value > 10(-15)) to porcine ESTs already entered in the database, suggesting the possibility of their specific expression in porcine skeletal muscle. Generation of non-redundant ESTs from this library will allow us to construct cDNA microarrays for identification of gene expression changes that regulate muscle growth and affect meat quality in swine.
Zeng, Lingyao; Sun, Jiehuan; Li, Wei; Sun, Han; He, Ying; Li, Jing; Zhang, Guoqing; Wang, Chuan; Li, Yixue; Xie, Lu
Elucidation of the mechanisms of stem cell differentiation is of great scientific interest. Increasing evidence suggests that stem cell differentiation involves changes at multiple levels of biological regulation, which together orchestrate the complex differentiation process; many related studies have been performed to investigate the various levels of regulation. The resulting valuable data, however, remain scattered. Most of the current stem cell-relevant databases focus on a single level of regulation (mRNA expression) from limited stem cell types; thus, a unifying resource would be of great value to compile the multiple levels of research data available. Here we present a database for this purpose, SyStemCell, deposited with multi-level experimental data from stem cell research. The database currently covers seven levels of stem cell differentiation-associated regulatory mechanisms, including DNA CpG 5-hydroxymethylcytosine/methylation, histone modification, transcript products, microRNA-based regulation, protein products, phosphorylation proteins and transcription factor regulation, all of which have been curated from 285 peer-reviewed publications selected from PubMed. The database contains 43,434 genes, recorded as 942,221 gene entries, for four organisms (Homo sapiens, Mus musculus, Rattus norvegicus, and Macaca mulatta) and various stem cell sources (e.g., embryonic stem cells, neural stem cells and induced pluripotent stem cells). Data in SyStemCell can be queried by Entrez gene ID, symbol, alias, or browsed by specific stem cell type at each level of genetic regulation. An online analysis tool is integrated to assist researchers to mine potential relationships among different regulations, and the potential usage of the database is demonstrated by three case studies. SyStemCell is the first database to bridge multi-level experimental information of stem cell studies, which can become an important reference resource for stem cell researchers. The database
Callac, Christopher; Lunsford, Michelle
The NASA Records Database, comprising a Web-based application program and a database, is used to administer an archive of paper records at Stennis Space Center. The system begins with an electronic form, into which a user enters information about records that the user is sending to the archive. The form is smart : it provides instructions for entering information correctly and prompts the user to enter all required information. Once complete, the form is digitally signed and submitted to the database. The system determines which storage locations are not in use, assigns the user s boxes of records to some of them, and enters these assignments in the database. Thereafter, the software tracks the boxes and can be used to locate them. By use of search capabilities of the software, specific records can be sought by box storage locations, accession numbers, record dates, submitting organizations, or details of the records themselves. Boxes can be marked with such statuses as checked out, lost, transferred, and destroyed. The system can generate reports showing boxes awaiting destruction or transfer. When boxes are transferred to the National Archives and Records Administration (NARA), the system can automatically fill out NARA records-transfer forms. Currently, several other NASA Centers are considering deploying the NASA Records Database to help automate their records archives.
The purpose of the Air Mobility Command (AMC) Deployment Analysis System (ADANS) Database Specification (DS) is to describe the database organization and storage allocation and to provide the detailed data model of the physical design and information necessary for the construction of the parts of the database (e.g., tables, indexes, rules, defaults). The DS includes entity relationship diagrams, table and field definitions, reports on other database objects, and a description of the ADANS data dictionary. ADANS is the automated system used by Headquarters AMC and the Tanker Airlift Control Center (TACC) for airlift planning and scheduling of peacetime and contingency operations as well as for deliberate planning. ADANS also supports planning and scheduling of Air Refueling Events by the TACC and the unit-level tanker schedulers. ADANS receives input in the form of movement requirements and air refueling requests. It provides a suite of tools for planners to manipulate these requirements/requests against mobility assets and to develop, analyze, and distribute schedules. Analysis tools are provided for assessing the products of the scheduling subsystems, and editing capabilities support the refinement of schedules. A reporting capability provides formatted screen, print, and/or file outputs of various standard reports. An interface subsystem handles message traffic to and from external systems. The database is an integral part of the functionality summarized above.
Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.
With three missions outstanding, the Shuttle Hypervelocity Impact Database has nearly 3000 entries. The data is divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. Details and insights on the contents of the database including examples of descriptive statistics will be provided. Post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will also be discussed. Potential enhancements to the database structure and availability of the data for other researchers will be addressed in the Future Work section. A related database of returned surfaces from the International Space Station will also be introduced.
Hyde, James I.; Christiansen, Eric I.; Lear, Dana M.
With three flights remaining on the manifest, the shuttle impact hypervelocity database has over 2800 entries. The data is currently divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. The paper will provide details and insights on the contents of the database including examples of descriptive statistics using the impact data. A discussion of post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will be presented. Future work to be discussed will be possible enhancements to the database structure and availability of the data for other researchers. A related database of ISS returned surfaces that are under development will also be introduced.
Sundaramurthi, Jagadish Chandrabose; Ramanandan, Prabhakaran; Brindha, Sridharan; Subhasree, Chelladurai Ramarathnam; Prasad, Abhimanyu; Kumaraswami, Vasanthapuram; Hanna, Luke Elizabeth
Emergence of drug resistance is a major threat to public health. Many pathogens have developed resistance to most of the existing antibiotics, and multidrug-resistant and extensively drug resistant strains are extremely difficult to treat. This has resulted in an urgent need for novel drugs. We describe a database called ‘Database of Drug Targets for Resistant Pathogens’ (DDTRP). The database contains information on drugs with reported resistance, their respective targets, metabolic pathways involving these targets, and a list of potential alternate targets for seven pathogens. The database can be accessed freely at http://bmi.icmr.org.in/DDTRP. PMID:21938213
Scott, D J; Manos, S; Coveney, P V; Rossiny, J C H; Fearn, S; Kilner, J A; Pullar, R C; Alford, N Mc N; Axelsson, A-K; Zhang, Y; Chen, L; Yang, S; Evans, J R G; Sebastian, M T
We present work on the creation of a ceramic materials database which contains data gleaned from literature data sets as well as new data obtained from combinatorial experiments on the London University Search Instrument. At the time of this writing, the database contains data related to two main groups of materials, mainly in the perovskite family. Permittivity measurements of electroceramic materials are the first area of interest, while ion diffusion measurements of oxygen ion conductors are the second. The nature of the database design does not restrict the type of measurements which can be stored; as the available data increase, the database may become a generic, publicly available ceramic materials resource.
A few years ago, the Department of Applied Science perceived a need to automate activities related to publications by using some computer based system. Among the objectives were that: (1) it should be easy for a secretary or someone without extensive computer skills to use the system; (2) it should run on PCs (at that time DOS based), Macintosh, and Unix systems, so that different groups or individual investigators could use it on their platform of choice; (3) it should be flexible enough to track evolving views of what information was needed; (4) it should be able to generate output in different formats for different purposes; (5) the information should be able to be selected from and sorted by a wide variety of keys; and (6) individual items should be able to be updated with new information or deleted. This document gives an over view of the PUBLIST database for handling bibliographic data.
This catalog provides information about the many reports and materials made available by the US Department of Energy`s (DOE`s) Global Change Research Program (GCRP) and the Carbon Dioxide Information Analysis Center (CDIAC). The catalog is divided into nine sections plus the author and title indexes: Section A--US Department of Energy Global Change Research Program Research Plans and Summaries; Section B--US Department of Energy Global Change Research Program Technical Reports; Section C--US Department of Energy Atmospheric Radiation Measurement (ARM) Program Reports; Section D--Other US Department of Energy Reports; Section E--CDIAC Reports; Section F--CDIAC Numeric Data and Computer Model Distribution; Section G--Other Databases Distributed by CDIAC; Section H--US Department of Agriculture Reports on Response of Vegetation to Carbon Dioxide; and Section I--Other Publications.
The Sandia Wind Turbine Loads Database is divided into six files, each corresponding to approximately 16 years of simulation. The files are text files with data in columnar format. The 424MB zipped file containing six data files can be downloaded by the public. The files simulate 10-minute maximum loads for the NREL 5MW wind turbine. The details of the loads simulations can be found in the paper: “Decades of Wind Turbine Loads Simulations”, M. Barone, J. Paquette, B. Resor, and L. Manuel, AIAA2012-1288 (3.69MB PDF). Note that the site-average wind speed is 10 m/s (class I-B), not the 8.5 m/s reported in the paper.
Selkov, E., Jr.; Grechkin, Y.; Mikhailova, N.; Selkov, E.; Mathematics and Computer Science; Russian Academy of Sciences
The Metabolic Pathways Database (MPW) (www.biobase.com/emphome.html/homepage. html.pags/pathways.html) a derivative of EMP (www.biobase.com/EMP) plays a fundamental role in the technology of metabolic reconstructions from sequenced genomes under the PUMA (www.mcs.anl.gov/home/compbio/PUMA/Production/ ReconstructedMetabolism/reconstruction.html), WIT (www.mcs.anl.gov/home/compbio/WIT/wit.html ) and WIT2 (beauty.isdn.msc.anl.gov/WIT2.pub/CGI/user.cgi) systems. In October 1997, it included some 2800 pathway diagrams covering primary and secondary metabolism, membrane transport, signal transduction pathways, intracellular traffic, translation and transcription. In the current public release of MPW (beauty.isdn.mcs.anl.gov/MPW), the encoding is based on the logical structure of the pathways and is represented by the objects commonly used in electronic circuit design. This facilitates drawing and editing the diagrams and makes possible automation of the basic simulation operations such as deriving stoichiometric matrices, rate laws, and, ultimately, dynamic models of metabolic pathways. Individual pathway diagrams, automatically derived from the original ASCII records, are stored as SGML instances supplemented by relational indices. An auxiliary database of compound names and structures, encoded in the SMILES format, is maintained to unambiguously connect the pathways to the chemical structures of their intermediates.
Clough, Emily; Barrett, Tanya
The Gene Expression Omnibus (GEO) database is an international public repository that archives and freely distributes high-throughput gene expression and other functional genomics data sets. Created in 2000 as a worldwide resource for gene expression studies, GEO has evolved with rapidly changing technologies and now accepts high-throughput data for many other data applications, including those that examine genome methylation, chromatin structure, and genome-protein interactions. GEO supports community-derived reporting standards that specify provision of several critical study elements including raw data, processed data, and descriptive metadata. The database not only provides access to data for tens of thousands of studies, but also offers various Web-based tools and strategies that enable users to locate data relevant to their specific interests, as well as to visualize and analyze the data. This chapter includes detailed descriptions of methods to query and download GEO data and use the analysis and visualization tools. The GEO homepage is at http://www.ncbi.nlm.nih.gov/geo/.
Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi
Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745
Roussopoulos, Nick; Sellis, Timoleon
The objective is to illustrate the concept of incremental access to distributed databases. An experimental database management system, ADMS, which has been developed at the University of Maryland, in College Park, uses VIEWCACHE, a database access method based on incremental search. VIEWCACHE is a pointer-based access method that provides a uniform interface for accessing distributed databases and catalogues. The compactness of the pointer structures formed during database browsing and the incremental access method allow the user to search and do inter-database cross-referencing with no actual data movement between database sites. Once the search is complete, the set of collected pointers pointing to the desired data are dereferenced.
Hulo, Nicolas; Bairoch, Amos; Bulliard, Virginie; Cerutti, Lorenzo; De Castro, Edouard; Langendijk-Genevaux, Petra S.; Pagni, Marco; Sigrist, Christian J. A.
The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains 1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at . PMID:16381852
Pangalos, G J
Users of medical information systems need confidence in the security of the system they are using. They also need a method to evaluate and compare its security capabilities. Every system has its own requirements for maintaining confidentiality, integrity and availability. In order to meet these requirements a number of security functions must be specified covering areas such as access control, auditing, error recovery, etc. Appropriate confidence in these functions is also required. The 'trust' in trusted computer systems rests on their ability to prove that their secure mechanisms work as advertised and cannot be disabled or diverted. The general framework and requirements for medical database security and a number of parameters of the evaluation problem are presented and discussed. The problem of database security evaluation is then discussed, and a number of specific proposals are presented, based on a number of existing medical database security systems.
Bult, Carol J.; Eppig, Janan T.; Blake, Judith A.; Kadin, James A.; Richardson, Joel E.
The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data. PMID:26578600
Bult, Carol J; Eppig, Janan T; Blake, Judith A; Kadin, James A; Richardson, Joel E
The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the primary community model organism database for the laboratory mouse and serves as the source for key biological reference data related to mouse genes, gene functions, phenotypes and disease models with a strong emphasis on the relationship of these data to human biology and disease. As the cost of genome-scale sequencing continues to decrease and new technologies for genome editing become widely adopted, the laboratory mouse is more important than ever as a model system for understanding the biological significance of human genetic variation and for advancing the basic research needed to support the emergence of genome-guided precision medicine. Recent enhancements to MGD include new graphical summaries of biological annotations for mouse genes, support for mobile access to the database, tools to support the annotation and analysis of sets of genes, and expanded support for comparative biology through the expansion of homology data.
Leão, B. de F.; Pavan, A.
Medical Databases deal with dynamic, heterogeneous and fuzzy data. The modeling of such complex domain demands powerful semantic data modeling methodologies. This paper describes GSM-Explorer a Case Tool that allows for the creation of relational databases using semantic data modeling techniques. GSM Explorer fully incorporates the Generic Semantic Data Model-GSM enabling knowledge engineers to model the application domain with the abstraction mechanisms of generalization/specialization, association and aggregation. The tool generates a structure that implements persistent database-objects through the automatic generation of customized SQL ANSI scripts that sustain the semantics defined in the higher lever. This paper emphasizes the system architecture and the mapping of the semantic model into relational tables. The present status of the project and its further developments are discussed in the Conclusions. PMID:8563288
Kelley, Wayne P.
Discusses the trend toward the transfer of federal government information from the public domain to the private sector. Topics include free access, privatization, information-policy revision, accountability, copyright issues, costs, pricing, and market needs versus public needs. (LRW)
van der Kamp, Marc W.; Schaeffer, Richard D.; Jonsson, Amanda L.; Scouras, Alexander D.; Simms, Andrew; Toofanny, Rudesh D.; Benson, Noah C.; Anderson, Peter C.; Merkley, Eric D.; Rysavy, Steve; Bromley, Denny; Beck, David A. C.; Daggett, Valerie
Summary The dynamic behavior of proteins is important for an understanding of their function and folding. We have performed molecular dynamics simulations of the native state and unfolding pathways of over 1000 proteins, representing the majority of folds in globular proteins. These data are stored and organized using an innovative database approach, which can be mined to obtain both general and specific information about the dynamics and folding/unfolding of proteins, relevant subsets thereof, and individual proteins. Here we describe the project in general terms and the type of information contained in the database. Then we provide examples of mining the database for information relevant to protein folding, structure building, the effect of single-nucleotide polymorphisms, and drug design. The native state simulation data and corresponding analyses for the 100 most populated metafolds, together with related resources, are publicly accessible through www.dynameomics.org. PMID:20399180
Baxevanis, Andreas D
One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
Baxevanis, Andreas D
One of the most widely-used interfaces for the retrieval of information from biological databases is the NCBI Entrez system. Entrez capitalizes on the fact that there are pre-existing, logical relationships between the individual entries found in numerous public databases. The existence of such natural connections, mostly biological in nature, argued for the development of a method through which all the information about a particular biological entity could be found without having to sequentially visit and query disparate databases. Two Basic Protocols describe simple, text-based searches, illustrating the types of information that can be retrieved through the Entrez system. An Alternate Protocol builds upon the first Basic Protocol, using additional, built-in features of the Entrez system, and providing alternative ways to issue the initial query. The Support Protocol reviews how to save frequently-issued queries. Finally, Cn3D, a structure visualization tool, is also discussed.
Mertzimekis, T. J.; Stamou, K.; Psaltis, A.
Measurements of nuclear magnetic dipole and electric quadrupole moments are considered quite important for the understanding of nuclear structure both near and far from the valley of stability. The recent advent of radioactive beams has resulted in a plethora of new, continuously flowing, experimental data on nuclear structure - including nuclear moments - which hinders the information management. A new, dedicated, public and user friendly online database (http://magneticmoments.info) has been created comprising experimental data of nuclear electromagnetic moments. The present database supersedes existing printed compilations, including also non-evaluated series of data and relevant meta-data, while putting strong emphasis on bimonthly updates. The scope, features and extensions of the database are reported.
Grimm, E. C.; Ashworth, A. C.; Barnosky, A. D.; Betancourt, J. L.; Bills, B.; Booth, R.; Blois, J.; Charles, D. F.; Graham, R. W.; Goring, S. J.; Hausmann, S.; Smith, A. J.; Williams, J. W.; Buckland, P.
The Neotoma Paleoecology Database (www.neotomadb.org) is a multiproxy, open-access, relational database that includes fossil data for the past 5 million years (the late Neogene and Quaternary Periods). Modern distributional data for various organisms are also being made available for calibration and paleoecological analyses. The project is a collaborative effort among individuals from more than 20 institutions worldwide, including domain scientists representing a spectrum of Pliocene-Quaternary fossil data types, as well as experts in information technology. Working groups are active for diatoms, insects, ostracodes, pollen and plant macroscopic remains, testate amoebae, rodent middens, vertebrates, age models, geochemistry and taphonomy. Groups are also active in developing online tools for data analyses and for developing modules for teaching at different levels. A key design concept of NeotomaDB is that stewards for various data types are able to remotely upload and manage data. Cooperatives for different kinds of paleo data, or from different regions, can appoint their own stewards. Over the past year, much progress has been made on development of the steward software-interface that will enable this capability. The steward interface uses web services that provide access to the database. More generally, these web services enable remote programmatic access to the database, which both desktop and web applications can use and which provide real-time access to the most current data. Use of these services can alleviate the need to download the entire database, which can be out-of-date as soon as new data are entered. In general, the Neotoma web services deliver data either from an entire table or from the results of a view. Upon request, new web services can be quickly generated. Future developments will likely expand the spatial and temporal dimensions of the database. NeotomaDB is open to receiving new datasets and stewards from the global Quaternary community
In 1981 Wayne Erickson founded Microrim, Inc, a company originally focused on marketing a microcomputer version of RIM (Relational Information Manager). Dennis Comfort joined the firm and is now vice president, development. The team developed an advanced spinoff from the NASA system they had originally created, a microcomputer database management system known as R:BASE 4000. Microrim added many enhancements and developed a series of R:BASE products for various environments. R:BASE is now the second largest selling line of microcomputer database management software in the world.
THIS DATA ASSET NO LONGER ACTIVE: This is metadata documentation for the Region 7 Drycleaner Database (R7DryClnDB) which tracks all Region7 drycleaners who notify Region 7 subject to Maximum Achievable Control Technologiy (MACT) standards. The Air and Waste Management Division is the primary managing entity for this database. This work falls under objectives for EPA's 2003-2008 Strategic Plan (Goal 4) for Healthy Communities & Ecosystems, which are to reduce chemical and/or pesticide risks at facilities.