DOE Office of Scientific and Technical Information (OSTI.GOV)
Gold, Lois Swirsky; Manley, Neela B.; Slone, Thomas H.
2005-04-08
The Carcinogenic Potency Database (CPDB) is a systematic and unifying resource that standardizes the results of chronic, long-term animal cancer tests which have been conducted since the 1950s. The analyses include sufficient information on each experiment to permit research into many areas of carcinogenesis. Both qualitative and quantitative information is reported on positive and negative experiments that meet a set of inclusion criteria. A measure of carcinogenic potency, TD50 (daily dose rate in mg/kg body weight/day to induce tumors in half of test animals that would have remained tumor-free at zero dose), is estimated for each tissue-tumor combination reported. Thismore » article is the ninth publication of a chronological plot of the CPDB; it presents results on 560 experiments of 188 chemicals in mice, rats, and hamsters from 185 publications in the general literature updated through 1997, and from 15 Reports of the National Toxicology Program in 1997-1998. The test agents cover a wide variety of uses and chemical classes. The CPDB Web Site(http://potency.berkeley.edu/) presents the combined database of all published plots in a variety of formats as well as summary tables by chemical and by target organ, supplemental materials on dosing and survival, a detailed guide to using the plot formats, and documentation of methods and publications. The overall CPDB, including the results in this article, presents easily accessible results of 6153 experiments on 1485 chemicals from 1426 papers and 429 NCI/NTP (National Cancer Institute/National Toxicology program) Technical Reports. A tab-separated format of the full CPDB for reading the data into spreadsheets or database applications is available on the Web Site.« less
Gold, L S; Manley, N B; Slone, T H; Garfinkel, G B; Ames, B N; Rohrbach, L; Stern, B R; Chow, K
1995-01-01
This paper presents two types of information from the Carcinogenic Potency Database (CPDB): (a) the sixth chronological plot of analyses of long-term carcinogenesis bioassays, and (b) an index to chemicals in all six plots, including a summary compendium of positivity and potency for each chemical (Appendix 14). The five earlier plots of the CPDB have appeared in this journal, beginning in 1984 (1-5). Including the plot in this paper, the CPDB reports results of 5002 experiments on 1230 chemicals. This paper includes bioassay results published in the general literature between January 1989 and December 1990, and in Technical Reports of the National Toxicology Program between January 1990 and June 1993. Analyses are included on 17 chemicals tested in nonhuman primates by the Laboratory of Chemical Pharmacology, National Cancer Institute. This plot presents results of 531 long-term, chronic experiments of 182 test compounds and includes the same information about each experiment in the same plot format as the earlier papers: the species and strain of test animal, the route and duration of compound administration, dose level and other aspects of experimental protocol, histopathology and tumor incidence, TD50 (carcinogenic potency) and its statistical significance, dose response, author's opinion about carcinogenicity, and literature citation. We refer the reader to the 1984 publications (1,6,7) for a detailed guide to the plot of the database, a complete description of the numerical index of carcinogenic potency, and a discussion of the sources of data, the rationale for the inclusion of particular experiments and particular target sites, and the conventions adopted in summarizing the literature. The six plots of the CPDB are to be used together since results of individual experiments that were published earlier are not repeated. Appendix 14 is designed to facilitate access to results on all chemicals. References to the published papers that are the source of experimental data are reported in each of the published plots. For readers using the CPDB extensively, a combined plot is available of all results from the six separate plot papers, ordered alphabetically by chemical; the combined plot in printed form or on computer tape or diskette is available from the first author. A SAS database is also available. PMID:8741772
Gold, L S; Manley, N B; Slone, T H; Rohrbach, L
1999-01-01
The Carcinogenic Potency Database (CPDB) is a systematic and unifying analysis of results of chronic, long-term cancer tests. This paper presents a supplemental plot of the CPDB, including 513 experiments on 157 test compounds published in the general literature in 1993 and 1994 and in Technical Reports of the National Toxicology Program in 1995 and 1996. The plot standardizes the experimental results (whether positive or negative for carcinogenicity), including qualitative data on strain, sex, route of compound administration, target organ, histopathology, and author's opinion and reference to the published paper, as well as quantitative data on carcinogenic potency, statistical significance, tumor incidence, dose-response curve shape, length of experiment, duration of dosing, and dose rate. A numerical description of carcinogenic potency, the TD(subscript)50(/subscript), is estimated for each set of tumor incidence data reported. When added to the data published earlier, the CPDB now includes results of 5,620 experiments on 1,372 chemicals that have been reported in 1,250 published papers and 414 National Cancer Institute/National Toxicology Program Technical Reports. The plot presented here includes detailed analyses of 25 chemicals tested in monkeys for up to 32 years by the National Cancer Institute. Half the rodent carcinogens that were tested in monkeys were not carcinogenic, despite usually strong evidence of carcinogenicity in rodents and/or humans. Our analysis of possible explanatory factors indicates that this result is due in part to the fact that the monkey studies lacked power to detect an effect compared to standard rodent bioassays. Factors that contributed to the lack of power are the small number of animals on test; a stop-exposure protocol for model rodent carcinogens; in a few cases, toxic doses that resulted in stoppage of dosing or termination of the experiment; and in a few cases, low doses administered to monkeys or early termination of the experiment even though the doses were not toxic. Among chemicals carcinogenic in both monkeys and rodents, there is some support for target site concordance, but it is primarily restricted to liver tumors. Potency values are highly correlated between rodents and monkeys. The plot in this paper can be used in conjunction with the earlier results published in the CRC Handbook of Carcinogenic Potency and Genotoxicity Databases [Gold LS, Zeiger E, eds. Boca Raton FL:CRC Press, 1997] and with our web site (http://potency.berkeley.edu), which includes a guide to the plot of the database, a complete description of the numerical index of carcinogenic potency (TD50), and a discussion of the sources of data, the rationale for the inclusion of particular experiments and particular target sites, and the conventions adopted in summarizing the literature. Two summary tables permit easy access to the literature of animal cancer tests by target organ and by chemical. For readers using the CPDB extensively, a combined plot on diskette or other format is available from the first author. It includes all results published earlier and in this paper, ordered alphabetically by chemical. A SAS database is also available. PMID:10421768
Physiology and pathogenicity of cpdB deleted mutant of avian pathogenic Escherichia coli.
Liu, Huifang; Chen, Liping; Si, Wei; Wang, Chunlai; Zhu, Fangna; Li, Guangxing; Liu, Siguo
2017-04-01
Avian colibacillosis is one of the most common infectious diseases caused partially or entirely by avian pathogenic Escherichia coli (APEC) in birds. In addition to spontaneous infection, APEC can also cause secondary infections that result in greater severity of illness and greater losses to the poultry industry. In order to assess the role of 2', 3'-cyclic phosphodiesterase (cpdB) in APEC on disease physiology and pathogenicity, an avian pathogenic Escherichia coli-34 (APEC-34) cpdB mutant was obtained using the Red system. The cpdB mutant grew at a slower rate than the natural strain APEC-34. Scanning electron microscopy (SEM) indicated that the bacteria of the cpdB mutant were significantly longer than the bacteria observed in the natural strain (P<0.01), and that the width of the cpdB mutant was significantly smaller than its natural counterpart (P<0.01). In order to evaluate the role of cpdB in APEC in the colonization of internal organs (lung, liver and spleen) in poultry, seven-day-old SPF chicks were infected with 10 9 CFU/chick of the cpdB mutant or the natural strain. No colonizations of cpdB mutants were observed in the internal organs 10days following the infection, though numerous natural strains were observed at 20days following infection. Additionally, the relative expression of division protein ftsZ, outer membrane protein A ompA, ferric uptake regulator fur and tryptophanase tnaA genes in the mutant strain were all significantly lower than in the natural strain (P<0.05 or P<0.01). These results suggested that cpdB is involved in the long-term colonization of APEC in the internal organs of the test subjects. The deletion of the cpdB gene also significantly affected the APEC growth and morphology. Copyright © 2016. Published by Elsevier Ltd.
Carcinogenicity and Mutagenicity Data: New Initiatives to ...
Currents models for prediction of chemical carcinogenicity and mutagenicity rely upon a relatively small number of publicly available data resources, where the data being modeled are highly summarized and aggregated representations of the actual experimental results. A number of new initiatives are underway to improve access to existing public carcinogenicity and mutagenicity data for use in modeling, as well as to encourage new approaches to the use of data in modeling. Rodent bioassay results from the NIEHS National Toxicology Program (NTP) and the Berkeley Carcinogenic Potency Database (CPDB) have provided the largest public data resources for building carcinogenicity prediction models to date. However, relatively few and limited representations of these data have actually informed existing models. Initiatives, such as EPA's DSSTox Database Network, offer elaborated and quality reviewed presentations of the CPDB and expanded data linkages and coverage of chemical space for carcinogenicity and mutagenicity. In particular the latest published DSSTox CPDBAS structure-data file includes a number of species-specific and summary activity fields, including a species-specific normalized score for carcinogenic potency (TD50) and various weighted summary activities. These data are being incorporated into PubChem to provide broad
Gold, L S; Manley, N B; Slone, T H; Garfinkel, G B; Rohrbach, L; Ames, B N
1993-01-01
This paper is the fifth plot of the Carcinogenic Potency Database (CPDB) that first appeared in this journal in 1984 (1-5). We report here results of carcinogenesis bioassays published in the general literature between January 1987 and December 1988, and in technical reports of the National Toxicology Program between July 1987 and December 1989. This supplement includes results of 412 long-term, chronic experiments of 147 test compounds and reports the same information about each experiment in the same plot format as the earlier papers: the species and strain of test animal, the route and duration of compound administration, dose level and other aspects of experimental protocol, histopathology and tumor incidence, TD50 (carcinogenic potency) and its statistical significance, dose response, author's opinion about carcinogenicity, and literature citation. We refer the reader to the 1984 publications (1,5,6) for a guide to the plot of the database, a complete description of the numerical index of carcinogenic potency, and a discussion of the sources of data, the rationale for the inclusion of particular experiments and particular target sites, and the conventions adopted in summarizing the literature. The five plots of the database are to be used together, as results of individual experiments that were published earlier are not repeated. In all, the five plots include results of 4487 experiments on 1136 chemicals. Several analyses based on the CPDB that were published earlier are described briefly, and updated results based on all five plots are given for the following earlier analyses: the most potent TD50 value by species, reproducibility of bioassay results, positivity rates, and prediction between species.(ABSTRACT TRUNCATED AT 250 WORDS) PMID:8354183
Target organs in chronic bioassays of 533 chemical carcinogens
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gold, L.S.; Slone, T.H.; Manley, N.B.
1991-06-01
A compendium of carcinogenesis bioassay results organized by target organ is presented for 533 chemicals that are carcinogenic in at least one species. This compendium is based primarily on experiments in rats or mice; results in hamsters, nonhuman primates, and dogs are also reported. The compendium can be used to identify chemicals that induce tumors at particular sites, and to determine whether target sites are the same for chemicals positive in more than one species. The Carcinogenic Potency Database (CPDB), which includes results of 3969 experiments, is used in the analysis. The published CPDB includes details on each test, andmore » literature references. Chemical carcinogens are reported for 35 different target organs in rats or mice. More than 80% of the carcinogens in each of these species are positive in at least one of the 8 most frequent target sites; liver, lung, mammary gland, stomach, vascular system, kidney, hematopoietic system, and urinary bladder. An analysis is presented of how well one can predict the carcinogenic response in mice from results in rats, or vice versa. Among chemicals tested in both species, 76% of rat carcinogens are positive in mice, and 71% of mouse carcinogens are positive in rats. Prediction is less accurate to the same target site: 52% of rat carcinogens are positive in the same site in mice, and 48% of mouse carcinogens are positive in the same site in rats. The liver is the most frequent site in common between rats and mice.« less
Cupriavidus pampae sp. nov., a novel herbicide-degrading bacterium isolated from agricultural soil.
Cuadrado, Virginia; Gomila, Margarita; Merini, Luciano; Giulietti, Ana M; Moore, Edward R B
2010-11-01
A bacterial consortium able to degrade the herbicide 4-(2,4-dichlorophenoxy) butyric acid (2,4-DB) was obtained from an agricultural soil of the Argentinean Humid Pampa region which has a history of long-term herbicide use. Four bacterial strains were isolated from the consortium and identified as members of the genera Cupriavidus, Labrys and Pseudomonas. A polyphasic systematic analysis was carried out on strain CPDB6(T), the member of the 2,4-DB-degrading consortium able to degrade 2,4-DB as a sole carbon and energy source. The Gram-negative, rod-shaped, motile, non-sporulating, non-fermenting bacterium was shown to belong to the genus Cupriavidus on the basis of 16S rRNA gene sequence analyses. Strain CPDB6(T) did not reduce nitrate, which differentiated it from the type species of the genus, Cupriavidus necator; it did not grow in 0.5-4.5 % NaCl, although most species of Cupriavidus are able to grow at NaCl concentrations as high as 1.5 %; and it was able to deamidate acetamide, which differentiated it from all other species of Cupriavidus. DNA-DNA hybridization data revealed low levels of genomic DNA similarity (less than 30 %) between strain CPDB6(T) and the type strains of Cupriavidus species with validly published names. The major cellular fatty acids detected were cis-9-hexadecenoic (16 : 1ω7c) and hexadecanoic (16 : 0) acids. On the basis of phenotypic and genotypic characterizations, strain CPDB6(T) was recognized as a representative of a novel species within the genus Cupriavidus. The name Cupriavidus pampae sp. nov. is proposed, with strain CPDB6(T) (=CCUG 55948(T)=CCM-A-29:1289(T)) as the type strain.
Submarine seep of carbon dioxide in Norton Sound, Alaska
Kvenvolden, K.A.; Weliky, K.; Nelson, H.; Des Marais, D.J.
1979-01-01
Earlier workers have described a submarine gas seep in Norton Sound having an unusual mixture of petroleum-like, low-molecular-weight hydrocarbons. Actually, only about 0.04 percent of the seeping gas is hydrocarbons and 98 percent is carbon dioxide. The isotopic compositions of carbon dioxide (??13CPDB = -2.7 per mil) and methane (??13CPDB = -36 per mil, where PDB is the Peedee belemnite standard) indicate that geothermal processes are active here. Copyright ?? 1979 AAAS.
Carbon isotopic composition of individual Precambrian microfossils
NASA Technical Reports Server (NTRS)
House, C. H.; Schopf, J. W.; McKeegan, K. D.; Coath, C. D.; Harrison, T. M.; Stetter, K. O.
2000-01-01
Ion microprobe measurements of carbon isotope ratios were made in 30 specimens representing six fossil genera of microorganisms petrified in stromatolitic chert from the approximately 850 Ma Bitter Springs Formation, Australia, and the approximately 2100 Ma Gunflint Formation, Canada. The delta 13C(PDB) values from individual microfossils of the Bitter Springs Formation ranged from -21.3 +/- 1.7% to -31.9 +/- 1.2% and the delta 13C(PDB) values from microfossils of the Gunflint Formation ranged from -32.4 +/- 0.7% to -45.4 +/- 1.2%. With the exception of two highly 13C-depleted Gunflint microfossils, the results generally yield values consistent with carbon fixation via either the Calvin cycle or the acetyl-CoA pathway. However, the isotopic results are not consistent with the degree of fractionation expected from either the 3-hydroxypropionate cycle or the reductive tricarboxylic acid cycle, suggesting that the microfossils studied did not use either of these pathways for carbon fixation. The morphologies of the microfossils suggest an affinity to the cyanobacteria, and our carbon isotopic data are consistent with this assignment.
Faure, G.; Botoman, G.
1984-01-01
Isotopic compositions of oxygen, carbon and strontium of calcite cleats in coal seams of southern Victoria Land, Antarctica, and Tuscarawas County, Ohio, contain a record of the conditions a the time of their formation. The Antarctic calcites (?? 18O(SMOW) = +9.14 to +11.82%0) were deposited from waters enriched in 16O whose isotopic composition was consistent with that of meteoric precipitation at low temperature and high latitude. The carbon of the calcite cleats (?? 13C(PDB) = -15.6 to -16.9%0) was derived in part from the coal (?? 13C(PDB) = -23.5 to -26.7%0) as carbon dioxide and by oxidation of methane or other hydrocarbon gases. The strontium ( 87Sr 86Sr = 0.71318-0.72392) originated primarily from altered feldspar grains in the sandstones of the Beacon Supergroup. Calcite cleats in the Kittaning No. 6 coal seam of Ohio (?? 18O(SMOW) = +26.04 to +27.79%0) were deposited from waters that had previously exchanged oxygen, possibly with marine carbonate at depth. The carbon (?? 13C(PDB) = 0.9 to +2.4%0) is enriched in 13C even though that cleats were deposited in coal that is highly enriched in 12C and apparently originated from marine carbonates. Strontium in the cleats ( Sr 87 0.71182-0.71260) is not of marine origin but contains varying amounts of radiogenic 87Sr presumably derived from detrital Rb-bearing minerals in the adjacent sedimentary rocks. The results of this study suggest that calcite cleats in coal of southern Victoria Land, Antarctica, were deposited after the start of glaciation in Cenozoic time and that those in Ohio precipitated from formation waters derived from the underlying marine carbonate rocks, probably in the recent geologic past. ?? 1984.
Carbon and its isotopes in mid-oceanic basaltic glasses
Des Marais, D.J.; Moore, J.G.
1984-01-01
Three carbon components are evident in eleven analyzed mid-oceanic basalts: carbon on sample surfaces (resembling adsorbed gases, organic matter, or other non-magmatic carbon species acquired by the glasses subsequent to their eruption), mantle carbon dioxide in vesicles, and mantle carbon dissolved in the glasses. The combustion technique employed recovered only reduced sulfur, all of which appears to be indigenous to the glasses. The dissolved carbon concentration (measured in vesicle-free glass) increases with the eruption depth of the spreading ridge, and is consistent with earlier data which show that magma carbon solubility increases with pressure. The total glass carbon content (dissolved plus vesicular carbon) may be controlled by the depth of the shallowest ridge magma chamber. Carbon isotopic fractionation accompanies magma degassing; vesicle CO2 is about 3.8??? enriched in 13C, relative to dissolved carbon. Despite this fractionation, ??13CPDB values for all spreading ridge glasses lie within the range -5.6 and -7.5, and the ??13CPDB of mantle carbon likely lies between -5 and -7. The carbon abundances and ??13CPDB values of Kilauea East Rift glasses apparently are influenced by the differentiation and movement of magma within that Hawaiian volcano. Using 3He and carbon data for submarine hydrothermal fluids, the present-day mid-oceanic ridge mantle carbon flux is estimated very roughly to be about 1.0 ?? 1013 g C/yr. Such a flux requires 8 Gyr to accumulate the earth's present crustal carbon inventory. ?? 1984.
Integrated web visualizations for protein-protein interaction databases.
Jeanquartier, Fleur; Jean-Quartier, Claire; Holzinger, Andreas
2015-06-16
Understanding living systems is crucial for curing diseases. To achieve this task we have to understand biological networks based on protein-protein interactions. Bioinformatics has come up with a great amount of databases and tools that support analysts in exploring protein-protein interactions on an integrated level for knowledge discovery. They provide predictions and correlations, indicate possibilities for future experimental research and fill the gaps to complete the picture of biochemical processes. There are numerous and huge databases of protein-protein interactions used to gain insights into answering some of the many questions of systems biology. Many computational resources integrate interaction data with additional information on molecular background. However, the vast number of diverse Bioinformatics resources poses an obstacle to the goal of understanding. We present a survey of databases that enable the visual analysis of protein networks. We selected M=10 out of N=53 resources supporting visualization, and we tested against the following set of criteria: interoperability, data integration, quantity of possible interactions, data visualization quality and data coverage. The study reveals differences in usability, visualization features and quality as well as the quantity of interactions. StringDB is the recommended first choice. CPDB presents a comprehensive dataset and IntAct lets the user change the network layout. A comprehensive comparison table is available via web. The supplementary table can be accessed on http://tinyurl.com/PPI-DB-Comparison-2015. Only some web resources featuring graph visualization can be successfully applied to interactive visual analysis of protein-protein interaction. Study results underline the necessity for further enhancements of visualization integration in biochemical analysis tools. Identified challenges are data comprehensiveness, confidence, interactive feature and visualization maturing.
Si, Wei; Wang, Xiumei; Liu, Huifang; Yu, Shenye; Li, Zhaoli; Chen, Liping; Zhang, Wanjiang; Liu, Siguo
2015-01-01
To construct a novel live, attenuated Salmonella vaccine, the lon, cpxR and cpdB genes were deleted from a wild-type Salmonella enterica serovar Enteritidis-6 (SM-6) strain using the phage λ Red homologous recombination system, resulting in SM-△CpxR, SM-△C/Lon and SM-△C/L/CpdB. The growth curves of strain SM-△C/Lon grew more rapidly than the other strains and had OD 600 values higher than the other strains starting at the 4 h time point. The growth curves of strain SM-△C/L/CpdB were relatively flat. The colonization time of SM-△C/L/CpdB is about 8-10 days. Deleting the lon/cpxR/cpdB (SM-6) genes resulted in an approximate 10(3)-fold attenuation in virulence assessed by the analysis of the LD50 of specific pathogen-free (SPF) chicks. This result indicated that the deletion of the lon, cpxR and cpdB genes induced significant virulence attenuation. The protective effects of SM-△C/L/CpdB vaccination in SPF chicks against 5 × 10(9) colony forming units (CFU) of S. Enteritidis were resulted from the induction of an effective immune response. These findings demonstrate the potential of mutant SM-△C/L/CpdB to be used as an effective vaccine. Copyright © 2015 Elsevier Ltd. All rights reserved.
In situ Analysis of North American Diamond: Implications for Diamond Growth Modeling
NASA Astrophysics Data System (ADS)
Schulze, D. J.; Van Rythoven, A. D.; Hauri, E.; Wang, J.
2014-12-01
Diamond crystals from three North American kimberlite occurrences were investigated with cathodoluminescence (CL) and secondary ion mass spectrometry (SIMS) to determine their growth history, carbon isotope composition and nitrogen content. Samples analyzed include sixteen from Lynx (Quebec), twelve from Kelsey Lake (Colorado) and eighteen from A154 South (Diavik mine, Northwest Territories). Growth histories for the samples vary from simple to highly complex based on their CL images and depending on the individual stone. Deformation lamellae are evident in CL images of the Lynx crystals which typically are brownish in color. Two to five points per diamond were analyzed by SIMS for carbon isotope composition (δ13CPDB) and three to seven points for nitrogen content. The results for the A154 South (δ13CPDB = -6.76 to -1.68 ‰) and Kelsey Lake (δ13CPDB = -11.81 to -2.43 ‰) stones (mixed peridotitic and eclogitic suites) are similar to earlier reported values. The Lynx kimberlite stones have anomalously high carbon isotope ratios and range from -3.58 to +1.74 ‰. The Lynx diamond suite is almost entirely peridotitic. The unusually high (i.e. >-5‰) δ13C values of the Lynx diamonds, as well as those from Wawa, Ontario and Renard, Quebec, may indicate an anomalous carbon reservoir for the Superior cratonic mantle relative to other cratons. In addition to the heavier carbon isotope values, the Lynx samples have very low nitrogen contents (<100 ppm). Nitrogen contents for Kelsey Lake and Diavik samples are more typical and range to ~1100 ppm. Comparison of observed core to rim variations in nitrogen content and carbon isotopes with modeled Rayleigh fractionation trends for published diamond growth mechanisms allows for evaluation of carbon speciation and other parent fluid conditions. Observed trends that closely follow modeled data are rare, but appear to suggest diamond growth from carbonate-bearing fluids at Lynx and Diavik, and growth from a methane-bearing fluid at Kelsey Lake. However the majority of crystals appear to have very complex growth histories that are clearly the result of multiple growth and resorption events. Trends observed in most of the samples from this study are chaotic and no consistent patterns are seen.
Chen, Lei; Chu, Chen; Lu, Jing; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong
2015-09-01
Cancer is one of the leading causes of human death. Based on current knowledge, one of the causes of cancer is exposure to toxic chemical compounds, including radioactive compounds, dioxin, and arsenic. The identification of new carcinogenic chemicals may warn us of potential danger and help to identify new ways to prevent cancer. In this study, a computational method was proposed to identify potential carcinogenic chemicals, as well as non-carcinogenic chemicals. According to the current validated carcinogenic and non-carcinogenic chemicals from the CPDB (Carcinogenic Potency Database), the candidate chemicals were searched in a weighted chemical network constructed according to chemical-chemical interactions. Then, the obtained candidate chemicals were further selected by a randomization test and information on chemical interactions and structures. The analyses identified several candidate carcinogenic chemicals, while those candidates identified as non-carcinogenic were supported by a literature search. In addition, several candidate carcinogenic/non-carcinogenic chemicals exhibit structural dissimilarity with validated carcinogenic/non-carcinogenic chemicals.
Advances in Toxico-Cheminformatics: Supporting a New ...
EPA’s National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction through the harnessing of legacy toxicity data, creation of data linkages, and generation of new high-throughput screening (HTS) data. The DSSTox project is working to improve public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate both data-mining and read-across. Both DSSTox Structure-Files and the dedicated on-line DSSTox Structure-Browser are enabling seamless structure-based searching and linkages to and from previously isolated, chemically indexed public toxicity data resources (e.g., NTP, EPA IRIS, CPDB). Most recently, structure-enabled search capabilities have been extended to chemical exposure-related microarray experiments in the public EBI Array Express database, additionally linking this resource to the NIEHS CEBS toxicogenomics database. The public DSSTox chemical and bioassay inventory has been recently integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The DSSTox project is providing cheminformatics support for EPA’s ToxCastTM project, as well as supporting collaborations with the National Toxicology Program (NTP) HTS and the NIH Chemical Genomics Center (NCGC). Phase I of the ToxCastTM project is generating HT
Ordovician reef-hosted Jiaodingshan Mn-Co deposit and Dawashan Mn deposit, Sichuan Province, China
Fan, Delian; Hein, James R.; Ye, Jie
1999-01-01
The Jiaodingshan Mn-Co and Dawashan Mn deposits are located in the approximately 2-m thick Daduhe unit of the Wufengian strata of Late Ordovician (Ashgill) age. Paleogeographic reconstruction places the deposits at the time of their formation in a gulf between Chengdu submarine rise and the Kangdian continent. The Jiaodingshan and Dawashan deposits occur in algal-reef facies, the former in an atoll-like structure and the latter in a pinnacle reef. Ores are mainly composed of rhodochrosite, kutnahorite, hausmannite, braunite, manganosite, and bementite. Dark red, yellowish-pink, brown, green-gray, and black ores are massive, banded, laminated, spheroidal, and cryptalgal (oncolite, stromatolite, algal filaments) boundstones. Blue, green, and red algal fossils show in situ growth positions. Samples of high-grade Jiaodingshan and Dawashan ores assay as much as 66.7% MnO. Jiaodingshan Mn carbonate ores have mean contents of Ba, Co, and Pb somewhat higher than in Dawashan ores. Cobalt is widely distributed and strongly enriched in all rock types as compared to its crustal mean content. Cobalt is correlated with Cu, Ni, and MgO in both deposits and additionally with Ba and Zn in the Dawashan deposit. The δ13C(PDB) values of Mn carbonate ores (-7.8 to -16.3‰) indicate contributions of carbon from both seawater bicarbonate and the bacterial degradation of organic matter, the latter being 33% to 68%, assuming about -24‰ for the δ13C(PDB) of the organic matter. Host limestones derived carbon predominantly from seawater bicarbonate δ1313C(PDB) of +0.2 to -7‰). NW-trending fault zones controlled development of lithofacies, whereas NE-trending fault zones provided pathways for movement of fluids. The source of Co, Ni, and Cu was mainly from weathering of mafic and ultramafic rocks on the Kangdian continent, whereas contemporaneous volcanic eruptions were of secondary importance. The reefs were likely mineralized during early diagenesis under shallow burial. The reefs were highly porous and acted as the locus for metasomatic replacement by Mn that combined with CO2 produced during oxidation of organic matter in the zone of sulfate reduction and seawater bicarbonate. That metasomatic replacement formed the rhodochrosite ores.
McDonough, EmilyKate; Kamp, Heather
2015-01-01
Summary Phosphate is essential for life, being used in many core processes such as signal transduction and synthesis of nucleic acids. The waterborne agent of cholera, V ibrio cholerae, encounters phosphate limitation in both the aquatic environment and human intestinal tract. This bacterium can utilize extracellular DNA (eDNA) as a phosphate source, a phenotype dependent on secreted endo‐ and exonucleases. However, no transporter of nucleotides has been identified in V . cholerae, suggesting that in order for the organism to utilize the DNA as a phosphate source, it must first separate the phosphate and nucleoside groups before transporting phosphate into the cell. In this study, we investigated the factors required for assimilation of phosphate from eDNA. We identified PhoX, and the previously unknown proteins UshA and CpdB as the major phosphatases that allow phosphate acquisition from eDNA and nucleotides. We demonstrated separable but partially overlapping roles for the three phosphatases and showed that the activity of PhoX and CpdB is induced by phosphate limitation. Thus, this study provides mechanistic insight into how V . cholerae can acquire phosphate from extracellular DNA, which is likely to be an important phosphate source in the environment and during infection. PMID:26175126
McDonough, EmilyKate; Kamp, Heather; Camilli, Andrew
2016-02-01
Phosphate is essential for life, being used in many core processes such as signal transduction and synthesis of nucleic acids. The waterborne agent of cholera, Vibrio cholerae, encounters phosphate limitation in both the aquatic environment and human intestinal tract. This bacterium can utilize extracellular DNA (eDNA) as a phosphate source, a phenotype dependent on secreted endo- and exonucleases. However, no transporter of nucleotides has been identified in V. cholerae, suggesting that in order for the organism to utilize the DNA as a phosphate source, it must first separate the phosphate and nucleoside groups before transporting phosphate into the cell. In this study, we investigated the factors required for assimilation of phosphate from eDNA. We identified PhoX, and the previously unknown proteins UshA and CpdB as the major phosphatases that allow phosphate acquisition from eDNA and nucleotides. We demonstrated separable but partially overlapping roles for the three phosphatases and showed that the activity of PhoX and CpdB is induced by phosphate limitation. Thus, this study provides mechanistic insight into how V. cholerae can acquire phosphate from extracellular DNA, which is likely to be an important phosphate source in the environment and during infection. © 2015 The Authors. Molecular Microbiology published by John Wiley & Sons Ltd.
Mya, Khine Y; Lin, Esther M J; Gudipati, Chakravarthy S; Gose, Halima B A S; He, Chaobin
2010-07-22
Poly(hexafluorobutyl methacrylate) (PHFBMA) homopolymer was synthesized by reversible addition-fragmentation chain transfer (RAFT)-mediated living radical polymerization in the presence of cyano-2-propyl dithiobenzoate (CPDB) RAFT agent. A block copolymer of PHFBMA-poly(propylene glycol acrylate) (PHFBMA-b-PPGA) with dangling poly(propylene glycol) (PPG) side chains was then synthesized by using CPDB-terminated PHFBMA as a macro-RAFT agent. The amphiphilic properties and self-assembly of PHFBMA-b-PPGA block copolymer in aqueous solution were investigated by dynamic and static light scattering (DLS and SLS) studies, in combination with fluorescence spectroscopy and transmission electron microscopy (TEM). Although PPG shows moderately hydrophilic character, the formation of nanosize polymeric micelles was confirmed by fluorescence and TEM studies. The low value of the critical aggregation concentration exhibited that the tendency for the formation of copolymer aggregates in aqueous solution was very high due to the strong hydrophobicity of the PHFBMA(145)-b-PPGA(33) block copolymer. The combination of DLS and SLS measurements revealed the existence of micellar aggregates in aqueous solution with an association number of approximately 40 +/- 7 for block copolymer micelles. It was also found in TEM observation that there are 40-50 micelles accumulated into one aggregate and these micelles are loosely packed inside the aggregate.
NASA Astrophysics Data System (ADS)
Sturrock, Colin P.; Catlos, Elizabeth J.; Miller, Nathan R.; Akgun, Aykut; Fall, András; Gabitov, Rinat I.; Yilmaz, Ismail Omer; Larson, Toti; Black, Karen N.
2017-08-01
Six limestone assemblages along the North Anatolian Fault (NAF) Niksar pull-apart basin in northern Turkey were analyzed for δ18OPDB and δ13CPDB using bulk isotope ratio mass spectrometry (IRMS). Matrix-vein differences in δ18OPDB (-2.1 to 6.3‰) and δ13CPDB (-0.9 to 4.6‰) suggest a closed fluid system and rock buffering. Veins in one travertine and two limestone assemblages were further subjected to cathodoluminescence, trace element (Laser Ablation Inductively Coupled Plasma Mass Spectrometry) and δ18OPDB (Secondary Ion Mass Spectrometry, SIMS) analyses. Fluid inclusions in one limestone sample yield Th of 83.8 ± 7.3 °C (±1σ, mean average). SIMS δ18OPDB values across veins show fine-scale variations interpreted as evolving thermal conditions during growth and limited rock buffering seen at a higher-resolution than IRMS. Rare earth element data suggest calcite veins precipitated from seawater, whereas the travertine has a hydrothermal source. The δ18OSMOW-fluid for the mineralizing fluid that reproduces Th is +2‰, in range of Cretaceous brines, as opposed to negative δ18OSMOW-fluid from meteoric, groundwater, and geothermal sites in the region and highly positive δ18OSMOW-fluid expected for mantle-derived fluids. Calcite veins at this location do not record evidence for deeply-sourced metamorphic and magmatic fluids, an observation that differs from what is reported for the NAF elsewhere along strike.
Kvenvolden, K.A.; Hostettler, F.D.; Carlson, P.R.; Rapp, J.B.; Threlkeld, C.N.; Warden, A.
1995-01-01
Although the shorelines of Prince William Sound still bear traces of the 1989 Exxon Valdez oil spill, most of the flattened tar balls that can be found today on these shorelines are not residues of Exxon Valdez oil. Instead, the carbon-isotopic and hydrocarbon-biomarker signatures of 61 tar ball samples, collected from shorelines throughout the northern and western parts of the sound, are all remarkably similar and have characteristics consistent with those of oil products that originated from the Monterey Formation source rocks of California. The carbon-isotopic compositions of the tar balls are all closely grouped (??13CPDB = -23.7 ?? 0.2???), within the range found in crude oils from those rocks, but are distinct from isotopic compositions of 28 samples of residues from the Exxon Valdez oil spill (??13CPDB = -29.4 ?? 0.1???). Likewise, values for selected biomarker ratios in the tar balls are all similar but distinct from values of residues from the 1989 oil spill. Carbon-isotopic and biomarker signatures generally relate the tar balls to oil products used in Alaska before ???1970 for construction and pavements. How these tar balls with such similar geochemical characteristics became so widely dispersed throughout the northern and western parts of the sound is not known with certainty, but the great 1964 Alaska earthquake was undoubtedly an important trigger, causing spills from ruptured storage facilities of California-sourced asphalt and fuel oil into Prince William Sound.
NASA Astrophysics Data System (ADS)
Noël, J.; Godard, M.; Martinez, I.; Oliot, E.; Williams, M. J.; Rodriguez, O.; Chaduteau, C.; Gouze, P.
2017-12-01
Carbon trapping in ophiolitic peridotites contributes to the global carbon cycle between solid Earth and its outer envelopes (through subduction and/or modern alteration). To investigate this process, we performed petro-structural (microtomography, EBSD, EPMA) and geochemical studies (LA-ICP-MS, carbon and oxygen isotopes on bulk and minerals using SHRIMP) of harzburgites cored in the Oman Ophiolite. Studied harzburgites are highly serpentinized (> 90 %) and crosscut by 3 generations of carbonates (> 20 Vol%) with compositions from calcite to dolomite (Mg/Ca = 0-0.85). Type 1 carbonates are fine penetrative veinlets and mesh core after olivine. They have low REE (e.g., Yb = 0.08-0.23 x CI-chondrite) and negative Ce anomalies. They have δ13CPDB = -15.2 to 1.10‰ and δ18OSMOW = 17.5 to 33.7‰, suggesting precipitation temperatures up to 110°C. Type 2 carbonates are pluri-mm veins bounded by cm-thick serpentinized vein selvages, oriented dominantly parallel to mantle foliation. Dynamic recrystallization is observed, indicating polygenetic formation: well crystallized calcite with REE abundances similar to Type 1 carbonates are locally replaced by small dolomite and calcite grains with higher REE (e.g., Yb = 0.35-1.0 x CI-chondrite) and positive Gd anomaly. Type 2 carbonates have δ13CPDB = -12.6 to -4.1‰ and δ18OSMOW = 25.0 to 32.7‰, suggesting precipitation temperatures from 10 to 60°C. Type 3 carbonates are late pluri-mm to cm veins reactivating Type 2 veins. They consist of small grains of dolomite and calcite with REE abundances similar to recrystallized Type 2 carbonates. Type 3 carbonates have δ13CPDB = -8.3 to -5.8‰ and δ18OSMOW = 28.8 to 32.7‰, suggesting precipitation temperatures <35°C. δ13C data indicate an evolution of fluid composition precipitating carbonates from seawater- and sediment-derived fluids to meteoric water. Carbonate formation starts during oceanic lithospheric cooling and occurs as a penetrative process at the expense of olivine (Type 1, at T > 100°C). Formation of carbonate veins (Type 2) indicates localization of fluid flux, while serpentinization remains the dominant alteration process. Low T carbonate veins (Type 3) remain the main flow path through ophiolitic peridotites. Our study suggests that their orientation is controlled by the later stages of oceanic mantle deformation.
EUV lithographic radiation grafting of thermo-responsive hydrogel nanostructures
NASA Astrophysics Data System (ADS)
Farquet, Patrick; Padeste, Celestino; Solak, Harun H.; Gürsel, Selmiye Alkan; Scherer, Günther G.; Wokaun, Alexander
2007-12-01
Nanostructures of the thermoresponsive poly( N-isopropyl acrylamide) (PNIPAAm) and of PNIPAAm-block-poly(acrylic acid) copolymers were produced on poly(tetrafluoroethylene-co-ethyelene) (ETFE) films using extreme ultraviolet (EUV) lithographic exposure with subsequent graft-polymerization. The phase transition of PNIPAAm nanostructures at the low critical solution temperature (LCST) at 32 °C was imaged by atomic force microscopy (AFM) phase contrast measurements in pure water. Results show a higher phase contrast for samples measured below the LCST temperature than for samples above the LCST, proving that the soft PNIPAAm hydrogel transforms into a much more compact conformation above the LCST. EUV lithographic exposures were combined with the reversible addition-fragment chain transfer (RAFT)-mediated polymerization using cyanoisopropyl dithiobenzoate (CPDB) as chain transfer agent to synthesize PNIPAAm block-copolymer nanostructures.
Contrasting fault fluids along high-angle faults: a case study from Southern Apennines (Italy)
NASA Astrophysics Data System (ADS)
Sinisi, Rosa; Petrullo, Angela Vita; Agosta, Fabrizio; Paternoster, Michele; Belviso, Claudia; Grassa, Fausto
2016-10-01
This work focuses on two fault-controlled deposits, the Atella and Rapolla travertines, which are associated with high-angle extensional faults of the Bradano Trough, southern Apennines (Italy). The Atella travertine is along a NW-SE striking, deep-seated extensional fault, already described in literature, which crosscuts both Apulian carbonates and the overlying foredeep basin infill. The Rapolla travertine is on top of a NE-SW striking, shallow-seated fault, here described for the first time, which is interpreted as a tear fault associated with a shallow thrust displacing only the foredeep basin infill. The results of structural, sedimentological, mineralogical, and C and O isotope analyses are here reported and discussed to assess the provenance of mineralizing fluids, and to evaluate the control exerted by the aforementioned extensional faults on deep, mantle-derived and shallow, meteoric fluids. Sedimentological analysis is consistent with five lithofacies in the studied travertines, which likely formed in a typical lacustrine depositional environment. Mineralogical analysis show that travertines mainly consist of calcite, and minor quartz, feldspar and clay minerals, indicative of a terrigenous supply during travertine precipitation. The isotope signature of the two studied travertines shows different provenance for the mineralizing fluids. At the Atella site, the δ13CPDB values range between + 5.2 and + 5.7‰ and the δ18OPDB values between - 9.0 and - 7.3‰, which are consistent with a mantle-derived CO2 component in the fluid. In contrast, at the Rapolla site the δ13CPDB values vary from - 2.7 to + 1.5‰ and the δ18OPDB values from - 6.8 to - 5.4‰, suggesting a mixed CO2 source with both biogenic-derived and mantle-derived fluids. The results of structural analyses conducted along the footwall damage zone of the fault exposed at the Rapolla site, show that the whole damage zone, in which fractures and joints likely channeled the mixed fluids, acted as a distributed conduit for both fault-parallel and cross-fault fluid migration.
Endolithic Boring Enhance the Deep-sea Carbonate Lithification on the Southwest Indian Ridge
NASA Astrophysics Data System (ADS)
Peng, X.; Xu, H.
2017-12-01
Deep-sea carbonates represent an important type of sedimentary rock due to their effect on the composition of upper oceanic crust and their contribution to deep-sea geochemical cycles. However, the lithification of deep-sea carbonates at the seafloor has remained a mystery for many years. A large lithified carbonate area, characterized by thriving benthic faunas and tremendous amount of endolithic borings, was discovered in 2008, blanketed on the seafloor of ultraslow spreading Southwest Indian Ridge (SWIR). Macrofaunal inhabitants including echinoids, polychaetes, gastropods as well as crustaceans, are abundant in the sample. The most readily apparent feature of the sample is the localized enhancement of density around the borings. The boring features of these carbonate rocks and factors that may enhance deep-sea carbonate lithification are reported. The δ13CPDB values of 46 bulk samples are -0.37 to 1.86‰, while these samples have a relatively narrow δ18OPDB range of 1.35 to 3.79‰. The bulk δ13CPDB values of chalk and gray excrements are positively correlated with bulk δ18OPDB values (r = 0.91) (Fig. 8), which reflects that endolithic boring is possibly a critical factor influence the lithification. We suggest that active boring may trigger the dissolution of the original calcite and thus accelerate deep-sea carbonate lithification on mid-ocean ridges. Our study reports an unfamiliar phenomenon of non-burial carbonate lithification and interested by the observation that it is often associated with boring feature. These carbonate rocks may provide a novel mechanism for deep-sea carbonate lithification at the deep-sea seafloor and also illuminate the geological and biological importance of deep-sea carbonate rocks on mid-ocean ridges.
Garvie, Laurence A J; Knauth, L Paul; Bungartz, Frank; Klonowski, Stan; Nash, Thomas H
2008-08-01
Verrucaria rubrocincta Breuss is an endolithic lichen that inhabits caliche plates exposed on the surface of the Sonoran Desert. Caliche surface temperatures are regularly in excess of 60 degrees C during the summer and approach 0 degrees C in the winter. Incident light intensities are high, with photosynthetically active radiation levels typically to 2,600 micromol/m(2) s(-1) during the summer. A cross-section of rock inhabited by V. rubrocincta shows an anatomical zonation comprising an upper micrite layer, a photobiont layer containing clusters of algal cells, and a pseudomedulla embedded in the caliche. Hyphae of the pseudomedulla become less numerous with depth below the rock surface. Stable carbon and oxygen isotopic data for the caliche and micrite fall into two sloping, well-separated arrays on a delta(13)C-delta(18)O plot. The delta(13)C(PDB) of the micrite ranges from 2.1 to 8.1 and delta(18)O(SMOW) from 25.4 to 28.9, whereas delta(13)C(PDB) of the caliche ranges from -4.7 to 0.7 and delta(18)O(SMOW) from 23.7 to 29.2. The isotopic data of the micrite can be explained by preferential fixing of (12)C into the alga, leaving local (13)C enrichment and evaporative enrichment of (18)O in the water. The (14)C dates of the micrite range from recent to 884 years b.p., indicating that "dead" carbon from the caliche is not a significant source for the lichen-precipitated micrite. The endolithic growth is an adaptation to the environmental extremes of exposed rock surfaces in the hot desert. The micrite layer is highly reflective and reduces light intensity to the algae below and acts as an efficient sunscreen that blocks harmful UV radiation. The micrite also acts as a cap to the lichen and helps trap moisture. The lichen survives by the combined effects of biodeterioration and biomineralization. Biodeterioration of the caliche concomitant with biomineralization of a protective surface coating of micrite results in the distinctive anatomy of V. rubrocincta.
NASA Astrophysics Data System (ADS)
Reis, A.; McGlue, M. M.; Waite, L.; Erhardt, A. M.
2017-12-01
Diagenetic processes influenced by changing climate, eustatic fluctuations, and porewater evolution led to the formation and alteration of carbonate layers in the Pennsylvanian Wolfcamp D Formation of the Midland Basin. Preliminary evidence from bulk geochemistry, oxygen and carbon stable isotopes, and petrographic analysis of the carbonates recovered from two drill cores indicate multiple generations of diagenesis. High Mg calcite and dolomite layers predominantly occur in the fine grained intervals of both cores. Whereas there are less carbonate layers in the central basin core, more of the layers underwent diagenesis compared to the carbonates in the southern core. δ13CPDB values ranging from -6‰ to -4‰ and the presence of framboidal pyrite indicate initial dolomite precipitation occurring in the zone of bacterial sulfate reduction. Later stages alteration occurred following the burial diagenesis of clay, releasing Mg2+ and Fe2+ into the pore waters allowing ferroan dolomite rims to precipitate on the precursor iron-poor dolomite rhombs. δ13CPDB and δ18OPDBvalues from altered beds in the southern core show a positive 4-6‰ offset from the central basin beds. Petrographic analysis of the carbonate intervals shows a larger allochem size, and lower pyrite abundance in the southern core. These differences can be associated with a shorter source-to-sink distance and less frequent bottom water anoxia, leading to reduced rates of sulfate reduction. One possibility we will explore is if increased circulation due to the proximity of the southern core to the Sheffield Channel could stabilize the bottom water conditions in this region of the basin. In addition to dolomite precipitation and replacement, scanning electron microscopy reveals the replacement of silica cements by calcite, suggesting an increase in porewater pH during or following sulfate reduction coinciding with pyrite formation. Changing bottom water chemistry tied to fluctuations in sea-level through time led to porewater conditions favorable to several generations of post-depositional diagenesis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001; Gupta, Shikha
Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models wasmore » performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive abilities of the interspecies GRNN model to predict the carcinogenic potency of diverse chemicals. - Highlights: • Global robust models constructed for carcinogenicity prediction of diverse chemicals. • Tanimoto/BDS test revealed structural diversity of chemicals and nonlinearity in data. • PNN/GRNN successfully predicted carcinogenicity/carcinogenic potency of chemicals. • Developed interspecies PNN/GRNN models for carcinogenicity prediction. • Proposed models can be used as tool to predict carcinogenicity of new chemicals.« less
Diploptene: an indicator of terrigenous organic carbon in Washington coastal sediments
NASA Technical Reports Server (NTRS)
Prahl, F. G.; Hayes, J. M.
1992-01-01
The pentacyclic triterpene 17 beta(H),21 beta(H)-hop-22(29)-ene (diploptene) occurs in sediments throughout the Columbia River drainage basin and off the southern coast of Washington state in concentrations comparable to long-chain plantwax n-alkanes. The same relationship is evident for diploptene and long-chain n-alkanes in soils from the Willamette Valley. Microorganisms indigenous to soils and soil erosion are indicated as the biological source and physical process, respectively, for diploptene in coastal sediments. Similarity between the stable carbon isotopic composition (delta 13CPDB) of diploptene isolated from soil in the Willamette Valley (-31.2 +/- 0.3%) and from sediments deposited throughout the Washington coastal environment (-31.2 +/- 0.5%) supports this argument. Values of delta for diploptene in river sediments are variable and 8-17% lighter, indicating that an additional biological source such as methane-oxidizing bacteria makes a significant contribution to the diploptene record in river sediments. Selective biodegradation resulting from a difference in the physicochemical association within eroded particles can explain the absence of the more-13C-depleted form of diploptene in Washington coastal sediments, but this mechanism remains unproven.
Seewald, Jeffrey S.; Seyfried, W.E.; Shanks, Wayne C.
1994-01-01
Organic-rich diatomaceous ooze was reacted with seawater and a Na-Ca-K-Cl fluid of seawater chlorinity at 325-400??C, 400-500 bars, and fluid/sediment mass ratios of 1.56-2.35 to constrain factors regulating the abundance and stable isotope composition of C and S species during hydrothermal alteration of sediment from Guaymas Basin, Gulf of California. Alteration of inorganic and organic sedimentary components resulted in extensive exchange reactions, the release of abundant H2S, CO2, CH4, and Corganic, to solution, and recrystallization of the sediment to an assemblage containing albitic plagioclase, quartz, pyrrhotite, and calcite. The ??34Scdt values of dissolved H2S varied from -10.9 to +4.3??? during seawater-sediment interaction at 325 and 400??C and from -16.5 to -9.0??? during Na-Ca-K-Cl fluid-sediment interaction at 325 and 375??C. In the absence of seawater SO4, H2S is derived from both the transformation of pyrite to pyrrhotite and S released during the degradation of organic matter. In the presence of seawater SO4, reduction of SO4 contributes directly to H2S production. Sedimentary organic matter acts as the reducing agent during pyrite and SO4 reduction. Requisite acidity for the reduction of SO4 is provided by Mg fixation during early-stage sediment alteration and by albite and calcite formation in Mg-free solutions. Organically derived CH4 was characterized by ??13Cpdb values ranging between -20.8 and -23.1???, whereas ??13Cpdb values for dissolved Corganic ranged between -14.8 and -17.7%. Mass balance calculations indicate that ??13C values for organically derived CO2 were ??? - 14.8%. Residual solid sedimentary organic C showed small (??? 0.7???) depletions in 13C relative to the starting sediment. The experimental results are consistent with the isotopic and chemical composition of natural hydrothermal fluids and minerals at Guaymas Basin and permit us to better constrain sources and sinks for C and S species in subseafloor hydrothermal systems at sediment-covered spreading centers. Our data show that the sulfur isotope composition of hydrothermal Sulfide minerals in Guaymas Basin can be explained by derivation of S from diagenetic sulfide and seawater sulfate. Basaltic S may also contribute to hydrothermal sulfide precipitates but is not required to explain their isotopic composition. Estimates of seawater/ sediment mass ratios based on sulfur isotopic composition of sulfide minerals and the abundance of dissolved NH3 in vent fluids range from 3-29 during hydrothermal circulation. Sources of C in Guaymas Basin hydrothermal fluids include thermal degradation of organic matter, bacteriogenic methane production, and dissolution of diagenetic carbonate. ?? 1994.
Song, Min-Ae; Marian, Catalin; Brasky, Theodore M; Reisinger, Sarah; Djordjevic, Mirjana; Shields, Peter G
2016-03-14
Use of smokeless tobacco products (STPs) is associated with oral cavity cancer and other health risks. Comprehensive analysis for chemical composition and toxicity is needed to compare conventional and newer STPs with lower tobacco-specific nitrosamines (TSNAs) yields. Seven conventional and 12 low-TSNA moist snuff products purchased in the U.S., Sweden, and South Africa were analyzed for 18 chemical constituents (International Agency for Research on Cancer classified carcinogens), pH, nicotine, and free nicotine. Chemicals were compared in each product using Wilcoxon rank-sum test and principle component analysis (PCA). Conventional compared to low-TSNA moist snuff products had higher ammonia, benzo[a]pyrene, cadmium, nickel, nicotine, nitrate, and TSNAs and had lower arsenic in dry weight content and per mg nicotine. Lead and chromium were significantly higher in low-TSNA moist snuff products. PCA showed a clear difference for constituents between conventional and low-TSNA moist snuff products. Differences among products were reduced when considered on a per mg nicotine basis. As one way to contextualize differences in constituent levels, probabilistic lifetime cancer risk was estimated for chemicals included in The University of California's carcinogenic potency database (CPDB). Estimated probabilistic cancer risks were 3.77-fold or 3-fold higher in conventional compared to low-TSNA moist snuff products under dry weight or under per mg nicotine content, respectively. In vitro testing for the STPs indicated low level toxicity and no substantial differences. The comprehensive chemical characterization of both conventional and low-TSNA moist snuff products from this study provides a broader assessment of understanding differences in carcinogenic potential of the products. In addition, the high levels and probabilistic cancer risk estimates for certain chemical constituents of smokeless tobacco products will further inform regulatory decision makers and aid them in their efforts to reduce carcinogen exposure in smokeless tobacco products. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Potentially biogenic carbon preserved in a 4.1 billion-year-old zircon.
Bell, Elizabeth A; Boehnke, Patrick; Harrison, T Mark; Mao, Wendy L
2015-11-24
Evidence of life on Earth is manifestly preserved in the rock record. However, the microfossil record only extends to ∼ 3.5 billion years (Ga), the chemofossil record arguably to ∼ 3.8 Ga, and the rock record to 4.0 Ga. Detrital zircons from Jack Hills, Western Australia range in age up to nearly 4.4 Ga. From a population of over 10,000 Jack Hills zircons, we identified one >3.8-Ga zircon that contains primary graphite inclusions. Here, we report carbon isotopic measurements on these inclusions in a concordant, 4.10 ± 0.01-Ga zircon. We interpret these inclusions as primary due to their enclosure in a crack-free host as shown by transmission X-ray microscopy and their crystal habit. Their δ(13)CPDB of -24 ± 5‰ is consistent with a biogenic origin and may be evidence that a terrestrial biosphere had emerged by 4.1 Ga, or ∼ 300 My earlier than has been previously proposed.
Potentially biogenic carbon preserved in a 4.1 billion-year-old zircon
Bell, Elizabeth A.; Harrison, T. Mark; Mao, Wendy L.
2015-01-01
Evidence of life on Earth is manifestly preserved in the rock record. However, the microfossil record only extends to ∼3.5 billion years (Ga), the chemofossil record arguably to ∼3.8 Ga, and the rock record to 4.0 Ga. Detrital zircons from Jack Hills, Western Australia range in age up to nearly 4.4 Ga. From a population of over 10,000 Jack Hills zircons, we identified one >3.8-Ga zircon that contains primary graphite inclusions. Here, we report carbon isotopic measurements on these inclusions in a concordant, 4.10 ± 0.01-Ga zircon. We interpret these inclusions as primary due to their enclosure in a crack-free host as shown by transmission X-ray microscopy and their crystal habit. Their δ13CPDB of −24 ± 5‰ is consistent with a biogenic origin and may be evidence that a terrestrial biosphere had emerged by 4.1 Ga, or ∼300 My earlier than has been previously proposed. PMID:26483481
Growth and development of spring towers at Shiqiang, Yunnan Province, China
NASA Astrophysics Data System (ADS)
Jones, Brian; Peng, Xiaotong
2017-01-01
Throughout the world, high artesian pressures in hydrothermal areas have led to the growth of tall spring towers that have their vents at their summits. The factors that control their development and formative precipitates are poorly understood because these springs, irrespective of location, are mostly inactive. Spring towers found at Shiqiang (Yunnan Province, China), which are up to 4 m high and 3 m in diameter, are formed largely of calcite and aragonite crystal bushes, euhedral calcite crystals and coated grains with alternating Fe-poor and Fe-rich zones, calcite rafts, and cements formed of various combinations of calcite, aragonite, strontianite, Mg-Si reticulate, needle fiber calcite, calcified and non-calcified microbes, diatoms, and insects. Collectively, the limestones that form the towers can be divided into (1) Group A that are friable, porous and form the cores of the towers and have δ18OSMOW values of + 15.7 to + 19.7‰ (average 17.8‰) and δ13CPDB values of + 5.1 to + 6.9‰ (average 5.9‰), and (2) Group B that are hard and well lithified and found largely around the vents and the tower sides, and have δ18OSMOW values of + 13.0 to + 22.0‰ (average 17.6‰) and δ13CPDB values of + 1.4 to + 3.6‰ (average 2.6‰). The precipitates and the isotopic values indicate that these were thermogene springs. Growth of the Shiqiang spring towers involved (1) Phase IA when precipitation of calcite and aragonite bushes formed the core of the tower and Phase IB when calcite, commonly Fe-rich, was precipitated locally, (2) Phase II that involved the precipitation of white cements, formed of calcite, aragonite, strontianite, and Mg-Si reticulate coatings in cavities amid the Phase I precipitates, and (3) Phase III, which formed probably after spring activity ceased, when needle-fiber calcite was precipitated and the mounds were invaded by microbes (some now calcified), diatoms, and insects. At various times during this complex history, pore waters mediated dissolution of the calcite and aragonite and sometimes partial alteration of the aragonite. The diverse array of precipitates, depositional fabrics and diagenetic changes clearly indicate that the composition of the spring water changed frequently. Growth of the spring towers at Shiqiang continued until there was insufficient artesian pressure to lift the water above the top of the tower vent.
NASA Astrophysics Data System (ADS)
Zhu, Shifa; Qin, Yi; Liu, Xin; Wei, Chengjie; Zhu, Xiaomin; Zhang, Wei
2017-04-01
Although dolomitization of calcite minerals and carbonatization of volcanic rocks have been studied widely, the extensive dolomitic rocks that originated from altered volcanic and volcaniclastic rocks have not been reported. The dolomitic rocks of the Fengcheng Formation in the Junggar Basin of China appear to be formed under unusual geologic conditions. The petrological and geochemical characteristics indicate that the dolomitizing host rock is devitrified volcanic tuff. After low-temperature alteration and calcitization, these tuffaceous rocks are replaced by Mg-rich brine to form massive dolomitic tuffs. We propose that the briny (with -2 ‰ 6 ‰ of δ13CPDB and -5 ‰ 4 ‰ of δ18OPDB) and Mg-rich marine formation water (with 0.7060 0.7087 of 87Sr/86Sr ratio), the thick and intermediate-mafic volcanic ashes, and the tectonically compressional movement may have favored the formation of the unusual dolomitic rocks. We conclude that the proposed origin of the dolomitic rocks can be extrapolated to other similar terranes with volcaniclastic rocks, seabed tuffaceous sediment, and fracture filling of sill.
NASA Astrophysics Data System (ADS)
Matson, Ernest A.
1989-01-01
Stable C isotope ratios (δ13C-PDB), percentages of organic matter, and HCl insoluble ash and soluble carbonates, extractable Fe, Al, Si and P were used to determine the distribution and accumulation of terrestrial material in reef-flat moats and lagoons of two high islands (Guam and Saipan) in the western tropical Pacific. Carbonate sediments of a reef-flat moat infiltrated by seepage of aquifer waters (but without surface runoff) were depleted in both P (by 38%) and 13C (by 41%) and enriched in Si (by 100%) relative to offshore lagoon sediments. Iron and ash accumulated in depositional regimes regardless of the occurrence of runoff but was depleted from coarse-grained carbonates in turbulent regimes. Aluminum (>ca. 10 to 20 μmol g-1), Fe (>ca. 1 to 3 μmol g-1) and ash (>0.5%) indicated terrigenous influence which was corroborated by depletions in both 13C and P. Low-salinity geochemical segregation, natural biochemical accumulation, as well as long-shore currents and eddies help sequester these materials nearshore.
Affinities and architecture of Devonian trunks of Prototaxites loganii.
Retallack, G J; Landing, Ed
2014-01-01
Devonian fossil logs of Prototaxites loganii have been considered kelp-like aquatic algae, rolled up carpets of liverworts, enormous saprophytic fungal fruiting bodies or giant lichens. Algae and rolled liverwort models cannot explain the proportions and branching described here of a complete fossil of Prototaxites loganii from the Middle Devonian (386 Ma) Bellvale Sandstone on Schunnemunk Mountain, eastern New York. The "Schunnemunk tree" was 8.83 m long and had six branches, each about 1 m long and 9 cm diam, on the upper 1.2 m of the main axis. The coalified outermost layer of the Schunnemunk trunk and branches have isotopic compositions (δ(13)CPDB) of -25.03 ± 0.13‰ and -26.17 ± 0.69‰, respectively. The outermost part of the trunk has poorly preserved invaginations above cortical nests of coccoid cells embraced by much-branched tubular cells. This histology is unlike algae, liverworts or vascular plants and most like lichen with coccoid chlorophyte phycobionts. Prototaxites has been placed within Basidiomycota but lacks clear dikaryan features. Prototaxites and its extinct order Nematophytales may belong within Mucoromycotina or Glomeromycota. © 2014 by The Mycological Society of America.
Carbon and nitrogen isotopic compositions of alkyl porphyrins from the Triassic Serpiano oil shale
NASA Technical Reports Server (NTRS)
Chicarelli, M. I.; Hayes, J. M.; Popp, B. N.; Eckardt, C. B.; Maxwell, J. R.
1993-01-01
The carbon and nitrogen isotopic compositions of seven of the most abundant alkylporphyrins from the Serpiano oil shale (marine, Triassic) were determined. For the C31 and C32 butanoporphyrins, values of delta 13CPDB and delta 15NAIR averaged -24.0% and -3.1%. In contrast, the C31 and C32 methylpropanoporphyrins, DPEP, and a C30 13-nor etioporphyrin had delta 13C and delta 15N values averaging -27.5 and -3.3%, respectively. Carbon and nitrogen isotopic values for kerogen averaged -30.8 and -0.9, whereas those for total extract averaged -31.6, and -4.0%. The butanoporphyrins apparently derive from a biological source different from that giving rise to the other porphyrins, their 13C enrichment not being related to carbon isotopic fractionation accompanying diagenetic reactions. The delta 15N values for all the porphyrins indicate that the depletion of 15N observed in the kerogen is of primary origin. Consistent with the very high abundance of hopanoids and methyl hopanoids in the aliphatic hydrocarbon fraction, it is suggested that cyanobacterial fixation of N2 may have been the main cause of 15N depletion.
Carbon isotopic comparisons of oil products used in the developmental history of Alaska
Kvenvolden, K.A.; Carlson, P.R.; Warden, A.; Threlkeld, C.N.
1998-01-01
Studies of the fate of oil released into Prince William Sound, AK, as a result of the 1989 Exxon Valdez oil spill, have led to an unexpected discovery. In addition to oil-like residues attributed to the spill, the ubiquitous presence of flattened tar balls, the carbon isotopic compositions of which fall within a surprisingly narrow range [??13C(PDB) = -23.7 ?? 0.3??? (n = 65)], were observed on the shorelines of the northern and western parts of the sound. These compositions are similar to those of some oil products [-23.7 ?? 0.7??? (n = 35)] that were shipped from California and used in Alaska for fuel, lubrication, construction, and paving before ~ 1970. These products include fuel oil, asphalt, and lubricants [-23.8 ?? 0.5??? (n = 11)], caulking, sealants, and roofing tar [-23.7 ?? 0.7??? (n = 16)], and road pavements and airport runways [-23.5 ?? 0.9??? (n = 8)]. Fuel oil and asphalt [-23.5 ?? 0.1??? (n = 3)], stored at the old Valdez town site and spilled during the 1964 Alaskan earthquake, appear to be the source of most of the beached tar balls. Oil products with lighter carbon isotopic compositions, between -25 and -30??? (n = 18), appear to have been used more recently in Alaska, that is, after ~ 1970. The source of some of the products used for modern pavement and runways [-29.3 ?? 0.2??? (n = 6)] is likely Alaskan North Slope crude oil, an example of which was spilled in the 1989 oil spill [-29.2??? (n = 1)].
NASA Astrophysics Data System (ADS)
Martín-Méndez, Iván; Boixereu, Ester; Villaseca, Carlos
2016-06-01
Graphite is found dispersed in high-grade metapelitic rocks of the Anatectic Complex of Toledo (ACT) and was mined during the mid twentieth century in places where it has been concentrated (Guadamur and la Puebla de Montalbán mines). Some samples from these mines show variable but significant alteration intensity, reaching very low-T hydrothermal (supergene) conditions for some samples from the waste heap of the Guadamur site (<100 °C and 1 kbar). Micro-Raman and XRD data indicate that all the studied ACT graphite is of high crystallinity irrespective of the degree of hydrothermal alteration. Chemical differences were obtained for graphite δ13C composition. ACT granulitic graphite shows δ13CPDB values in the range of -20.5 to -27.8 ‰, indicating a biogenic origin. Interaction of graphite with hydrothermal fluids does not modify isotopic compositions even in the most transformed samples from mining sites. The different isotopic signatures of graphite from the mining sites reflect its contrasted primary carbon source. The high crystallinity of studied graphite makes this area of central Spain suitable for graphitic exploration and its potential exploitation, due to the low carbon content required for its viability and its strategic applications in advanced technologies, such as graphene synthesis.
Neri, V; Margiotta, M; de Francesco, V; Ambrosi, A; Valle, N Della; Fersini, A; Tartaglia, N; Minenna, M F; Ricciardelli, C; Giorgio, F; Panella, C; Ierardi, E
2005-10-15
Although Helicobacter pylori DNA sequences have been detected in cholecystic bile and tissue of patients with gallstones, controversial results are reported from different geographic areas. To detect H. pylori in cholecystic bile and tissue of patients with gallstones from a previously uninvestigated geographic area, southern Italy. Detection included both the bacterial DNA and the specific antigen (H. pylori stool antigen) identified in the stools of infected patients for diagnostic purposes. The study enclosed 33 consecutive patients undergoing laparoscopic cholecystectomy for gallstones. DNA sequences of H. pylori were detected by polymerase chain reaction in both cholecystic bile and tissue homogenate. Moreover, we assayed H.pylori stool antigen on gall-bladder cytosolic and biliary proteins after their extraction. Bacterial presence in the stomach was assessed by urea breath test in all patients and Deltadelta13CPDB value assumed as marker of intragastric load. Fisher's exact probability and Student's t-tests were used for statistical analysis. DNA sequences of H. pylori in bile were found in 51.5% and significantly correlated with its presence in cholecystic tissue homogenate (P<0.005), H. pylori stool antigen in gall-bladder (P=0.0013) and bile (P=0.04) proteins, gastric infection (P<0.01) and intragastric bacterial load (P<0.001). No correlation was found, however, with sex and age of the patients. Our prevalence value of bacterial DNA in bile and gall-bladder of patients with gallstones agreed with that of the only other Italian study. The simultaneous presence of both bacterial DNA and proteic antigen suggests that the same prototype of bacterium could be located at both intestinal and cholecystic level and, therefore, the intestine represents the source of biliary contagion.
Diagenetic evaluation of Pannonian lacustrine deposits in the Makó Trough, southeastern Hungary
NASA Astrophysics Data System (ADS)
Szőcs, Emese; Milovský, Rastislav; Gier, Susanne; Hips, Kinga; Sztanó, Orsolya
2017-04-01
The Makó Trough is the deepest sub-basin of the Pannonian Basin. As a possible shale gas and tight gas accumulation the area was explored by several hydrocarbon companies. In this study, we present the preliminary results on the diagenetic history and the porosity evolution of sandstones and shales. Petrographic (optical microscopy, CL, blue light microscopy) and geochemical methods (SEM-EDX, WDX, O and C stable isotopes) were applied on core samples of Makó-7 well (3408- 5479 m). Processes which influenced the porosity evolution of the sandstones were compaction, cementation, mineral replacement and dissolution. The most common diagenetic minerals are carbonates (non-ferroan and Fe-bearing calcite, dolomite and ankerite), clay minerals (kaolinite, mixed layer illite-smectite and chlorite) and other silicates (quartz and feldspar). Initial clay mineral and ductile grain content also influences reservoir quality. The volumetrically most significant diagenetic minerals are calcite and clay minerals. The petrography of calcite is variable (bright orange to dull red luminescence color, pore-filling cement, replacive phases which are occasionally scattered in the matrix). The δ13 C-PDB values of calcite range from 1.7 ‰ to -5.5 ‰, while δ18 O-PDB values range from 0.5 ‰ to -9.1 ‰, no depth related trend was observed. These data suggest that calcite occurs in more generations, i.e. eogenetic pre-compactional and mesogenetic post-compactional. Kaolinite is present in mottles in size similar to detrital grains, where remnants of feldspars can be seen. This indicates feldspar alteration via influx of water rich in organic derived carbon dioxide. Secondary porosity can be observed in carbonates and feldspars at some levels, causing the improvement of the reservoir quality.
Liquid carbon dioxide of magmatic origin and its role in volcanic eruptions
Chivas, A.R.; Barnes, I.; Evans, William C.; Lupton, J.E.; Stone, J.O.
1987-01-01
Natural liquid carbon dioxide is produced commercially from a 2.5-km-deep well near the 4,500-yr-old maar volcano, Mount Gambier, South Australia. The carbon dioxide has accumulated in a dome that is located on the extension of a linear chain of volcanic activity. A magmatic origin for the fluid is suggested by the geological setting, ??13CPDB of -4.0???, for the CO2 (where PDB represents the carbon-isotope standard), and a relatively high 3He component of the contained helium and high 3He/C ratio (6.4 x 10-10). The 3He/ 4He and He/Ne ratios are 3.0 and > 1,370 times those of air, respectively. The CO2, as collected at the Earth's surface at 29.5 ??C and 75 bar, expands more than 300-fold to form a gas at 1 atm and 22 ??C. We suggest that liquid CO2 or high-density CO2 fluid (the critical point is 31.1 ??C, 73.9 bar) of volcanic origin that expands explosively from shallow levels in the Earth's crust may be a major contributor to 'phreatic' volcanic eruptions and maar formation. Less violent release of magmatic CO2 into crater lakes may cause gas bursts with equally disastrous consequences such as occurred at Lake Nyos, Cameroon, in August 1986. ?? 1987 Nature Publishing Group.
NASA Astrophysics Data System (ADS)
Zhu, Shifa; Yue, Hui; Zhu, Xiaomin; Sun, Shuyang; Wei, Wei; Liu, Xin; Jia, Ye
2017-05-01
Dolomitization of fine-grained volcaniclastic rocks is common in the Lower Cretaceous of the A'nan Sag in the Er'lian Basin of China. Analysis of core samples shows that the organic-rich volcaniclastic rocks are mainly composed of reworked felsic volcanic materials and terrigenous clay minerals. The fine-grained volcaniclastic rocks can be divided into four types: volcaniclastic rocks without carbonatization, volcaniclastic rocks with ferroan dolomites, dolomitized and calcified volcaniclastic rocks, and calcified volcaniclastic rocks. The parent rocks of the volcaniclastic rocks have high silicon and potassium contents and low iron and magnesium contents, and are probably felsic magma of the calc-alkaline series. The average values of δ13CPDB of the carbonate minerals are about 3.13‰; the average values of δ18OPDB are about - 16.74‰. The compositions of C and O isotopes are probably influenced by bacterial methanogenesis. Iron, magnesium, and calcium are probably derived from illitization of terrigenous smectite. A model for dolomitization of felsic volcaniclastic rock is proposed, including three stages: 1) mixed sedimentation and bacterial methanogenesis (< 75 °C); 2) transformation of clay minerals (> 70 °C) and dolomitization (75 to 97 °C); and 3) dissolution. Late dissolution of authigenic carbonate minerals, creating abundant secondary pores, is significant for hydrocarbon accumulation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Bin; Shi, Liang; Brown, Roslyn N.
This study characterizes the composition of extracellular polymeric substances (EPS) from Shewanella sp. HRCR-1 biofilms to provide insight into potential interactions of EPS with redox-active metals and radionuclides. Both bound and loosely associated EPS were extracted from Shewanella sp. HRCR-1 biofilms prepared using a hollow-fiber membrane biofilm reactor (HfMBR). FTIR spectra revealed the presence of proteins, polysaccharides, nucleic acids, membrane lipids, and fatty acids in both bound and loosely associated EPS. Using a global proteomic approach, a total of 58 extracellular and outer membrane proteins were identified in the EPS. These included homologues of multiple S. oneidensis MR-1 proteins thatmore » potentially contribute to key physiological biofilm processes, such as biofilm-promoting protein BpfA, surface-associated serine protease, nucleotidases (CpdB and UshA), an extracellular lipase, and oligopeptidases (PtrB and a M13 family oligopeptidase lipoprotein). In addition, 20 redox proteins were found in extracted EPS. Among the detected redox proteins were the homologues of two S. oneidensis MR-1 c-type cytochromes, MtrC and OmcA, which have been implicated in extracellular electron transfer. Given their detection in the EPS of Shewanella sp. HRCR 1 biofilms, c-type cytochromes may contribute to the possible redox activity of the biofilm matrix and play important roles in extracellular electron transfer reactions.« less
NASA Astrophysics Data System (ADS)
Karmakar, Shyamal; Ghergut, Julia; Sauter, Martin
2015-04-01
Artificial-fracture design, and fracture characterization during or following stimulation treatment is a central aspect of many EGS ('enhanced' or 'engineered' geothermal system) projects. During the creation or stimulation of an EGS, the injection of fluids, followed by flowback and production stages offers the opportunity for conducting various tracer tests in a single-well (SW) configuration, and given the typical operational and time limitations associated with such tests, along with the need to assess treatment success in real time, investigators mostly favour using short-time tracer-test data, rather than awaiting long-term 'tailings' of tracer signals. Late-time tracer signals from SW injection-flowback and production tests have mainly been used for the purpose of multiple-fracture inflow profiling in multi-layer reservoirs [1]. However, the potential of using SW short-term tracer signals for fracture characterization [2, 3] remained little explored as yet. Dealing with short-term flowback signals, we face a certain degree of parameter interplay, leading to ambiguity in fracture parameter inversion from the measured signal of a single tracer. This ambiguity can, to a certain extent, be overcome by - combining different sources of information (lithostratigraphy, and hydraulic monitoring) in order to constrain the variation range of hydrogeologic parameters (matrix and fracture permeability and porosity, fracture size), - using different types of tracers, such as conservative tracer pairs with contrasting diffusivity, or tracers pairs with contrasting sorptivity onto target surfaces. Fracture height is likely to be constrained by lithostratigraphy, while fracture length is supposed to be determinable from hydraulic monitoring (pressure recordings); the flowback rate can be assumed as a known (measurable) quantity during individual-fracture flowback. This leaves us with one or two unknown parameters to be determined from tracer signals: - the transport-effective aperture, in a water fracture (WF), or - fracture thickness and porosity, for a gel-proppant fracture (GPF). We find that parameter determination from SW early signals can significantly be improved by concomitantly using a number of solute tracers with different transport and retardation behaviour. We considered tracers of different sorptivity to proppant coatings, and to matrix rock surfaces, for GPF, as well as contrasting-diffusivity or -sorptivity tracers, for WF. An advantage of this SW approach is that it requires only small chaser volumes (few times the fracture volume), not relying on advective penetration into the rock matrix. Thus, selected tracer species are to be injected during the very last stage of the fracturing process, when fracture sizes and thus target parameters are supposed to attain more or less stable values. We illustrate the application of these tracer test design principles using hydro- and lithostratigraphy data from the Geothermal Research Platform at Groß Schönebeck [4], targeting a multi-layer reservoir (sedimentary and crystalline formations in 4-5 km depth) in the NE-German Sedimentary Basin. Acknowledgments: This work benefited from long-term support from Baker Hughes (Celle) and from the Lower-Saxonian Science and Culture Ministry (MWK Niedersachsen) within the applied research project gebo (Geothermal Energy and High-Performance Drilling, 2009-2014). The first author gratefully acknowledges continued financial support from the DAAD (German Academic Exchange Service) to pursuing Ph. D. work. References: [1] http://www.sciencedirect.com/science/article/pii/S1876610214017391 [2] http://www.geothermal-energy.org/cpdb/record_detail.php?id=7215 [3] http://www.geothermal-energy.org/cpdb/record_detail.php?id=19034 [4] http://www.gfz-potsdam.de/en/scientific-services/laboratories/gross-schoenebeck/
NASA Astrophysics Data System (ADS)
Burnham, A. D.; Bulanova, G. P.; Smith, C. B.; Whitehead, S. C.; Kohn, S. C.; Gobbo, L.; Walter, M. J.
2016-11-01
Diamonds from the Machado River alluvial deposit have been characterised on the basis of external morphology, internal textures, carbon isotopic composition, nitrogen concentration and aggregation state and mineral inclusion chemistry. Variations in morphology and features of abrasion suggest some diamonds have been derived directly from local kimberlites, whereas others have been through extensive sedimentary recycling. On the basis of mineral inclusion compositions, both lithospheric and sublithospheric diamonds are present at the deposit. The lithospheric diamonds have clear layer-by-layer octahedral and/or cuboid internal growth zonation, contain measurable nitrogen and indicate a heterogeneous lithospheric mantle beneath the region. The sublithospheric diamonds show a lack of regular sharp zonation, do not contain detectable nitrogen, are isotopically heavy (δ13CPDB predominantly - 0.7 to - 5.5) and contain inclusions of ferropericlase, former bridgmanite, majoritic garnet and former CaSiO3-perovskite. This suggests source lithologies that are Mg- and Ca-rich, probably including carbonates and serpentinites, subducted to lower mantle depths. The studied suite of sublithospheric diamonds has many similarities to the alluvial diamonds from Kankan, Guinea, but has more extreme variations in mineral inclusion chemistry. Of all superdeep diamond suites yet discovered, Machado River represents an end-member in terms of either the compositional range of materials being subducted to Transition Zone and lower mantle or the process by which materials are transferred from the subducted slab to the diamond-forming region.
NASA Astrophysics Data System (ADS)
Sakakibara, M.; Sugawara, H.; Tsuji, T.; Ikehara, M.
2014-05-01
The past two decades have seen the reporting of microbial fossils within ancient oceanic basalts that could be identical to microbes within modern basalts. Here, we present new petrographic, mineralogical, and stable isotopic data for metabasalts containing filamentous structures in a Jurassic accretionary complex within the northern Chichibu Belt of the Yanadani area of central Shikoku, Japan. Mineralized filaments within these rocks are present in interstitial domains filled with calcite, pumpellyite, or quartz, and consist of iron oxide, phengite, and pumpellyite. δ13CPDB values for filament-bearing calcite within these metabasalts vary from -2.49‰ to 0.67‰. A biogenic origin for these filamentous structures is indicated by (1) the geological context of the Yanadani metabasalt, (2) the morphology of the filaments, (3) the carbon isotope composition of carbonates that host the filaments, and (4) the timing of formation of these filaments relative to the timing of low-grade metamorphism in a subduction zone. The putative microorganisms that formed these filaments thrived between eruption (Late Paleozoic) and accretion (Early Jurassic) of the basalt. The data presented here indicate that cryptoendolithic life was present within water-filled vesicles in pre-Jurassic intraplate basalts. The mineralogy of the filaments reflects the low-grade metamorphic recrystallization of authigenic microbial clays similar to those formed by the encrustation of prokaryotes in modern iron-rich environments. These findings suggest that a previously unusual niche for life is present within intraplate volcanic rocks in accretionary complexes.
NASA Astrophysics Data System (ADS)
Lustrino, Michele; Prelević, Dejan; Agostini, Samuele; Gaeta, Mario; Di Rocco, Tommaso; Stagno, Vincenzo; Capizzi, Luca Samuele
2016-07-01
The volcanic products of the late Miocene Morron de Villamayor volcano (Calatrava Volcanic Field, central Spain) are known for being one of the few outcrops of leucitites in the entire circum-Mediterranean area. These rocks are important because aragonite of mantle origin has been reported as inclusion in olivine macrocrysts. We use petrographic observations, mineral compositions, as well as oxygen and carbon isotope ratios coupled with experimental petrology to understand the origin of carbonate phase in these olivine-phyric rocks. Groundmass and macrocryst olivines range from δ18OVSMOW of +4.8‰, typical of mantle olivine values, to +7.4‰, indicating contamination by sedimentary carbonate. Carbonates are characterized by heavy oxygen isotope compositions (δ18OVSMOW >+24‰), and relatively light carbon isotopes (δ13CPDB <-11‰), resembling skarn values, and distinct from typical mantle carbonatite compositions. Petrography, mineral compositions such as low Mg# of clinopyroxene and biotite, low Ca# and low incompatible element abundance of the carbonate, and isotopic ratios of O and C, do not support a mantle origin for the carbonate. Rather, the carbonate inclusions found in the olivine macrocrysts are interpreted as basement limestone fragments entrapped by the rising crystallizing magma. Comparison with experimental carbonatitic and silicate-carbonatitic melts indicates that low-degree partial melts of a carbonated peridotite must have a dolomitic rather than the aragonitic/calcitic composition as those found trapped in the Morron de Villamayor olivine macrocrysts.
Maintaining Multimedia Data in a Geospatial Database
2012-09-01
at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database produced result sets from zero to 100,000, it was...excelled given multiple conditions. A different look at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database... MySQL ................................................................................................14 B. BENCHMARKING DATA RETRIEVED FROM TABLE
NASA Astrophysics Data System (ADS)
Li, Y.; Yang, J.; Nida, K.; Yamamoto, S.; Lin, Y.; Li, Q.; Tian, M.; Kon, Y.; Komiya, T.; Maruyama, S.
2017-12-01
The Horoman peridotite complex is an Alpine-type orogenic lherzolite massif of upper-mantle in the Hidaka metamorphic belt, Hokkaido, Japan. The peridotite complex is composed of dunite, harzburgite, spinel lherzolite and plagioclase lherzolite, exhibits a conspicuous layered structure, which is a product of a Cretaceous to early Paleogene arc-trench system formed by westward subduction of an oceanic plate between the paleo-Eurasian and paleo-North American Plates. Various combinations of diamond, corundum, moissanite, zircon, monazite, rutile, and kyanite have been separated from spinel harzburgite (700 kg) and lherzolite (500 kg), respectively. The carbon isotopes analyses of diamond grains by Nano-SIMS yielded significant light carbon isotopes feature as δ13 CPDB values ranging from -29.2 ‰ to -17.2 ‰, with an average of -22.8±0.32 ‰. Zircon grains occur as sub-angular to round in morphological characteristics, similar to zircons of crustal sedimentary rocks. Many zircons contain small inclusions, comprise of quartz, apatite, rutile and muscovite. The U-Pb age of zircon grains analyzed using LA-ICP-MS and SIMS gave a wide age range, from the Jurassic to Archean (ca 159 - 3131 Ma). In the zircon age histogram, four age groups were identified; the age peaks are 2385 Ma, 1890 Ma, 1618 Ma and 1212 Ma, respectively. On the other hand, U-Pb ages of rutile grains analyzed using SIMS gave a peak of 370 Ma in age histogram. The mineralogical and chronological evidences of numerous crustal minerals in peridotite of Horoman suggest that the ancient continent material was subducted in deep mantle and recycled through the upper mantle by multicycle subduction processes.
Nelson, Stephen T.; Karlsson, Haraldur R.; Paces, James B.; Tingey, David G.; Ward, Stephen; Peters, Mark T.
2001-01-01
Tufa (spring) deposits in the Tecopa basin, California, reflect the response of arid groundwater regimes to wet climate episodes. Two types of tufa are represented, informally defined as (1) an easily disaggregated, fine-grained mixture of calcite and quartz (friable tufa) in the southwest Tecopa Valley, and (2) hard, vuggy micrite, laminated carbonate, and carbonate-cemented sands and gravels (indurated tufa) along the eastern margin of Lake Tecopa. High δ18OVSMOW (Vienna standard mean ocean water) water values, field relations, and the texture of friable tufa suggest rapid nucleation of calcite as subaqueous, fault- controlled groundwater discharge mixed with high-pH, hypersaline lake water. Variations between δ18OVSMOW and δ13CPDB (Peedee belemnite) values relative to other closed basin lakes such as the Great Salt Lake and Lake Lahontan suggest similarities in climatic and hydrologic settings. Indurated tufa, also fault controlled, formed mounds and associated feeder systems as well as stratabound carbonate-cemented ledges. Both deposits represent discharge of deeply circulated, high total dissolved solids, and high pCO2 regional groundwater with kinetic enrichments of as much as several per mil for δ18OVSMOW values. Field relations show that indurated tufa represents episodic discharge, and U-series ages imply that discharge was correlated with cold, wet climate episodes. In response to both the breaching of the Tecopa basin and a modern arid climate, most discharge has changed from fault-controlled locations near basin margins to topographic lows of the Amargosa River drainage at elevations 30–130 m lower. Because of episodic climate change, spring flows may have relocated from basin margin to basin center multiple times.
Peter, J.M.; Shanks, Wayne C.
1992-01-01
Sulfur, carbon, and oxygen isotope values were measured in sulfide, sulfate, and carbonate from hydrothermal chimney, spire, and mound samples in the southern trough of Guaymas Basin, Gulf of California, USA. ??34S values of sulfides range from -3.7 to 4.5%. and indicate that sulfur originated from several sources: 1. (1) dissolution of 0??? sulfide contained within basaltic rocks, 2. (2) thermal reduction of seawater sulfate during sediment alteration reactions in feeder zones to give sulfide with positive ??34S, and 3. (3) entrainment or leaching of isotopically light (negative-??34S) bacteriogenic sulfide from sediments underlying the deposits. ??34S of barite and anhydrite indicate sulfur derivation mainly from unfractionated seawater sulfate, although some samples show evidence of sulfate reduction and sulfide oxidation reactions during mixing within chimneys. Oxygen isotope temperatures calculated for chimney calcites are in reasonable agreement with measured vent fluid temperatures and fluid inclusion trapping temperatures. Hydrothermal fluids that formed calcite-rich chimneys in the southern trough of Guaymas Basin were enriched in 18O with respect to seawater by about 2.4??? due to isotopic exchange with sedimentary and/or basaltic rocks. Carbon isotope values of calcite range from -9.6 to -14.0??? ??34CpDB, indicating that carbon was derived in approximately equal quantities from the dissolution of marine carbonate minerals and the oxidation of organic matter during migration of hydrothermal fluid through the underlying sediment column. Statistically significant positive, linear correlations of ??34S, ??34C, and ??18O of sulfides and calcites with geographic location within the southern trough of Guaymas Basin are best explained by variations in water/rock ( w r) ratios or sediment reactivity within subsurface alteration zones. Low w r ratios and the leaching of detrital carbonates and bacteriogenic sulfides at the southern vent sites result in relatively high ??13C and low ??34S in chimney carbonates and sulfides, respectively. In the north, where the depletion of alkalis in vent fluids indicates higher w r ratios, positive ??34S and more negative ??13c are due to increased contributions from organic matter oxidation and sulfate reduction reactions. ?? 1992.
NLTE4 Plasma Population Kinetics Database
National Institute of Standards and Technology Data Gateway
SRD 159 NLTE4 Plasma Population Kinetics Database (Web database for purchase) This database contains benchmark results for simulation of plasma population kinetics and emission spectra. The data were contributed by the participants of the 4th Non-LTE Code Comparison Workshop who have unrestricted access to the database. The only limitation for other users is in hidden labeling of the output results. Guest users can proceed to the database entry page without entering userid and password.
Prototype of web-based database of surface wave investigation results for site classification
NASA Astrophysics Data System (ADS)
Hayashi, K.; Cakir, R.; Martin, A. J.; Craig, M. S.; Lorenzo, J. M.
2016-12-01
As active and passive surface wave methods are getting popular for evaluating site response of earthquake ground motion, demand on the development of database for investigation results is also increasing. Seismic ground motion not only depends on 1D velocity structure but also on 2D and 3D structures so that spatial information of S-wave velocity must be considered in ground motion prediction. The database can support to construct 2D and 3D underground models. Inversion of surface wave processing is essentially non-unique so that other information must be combined into the processing. The database of existed geophysical, geological and geotechnical investigation results can provide indispensable information to improve the accuracy and reliability of investigations. Most investigations, however, are carried out by individual organizations and investigation results are rarely stored in the unified and organized database. To study and discuss appropriate database and digital standard format for the surface wave investigations, we developed a prototype of web-based database to store observed data and processing results of surface wave investigations that we have performed at more than 400 sites in U.S. and Japan. The database was constructed on a web server using MySQL and PHP so that users can access to the database through the internet from anywhere with any device. All data is registered in the database with location and users can search geophysical data through Google Map. The database stores dispersion curves, horizontal to vertical spectral ratio and S-wave velocity profiles at each site that was saved in XML files as digital data so that user can review and reuse them. The database also stores a published 3D deep basin and crustal structure and user can refer it during the processing of surface wave data.
Kamali, Parisa; Zettervall, Sara L; Wu, Winona; Ibrahim, Ahmed M S; Medin, Caroline; Rakhorst, Hinne A; Schermerhorn, Marc L; Lee, Bernard T; Lin, Samuel J
2017-04-01
Research derived from large-volume databases plays an increasing role in the development of clinical guidelines and health policy. In breast cancer research, the Surveillance, Epidemiology and End Results, National Surgical Quality Improvement Program, and Nationwide Inpatient Sample databases are widely used. This study aims to compare the trends in immediate breast reconstruction and identify the drawbacks and benefits of each database. Patients with invasive breast cancer and ductal carcinoma in situ were identified from each database (2005-2012). Trends of immediate breast reconstruction over time were evaluated. Patient demographics and comorbidities were compared. Subgroup analysis of immediate breast reconstruction use per race was conducted. Within the three databases, 1.2 million patients were studied. Immediate breast reconstruction in invasive breast cancer patients increased significantly over time in all databases. A similar significant upward trend was seen in ductal carcinoma in situ patients. Significant differences in immediate breast reconstruction rates were seen among races; and the disparity differed among the three databases. Rates of comorbidities were similar among the three databases. There has been a significant increase in immediate breast reconstruction; however, the extent of the reporting of overall immediate breast reconstruction rates and of racial disparities differs significantly among databases. The Nationwide Inpatient Sample and the National Surgical Quality Improvement Program report similar findings, with the Surveillance, Epidemiology and End Results database reporting results significantly lower in several categories. These findings suggest that use of the Surveillance, Epidemiology and End Results database may not be universally generalizable to the entire U.S.
Alternative Databases for Anthropology Searching.
ERIC Educational Resources Information Center
Brody, Fern; Lambert, Maureen
1984-01-01
Examines online search results of sample questions in several databases covering linguistics, cultural anthropology, and physical anthropology in order to determine if and where any overlap in results might occur, and which files have greatest number of relevant hits. Search results by database are given for each subject area. (EJS)
Hartung, Daniel M; Zarin, Deborah A; Guise, Jeanne-Marie; McDonagh, Marian; Paynter, Robin; Helfand, Mark
2014-04-01
ClinicalTrials.gov requires reporting of result summaries for many drug and device trials. To evaluate the consistency of reporting of trials that are registered in the ClinicalTrials.gov results database and published in the literature. ClinicalTrials.gov results database and matched publications identified through ClinicalTrials.gov and a manual search of 2 electronic databases. 10% random sample of phase 3 or 4 trials with results in the ClinicalTrials.gov results database, completed before 1 January 2009, with 2 or more groups. One reviewer extracted data about trial design and results from the results database and matching publications. A subsample was independently verified. Of 110 trials with results, most were industry-sponsored, parallel-design drug studies. The most common inconsistency was the number of secondary outcome measures reported (80%). Sixteen trials (15%) reported the primary outcome description inconsistently, and 22 (20%) reported the primary outcome value inconsistently. Thirty-eight trials inconsistently reported the number of individuals with a serious adverse event (SAE); of these, 33 (87%) reported more SAEs in ClinicalTrials.gov. Among the 84 trials that reported SAEs in ClinicalTrials.gov, 11 publications did not mention SAEs, 5 reported them as zero or not occurring, and 21 reported a different number of SAEs. Among 29 trials that reported deaths in ClinicalTrials.gov, 28% differed from the matched publication. Small sample that included earliest results posted to the database. Reporting discrepancies between the ClinicalTrials.gov results database and matching publications are common. Which source contains the more accurate account of results is unclear, although ClinicalTrials.gov may provide a more comprehensive description of adverse events than the publication. Agency for Healthcare Research and Quality.
Databases in the Central Government : State-of-the-art and the Future
NASA Astrophysics Data System (ADS)
Ohashi, Tomohiro
Management and Coordination Agency, Prime Minister’s Office, conducted a survey by questionnaire against all Japanese Ministries and Agencies, in November 1985, on a subject of the present status of databases produced or planned to be produced by the central government. According to the results, the number of the produced databases has been 132 in 19 Ministries and Agencies. Many of such databases have been possessed by Defence Agency, Ministry of Construction, Ministry of Agriculture, Forestry & Fisheries, and Ministry of International Trade & Industries and have been in the fields of architecture & civil engineering, science & technology, R & D, agriculture, forestry and fishery. However the ratio of the databases available for other Ministries and Agencies has amounted to only 39 percent of all produced databases and the ratio of the databases unavailable for them has amounted to 60 percent of all of such databases, because of in-house databases and so forth. The outline of such results of the survey is reported and the databases produced by the central government are introduced under the items of (1) databases commonly used by all Ministries and Agencies, (2) integrated databases, (3) statistical databases and (4) bibliographic databases. The future problems are also described from the viewpoints of technology developments and mutual uses of databases.
NASA Technical Reports Server (NTRS)
Bohnhoff-Hlavacek, Gail
1992-01-01
One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.
Sana, Theodore R; Roark, Joseph C; Li, Xiangdong; Waddell, Keith; Fischer, Steven M
2008-09-01
In an effort to simplify and streamline compound identification from metabolomics data generated by liquid chromatography time-of-flight mass spectrometry, we have created software for constructing Personalized Metabolite Databases with content from over 15,000 compounds pulled from the public METLIN database (http://metlin.scripps.edu/). Moreover, we have added extra functionalities to the database that (a) permit the addition of user-defined retention times as an orthogonal searchable parameter to complement accurate mass data; and (b) allow interfacing to separate software, a Molecular Formula Generator (MFG), that facilitates reliable interpretation of any database matches from the accurate mass spectral data. To test the utility of this identification strategy, we added retention times to a subset of masses in this database, representing a mixture of 78 synthetic urine standards. The synthetic mixture was analyzed and screened against this METLIN urine database, resulting in 46 accurate mass and retention time matches. Human urine samples were subsequently analyzed under the same analytical conditions and screened against this database. A total of 1387 ions were detected in human urine; 16 of these ions matched both accurate mass and retention time parameters for the 78 urine standards in the database. Another 374 had only an accurate mass match to the database, with 163 of those masses also having the highest MFG score. Furthermore, MFG calculated a formula for a further 849 ions that had no match to the database. Taken together, these results suggest that the METLIN Personal Metabolite database and MFG software offer a robust strategy for confirming the formula of database matches. In the event of no database match, it also suggests possible formulas that may be helpful in interpreting the experimental results.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Use of a German longitudinal prescription database (LRx) in pharmacoepidemiology.
Richter, Hartmut; Dombrowski, Silvia; Hamer, Hajo; Hadji, Peyman; Kostev, Karel
2015-01-01
Large epidemiological databases are often used to examine matters pertaining to drug utilization, health services, and drug safety. The major strength of such databases is that they include large sample sizes, which allow precise estimates to be made. The IMS® LRx database has in recent years been used as a data source for epidemiological research. The aim of this paper is to review a number of recent studies published with the aid of this database and compare these with the results of similar studies using independent data published in the literature. In spite of being somewhat limited to studies for which comparative independent results were available, it was possible to include a wide range of possible uses of the LRx database in a variety of therapeutic fields: prevalence/incidence rate determination (diabetes, epilepsy), persistence analyses (diabetes, osteoporosis), use of comedication (diabetes), drug utilization (G-CSF market) and treatment costs (diabetes, G-CSF market). In general, the results of the LRx studies were found to be clearly in line with previously published reports. In some cases, noticeable discrepancies between the LRx results and the literature data were found (e.g. prevalence in epilepsy, persistence in osteoporosis) and these were discussed and possible reasons presented. Overall, it was concluded that the IMS® LRx database forms a suitable database for pharmacoepidemiological studies.
Content based information retrieval in forensic image databases.
Geradts, Zeno; Bijhold, Jurrien
2002-03-01
This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.
Petherick, Emily S; Pickett, Kate E; Cullum, Nicky A
2015-08-01
Primary care databases from the UK have been widely used to produce evidence on the epidemiology and health service usage of a wide range of conditions. To date there have been few evaluations of the comparability of estimates between different sources of these data. To estimate the comparability of two widely used primary care databases, the Health Improvement Network Database (THIN) and the General Practice Research Database (GPRD) using venous leg ulceration as an exemplar condition. Cross prospective cohort comparison. GPRD and the THIN databases using data from 1998 to 2006. A data set was extracted from both databases containing all cases of persons aged 20 years or greater with a database diagnosis of venous leg ulceration recorded in the databases for the period 1998-2006. Annual rates of incidence and prevalence of venous leg ulceration were calculated within each database and standardized to the European standard population and compared using standardized rate ratios. Comparable estimates of venous leg ulcer incidence from the GPRD and THIN databases could be obtained using data from 2000 to 2006 and of prevalence using data from 2001 to 2006. Recent data collected by these two databases are more likely to produce comparable results of the burden venous leg ulceration. These results require confirmation in other disease areas to enable researchers to have confidence in the comparability of findings from these two widely used primary care research resources. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Databases as policy instruments. About extending networks as evidence-based policy.
de Bont, Antoinette; Stoevelaar, Herman; Bal, Roland
2007-12-07
This article seeks to identify the role of databases in health policy. Access to information and communication technologies has changed traditional relationships between the state and professionals, creating new systems of surveillance and control. As a result, databases may have a profound effect on controlling clinical practice. We conducted three case studies to reconstruct the development and use of databases as policy instruments. Each database was intended to be employed to control the use of one particular pharmaceutical in the Netherlands (growth hormone, antiretroviral drugs for HIV and Taxol, respectively). We studied the archives of the Dutch Health Insurance Board, conducted in-depth interviews with key informants and organized two focus groups, all focused on the use of databases both in policy circles and in clinical practice. Our results demonstrate that policy makers hardly used the databases, neither for cost control nor for quality assurance. Further analysis revealed that these databases facilitated self-regulation and quality assurance by (national) bodies of professionals, resulting in restrictive prescription behavior amongst physicians. The databases fulfill control functions that were formerly located within the policy realm. The databases facilitate collaboration between policy makers and physicians, since they enable quality assurance by professionals. Delegating regulatory authority downwards into a network of physicians who control the use of pharmaceuticals seems to be a good alternative for centralized control on the basis of monitoring data.
Difficulties in diagnosing Marfan syndrome using current FBN1 databases.
Groth, Kristian A; Gaustadnes, Mette; Thorsen, Kasper; Østergaard, John R; Jensen, Uffe Birk; Gravholt, Claus H; Andersen, Niels H
2016-01-01
The diagnostic criteria of Marfan syndrome (MFS) highlight the importance of a FBN1 mutation test in diagnosing MFS. As genetic sequencing becomes better, cheaper, and more accessible, the expected increase in the number of genetic tests will become evident, resulting in numerous genetic variants that need to be evaluated for disease-causing effects based on database information. The aim of this study was to evaluate genetic variants in four databases and review the relevant literature. We assessed background data on 23 common variants registered in ESP6500 and classified as causing MFS in the Human Gene Mutation Database (HGMD). We evaluated data in four variant databases (HGMD, UMD-FBN1, ClinVar, and UniProt) according to the diagnostic criteria for MFS and compared the results with the classification of each variant in the four databases. None of the 23 variants was clearly associated with MFS, even though all classifications in the databases stated otherwise. A genetic diagnosis of MFS cannot reliably be based on current variant databases because they contain incorrectly interpreted conclusions on variants. Variants must be evaluated by time-consuming review of the background material in the databases and by combining these data with expert knowledge on MFS. This is a major problem because we expect even more genetic test results in the near future as a result of the reduced cost and process time for next-generation sequencing.Genet Med 18 1, 98-102.
Improved Information Retrieval Performance on SQL Database Using Data Adapter
NASA Astrophysics Data System (ADS)
Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.
2018-02-01
The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.
ERIC Educational Resources Information Center
Fitzgibbons, Megan; Meert, Deborah
2010-01-01
The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…
Building An Integrated Neurodegenerative Disease Database At An Academic Health Center
Xie, Sharon X.; Baek, Young; Grossman, Murray; Arnold, Steven E.; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M.-Y.; Trojanowski, John Q.
2010-01-01
Background It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration (FTLD). These comparative studies rely on powerful database tools to quickly generate data sets which match diverse and complementary criteria set by the studies. Methods In this paper, we present a novel Integrated NeuroDegenerative Disease (INDD) database developed at the University of Pennsylvania (Penn) through a consortium of Penn investigators. Since these investigators work on AD, PD, ALS and FTLD, this allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used Microsoft SQL Server as the platform with built-in “backwards” functionality to provide Access as a front-end client to interface with the database. We used PHP hypertext Preprocessor to create the “front end” web interface and then integrated individual neurodegenerative disease databases using a master lookup table. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Results We compare the results of a biomarker study using the INDD database to those using an alternative approach by querying individual database separately. Conclusions We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies across several neurodegenerative diseases. PMID:21784346
Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs
2015-01-01
Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
Toward unification of taxonomy databases in a distributed computer environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi
1994-12-31
All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
[Validation of interaction databases in psychopharmacotherapy].
Hahn, M; Roll, S C
2018-03-01
Drug-drug interaction databases are an important tool to increase drug safety in polypharmacy. There are several drug interaction databases available but it is unclear which one shows the best results and therefore increases safety for the user of the databases and the patients. So far, there has been no validation of German drug interaction databases. Validation of German drug interaction databases regarding the number of hits, mechanisms of drug interaction, references, clinical advice, and severity of the interaction. A total of 36 drug interactions which were published in the last 3-5 years were checked in 5 different databases. Besides the number of hits, it was also documented if the mechanism was correct, clinical advice was given, primary literature was cited, and the severity level of the drug-drug interaction was given. All databases showed weaknesses regarding the hit rate of the tested drug interactions, with a maximum of 67.7% hits. The highest score in this validation was achieved by MediQ with 104 out of 180 points. PsiacOnline achieved 83 points, arznei-telegramm® 58, ifap index® 54 and the ABDA-database 49 points. Based on this validation MediQ seems to be the most suitable databank for the field of psychopharmacotherapy. The best results in this comparison were achieved by MediQ but this database also needs improvement with respect to the hit rate so that the users can rely on the results and therefore increase drug therapy safety.
Evaluating the Impact of Database Heterogeneity on Observational Study Results
Madigan, David; Ryan, Patrick B.; Schuemie, Martijn; Stang, Paul E.; Overhage, J. Marc; Hartzema, Abraham G.; Suchard, Marc A.; DuMouchel, William; Berlin, Jesse A.
2013-01-01
Clinical studies that use observational databases to evaluate the effects of medical products have become commonplace. Such studies begin by selecting a particular database, a decision that published papers invariably report but do not discuss. Studies of the same issue in different databases, however, can and do generate different results, sometimes with strikingly different clinical implications. In this paper, we systematically study heterogeneity among databases, holding other study methods constant, by exploring relative risk estimates for 53 drug-outcome pairs and 2 widely used study designs (cohort studies and self-controlled case series) across 10 observational databases. When holding the study design constant, our analysis shows that estimated relative risks range from a statistically significant decreased risk to a statistically significant increased risk in 11 of 53 (21%) of drug-outcome pairs that use a cohort design and 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design. This exceeds the proportion of pairs that were consistent across databases in both direction and statistical significance, which was 9 of 53 (17%) for cohort studies and 5 of 53 (9%) for self-controlled case series. Our findings show that clinical studies that use observational databases can be sensitive to the choice of database. More attention is needed to consider how the choice of data source may be affecting results. PMID:23648805
Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A; McGowan, Thomas; Wroblewski, Matthew S; Seymour, Sean L; Griffin, Timothy J
2013-04-01
Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek
2016-12-22
Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.
National Institute of Standards and Technology Data Gateway
SRD 17 NIST Chemical Kinetics Database (Web, free access) The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
Hydrogen Leak Detection Sensor Database
NASA Technical Reports Server (NTRS)
Baker, Barton D.
2010-01-01
This slide presentation reviews the characteristics of the Hydrogen Sensor database. The database is the result of NASA's continuing interest in and improvement of its ability to detect and assess gas leaks in space applications. The database specifics and a snapshot of an entry in the database are reviewed. Attempts were made to determine the applicability of each of the 65 sensors for ground and/or vehicle use.
Simple Logic for Big Problems: An Inside Look at Relational Databases.
ERIC Educational Resources Information Center
Seba, Douglas B.; Smith, Pat
1982-01-01
Discusses database design concept termed "normalization" (process replacing associations between data with associations in two-dimensional tabular form) which results in formation of relational databases (they are to computers what dictionaries are to spoken languages). Applications of the database in serials control and complex systems…
NASA Technical Reports Server (NTRS)
Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang
2009-01-01
Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.
NASA Astrophysics Data System (ADS)
Yin, Lucy; Andrews, Jennifer; Heaton, Thomas
2018-05-01
Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.
Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L
2016-11-04
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.
2016-01-01
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934
Mackey, Aaron J; Pearson, William R
2004-10-01
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Quantification of the Uncertainties for the Ares I A106 Ascent Aerodynamic Database
NASA Technical Reports Server (NTRS)
Houlden, Heather P.; Favaregh, Amber L.
2010-01-01
A detailed description of the quantification of uncertainties for the Ares I ascent aero 6-DOF wind tunnel database is presented. The database was constructed from wind tunnel test data and CFD results. The experimental data came from tests conducted in the Boeing Polysonic Wind Tunnel in St. Louis and the Unitary Plan Wind Tunnel at NASA Langley Research Center. The major sources of error for this database were: experimental error (repeatability), database modeling errors, and database interpolation errors.
NASA Astrophysics Data System (ADS)
Cao, Hua-Wen; Pei, Qiu-Ming; Zhang, Shou-Ting; Zhang, Lin-Kui; Tang, Li; Lin, Jin-Zhan; Zheng, Luo
2017-04-01
The Lailishan deposit is an important tin deposit that is genetically associated with an Early Eocene biotite granite in the western Yunnan metallogenic belt in the Sanjiang region, SW China. This study reports new zircon U-Pb ages and Hf isotopic data, whole-rock elements, mica Ar-Ar age and C-H-O-S-Pb isotope for the Lailishan Sn deposit. The mineralization-related biotite granite crystallized during the Early Eocene (50.5 Ma), with its zircon εHf(t) values ranging from -11.5 to -7.6 and two-stage Hf model ages (TDM2) ranging from 1.60 to 1.85 Ga. The rocks are peraluminous with A/CNK values of 0.99-1.08. The granites display high Si, Al and K contents but low Mg, Fe and Ca contents. The rocks show flat chondrite-normalized REE patterns with strong Eu negative anomalies. These characteristics indicate that the magma originated from a continental crustal source. The hydrothermal muscovite exhibits an Ar-Ar plateau age of 50.4 ± 0.2 Ma. The δ18O and δD values of hydrothermal quartz from the deposit range from -7.32‰ to 4.01‰ and from -124.9‰ to -87.1‰, respectively. The δ13CPDB and δ18OSMOW values of calcite range from -11.3‰ to -3.7‰ and from +2.2‰ to +12.7‰, respectively. The sulfur isotopic compositions (δ34SV-CDT) range from +3.3‰ to +8.6‰ for sulfide separates, and the lead isotopic ratios 206Pb/204Pb, 207Pb/204Pb and 208Pb/204Pb range from 18.668 to 18.746, from 15.710 to 15.743 and from 39.202 to 39.295, respectively. These isotopic compositions are similar to those of magma-derived fluids, indicating that the ore-forming fluids and materials mainly originated from magmatic rocks with some input from meteoric water. This evidence suggests that the tin mineralization is closely linked to the Lailishan I-type granites. In combination with previous data, it is proposed in this study that widespread early Eocene magmatism resulted from the slab breakoff of the subducting Neo-Tethyan slab at ca. 55 Ma.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases
Ahmadi, Maryam; Sarabi, Roghayeh Ershad; Orak, Roohangiz Jamshidi; Bahaadinbeigy, Kambiz
2015-01-01
Background and Aims: The first step in each systematic review is selection of the most valid database that can provide the highest number of relevant references. This study was carried out to determine the most suitable database for information retrieval in telemedicine field. Methods: Cinhal, PubMed, Web of Science and Scopus databases were searched for telemedicine matched with Education, cost benefit and patient satisfaction. After analysis of the obtained results, the accuracy coefficient, sensitivity, uniqueness and overlap of databases were calculated. Results: The studied databases differed in the number of retrieved articles. PubMed was identified as the most suitable database for retrieving information on the selected topics with the accuracy and sensitivity ratios of 50.7% and 61.4% respectively. The uniqueness percent of retrieved articles ranged from 38% for Pubmed to 3.0% for Cinhal. The highest overlap rate (18.6%) was found between PubMed and Web of Science. Less than 1% of articles have been indexed in all searched databases. Conclusion: PubMed is suggested as the most suitable database for starting search in telemedicine and after PubMed, Scopus and Web of Science can retrieve about 90% of the relevant articles. PMID:26236086
ARACHNID: A prototype object-oriented database tool for distributed systems
NASA Technical Reports Server (NTRS)
Younger, Herbert; Oreilly, John; Frogner, Bjorn
1994-01-01
This paper discusses the results of a Phase 2 SBIR project sponsored by NASA and performed by MIMD Systems, Inc. A major objective of this project was to develop specific concepts for improved performance in accessing large databases. An object-oriented and distributed approach was used for the general design, while a geographical decomposition was used as a specific solution. The resulting software framework is called ARACHNID. The Faint Source Catalog developed by NASA was the initial database testbed. This is a database of many giga-bytes, where an order of magnitude improvement in query speed is being sought. This database contains faint infrared point sources obtained from telescope measurements of the sky. A geographical decomposition of this database is an attractive approach to dividing it into pieces. Each piece can then be searched on individual processors with only a weak data linkage between the processors being required. As a further demonstration of the concepts implemented in ARACHNID, a tourist information system is discussed. This version of ARACHNID is the commercial result of the project. It is a distributed, networked, database application where speed, maintenance, and reliability are important considerations. This paper focuses on the design concepts and technologies that form the basis for ARACHNID.
NASA Astrophysics Data System (ADS)
Bashev, A.
2012-04-01
Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but professional scientists could use it also.
Scholarly Online Database Use in Higher Education: A Faculty Survey.
ERIC Educational Resources Information Center
Piotrowski, Chris; Perdue, Bob; Armstrong, Terry
2005-01-01
The present study reports the results of a survey conducted at the University of West Florida concerning faculty usage and views toward online databases. Most respondents (N=46) felt quite satisfied with scholarly database availability through the university library. However, some faculty suggested that databases such as Current Contents and…
University Faculty Use of Computerized Databases: An Assessment of Needs and Resources.
ERIC Educational Resources Information Center
Borgman, Christine L.; And Others
1985-01-01
Results of survey indicate that: academic faculty are unaware of range of databases available; few recognize need for databases in research; most delegate searching to librarian or assistant, rather than perform searching themselves; and 39 database guides identified tended to be descriptive rather than evaluative. A comparison of the guides is…
Analysis of Landslide Hazard Impact Using the Landslide Database for Germany
NASA Astrophysics Data System (ADS)
Klose, M.; Damm, B.
2014-12-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still shows a comprehensive research history in Germany, but only one focused on development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present contribution reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this contribution, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of different case studies in the German Central Uplands. The case study results exemplify database application in analysis of vulnerability to landslides, impact statistics, and hazard or cost modeling.
Normative Databases for Imaging Instrumentation
Realini, Tony; Zangwill, Linda; Flanagan, John; Garway-Heath, David; Patella, Vincent Michael; Johnson, Chris; Artes, Paul; Ben Gaddie, I.; Fingeret, Murray
2015-01-01
Purpose To describe the process by which imaging devices undergo reference database development and regulatory clearance. The limitations and potential improvements of reference (normative) data sets for ophthalmic imaging devices will be discussed. Methods A symposium was held in July 2013 in which a series of speakers discussed issues related to the development of reference databases for imaging devices. Results Automated imaging has become widely accepted and used in glaucoma management. The ability of such instruments to discriminate healthy from glaucomatous optic nerves, and to detect glaucomatous progression over time is limited by the quality of reference databases associated with the available commercial devices. In the absence of standardized rules governing the development of reference databases, each manufacturer’s database differs in size, eligibility criteria, and ethnic make-up, among other key features. Conclusions The process for development of imaging reference databases may be improved by standardizing eligibility requirements and data collection protocols. Such standardization may also improve the degree to which results may be compared between commercial instruments. PMID:25265003
Organization and dissemination of multimedia medical databases on the WWW.
Todorovski, L; Ribaric, S; Dimec, J; Hudomalj, E; Lunder, T
1999-01-01
In the paper, we focus on the problem of building and disseminating multimedia medical databases on the World Wide Web (WWW). The current results of the ongoing project of building a prototype dermatology images database and its WWW presentation are presented. The dermatology database is part of an ambitious plan concerning an organization of a network of medical institutions building distributed and federated multimedia databases of a much wider scale.
Partitioning medical image databases for content-based queries on a Grid.
Montagnat, J; Breton, V; E Magnin, I
2005-01-01
In this paper we study the impact of executing a medical image database query application on the grid. For lowering the total computation time, the image database is partitioned into subsets to be processed on different grid nodes. A theoretical model of the application complexity and estimates of the grid execution overhead are used to efficiently partition the database. We show results demonstrating that smart partitioning of the database can lead to significant improvements in terms of total computation time. Grids are promising for content-based image retrieval in medical databases.
The landslide database for Germany: Closing the gap at national level
NASA Astrophysics Data System (ADS)
Damm, Bodo; Klose, Martin
2015-11-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still has a long research history in Germany, but one focussed on the development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present paper reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to the 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this paper, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of three case studies in the German Central Uplands. The case study results exemplify database application in the analysis of landslide frequency and causes, impact statistics, and landslide susceptibility modeling. Using the example of these case studies, strengths and weaknesses of the database are discussed in detail. The paper concludes with a summary of the database project with regard to previous achievements and the strategic roadmap.
Lupiañez-Barbero, Ascension; González Blanco, Cintia; de Leiva Hidalgo, Alberto
2018-05-23
Food composition tables and databases (FCTs or FCDBs) provide the necessary information to estimate intake of nutrients and other food components. In Spain, the lack of a reference database has resulted in use of different FCTs/FCDBs in nutritional surveys and research studies, as well as for development of dietetic for diet analysis. As a result, biased, non-comparable results are obtained, and healthcare professionals are rarely aware of these limitations. AECOSAN and the BEDCA association developed a FCDB following European standards, the Spanish Food Composition Database Network (RedBEDCA).The current database has a limited number of foods and food components and barely contains processed foods, which limits its use in epidemiological studies and in the daily practice of healthcare professionals. Copyright © 2018 SEEN y SED. Publicado por Elsevier España, S.L.U. All rights reserved.
Surgical research using national databases
Leland, Hyuma; Heckmann, Nathanael
2016-01-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945
Surgical research using national databases.
Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael
2016-10-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.
EQUIP: A European Survey of Quality Criteria for the Evaluation of Databases.
ERIC Educational Resources Information Center
Wilson, T. D.
1998-01-01
Reports on two stages of an investigation into the perceived quality of online databases. Presents data from 989 questionnaires from 600 database users in 12 European and Scandinavian countries and results of a test of the SERVQUAL methodology for identifying user expectations about database services. Lists statements used in the SERVQUAL survey.…
Quadcopter Control Using Speech Recognition
NASA Astrophysics Data System (ADS)
Malik, H.; Darma, S.; Soekirno, S.
2018-04-01
This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
EasyKSORD: A Platform of Keyword Search Over Relational Databases
NASA Astrophysics Data System (ADS)
Peng, Zhaohui; Li, Jing; Wang, Shan
Keyword Search Over Relational Databases (KSORD) enables casual users to use keyword queries (a set of keywords) to search relational databases just like searching the Web, without any knowledge of the database schema or any need of writing SQL queries. Based on our previous work, we design and implement a novel KSORD platform named EasyKSORD for users and system administrators to use and manage different KSORD systems in a novel and simple manner. EasyKSORD supports advanced queries, efficient data-graph-based search engines, multiform result presentations, and system logging and analysis. Through EasyKSORD, users can search relational databases easily and read search results conveniently, and system administrators can easily monitor and analyze the operations of KSORD and manage KSORD systems much better.
Planas, M; Rodríguez, T; Lecha, M
2004-01-01
Decisions have to be made about what data on patient characteristics and processes and outcome need to be collected, and standard definitions of these data items need to be developed to identify data quality concerns as promptly as possible and to establish ways to improve data quality. The usefulness of any clinical database depends strongly on the quality of the collected data. If the data quality is poor, the results of studies using the database might be biased and unreliable. Furthermore, if the quality of the database has not been verified, the results might be given little credence, especially if they are unwelcome or unexpected. To assure the quality of clinical database is essential the clear definition of the uses to which the database is going to be put; the database should to be developed that is comprehensive in terms of its usefulness but limited in its size.
AgdbNet – antigen sequence database software for bacterial typing
Jolley, Keith A; Maiden, Martin CJ
2006-01-01
Background Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. Results Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script. Conclusion The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated. PMID:16790057
GlycomeDB – integration of open-access carbohydrate structure databases
Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm
2008-01-01
Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830
Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
Construction and validation of a population-based bone densitometry database.
Leslie, William D; Caetano, Patricia A; Macwilliam, Leonard R; Finlayson, Gregory S
2005-01-01
Utilization of dual-energy X-ray absorptiometry (DXA) for the initial diagnostic assessment of osteoporosis and in monitoring treatment has risen dramatically in recent years. Population-based studies of the impact of DXA and osteoporosis remain challenging because of incomplete and fragmented test data that exist in most regions. Our aim was to create and assess completeness of a database of all clinical DXA services and test results for the province of Manitoba, Canada and to present descriptive data resulting from testing. A regionally based bone density program for the province of Manitoba, Canada was established in 1997. Subsequent DXA services were prospectively captured in a program database. This database was retrospectively populated with earlier DXA results dating back to 1990 (the year that the first DXA scanner was installed) by integrating multiple data sources. A random chart audit was performed to assess completeness and accuracy of this dataset. For comparison, testing rates determined from the DXA database were compared with physician administrative claims data. There was a high level of completeness of this database (>99%) and accurate personal identifier information sufficient for linkage with other health care administrative data (>99%). This contrasted with physician billing data that were found to be markedly incomplete. Descriptive data provide a profile of individuals receiving DXA and their test results. In conclusion, the Manitoba bone density database has great potential as a resource for clinical and health policy research because it is population based with a high level of completeness and accuracy.
Interactive Database of Pulsar Flux Density Measurements
NASA Astrophysics Data System (ADS)
Koralewska, O.; Krzeszowski, K.; Kijak, J.; Lewandowski, W.
2012-12-01
The number of astronomical observations is steadily growing, giving rise to the need of cataloguing the obtained results. There are a lot of databases, created to store different types of data and serve a variety of purposes, e. g. databases providing basic data for astronomical objects (SIMBAD Astronomical Database), databases devoted to one type of astronomical object (ATNF Pulsar Database) or to a set of values of the specific parameter (Lorimer 1995 - database of flux density measurements for 280 pulsars on the frequencies up to 1606 MHz), etc. We found that creating an online database of pulsar flux measurements, provided with facilities for plotting diagrams and histograms, calculating mean values for a chosen set of data, filtering parameter values and adding new measurements by the registered users, could be useful in further studies on pulsar spectra.
Sönksen, Ute Wolff; Christensen, Jens Jørgen; Nielsen, Lisbeth; Hesselbjerg, Annemarie; Hansen, Dennis Schrøder; Bruun, Brita
2010-12-31
Taxonomy and identification of fastidious Gram negatives are evolving and challenging. We compared identifications achieved with the Vitek 2 Neisseria-Haemophilus (NH) card and partial 16S rRNA gene sequence (526 bp stretch) analysis with identifications obtained with extensive phenotypic characterization using 100 fastidious Gram negative bacteria. Seventy-five strains represented 21 of the 26 taxa included in the Vitek 2 NH database and 25 strains represented related species not included in the database. Of the 100 strains, 31 were the type strains of the species. Vitek 2 NH identification results: 48 of 75 database strains were correctly identified, 11 strains gave `low discrimination´, seven strains were unidentified, and nine strains were misidentified. Identification of 25 non-database strains resulted in 14 strains incorrectly identified as belonging to species in the database. Partial 16S rRNA gene sequence analysis results: For 76 strains phenotypic and sequencing identifications were identical, for 23 strains the sequencing identifications were either probable or possible, and for one strain only the genus was confirmed. Thus, the Vitek 2 NH system identifies most of the commonly occurring species included in the database. Some strains of rarely occurring species and strains of non-database species closely related to database species cause problems. Partial 16S rRNA gene sequence analysis performs well, but does not always suffice, additional phenotypical characterization being useful for final identification.
Sönksen, Ute Wolff; Christensen, Jens Jørgen; Nielsen, Lisbeth; Hesselbjerg, Annemarie; Hansen, Dennis Schrøder; Bruun, Brita
2010-01-01
Taxonomy and identification of fastidious Gram negatives are evolving and challenging. We compared identifications achieved with the Vitek 2 Neisseria-Haemophilus (NH) card and partial 16S rRNA gene sequence (526 bp stretch) analysis with identifications obtained with extensive phenotypic characterization using 100 fastidious Gram negative bacteria. Seventy-five strains represented 21 of the 26 taxa included in the Vitek 2 NH database and 25 strains represented related species not included in the database. Of the 100 strains, 31 were the type strains of the species. Vitek 2 NH identification results: 48 of 75 database strains were correctly identified, 11 strains gave `low discrimination´, seven strains were unidentified, and nine strains were misidentified. Identification of 25 non-database strains resulted in 14 strains incorrectly identified as belonging to species in the database. Partial 16S rRNA gene sequence analysis results: For 76 strains phenotypic and sequencing identifications were identical, for 23 strains the sequencing identifications were either probable or possible, and for one strain only the genus was confirmed. Thus, the Vitek 2 NH system identifies most of the commonly occurring species included in the database. Some strains of rarely occurring species and strains of non-database species closely related to database species cause problems. Partial 16S rRNA gene sequence analysis performs well, but does not always suffice, additional phenotypical characterization being useful for final identification. PMID:21347215
Khan, Aihab; Husain, Syed Afaq
2013-01-01
We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.
Cross-Border Use of Food Databases: Equivalence of US and Australian Databases for Macronutrients
Summer, Suzanne S.; Ollberding, Nicholas J.; Guy, Trish; Setchell, Kenneth D. R.; Brown, Nadine; Kalkwarf, Heidi J.
2013-01-01
When estimating dietary intake across multiple countries, the lack of a single comprehensive dietary database may lead researchers to modify one database to analyze intakes for all participants. This approach may yield results different from those using the country-specific database and introduce measurement error. We examined whether nutrient intakes of Australians calculated with a modified US database would be similar to those calculated with an Australian database. We analyzed 3-day food records of 68 Australian adults using the US-based Nutrition Data System for Research, modified to reflect food items consumed in Australia. Modification entailed identifying a substitute food whose energy and macronutrient content were within 10% of the Australian food or by adding a new food to the database. Paired Wilcoxon signed rank tests were used to compare differences in nutrient intakes estimated by both databases, and Pearson and intraclass correlation coefficients measured degree of association and agreement between intake estimates for individuals. Median intakes of energy, carbohydrate, protein, and fiber differed by <5% at the group level. Larger discrepancies were seen for fat (11%; P<0.0001) and most micronutrients. Despite strong correlations, nutrient intakes differed by >10% for an appreciable percentage of participants (35% for energy to 69% for total fat). Adding country-specific food items to an existing database resulted in similar overall macronutrient intake estimates but was insufficient for estimating individual intakes. When analyzing nutrient intakes in multinational studies, greater standardization and modification of databases may be required to more accurately estimate intake of individuals. PMID:23871108
Planned and ongoing projects (pop) database: development and results.
Wild, Claudia; Erdös, Judit; Warmuth, Marisa; Hinterreiter, Gerda; Krämer, Peter; Chalon, Patrice
2014-11-01
The aim of this study was to present the development, structure and results of a database on planned and ongoing health technology assessment (HTA) projects (POP Database) in Europe. The POP Database (POP DB) was set up in an iterative process from a basic Excel sheet to a multifunctional electronic online database. The functionalities, such as the search terminology, the procedures to fill and update the database, the access rules to enter the database, as well as the maintenance roles, were defined in a multistep participatory feedback loop with EUnetHTA Partners. The POP Database has become an online database that hosts not only the titles and MeSH categorizations, but also some basic information on status and contact details about the listed projects of EUnetHTA Partners. Currently, it stores more than 1,200 planned, ongoing or recently published projects of forty-three EUnetHTA Partners from twenty-four countries. Because the POP Database aims to facilitate collaboration, it also provides a matching system to assist in identifying similar projects. Overall, more than 10 percent of the projects in the database are identical both in terms of pathology (indication or disease) and technology (drug, medical device, intervention). In addition, approximately 30 percent of the projects are similar, meaning that they have at least some overlap in content. Although the POP DB is successful concerning regular updates of most national HTA agencies within EUnetHTA, little is known about its actual effects on collaborations in Europe. Moreover, many non-nationally nominated HTA producing agencies neither have access to the POP DB nor can share their projects.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Haoyu S.; Zhang, Wenjing; Verma, Pragya
2015-01-01
The goal of this work is to develop a gradient approximation to the exchange–correlation functional of Kohn–Sham density functional theory for treating molecular problems with a special emphasis on the prediction of quantities important for homogeneous catalysis and other molecular energetics. Our training and validation of exchange–correlation functionals is organized in terms of databases and subdatabases. The key properties required for homogeneous catalysis are main group bond energies (database MGBE137), transition metal bond energies (database TMBE32), reaction barrier heights (database BH76), and molecular structures (database MS10). We also consider 26 other databases, most of which are subdatabases of a newlymore » extended broad database called Database 2015, which is presented in the present article and in its ESI. Based on the mathematical form of a nonseparable gradient approximation (NGA), as first employed in the N12 functional, we design a new functional by using Database 2015 and by adding smoothness constraints to the optimization of the functional. The resulting functional is called the gradient approximation for molecules, or GAM. The GAM functional gives better results for MGBE137, TMBE32, and BH76 than any available generalized gradient approximation (GGA) or than N12. The GAM functional also gives reasonable results for MS10 with an MUE of 0.018 Å. The GAM functional provides good results both within the training sets and outside the training sets. The convergence tests and the smooth curves of exchange–correlation enhancement factor as a function of the reduced density gradient show that the GAM functional is a smooth functional that should not lead to extra expense or instability in optimizations. NGAs, like GGAs, have the advantage over meta-GGAs and hybrid GGAs of respectively smaller grid-size requirements for integrations and lower costs for extended systems. These computational advantages combined with the relatively high accuracy for all the key properties needed for molecular catalysis make the GAM functional very promising for future applications.« less
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-03-01
Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Mathematical Notation in Bibliographic Databases.
ERIC Educational Resources Information Center
Pasterczyk, Catherine E.
1990-01-01
Discusses ways in which using mathematical symbols to search online bibliographic databases in scientific and technical areas can improve search results. The representations used for Greek letters, relations, binary operators, arrows, and miscellaneous special symbols in the MathSci, Inspec, Compendex, and Chemical Abstracts databases are…
Online Database Coverage of Pharmaceutical Journals.
ERIC Educational Resources Information Center
Snow, Bonnie
1984-01-01
Describes compilation of data concerning pharmaceutical journal coverage in online databases which aid information providers in collection development and database selection. Methodology, results (a core collection, overlap, timeliness, geographic scope), and implications are discussed. Eight references and a list of 337 journals indexed online in…
Atmospheric Science Data Center
2016-06-24
... data granules using a high resolution spatial metadata database and directly accessing the archived data granules. Subset results are ... data granules using a high resolution spatial metadata database and directly accessing the archived data granules. Subset results are ...
Problem of Mistakes in Databases, Processing and Interpretation of Observations of the Sun. I.
NASA Astrophysics Data System (ADS)
Lozitska, N. I.
In databases of observations unnoticed mistakes and misprints could occur at any stage of observation, preparation and processing of databases. The current detection of errors is complicated by the fact that the works of observer, databases compiler and researcher were divided. Data acquisition from a spacecraft requires the greater amount of researchers than for ground-based observations. As a result, the probability of errors is increasing. Keeping track of the errors on each stage is very difficult, so we use of cross-comparison of data from different sources. We revealed some misprints in the typographic and digital results of sunspot group area measurements.
A database for the analysis of immunity genes in Drosophila: PADMA database.
Lee, Mark J; Mondal, Ariful; Small, Chiyedza; Paddibhatla, Indira; Kawaguchi, Akira; Govind, Shubha
2011-01-01
While microarray experiments generate voluminous data, discerning trends that support an existing or alternative paradigm is challenging. To synergize hypothesis building and testing, we designed the Pathogen Associated Drosophila MicroArray (PADMA) database for easy retrieval and comparison of microarray results from immunity-related experiments (www.padmadatabase.org). PADMA also allows biologists to upload their microarray-results and compare it with datasets housed within PADMA. We tested PADMA using a preliminary dataset from Ganaspis xanthopoda-infected fly larvae, and uncovered unexpected trends in gene expression, reshaping our hypothesis. Thus, the PADMA database will be a useful resource to fly researchers to evaluate, revise, and refine hypotheses.
Identifying relevant data for a biological database: handcrafted rules versus machine learning.
Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles
2011-01-01
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.
The Status of Statewide Subscription Databases
ERIC Educational Resources Information Center
Krueger, Karla S.
2012-01-01
This qualitative content analysis presents subscription databases available to school libraries through statewide purchases. The results may help school librarians evaluate grade and subject-area coverage, make comparisons to recommended databases, and note potential suggestions for their states to include in future contracts or for local…
Bigger Is (Maybe) Better: Librarians' Views of Interdisciplinary Databases
ERIC Educational Resources Information Center
Gilbert, Julie K.
2010-01-01
This study investigates librarians' satisfaction with general interdisciplinary databases for undergraduate research and explores possibilities for improving these databases. Results from a national survey suggest that librarians at a variety of institutions are relatively satisfied overall with the content and usability of general,…
Data Mining the Ogle-II I-band Database for Eclipsing Binary Stars
NASA Astrophysics Data System (ADS)
Ciocca, M.
2013-08-01
The OGLE I-band database is a searchable database of quality photometric data available to the public. During Phase 2 of the experiment, known as "OGLE-II", I-band observations were made over a period of approximately 1,000 days, resulting in over 1010 measurements of more than 40 million stars. This was accomplished by using a filter with a passband near the standard Cousins Ic. The database of these observations is fully searchable using the mysql database engine, and provides the magnitude measurements and their uncertainties. In this work, a program of data mining the OGLE I-band database was performed, resulting in the discovery of 42 previously unreported eclipsing binaries. Using the software package Peranso (Vanmuster 2011) to analyze the light curves obtained from OGLE-II, the eclipsing types, the epochs and the periods of these eclipsing variables were determined, to one part in 106. A preliminary attempt to model the physical parameters of these binaries was also performed, using the Binary Maker 3 software (Bradstreet and Steelman 2004).
A database of new zeolite-like materials.
Pophale, Ramdas; Cheeseman, Phillip A; Deem, Michael W
2011-07-21
We here describe a database of computationally predicted zeolite-like materials. These crystals were discovered by a Monte Carlo search for zeolite-like materials. Positions of Si atoms as well as unit cell, space group, density, and number of crystallographically unique atoms were explored in the construction of this database. The database contains over 2.6 M unique structures. Roughly 15% of these are within +30 kJ mol(-1) Si of α-quartz, the band in which most of the known zeolites lie. These structures have topological, geometrical, and diffraction characteristics that are similar to those of known zeolites. The database is the result of refinement by two interatomic potentials that both satisfy the Pauli exclusion principle. The database has been deposited in the publicly available PCOD database and in www.hypotheticalzeolites.net/database/deem/. This journal is © the Owner Societies 2011
Folks, Russell D; Savir-Baruch, Bital; Garcia, Ernest V; Verdes, Liudmila; Taylor, Andrew T
2012-12-01
Our objective was to design and implement a clinical history database capable of linking to our database of quantitative results from (99m)Tc-mercaptoacetyltriglycine (MAG3) renal scans and export a data summary for physicians or our software decision support system. For database development, we used a commercial program. Additional software was developed in Interactive Data Language. MAG3 studies were processed using an in-house enhancement of a commercial program. The relational database has 3 parts: a list of all renal scans (the RENAL database), a set of patients with quantitative processing results (the Q2 database), and a subset of patients from Q2 containing clinical data manually transcribed from the hospital information system (the CLINICAL database). To test interobserver variability, a second physician transcriber reviewed 50 randomly selected patients in the hospital information system and tabulated 2 clinical data items: hydronephrosis and presence of a current stent. The CLINICAL database was developed in stages and contains 342 fields comprising demographic information, clinical history, and findings from up to 11 radiologic procedures. A scripted algorithm is used to reliably match records present in both Q2 and CLINICAL. An Interactive Data Language program then combines data from the 2 databases into an XML (extensible markup language) file for use by the decision support system. A text file is constructed and saved for review by physicians. RENAL contains 2,222 records, Q2 contains 456 records, and CLINICAL contains 152 records. The interobserver variability testing found a 95% match between the 2 observers for presence or absence of ureteral stent (κ = 0.52), a 75% match for hydronephrosis based on narrative summaries of hospitalizations and clinical visits (κ = 0.41), and a 92% match for hydronephrosis based on the imaging report (κ = 0.84). We have developed a relational database system to integrate the quantitative results of MAG3 image processing with clinical records obtained from the hospital information system. We also have developed a methodology for formatting clinical history for review by physicians and export to a decision support system. We identified several pitfalls, including the fact that important textual information extracted from the hospital information system by knowledgeable transcribers can show substantial interobserver variation, particularly when record retrieval is based on the narrative clinical records.
Reliability database development for use with an object-oriented fault tree evaluation program
NASA Technical Reports Server (NTRS)
Heger, A. Sharif; Harringtton, Robert J.; Koen, Billy V.; Patterson-Hine, F. Ann
1989-01-01
A description is given of the development of a fault-tree analysis method using object-oriented programming. In addition, the authors discuss the programs that have been developed or are under development to connect a fault-tree analysis routine to a reliability database. To assess the performance of the routines, a relational database simulating one of the nuclear power industry databases has been constructed. For a realistic assessment of the results of this project, the use of one of existing nuclear power reliability databases is planned.
NASA Technical Reports Server (NTRS)
Moroh, Marsha
1988-01-01
A methodology for building interfaces of resident database management systems to a heterogeneous distributed database management system under development at NASA, the DAVID system, was developed. The feasibility of that methodology was demonstrated by construction of the software necessary to perform the interface task. The interface terminology developed in the course of this research is presented. The work performed and the results are summarized.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Yubin; Shankar, Mallikarjun; Park, Byung H.
Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
NASA Astrophysics Data System (ADS)
Kim, Duk-hyun; Lee, Hyoung-Jin
2018-04-01
A study of efficient aerodynamic database modeling method was conducted. A creation of database using periodicity and symmetry characteristic of missile aerodynamic coefficient was investigated to minimize the number of wind tunnel test cases. In addition, studies of how to generate the aerodynamic database when the periodicity changes due to installation of protuberance and how to conduct a zero calibration were carried out. Depending on missile configurations, the required number of test cases changes and there exist tests that can be omitted. A database of aerodynamic on deflection angle of control surface can be constituted using phase shift. A validity of modeling method was demonstrated by confirming that the result which the aerodynamic coefficient calculated by using the modeling method was in agreement with wind tunnel test results.
Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph
2009-01-01
Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
Foerster, Hartmut; Bombarely, Aureliano; Battey, James N D; Sierro, Nicolas; Ivanov, Nikolai V; Mueller, Lukas A
2018-01-01
Abstract SolCyc is the entry portal to pathway/genome databases (PGDBs) for major species of the Solanaceae family hosted at the Sol Genomics Network. Currently, SolCyc comprises six organism-specific PGDBs for tomato, potato, pepper, petunia, tobacco and one Rubiaceae, coffee. The metabolic networks of those PGDBs have been computationally predicted by the pathologic component of the pathway tools software using the manually curated multi-domain database MetaCyc (http://www.metacyc.org/) as reference. SolCyc has been recently extended by taxon-specific databases, i.e. the family-specific SolanaCyc database, containing only curated data pertinent to species of the nightshade family, and NicotianaCyc, a genus-specific database that stores all relevant metabolic data of the Nicotiana genus. Through manual curation of the published literature, new metabolic pathways have been created in those databases, which are complemented by the continuously updated, relevant species-specific pathways from MetaCyc. At present, SolanaCyc comprises 199 pathways and 29 superpathways and NicotianaCyc accounts for 72 pathways and 13 superpathways. Curator-maintained, taxon-specific databases such as SolanaCyc and NicotianaCyc are characterized by an enrichment of data specific to these taxa and free of falsely predicted pathways. Both databases have been used to update recently created Nicotiana-specific databases for Nicotiana tabacum, Nicotiana benthamiana, Nicotiana sylvestris and Nicotiana tomentosiformis by propagating verifiable data into those PGDBs. In addition, in-depth curation of the pathways in N.tabacum has been carried out which resulted in the elimination of 156 pathways from the 569 pathways predicted by pathway tools. Together, in-depth curation of the predicted pathway network and the supplementation with curated data from taxon-specific databases has substantially improved the curation status of the species–specific N.tabacum PGDB. The implementation of this strategy will significantly advance the curation status of all organism-specific databases in SolCyc resulting in the improvement on database accuracy, data analysis and visualization of biochemical networks in those species. Database URL https://solgenomics.net/tools/solcyc/ PMID:29762652
NORTHWEST ENVIRONMENTAL DATABASE (NED) FOR WA, OR, AND ID
This database results from a massive data gathering program initiated by BPA/NPPC in the mid-1980s. Each state now manages the portion of the database within its borders. Data & evaluations were gathered by wildlife/game/fish biologists, and other state, federal, and tribal res...
USDA Branded Food Products Database, Release 2
USDA-ARS?s Scientific Manuscript database
The USDA Branded Food Products Database is the ongoing result of a Public-Private Partnership (PPP), whose goal is to enhance public health and the sharing of open data by complementing the USDA National Nutrient Database for Standard Reference (SR) with nutrient composition of branded foods and pri...
Vail, Paris J; Morris, Brian; van Kan, Aric; Burdett, Brianna C; Moyes, Kelsey; Theisen, Aaron; Kerr, Iain D; Wenstrup, Richard J; Eggington, Julie M
2015-10-01
Genetic variants of uncertain clinical significance (VUSs) are a common outcome of clinical genetic testing. Locus-specific variant databases (LSDBs) have been established for numerous disease-associated genes as a research tool for the interpretation of genetic sequence variants to facilitate variant interpretation via aggregated data. If LSDBs are to be used for clinical practice, consistent and transparent criteria regarding the deposition and interpretation of variants are vital, as variant classifications are often used to make important and irreversible clinical decisions. In this study, we performed a retrospective analysis of 2017 consecutive BRCA1 and BRCA2 genetic variants identified from 24,650 consecutive patient samples referred to our laboratory to establish an unbiased dataset representative of the types of variants seen in the US patient population, submitted by clinicians and researchers for BRCA1 and BRCA2 testing. We compared the clinical classifications of these variants among five publicly accessible BRCA1 and BRCA2 variant databases: BIC, ClinVar, HGMD (paid version), LOVD, and the UMD databases. Our results show substantial disparity of variant classifications among publicly accessible databases. Furthermore, it appears that discrepant classifications are not the result of a single outlier but widespread disagreement among databases. This study also shows that databases sometimes favor a clinical classification when current best practice guidelines (ACMG/AMP/CAP) would suggest an uncertain classification. Although LSDBs have been well established for research applications, our results suggest several challenges preclude their wider use in clinical practice.
QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors.
Tarasova, Olga A; Urusova, Aleksandra F; Filimonov, Dmitry A; Nicklaus, Marc C; Zakharov, Alexey V; Poroikov, Vladimir V
2015-07-27
Large-scale databases are important sources of training sets for various QSAR modeling approaches. Generally, these databases contain information extracted from different sources. This variety of sources can produce inconsistency in the data, defined as sometimes widely diverging activity results for the same compound against the same target. Because such inconsistency can reduce the accuracy of predictive models built from these data, we are addressing the question of how best to use data from publicly and commercially accessible databases to create accurate and predictive QSAR models. We investigate the suitability of commercially and publicly available databases to QSAR modeling of antiviral activity (HIV-1 reverse transcriptase (RT) inhibition). We present several methods for the creation of modeling (i.e., training and test) sets from two, either commercially or freely available, databases: Thomson Reuters Integrity and ChEMBL. We found that the typical predictivities of QSAR models obtained using these different modeling set compilation methods differ significantly from each other. The best results were obtained using training sets compiled for compounds tested using only one method and material (i.e., a specific type of biological assay). Compound sets aggregated by target only typically yielded poorly predictive models. We discuss the possibility of "mix-and-matching" assay data across aggregating databases such as ChEMBL and Integrity and their current severe limitations for this purpose. One of them is the general lack of complete and semantic/computer-parsable descriptions of assay methodology carried by these databases that would allow one to determine mix-and-matchability of result sets at the assay level.
Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-01-01
Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757
Simple re-instantiation of small databases using cloud computing
2013-01-01
Background Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. Results We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Conclusions Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear. PMID:24564380
2013-01-01
commercial NoSQL database system. The results show that In-dexedHBase provides a data loading speed that is 6 times faster than Riak, and is...compare it with Riak, a widely adopted commercial NoSQL database system. The results show that In- dexedHBase provides a data loading speed that is 6...events. This chapter describes our research towards building an efficient and scalable storage platform for Truthy. Many existing NoSQL databases
CardioTF, a database of deconstructing transcriptional circuits in the heart system
2016-01-01
Background: Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. Methods: The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Results: Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. Discussion: The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Availability and Implementation: Database URL: http://www.cardiosignal.org/database/cardiotf.html. PMID:27635320
Database for propagation models
NASA Astrophysics Data System (ADS)
Kantak, Anil V.
1991-07-01
A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.
Fullerene data mining using bibliometrics and database tomography
Kostoff; Braun; Schubert; Toothman; Humenik
2000-01-01
Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.
Development of an electronic database for Acute Pain Service outcomes
Love, Brandy L; Jensen, Louise A; Schopflocher, Donald; Tsui, Ban CH
2012-01-01
BACKGROUND: Quality assurance is increasingly important in the current health care climate. An electronic database can be used for tracking patient information and as a research tool to provide quality assurance for patient care. OBJECTIVE: An electronic database was developed for the Acute Pain Service, University of Alberta Hospital (Edmonton, Alberta) to record patient characteristics, identify at-risk populations, compare treatment efficacies and guide practice decisions. METHOD: Steps in the database development involved identifying the goals for use, relevant variables to include, and a plan for data collection, entry and analysis. Protocols were also created for data cleaning quality control. The database was evaluated with a pilot test using existing data to assess data collection burden, accuracy and functionality of the database. RESULTS: A literature review resulted in an evidence-based list of demographic, clinical and pain management outcome variables to include. Time to assess patients and collect the data was 20 min to 30 min per patient. Limitations were primarily software related, although initial data collection completion was only 65% and accuracy of data entry was 96%. CONCLUSIONS: The electronic database was found to be relevant and functional for the identified goals of data storage and research. PMID:22518364
Practice databases and their uses in clinical research.
Tierney, W M; McDonald, C J
1991-04-01
A few large clinical information databases have been established within larger medical information systems. Although they are smaller than claims databases, these clinical databases offer several advantages: accurate and timely data, rich clinical detail, and continuous parameters (for example, vital signs and laboratory results). However, the nature of the data vary considerably, which affects the kinds of secondary analyses that can be performed. These databases have been used to investigate clinical epidemiology, risk assessment, post-marketing surveillance of drugs, practice variation, resource use, quality assurance, and decision analysis. In addition, practice databases can be used to identify subjects for prospective studies. Further methodologic developments are necessary to deal with the prevalent problems of missing data and various forms of bias if such databases are to grow and contribute valuable clinical information.
Brimhall, Bradley B; Hall, Timothy E; Walczak, Steven
2006-01-01
A hospital laboratory relational database, developed over eight years, has demonstrated significant cost savings and a substantial financial return on investment (ROI). In addition, the database has been used to measurably improve laboratory operations and the quality of patient care.
Sequencing artifacts in the type A influenza database and attempts to correct them
USDA-ARS?s Scientific Manuscript database
Currently over 300,000 Type A influenza gene sequences representing over 50,000 strains are available in publicly available databases. However, the quality of the sequences submitted are determined by the contributor and many sequence errors are present in the databases, which can affect the result...
Evaluation of Database Coverage: A Comparison of Two Methodologies.
ERIC Educational Resources Information Center
Tenopir, Carol
1982-01-01
Describes experiment which compared two techniques used for evaluating and comparing database coverage of a subject area, e.g., "bibliography" and "subject profile." Differences in time, cost, and results achieved are compared by applying techniques to field of volcanology using two databases, Geological Reference File and GeoArchive. Twenty…
Correlates of Access to Business Research Databases
ERIC Educational Resources Information Center
Gottfried, John C.
2010-01-01
This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.…
ERIC Educational Resources Information Center
Young, Terrence E., Jr.
2004-01-01
Today's elementary school students have been exposed to computers since birth, so it is not surprising that they are so proficient at using them. As a result, they are ready to search databases that include topics and information appropriate for their age level. Subscription databases are digital copies of magazines, newspapers, journals,…
Go Figure: Computer Database Adds the Personal Touch.
ERIC Educational Resources Information Center
Gaffney, Jean; Crawford, Pat
1992-01-01
A database for recordkeeping for a summer reading club was developed for a public library system using an IBM PC and Microsoft Works. Use of the database resulted in more efficient program management, giving librarians more time to spend with patrons and enabling timely awarding of incentives. (LAE)
Influencing Database Use in Public Libraries.
ERIC Educational Resources Information Center
Tenopir, Carol
1999-01-01
Discusses results of a survey of factors influencing database use in public libraries. Highlights the importance of content; ease of use; and importance of instruction. Tabulates importance indications for number and location of workstations, library hours, availability of remote login, usefulness and quality of content, lack of other databases,…
TabSQL: a MySQL tool to facilitate mapping user data to public databases
2010-01-01
Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
A World Wide Web (WWW) server database engine for an organelle database, MitoDat.
Lemkin, P F; Chipperfield, M; Merril, C; Zullo, S
1996-03-01
We describe a simple database search engine "dbEngine" which may be used to quickly create a searchable database on a World Wide Web (WWW) server. Data may be prepared from spreadsheet programs (such as Excel, etc.) or from tables exported from relationship database systems. This Common Gateway Interface (CGI-BIN) program is used with a WWW server such as available commercially, or from National Center for Supercomputer Algorithms (NCSA) or CERN. Its capabilities include: (i) searching records by combinations of terms connected with ANDs or ORs; (ii) returning search results as hypertext links to other WWW database servers; (iii) mapping lists of literature reference identifiers to the full references; (iv) creating bidirectional hypertext links between pictures and the database. DbEngine has been used to support the MitoDat database (Mendelian and non-Mendelian inheritance associated with the Mitochondrion) on the WWW.
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Databases toward Disseminated Use - Nikkei News Telecom -
NASA Astrophysics Data System (ADS)
Kasiwagi, Akira
The need for “searchers” - adept hands in the art of information retrieval - is increasing nowadays. Searchers have become necessary as the result of the upbeat online database market. The number of database users is rising steeply. There is the urgent need to develop potential users of general information, such as newspaper articles. Simple commands, easy operation, and low prices hold the key to general popularization of databases, and the issue lies in how the industry will get about achieving this task. Nihon Keizai Shimbun has been undertaking a wide range of possibilities with Nikkei News Telecom. Although only two years have passed since its start, results of Nikkei’s efforts are summarized below.
Updated folate data in the Dutch Food Composition Database and implications for intake estimates
Westenbrink, Susanne; Jansen-van der Vliet, Martine; van Rossum, Caroline
2012-01-01
Background and objective Nutrient values are influenced by the analytical method used. Food folate measured by high performance liquid chromatography (HPLC) or by microbiological assay (MA) yield different results, with in general higher results from MA than from HPLC. This leads to the question of how to deal with different analytical methods in compiling standardised and internationally comparable food composition databases? A recent inventory on folate in European food composition databases indicated that currently MA is more widely used than HPCL. Since older Dutch values are produced by HPLC and newer values by MA, analytical methods and procedures for compiling folate data in the Dutch Food Composition Database (NEVO) were reconsidered and folate values were updated. This article describes the impact of this revision of folate values in the NEVO database as well as the expected impact on the folate intake assessment in the Dutch National Food Consumption Survey (DNFCS). Design The folate values were revised by replacing HPLC with MA values from recent Dutch analyses. Previously MA folate values taken from foreign food composition tables had been recalculated to the HPLC level, assuming a 27% lower value from HPLC analyses. These recalculated values were replaced by the original MA values. Dutch HPLC and MA values were compared to each other. Folate intake was assessed for a subgroup within the DNFCS to estimate the impact of the update. Results In the updated NEVO database nearly all folate values were produced by MA or derived from MA values which resulted in an average increase of 24%. The median habitual folate intake in young children was increased by 11–15% using the updated folate values. Conclusion The current approach for folate in NEVO resulted in more transparency in data production and documentation and higher comparability among European databases. Results of food consumption surveys are expected to show higher folate intakes when using the updated values. PMID:22481900
Heterogeneous database integration in biomedicine.
Sujansky, W
2001-08-01
The rapid expansion of biomedical knowledge, reduction in computing costs, and spread of internet access have created an ocean of electronic data. The decentralized nature of our scientific community and healthcare system, however, has resulted in a patchwork of diverse, or heterogeneous, database implementations, making access to and aggregation of data across databases very difficult. The database heterogeneity problem applies equally to clinical data describing individual patients and biological data characterizing our genome. Specifically, databases are highly heterogeneous with respect to the data models they employ, the data schemas they specify, the query languages they support, and the terminologies they recognize. Heterogeneous database systems attempt to unify disparate databases by providing uniform conceptual schemas that resolve representational heterogeneities, and by providing querying capabilities that aggregate and integrate distributed data. Research in this area has applied a variety of database and knowledge-based techniques, including semantic data modeling, ontology definition, query translation, query optimization, and terminology mapping. Existing systems have addressed heterogeneous database integration in the realms of molecular biology, hospital information systems, and application portability.
Ackerman, Katherine V.; Mixon, David M.; Sundquist, Eric T.; Stallard, Robert F.; Schwarz, Gregory E.; Stewart, David W.
2009-01-01
The Reservoir Sedimentation Survey Information System (RESIS) database, originally compiled by the Soil Conservation Service (now the Natural Resources Conservation Service) in collaboration with the Texas Agricultural Experiment Station, is the most comprehensive compilation of data from reservoir sedimentation surveys throughout the conterminous United States (U.S.). The database is a cumulative historical archive that includes data from as early as 1755 and as late as 1993. The 1,823 reservoirs included in the database range in size from farm ponds to the largest U.S. reservoirs (such as Lake Mead). Results from 6,617 bathymetric surveys are available in the database. This Data Series provides an improved version of the original RESIS database, termed RESIS-II, and a report describing RESIS-II. The RESIS-II relational database is stored in Microsoft Access and includes more precise location coordinates for most of the reservoirs than the original database but excludes information on reservoir ownership. RESIS-II is anticipated to be a template for further improvements in the database.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-01-01
Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Integrating a local database into the StarView distributed user interface
NASA Technical Reports Server (NTRS)
Silberberg, D. P.
1992-01-01
A distributed user interface to the Space Telescope Data Archive and Distribution Service (DADS) known as StarView is being developed. The DADS architecture consists of the data archive as well as a relational database catalog describing the archive. StarView is a client/server system in which the user interface is the front-end client to the DADS catalog and archive servers. Users query the DADS catalog from the StarView interface. Query commands are transmitted via a network and evaluated by the database. The results are returned via the network and are displayed on StarView forms. Based on the results, users decide which data sets to retrieve from the DADS archive. Archive requests are packaged by StarView and sent to DADS, which returns the requested data sets to the users. The advantages of distributed client/server user interfaces over traditional one-machine systems are well known. Since users run software on machines separate from the database, the overall client response time is much faster. Also, since the server is free to process only database requests, the database response time is much faster. Disadvantages inherent in this architecture are slow overall database access time due to the network delays, lack of a 'get previous row' command, and that refinements of a previously issued query must be submitted to the database server, even though the domain of values have already been returned by the previous query. This architecture also does not allow users to cross correlate DADS catalog data with other catalogs. Clearly, a distributed user interface would be more powerful if it overcame these disadvantages. A local database is being integrated into StarView to overcome these disadvantages. When a query is made through a StarView form, which is often composed of fields from multiple tables, it is translated to an SQL query and issued to the DADS catalog. At the same time, a local database table is created to contain the resulting rows of the query. The returned rows are displayed on the form as well as inserted into the local database table. Identical results are produced by reissuing the query to either the DADS catalog or to the local table. Relational databases do not provide a 'get previous row' function because of the inherent complexity of retrieving previous rows of multiple-table joins. However, since this function is easily implemented on a single table, StarView uses the local table to retrieve the previous row. Also, StarView issues subsequent query refinements to the local table instead of the DADS catalog, eliminating the network transmission overhead. Finally, other catalogs can be imported into the local database for cross correlation with local tables. Overall, it is believe that this is a more powerful architecture for distributed, database user interfaces.
The Biological Macromolecule Crystallization Database and NASA Protein Crystal Growth Archive
Gilliland, Gary L.; Tung, Michael; Ladner, Jane
1996-01-01
The NIST/NASA/CARB Biological Macromolecule Crystallization Database (BMCD), NIST Standard Reference Database 21, contains crystal data and crystallization conditions for biological macromolecules. The database entries include data abstracted from published crystallographic reports. Each entry consists of information describing the biological macromolecule crystallized and crystal data and the crystallization conditions for each crystal form. The BMCD serves as the NASA Protein Crystal Growth Archive in that it contains protocols and results of crystallization experiments undertaken in microgravity (space). These database entries report the results, whether successful or not, from NASA-sponsored protein crystal growth experiments in microgravity and from microgravity crystallization studies sponsored by other international organizations. The BMCD was designed as a tool to assist x-ray crystallographers in the development of protocols to crystallize biological macromolecules, those that have previously been crystallized, and those that have not been crystallized. PMID:11542472
An Open-source Toolbox for Analysing and Processing PhysioNet Databases in MATLAB and Octave.
Silva, Ikaro; Moody, George B
The WaveForm DataBase (WFDB) Toolbox for MATLAB/Octave enables integrated access to PhysioNet's software and databases. Using the WFDB Toolbox for MATLAB/Octave, users have access to over 50 physiological databases in PhysioNet. The toolbox provides access over 4 TB of biomedical signals including ECG, EEG, EMG, and PLETH. Additionally, most signals are accompanied by metadata such as medical annotations of clinical events: arrhythmias, sleep stages, seizures, hypotensive episodes, etc. Users of this toolbox should easily be able to reproduce, validate, and compare results published based on PhysioNet's software and databases.
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Performance assessment of EMR systems based on post-relational database.
Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji
2012-08-01
Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.
Arrhythmia Evaluation in Wearable ECG Devices
Sadrawi, Muammar; Lin, Chien-Hung; Hsieh, Yita; Kuo, Chia-Chun; Chien, Jen Chien; Haraikawa, Koichi; Abbod, Maysam F.; Shieh, Jiann-Shing
2017-01-01
This study evaluates four databases from PhysioNet: The American Heart Association database (AHADB), Creighton University Ventricular Tachyarrhythmia database (CUDB), MIT-BIH Arrhythmia database (MITDB), and MIT-BIH Noise Stress Test database (NSTDB). The ANSI/AAMI EC57:2012 is used for the evaluation of the algorithms for the supraventricular ectopic beat (SVEB), ventricular ectopic beat (VEB), atrial fibrillation (AF), and ventricular fibrillation (VF) via the evaluation of the sensitivity, positive predictivity and false positive rate. Sample entropy, fast Fourier transform (FFT), and multilayer perceptron neural network with backpropagation training algorithm are selected for the integrated detection algorithms. For this study, the result for SVEB has some improvements compared to a previous study that also utilized ANSI/AAMI EC57. In further, VEB sensitivity and positive predictivity gross evaluations have greater than 80%, except for the positive predictivity of the NSTDB database. For AF gross evaluation of MITDB database, the results show very good classification, excluding the episode sensitivity. In advanced, for VF gross evaluation, the episode sensitivity and positive predictivity for the AHADB, MITDB, and CUDB, have greater than 80%, except for MITDB episode positive predictivity, which is 75%. The achieved results show that the proposed integrated SVEB, VEB, AF, and VF detection algorithm has an accurate classification according to ANSI/AAMI EC57:2012. In conclusion, the proposed integrated detection algorithm can achieve good accuracy in comparison with other previous studies. Furthermore, more advanced algorithms and hardware devices should be performed in future for arrhythmia detection and evaluation. PMID:29068369
MIPS: a database for protein sequences, homology data and yeast genome information.
Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F
1997-01-01
The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498
Matsuda, Fumio; Shinbo, Yoko; Oikawa, Akira; Hirai, Masami Yokota; Fiehn, Oliver; Kanaya, Shigehiko; Saito, Kazuki
2009-01-01
Background In metabolomics researches using mass spectrometry (MS), systematic searching of high-resolution mass data against compound databases is often the first step of metabolite annotation to determine elemental compositions possessing similar theoretical mass numbers. However, incorrect hits derived from errors in mass analyses will be included in the results of elemental composition searches. To assess the quality of peak annotation information, a novel methodology for false discovery rates (FDR) evaluation is presented in this study. Based on the FDR analyses, several aspects of an elemental composition search, including setting a threshold, estimating FDR, and the types of elemental composition databases most reliable for searching are discussed. Methodology/Principal Findings The FDR can be determined from one measured value (i.e., the hit rate for search queries) and four parameters determined by Monte Carlo simulation. The results indicate that relatively high FDR values (30–50%) were obtained when searching time-of-flight (TOF)/MS data using the KNApSAcK and KEGG databases. In addition, searches against large all-in-one databases (e.g., PubChem) always produced unacceptable results (FDR >70%). The estimated FDRs suggest that the quality of search results can be improved not only by performing more accurate mass analysis but also by modifying the properties of the compound database. A theoretical analysis indicates that FDR could be improved by using compound database with smaller but higher completeness entries. Conclusions/Significance High accuracy mass analysis, such as Fourier transform (FT)-MS, is needed for reliable annotation (FDR <10%). In addition, a small, customized compound database is preferable for high-quality annotation of metabolome data. PMID:19847304
The reference ballistic imaging database revisited.
De Ceuster, Jan; Dujardin, Sylvain
2015-03-01
A reference ballistic image database (RBID) contains images of cartridge cases fired in firearms that are in circulation: a ballistic fingerprint database. The performance of an RBID was investigated a decade ago by De Kinder et al. using IBIS(®) Heritage™ technology. The results of that study were published in this journal, issue 214. Since then, technologies have evolved quite significantly and novel apparatus have become available on the market. The current research article investigates the efficiency of another automated ballistic imaging system, Evofinder(®) using the same database as used by De Kinder et al. The results demonstrate a significant increase in correlation efficiency: 38% of all matches were on first position of the Evofinder correlation list in comparison to IBIS(®) Heritage™ where only 19% were on the first position. Average correlation times are comparable to the IBIS(®) Heritage™ system. While Evofinder(®) demonstrates specific improvement for mutually correlating different ammunition brands, ammunition dependence of the markings is still strongly influencing the correlation result because the markings may vary considerably. As a consequence a great deal of potential hits (36%) was still far down in the correlation lists (positions 31 and lower). The large database was used to examine the probability of finding a match as a function of correlation list verification. As an example, the RBID study on Evofinder(®) demonstrates that to find at least 90% of all potential matches, at least 43% of the items in the database need to be compared on screen and this for breech face markings and firing pin impression separately. These results, although a clear improvement to the original RBID study, indicate that the implementation of such a database should still not be considered nowadays. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-01-01
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-03-19
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
The LAILAPS search engine: a feature model for relevance ranking in life science databases.
Lange, Matthias; Spies, Karl; Colmsee, Christian; Flemming, Steffen; Klapperstück, Matthias; Scholz, Uwe
2010-03-25
Efficient and effective information retrieval in life sciences is one of the most pressing challenge in bioinformatics. The incredible growth of life science databases to a vast network of interconnected information systems is to the same extent a big challenge and a great chance for life science research. The knowledge found in the Web, in particular in life-science databases, are a valuable major resource. In order to bring it to the scientist desktop, it is essential to have well performing search engines. Thereby, not the response time nor the number of results is important. The most crucial factor for millions of query results is the relevance ranking. In this paper, we present a feature model for relevance ranking in life science databases and its implementation in the LAILAPS search engine. Motivated by the observation of user behavior during their inspection of search engine result, we condensed a set of 9 relevance discriminating features. These features are intuitively used by scientists, who briefly screen database entries for potential relevance. The features are both sufficient to estimate the potential relevance, and efficiently quantifiable. The derivation of a relevance prediction function that computes the relevance from this features constitutes a regression problem. To solve this problem, we used artificial neural networks that have been trained with a reference set of relevant database entries for 19 protein queries. Supporting a flexible text index and a simple data import format, this concepts are implemented in the LAILAPS search engine. It can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. LAILAPS is publicly available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
Database constraints applied to metabolic pathway reconstruction tools.
Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi
2014-01-01
Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes.
Appendix A. Borderlands Site Database
A.C. MacWilliams
2006-01-01
The database includes modified components of the Arizona State Museum Site Recording System (Arizona State Museum 1993) and the New Mexico NMCRIS User?s Guide (State of New Mexico 1993). When sites contain more than one recorded component, these instances were entered separately with the result that many sites have multiple entries. Information for this database...
A prototypic small molecule database for bronchoalveolar lavage-based metabolomics
NASA Astrophysics Data System (ADS)
Walmsley, Scott; Cruickshank-Quinn, Charmion; Quinn, Kevin; Zhang, Xing; Petrache, Irina; Bowler, Russell P.; Reisdorph, Richard; Reisdorph, Nichole
2018-04-01
The analysis of bronchoalveolar lavage fluid (BALF) using mass spectrometry-based metabolomics can provide insight into lung diseases, such as asthma. However, the important step of compound identification is hindered by the lack of a small molecule database that is specific for BALF. Here we describe prototypic, small molecule databases derived from human BALF samples (n=117). Human BALF was extracted into lipid and aqueous fractions and analyzed using liquid chromatography mass spectrometry. Following filtering to reduce contaminants and artifacts, the resulting BALF databases (BALF-DBs) contain 11,736 lipid and 658 aqueous compounds. Over 10% of these were found in 100% of samples. Testing the BALF-DBs using nested test sets produced a 99% match rate for lipids and 47% match rate for aqueous molecules. Searching an independent dataset resulted in 45% matching to the lipid BALF-DB compared to<25% when general databases are searched. The BALF-DBs are available for download from MetaboLights. Overall, the BALF-DBs can reduce false positives and improve confidence in compound identification compared to when general databases are used.
A kinetics database and scripts for PHREEQC
NASA Astrophysics Data System (ADS)
Hu, B.; Zhang, Y.; Teng, Y.; Zhu, C.
2017-12-01
Kinetics of geochemical reactions has been increasingly used in numerical models to simulate coupled flow, mass transport, and chemical reactions. However, the kinetic data are scattered in the literature. To assemble a kinetic dataset for a modeling project is an intimidating task for most. In order to facilitate the application of kinetics in geochemical modeling, we assembled kinetics parameters into a database for the geochemical simulation program, PHREEQC (version 3.0). Kinetics data were collected from the literature. Our database includes kinetic data for over 70 minerals. The rate equations are also programmed into scripts with the Basic language. Using the new kinetic database, we simulated reaction path during the albite dissolution process using various rate equations in the literature. The simulation results with three different rate equations gave difference reaction paths at different time scale. Another application involves a coupled reactive transport model simulating the advancement of an acid plume in an acid mine drainage site associated with Bear Creek Uranium tailings pond. Geochemical reactions including calcite, gypsum, and illite were simulated with PHREEQC using the new kinetic database. The simulation results successfully demonstrated the utility of new kinetic database.
A Taxonomic Search Engine: Federating taxonomic databases using web services
Page, Roderic DM
2005-01-01
Background The taxonomic name of an organism is a key link between different databases that store information on that organism. However, in the absence of a single, comprehensive database of organism names, individual databases lack an easy means of checking the correctness of a name. Furthermore, the same organism may have more than one name, and the same name may apply to more than one organism. Results The Taxonomic Search Engine (TSE) is a web application written in PHP that queries multiple taxonomic databases (ITIS, Index Fungorum, IPNI, NCBI, and uBIO) and summarises the results in a consistent format. It supports "drill-down" queries to retrieve a specific record. The TSE can optionally suggest alternative spellings the user can try. It also acts as a Life Science Identifier (LSID) authority for the source taxonomic databases, providing globally unique identifiers (and associated metadata) for each name. Conclusion The Taxonomic Search Engine is available at and provides a simple demonstration of the potential of the federated approach to providing access to taxonomic names. PMID:15757517
Generation of an Aerothermal Data Base for the X33 Spacecraft
NASA Technical Reports Server (NTRS)
Roberts, Cathy; Huynh, Loc
1998-01-01
The X-33 experimental program is a cooperative program between industry and NASA, managed by Lockheed-Martin Skunk Works to develop an experimental vehicle to demonstrate new technologies for a single-stage-to-orbit, fully reusable launch vehicle (RLV). One of the new technologies to be demonstrated is an advanced Thermal Protection System (TPS) being designed by BF Goodrich (formerly Rohr, Inc.) with support from NASA. The calculation of an aerothermal database is crucial to identifying the critical design environment data for the TPS. The NASA Ames X-33 team has generated such a database using Computational Fluid Dynamics (CFD) analyses, engineering analysis methods and various programs to compare and interpolate the results from the CFD and the engineering analyses. This database, along with a program used to query the database, is used extensively by several X-33 team members to help them in designing the X-33. This paper will describe the methods used to generate this database, the program used to query the database, and will show some of the aerothermal analysis results for the X-33 aircraft.
Using a Relational Database to Index Infectious Disease Information
Brown, Jay A.
2010-01-01
Mapping medical knowledge into a relational database became possible with the availability of personal computers and user-friendly database software in the early 1990s. To create a database of medical knowledge, the domain expert works like a mapmaker to first outline the domain and then add the details, starting with the most prominent features. The resulting “intelligent database” can support the decisions of healthcare professionals. The intelligent database described in this article contains profiles of 275 infectious diseases. Users can query the database for all diseases matching one or more specific criteria (symptom, endemic region of the world, or epidemiological factor). Epidemiological factors include sources (patients, water, soil, or animals), routes of entry, and insect vectors. Medical and public health professionals could use such a database as a decision-support software tool. PMID:20623018
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
Hirano, Yoko; Asami, Yuko; Kuribayashi, Kazuhiko; Kitazaki, Shigeru; Yamamoto, Yuji; Fujimoto, Yoko
2018-05-01
Many pharmacoepidemiologic studies using large-scale databases have recently been utilized to evaluate the safety and effectiveness of drugs in Western countries. In Japan, however, conventional methodology has been applied to postmarketing surveillance (PMS) to collect safety and effectiveness information on new drugs to meet regulatory requirements. Conventional PMS entails enormous costs and resources despite being an uncontrolled observational study method. This study is aimed at examining the possibility of database research as a more efficient pharmacovigilance approach by comparing a health care claims database and PMS with regard to the characteristics and safety profiles of sertraline-prescribed patients. The characteristics of sertraline-prescribed patients recorded in a large-scale Japanese health insurance claims database developed by MinaCare Co. Ltd. were scanned and compared with the PMS results. We also explored the possibility of detecting signals indicative of adverse reactions based on the claims database by using sequence symmetry analysis. Diabetes mellitus, hyperlipidemia, and hyperthyroidism served as exploratory events, and their detection criteria for the claims database were reported by the Pharmaceuticals and Medical Devices Agency in Japan. Most of the characteristics of sertraline-prescribed patients in the claims database did not differ markedly from those in the PMS. There was no tendency for higher risks of the exploratory events after exposure to sertraline, and this was consistent with sertraline's known safety profile. Our results support the concept of using database research as a cost-effective pharmacovigilance tool that is free of selection bias . Further investigation using database research is required to confirm our preliminary observations. Copyright © 2018. Published by Elsevier Inc.
Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning
2007-01-01
Background Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. Results We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. Conclusion We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at . PMID:17945017
Using the Proteomics Identifications Database (PRIDE).
Martens, Lennart; Jones, Phil; Côté, Richard
2008-03-01
The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE.
Compressing DNA sequence databases with coil
White, W Timothy J; Hendy, Michael D
2008-01-01
Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
Transport and Environment Database System (TRENDS): Maritime air pollutant emission modelling
NASA Astrophysics Data System (ADS)
Georgakaki, Aliki; Coffey, Robert A.; Lock, Graham; Sorenson, Spencer C.
This paper reports the development of the maritime module within the framework of the Transport and Environment Database System (TRENDS) project. A detailed database has been constructed for the calculation of energy consumption and air pollutant emissions. Based on an in-house database of commercial vessels kept at the Technical University of Denmark, relationships between the fuel consumption and size of different vessels have been developed, taking into account the fleet's age and service speed. The technical assumptions and factors incorporated in the database are presented, including changes from findings reported in Methodologies for Estimating air pollutant Emissions from Transport (MEET). The database operates on statistical data provided by Eurostat, which describe vessel and freight movements from and towards EU 15 major ports. Data are at port to Maritime Coastal Area (MCA) level, so a bottom-up approach is used. A port to MCA distance database has also been constructed for the purpose of the study. This was the first attempt to use Eurostat maritime statistics for emission modelling; and the problems encountered, since the statistical data collection was not undertaken with a view to this purpose, are mentioned. Examples of the results obtained by the database are presented. These include detailed air pollutant emission calculations for bulk carriers entering the port of Helsinki, as an example of the database operation, and aggregate results for different types of movements for France. Overall estimates of SO x and NO x emission caused by shipping traffic between the EU 15 countries are in the area of 1 and 1.5 million tonnes, respectively.
Privacy-preserving search for chemical compound databases
2015-01-01
Background Searching for similar compounds in a database is the most important process for in-silico drug screening. Since a query compound is an important starting point for the new drug, a query holder, who is afraid of the query being monitored by the database server, usually downloads all the records in the database and uses them in a closed network. However, a serious dilemma arises when the database holder also wants to output no information except for the search results, and such a dilemma prevents the use of many important data resources. Results In order to overcome this dilemma, we developed a novel cryptographic protocol that enables database searching while keeping both the query holder's privacy and database holder's privacy. Generally, the application of cryptographic techniques to practical problems is difficult because versatile techniques are computationally expensive while computationally inexpensive techniques can perform only trivial computation tasks. In this study, our protocol is successfully built only from an additive-homomorphic cryptosystem, which allows only addition performed on encrypted values but is computationally efficient compared with versatile techniques such as general purpose multi-party computation. In an experiment searching ChEMBL, which consists of more than 1,200,000 compounds, the proposed method was 36,900 times faster in CPU time and 12,000 times as efficient in communication size compared with general purpose multi-party computation. Conclusion We proposed a novel privacy-preserving protocol for searching chemical compound databases. The proposed method, easily scaling for large-scale databases, may help to accelerate drug discovery research by making full use of unused but valuable data that includes sensitive information. PMID:26678650
THGS: a web-based database of Transmembrane Helices in Genome Sequences
Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.
2004-01-01
Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Using Web Ontology Language to Integrate Heterogeneous Databases in the Neurosciences
Lam, Hugo Y.K.; Marenco, Luis; Shepherd, Gordon M.; Miller, Perry L.; Cheung, Kei-Hoi
2006-01-01
Integrative neuroscience involves the integration and analysis of diverse types of neuroscience data involving many different experimental techniques. This data will increasingly be distributed across many heterogeneous databases that are web-accessible. Currently, these databases do not expose their schemas (database structures) and their contents to web applications/agents in a standardized, machine-friendly way. This limits database interoperation. To address this problem, we describe a pilot project that illustrates how neuroscience databases can be expressed using the Web Ontology Language, which is a semantically-rich ontological language, as a common data representation language to facilitate complex cross-database queries. In this pilot project, an existing tool called “D2RQ” was used to translate two neuroscience databases (NeuronDB and CoCoDat) into OWL, and the resulting OWL ontologies were then merged. An OWL-based reasoner (Racer) was then used to provide a sophisticated query language (nRQL) to perform integrated queries across the two databases based on the merged ontology. This pilot project is one step toward exploring the use of semantic web technologies in the neurosciences. PMID:17238384
[Establishement for regional pelvic trauma database in Hunan Province].
Cheng, Liang; Zhu, Yong; Long, Haitao; Yang, Junxiao; Sun, Buhua; Li, Kanghua
2017-04-28
To establish a database for pelvic trauma in Hunan Province, and to start the work of multicenter pelvic trauma registry. Methods: To establish the database, literatures relevant to pelvic trauma were screened, the experiences from the established trauma database in China and abroad were learned, and the actual situations for pelvic trauma rescue in Hunan Province were considered. The database for pelvic trauma was established based on the PostgreSQL and the advanced programming language Java 1.6. Results: The complex procedure for pelvic trauma rescue was described structurally. The contents for the database included general patient information, injurious condition, prehospital rescue, conditions in admission, treatment in hospital, status on discharge, diagnosis, classification, complication, trauma scoring and therapeutic effect. The database can be accessed through the internet by browser/servicer. The functions for the database include patient information management, data export, history query, progress report, video-image management and personal information management. Conclusion: The database with whole life cycle pelvic trauma is successfully established for the first time in China. It is scientific, functional, practical, and user-friendly.
[Status of libraries and databases for natural products at abroad].
Zhao, Li-Mei; Tan, Ning-Hua
2015-01-01
For natural products are one of the important sources for drug discovery, libraries and databases of natural products are significant for the development and research of natural products. At present, most of compound libraries at abroad are synthetic or combinatorial synthetic molecules, resulting to access natural products difficult; for information of natural products are scattered with different standards, it is difficult to construct convenient, comprehensive and large-scale databases for natural products. This paper reviewed the status of current accessing libraries and databases for natural products at abroad and provided some important information for the development of libraries and database for natural products.
Competitive Intelligence: Finding the Clues Online.
ERIC Educational Resources Information Center
Combs, Richard; Moorhead, John
1990-01-01
Defines and discusses competitive intelligence for business decision making and suggests the use of online databases to start looking for relevant information. The best databases to use are described, designing the search strategy is explained, reviewing and editing results are discussed, and the presentation of results is considered. (LRW)
Development of Online Database Services in Japan and Perspectives on Asia.
ERIC Educational Resources Information Center
Miyakawa, Takayasu
This paper outlines the market developments, governmental promotion policies, and efforts by private industries for online database services in Japan since the late 1970s. The combination of these efforts over the years has resulted in an online database service market of US$20 billion annually, of which approximately one third is Western online…
ERIC Educational Resources Information Center
Bharti, Neelam; Leonard, Michelle; Singh, Shailendra
2016-01-01
Online chemical databases are the largest source of chemical information and, therefore, the main resource for retrieving results from published journals, books, patents, conference abstracts, and other relevant sources. Various commercial, as well as free, chemical databases are available. SciFinder, Reaxys, and Web of Science are three major…
Checkpointing and Recovery in Distributed and Database Systems
ERIC Educational Resources Information Center
Wu, Jiang
2011-01-01
A transaction-consistent global checkpoint of a database records a state of the database which reflects the effect of only completed transactions and not the results of any partially executed transactions. This thesis establishes the necessary and sufficient conditions for a checkpoint of a data item (or the checkpoints of a set of data items) to…
Factors Influencing Error Recovery in Collections Databases: A Museum Case Study
ERIC Educational Resources Information Center
Marty, Paul F.
2005-01-01
This article offers an analysis of the process of error recovery as observed in the development and use of collections databases in a university museum. It presents results from a longitudinal case study of the development of collaborative systems and practices designed to reduce the number of errors found in the museum's databases as museum…
Application of cloud database in the management of clinical data of patients with skin diseases.
Mao, Xiao-fei; Liu, Rui; DU, Wei; Fan, Xue; Chen, Dian; Zuo, Ya-gang; Sun, Qiu-ning
2015-04-01
To evaluate the needs and applications of using cloud database in the daily practice of dermatology department. The cloud database was established for systemic scleroderma and localized scleroderma. Paper forms were used to record the original data including personal information, pictures, specimens, blood biochemical indicators, skin lesions,and scores of self-rating scales. The results were input into the cloud database. The applications of the cloud database in the dermatology department were summarized and analyzed. The personal and clinical information of 215 systemic scleroderma patients and 522 localized scleroderma patients were included and analyzed using the cloud database. The disease status,quality of life, and prognosis were obtained by statistical calculations. The cloud database can efficiently and rapidly store and manage the data of patients with skin diseases. As a simple, prompt, safe, and convenient tool, it can be used in patients information management, clinical decision-making, and scientific research.
A Framework for Cloudy Model Optimization and Database Storage
NASA Astrophysics Data System (ADS)
Calvén, Emilia; Helton, Andrew; Sankrit, Ravi
2018-01-01
We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
NASA Astrophysics Data System (ADS)
Thakore, Arun K.; Sauer, Frank
1994-05-01
The organization of modern medical care environments into disease-related clusters, such as a cancer center, a diabetes clinic, etc., has the side-effect of introducing multiple heterogeneous databases, often containing similar information, within the same organization. This heterogeneity fosters incompatibility and prevents the effective sharing of data amongst applications at different sites. Although integration of heterogeneous databases is now feasible, in the medical arena this is often an ad hoc process, not founded on proven database technology or formal methods. In this paper we illustrate the use of a high-level object- oriented semantic association method to model information found in different databases into an integrated conceptual global model that integrates the databases. We provide examples from the medical domain to illustrate an integration approach resulting in a consistent global view, without attacking the autonomy of the underlying databases.
Data exploration systems for databases
NASA Technical Reports Server (NTRS)
Greene, Richard J.; Hield, Christopher
1992-01-01
Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.
1998-01-01
sand and gravel outcrops - led to a database of hydraulic conductivities, porosities and kinetic parameters for each lithologjcal fades present in...sedimentological methods. The resulting 2D high-resolution data sets represent a veiy detailed database of excellent quality. On the basis of one example...from an outcrop in southwest Germany the process of building up the database is explained and the results of modelling of transport kinetics in such
Data Model and Relational Database Design for Highway Runoff Water-Quality Metadata
Granato, Gregory E.; Tessler, Steven
2001-01-01
A National highway and urban runoff waterquality metadatabase was developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration as part of the National Highway Runoff Water-Quality Data and Methodology Synthesis (NDAMS). The database was designed to catalog available literature and to document results of the synthesis in a format that would facilitate current and future research on highway and urban runoff. This report documents the design and implementation of the NDAMS relational database, which was designed to provide a catalog of available information and the results of an assessment of the available data. All the citations and the metadata collected during the review process are presented in a stratified metadatabase that contains citations for relevant publications, abstracts (or previa), and reportreview metadata for a sample of selected reports that document results of runoff quality investigations. The database is referred to as a metadatabase because it contains information about available data sets rather than a record of the original data. The database contains the metadata needed to evaluate and characterize how valid, current, complete, comparable, and technically defensible published and available information may be when evaluated for application to the different dataquality objectives as defined by decision makers. This database is a relational database, in that all information is ultimately linked to a given citation in the catalog of available reports. The main database file contains 86 tables consisting of 29 data tables, 11 association tables, and 46 domain tables. The data tables all link to a particular citation, and each data table is focused on one aspect of the information collected in the literature search and the evaluation of available information. This database is implemented in the Microsoft (MS) Access database software because it is widely used within and outside of government and is familiar to many existing and potential customers. The stratified metadatabase design for the NDAMS program is presented in the MS Access file DBDESIGN.mdb and documented with a data dictionary in the NDAMS_DD.mdb file recorded on the CD-ROM. The data dictionary file includes complete documentation of the table names, table descriptions, and information about each of the 419 fields in the database.
Colangelo, Christopher M.; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L.; Carriero, Nicholas J.; Gulcicek, Erol E.; Lam, TuKiet T.; Wu, Terence; Bjornson, Robert D.; Bruce, Can; Nairn, Angus C.; Rinehart, Jesse; Miller, Perry L.; Williams, Kenneth R.
2015-01-01
We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography–tandem mass spectrometry (LC–MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED’s database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. PMID:25712262
Colangelo, Christopher M; Shifman, Mark; Cheung, Kei-Hoi; Stone, Kathryn L; Carriero, Nicholas J; Gulcicek, Erol E; Lam, TuKiet T; Wu, Terence; Bjornson, Robert D; Bruce, Can; Nairn, Angus C; Rinehart, Jesse; Miller, Perry L; Williams, Kenneth R
2015-02-01
We report a significantly-enhanced bioinformatics suite and database for proteomics research called Yale Protein Expression Database (YPED) that is used by investigators at more than 300 institutions worldwide. YPED meets the data management, archival, and analysis needs of a high-throughput mass spectrometry-based proteomics research ranging from a single laboratory, group of laboratories within and beyond an institution, to the entire proteomics community. The current version is a significant improvement over the first version in that it contains new modules for liquid chromatography-tandem mass spectrometry (LC-MS/MS) database search results, label and label-free quantitative proteomic analysis, and several scoring outputs for phosphopeptide site localization. In addition, we have added both peptide and protein comparative analysis tools to enable pairwise analysis of distinct peptides/proteins in each sample and of overlapping peptides/proteins between all samples in multiple datasets. We have also implemented a targeted proteomics module for automated multiple reaction monitoring (MRM)/selective reaction monitoring (SRM) assay development. We have linked YPED's database search results and both label-based and label-free fold-change analysis to the Skyline Panorama repository for online spectra visualization. In addition, we have built enhanced functionality to curate peptide identifications into an MS/MS peptide spectral library for all of our protein database search identification results. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Database Constraints Applied to Metabolic Pathway Reconstruction Tools
Vilaplana, Jordi; Solsona, Francesc; Teixido, Ivan; Usié, Anabel; Karathia, Hiren; Alves, Rui; Mateo, Jordi
2014-01-01
Our group developed two biological applications, Biblio-MetReS and Homol-MetReS, accessing the same database of organisms with annotated genes. Biblio-MetReS is a data-mining application that facilitates the reconstruction of molecular networks based on automated text-mining analysis of published scientific literature. Homol-MetReS allows functional (re)annotation of proteomes, to properly identify both the individual proteins involved in the process(es) of interest and their function. It also enables the sets of proteins involved in the process(es) in different organisms to be compared directly. The efficiency of these biological applications is directly related to the design of the shared database. We classified and analyzed the different kinds of access to the database. Based on this study, we tried to adjust and tune the configurable parameters of the database server to reach the best performance of the communication data link to/from the database system. Different database technologies were analyzed. We started the study with a public relational SQL database, MySQL. Then, the same database was implemented by a MapReduce-based database named HBase. The results indicated that the standard configuration of MySQL gives an acceptable performance for low or medium size databases. Nevertheless, tuning database parameters can greatly improve the performance and lead to very competitive runtimes. PMID:25202745
ReprDB and panDB: minimalist databases with maximal microbial representation.
Zhou, Wei; Gay, Nicole; Oh, Julia
2018-01-18
Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
MEPD: a Medaka gene expression pattern database
Henrich, Thorsten; Ramialison, Mirana; Quiring, Rebecca; Wittbrodt, Beate; Furutani-Seiki, Makoto; Wittbrodt, Joachim; Kondoh, Hisato
2003-01-01
The Medaka Expression Pattern Database (MEPD) stores and integrates information of gene expression during embryonic development of the small freshwater fish Medaka (Oryzias latipes). Expression patterns of genes identified by ESTs are documented by images and by descriptions through parameters such as staining intensity, category and comments and through a comprehensive, hierarchically organized dictionary of anatomical terms. Sequences of the ESTs are available and searchable through BLAST. ESTs in the database are clustered upon entry and have been blasted against public data-bases. The BLAST results are updated regularly, stored within the database and searchable. The MEPD is a project within the Medaka Genome Initiative (MGI) and entries will be interconnected to integrated genomic map databases. MEPD is accessible through the WWW at http://medaka.dsp.jst.go.jp/MEPD. PMID:12519950
DOE technology information management system database study report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Widing, M.A.; Blodgett, D.W.; Braun, M.D.
1994-11-01
To support the missions of the US Department of Energy (DOE) Special Technologies Program, Argonne National Laboratory is defining the requirements for an automated software system that will search electronic databases on technology. This report examines the work done and results to date. Argonne studied existing commercial and government sources of technology databases in five general areas: on-line services, patent database sources, government sources, aerospace technology sources, and general technology sources. First, it conducted a preliminary investigation of these sources to obtain information on the content, cost, frequency of updates, and other aspects of their databases. The Laboratory then performedmore » detailed examinations of at least one source in each area. On this basis, Argonne recommended which databases should be incorporated in DOE`s Technology Information Management System.« less
Just-in-time Database-Driven Web Applications
2003-01-01
"Just-in-time" database-driven Web applications are inexpensive, quickly-developed software that can be put to many uses within a health care organization. Database-driven Web applications garnered 73873 hits on our system-wide intranet in 2002. They enabled collaboration and communication via user-friendly Web browser-based interfaces for both mission-critical and patient-care-critical functions. Nineteen database-driven Web applications were developed. The application categories that comprised 80% of the hits were results reporting (27%), graduate medical education (26%), research (20%), and bed availability (8%). The mean number of hits per application was 3888 (SD = 5598; range, 14-19879). A model is described for just-in-time database-driven Web application development and an example given with a popular HTML editor and database program. PMID:14517109
Published toxicity results are reviewed for oils, dispersants and dispersed oils and aquatic plants. The historical phytotoxicity database consists largely of results from a patchwork of research conducted after oil spills to marine waters. Toxicity information is available for ...
Literature searches on Ayurveda: An update
Aggithaya, Madhur G.; Narahari, Saravu R.
2015-01-01
Introduction: The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. Aims: To update on Ayurveda literature search and strategy to retrieve maximum publications. Methods: Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Results: Five among 46 databases are now relevant – AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. “The Researches in Ayurveda” and “Ayurvedic Research Database” (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. Conclusion: AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches. PMID:27313409
Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Tellez-Ibarra, Roberto; Llorente-Quesada, Monica T; Salgado, Jesús; Barigye, Stephen J; Liu, Jun
2015-08-01
The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are included in CAMP_Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Bitsch, A; Jacobi, S; Melber, C; Wahnschaffe, U; Simetska, N; Mangelsdorf, I
2006-12-01
A database for repeated dose toxicity data has been developed. Studies were selected by data quality. Review documents or risk assessments were used to get a pre-screened selection of available valid data. The structure of the chemicals should be rather simple for well defined chemical categories. The database consists of three core data sets for each chemical: (1) structural features and physico-chemical data, (2) data on study design, (3) study results. To allow consistent queries, a high degree of standardization categories and glossaries were developed for relevant parameters. At present, the database consists of 364 chemicals investigated in 1018 studies which resulted in a total of 6002 specific effects. Standard queries have been developed, which allow analyzing the influence of structural features or PC data on LOELs, target organs and effects. Furthermore, it can be used as an expert system. First queries have shown that the database is a very valuable tool.
The UMIST database for astrochemistry 2006
NASA Astrophysics Data System (ADS)
Woodall, J.; Agúndez, M.; Markwick-Kemper, A. J.; Millar, T. J.
2007-05-01
Aims:We present a new version of the UMIST Database for Astrochemistry, the fourth such version to be released to the public. The current version contains some 4573 binary gas-phase reactions, an increase of 10% from the previous (1999) version, among 420 species, of which 23 are new to the database. Methods: Major updates have been made to ion-neutral reactions, neutral-neutral reactions, particularly at low temperature, and dissociative recombination reactions. We have included for the first time the interstellar chemistry of fluorine. In addition to the usual database, we have also released a reaction set in which the effects of dipole-enhanced ion-neutral rate coefficients are included. Results: These two reactions sets have been used in a dark cloud model and the results of these models are presented and discussed briefly. The database and associated software are available on the World Wide Web at www.udfa.net. Tables 1, 2, 4 and 9 are only available in electronic form at http://www.aanda.org
Asiimwe, Innocent Gerald; Rumona, Dickson
2016-01-01
To limit selective and incomplete publication of the results of clinical trials, registries including ClinicalTrials.gov were introduced. The ClinicalTrials.gov registry added a results database in 2008 to enable researchers to post the results of their trials as stipulated by the Food and Drug Administration Amendment Act of 2007. This study aimed to determine the direction and magnitude of any change in publication proportions of registered breast cancer trials that occurred since the inception of the ClinicalTrials.gov results database. A cross-sectional study design was employed using ClinicalTrials.gov, a publicly available registry/results database as the primary data source. Registry contents under the subcategories 'Breast Neoplasms' and 'Breast Neoplasms, Male' were downloaded on 1 August 2015. A literature search for included trials was afterwards conducted using MEDLINE and DISCOVER databases to determine publication status of the registered breast cancer trials. Nearly half (168/340) of the listed trials had been published, with a median time to publication of 24 months (Q1 = 14 months, Q3 = 42 months). Only 86 trials were published within 24 months of completion. There was no significant increase in publication proportions of trials that were completed before the introduction of the results database compared to those completed after (OR = 1.00, 95 % CI = .61 to 1.63; adjusted OR = 0.84, 95 % CI = .51 to 1.39). Characteristics associated with publication included trial type (observational versus interventional adjusted OR = .28, 95 % CI = .10 to .74) and completion/termination status (terminated versus completed adjusted OR = .22, 95 % CI = .09 to .51). Less than a half of breast cancer trials registered in ClinicalTrials.gov are published in peer-reviewed journals.
Sridhar, Vishnu B; Tian, Peifang; Dale, Anders M; Devor, Anna; Saisan, Payam A
2014-01-01
We present a database client software-Neurovascular Network Explorer 1.0 (NNE 1.0)-that uses MATLAB(®) based Graphical User Interface (GUI) for interaction with a database of 2-photon single-vessel diameter measurements from our previous publication (Tian et al., 2010). These data are of particular interest for modeling the hemodynamic response. NNE 1.0 is downloaded by the user and then runs either as a MATLAB script or as a standalone program on a Windows platform. The GUI allows browsing the database according to parameters specified by the user, simple manipulation and visualization of the retrieved records (such as averaging and peak-normalization), and export of the results. Further, we provide NNE 1.0 source code. With this source code, the user can database their own experimental results, given the appropriate data structure and naming conventions, and thus share their data in a user-friendly format with other investigators. NNE 1.0 provides an example of seamless and low-cost solution for sharing of experimental data by a regular size neuroscience laboratory and may serve as a general template, facilitating dissemination of biological results and accelerating data-driven modeling approaches.
Inequality of obesity and socioeconomic factors in Iran: a systematic review and meta- analyses
Djalalinia, Shirin; Peykari, Niloofar; Qorbani, Mostafa; Larijani, Bagher; Farzadfar, Farshad
2015-01-01
Background: Socioeconomic status and demographic factors, such as education, occupation, place of residence, gender, age, and marital status have been reported to be associated with obesity. We conducted a systematic review to summarize evidences on associations between socioeconomic factors and obesity/overweight in Iranian population. Methods: We systematically searched international databases; ISI, PubMed/Medline, Scopus, and national databases Iran-medex, Irandoc, and Scientific Information Database (SID). We refined data for associations between socioeconomic factors and obesity/overweight by sex, age, province, and year. There were no limitations for time and languages. Results: Based on our search strategy we found 151 records; of them 139 were from international databases and the remaining 12 were obtained from national databases. After removing duplicates, via the refining steps, only 119 articles were found related to our study domains. Extracted results were attributed to 146596 person/data from included studies. Increased ages, low educational levels, being married, residence in urban area, as well as female sex were clearly associated with obesity. Conclusion: Results could be useful for better health policy and more planned studies in this field. These also could be used for future complementary analyses. PMID:26793632
Vieira, A.
2010-01-01
Background: In relation to pharmacognosy, an objective of many ethnobotanical studies is to identify plant species to be further investigated, for example, tested in disease models related to the ethnomedicinal application. To further warrant such testing, research evidence for medicinal applications of these plants (or of their major phytochemical constituents and metabolic derivatives) is typically analyzed in biomedical databases. Methods: As a model of this process, the current report presents novel information regarding traditional anti-inflammation and anti-infection medicinal plant use. This information was obtained from an interview-based ethnobotanical study; and was compared with current biomedical evidence using the Medline® database. Results: Of the 8 anti-infection plant species identified in the ethnobotanical study, 7 have related activities reported in the database; and of the 6 anti-inflammation plants, 4 have related activities in the database. Conclusion: Based on novel and complimentary results from the ethnobotanical and biomedical database analyses, it is suggested that some of these plants warrant additional investigation of potential anti-inflammatory or anti-infection activities in related disease models, and also additional studies in other population groups. PMID:21589754
Parallel database search and prime factorization with magnonic holographic memory devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khitun, Alexander
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploitmore » wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.« less
Parallel database search and prime factorization with magnonic holographic memory devices
NASA Astrophysics Data System (ADS)
Khitun, Alexander
2015-12-01
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.
Okuma, E
1994-01-01
With the introduction of the Cumulative Index to Nursing and Allied Health Literature (CINAHL) on CD-ROM, research was initiated to compare coverage of nursing journals by CINAHL and MEDLINE in this format, expanding on previous comparison of these databases in print and online. The study assessed search results for eight topics in 1989 and 1990 citations in both databases, each produced by SilverPlatter. Results were tallied and analyzed for number of records retrieved, unique and overlapping records, relevance, and appropriateness. An overall precision score was developed. The goal of the research was to develop quantifiable tools to help determine which database to purchase for an academic library serving an undergraduate nursing program.
Okuma, E
1994-01-01
With the introduction of the Cumulative Index to Nursing and Allied Health Literature (CINAHL) on CD-ROM, research was initiated to compare coverage of nursing journals by CINAHL and MEDLINE in this format, expanding on previous comparison of these databases in print and online. The study assessed search results for eight topics in 1989 and 1990 citations in both databases, each produced by SilverPlatter. Results were tallied and analyzed for number of records retrieved, unique and overlapping records, relevance, and appropriateness. An overall precision score was developed. The goal of the research was to develop quantifiable tools to help determine which database to purchase for an academic library serving an undergraduate nursing program. PMID:8136757
Identifying work-related motor vehicle crashes in multiple databases.
Thomas, Andrea M; Thygerson, Steven M; Merrill, Ray M; Cook, Lawrence J
2012-01-01
To compare and estimate the magnitude of work-related motor vehicle crashes in Utah using 2 probabilistically linked statewide databases. Data from 2006 and 2007 motor vehicle crash and hospital databases were joined through probabilistic linkage. Summary statistics and capture-recapture were used to describe occupants injured in work-related motor vehicle crashes and estimate the size of this population. There were 1597 occupants in the motor vehicle crash database and 1673 patients in the hospital database identified as being in a work-related motor vehicle crash. We identified 1443 occupants with at least one record from either the motor vehicle crash or hospital database indicating work-relatedness that linked to any record in the opposing database. We found that 38.7 percent of occupants injured in work-related motor vehicle crashes identified in the motor vehicle crash database did not have a primary payer code of workers' compensation in the hospital database and 40.0 percent of patients injured in work-related motor vehicle crashes identified in the hospital database did not meet our definition of a work-related motor vehicle crash in the motor vehicle crash database. Depending on how occupants injured in work-related motor crashes are identified, we estimate the population to be between 1852 and 8492 in Utah for the years 2006 and 2007. Research on single databases may lead to biased interpretations of work-related motor vehicle crashes. Combining 2 population based databases may still result in an underestimate of the magnitude of work-related motor vehicle crashes. Improved coding of work-related incidents is needed in current databases.
Saokaew, Surasak; Sugimoto, Takashi; Kamae, Isao; Pratoomsoot, Chayanin; Chaiyakunapruk, Nathorn
2015-01-01
Background Health technology assessment (HTA) has been continuously used for value-based healthcare decisions over the last decade. Healthcare databases represent an important source of information for HTA, which has seen a surge in use in Western countries. Although HTA agencies have been established in Asia-Pacific region, application and understanding of healthcare databases for HTA is rather limited. Thus, we reviewed existing databases to assess their potential for HTA in Thailand where HTA has been used officially and Japan where HTA is going to be officially introduced. Method Existing healthcare databases in Thailand and Japan were compiled and reviewed. Databases’ characteristics e.g. name of database, host, scope/objective, time/sample size, design, data collection method, population/sample, and variables were described. Databases were assessed for its potential HTA use in terms of safety/efficacy/effectiveness, social/ethical, organization/professional, economic, and epidemiological domains. Request route for each database was also provided. Results Forty databases– 20 from Thailand and 20 from Japan—were included. These comprised of national censuses, surveys, registries, administrative data, and claimed databases. All databases were potentially used for epidemiological studies. In addition, data on mortality, morbidity, disability, adverse events, quality of life, service/technology utilization, length of stay, and economics were also found in some databases. However, access to patient-level data was limited since information about the databases was not available on public sources. Conclusion Our findings have shown that existing databases provided valuable information for HTA research with limitation on accessibility. Mutual dialogue on healthcare database development and usage for HTA among Asia-Pacific region is needed. PMID:26560127
ERIC Educational Resources Information Center
CEDEFOP Flash, 1993
1993-01-01
During 1992, CEDEFOP (the European Centre for the Development of Vocational Training) commissioned two projects to investigate the current situation with regard to databases on vocational qualifications in Member States of the European Community (EC) and possibilities for networking such databases. Results of these two studies were presented and…
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
Bare, Jane; Gloria, Thomas; Norris, Gregory
2006-08-15
Normalization is an optional step within Life Cycle Impact Assessment (LCIA) that may be used to assist in the interpretation of life cycle inventory data as well as life cycle impact assessment results. Normalization transforms the magnitude of LCI and LCIA results into relative contribution by substance and life cycle impact category. Normalization thus can significantly influence LCA-based decisions when tradeoffs exist. The U. S. Environmental Protection Agency (EPA) has developed a normalization database based on the spatial scale of the 48 continental U.S. states, Hawaii, Alaska, the District of Columbia, and Puerto Rico with a one-year reference time frame. Data within the normalization database were compiled based on the impact methodologies and lists of stressors used in TRACI-the EPA's Tool for the Reduction and Assessment of Chemical and other environmental Impacts. The new normalization database published within this article may be used for LCIA case studies within the United States, and can be used to assist in the further development of a global normalization database. The underlying data analyzed for the development of this database are included to allow the development of normalization data consistent with other impact assessment methodologies as well.
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of "big data" has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on "adherence to prescription and medical plans" identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases.
NASA Astrophysics Data System (ADS)
Bell, E. A.; Boehnke, P.; Harrison, M.; Mao, W. L.
2015-12-01
Because the terrestrial rock record extends only to ~4 Ga and older materials thus far identified are limited to detrital zircons, information about volatile abundances and cycles on early Earth is limited. Carbon, for instance, plays an important role not only in the modern biosphere but also in deep recycling of materials between the crust and mantle. We are investigating the record of carbon abundance and origin in Hadean zircons from Jack Hills (W. Australia) using two main approaches. First, carbon may partition into the zircon structure at trace levels during crystallization from a magma, and better understanding of this partitioning behavior will allow for zircon's use as a monitor of magmatic carbon contents. We have measured carbon abundances in zircon from a variety of igneous rocks (gabbro; I-, A-, and S-type granitoids) via SIMS and found that although abundances are typically low (average raw 12C/30Si ~ 1x10-6), S-type granite zircons can reach a factor of 1000 over this background. Around 10% of Hadean zircons investigated show similar enrichments, consistent with other evidence for the derivation of many Jack Hills zircons from S-type granitoids and with the establishment of modern-level carbon abundances in the crust by ca. 4.2 Ga. Diamond and graphite inclusions reported in the Jack Hills zircons by previous studies proved to be contamination by polishing debris, leaving the true abundance of these materials in the population uncertain. On a second front, we have identified and investigated primary carbonaceous inclusions in these zircons. From a population of over 10,000 Jack Hills zircons, we identified one concordant 4.10±0.01 Ga zircon that contains primary graphite inclusions (so interpreted due to their enclosure in a crack-free zircon host as shown by transmission X-ray microscopy and their crystal habit). Their δ13CPDB of -24±5‰ is consistent with a biogenic origin and, in the absence of a likely inorganic mechanism to produce such a signal in a felsic igneous setting, may be evidence that a terrestrial biosphere had emerged by 4.1 Ga, or ~300 Ma earlier than has been previously proposed.
NASA Astrophysics Data System (ADS)
Zhang, Huiting; Liu, Yongsheng; Hu, Zhaochu; Zong, Keqing; Chen, Haihong; Chen, Chunfei
2017-08-01
Three types of carbonates have been found in the Miocene basalt in the Dongbahao area (Inner Mongolia), including wide veins and veinlets of carbonate in basalt and carbonates in peridotite xenoliths. Except for the dolomitic zonation in the basalt, all of the carbonates are calcite. Despite their different appearances, they share almost identical geochemical characteristics of low LILE (low large ion lithophile element), HFSE (high field strength element), and REE (rare earth elements) contents (ΣREE = 0.51-137 ppm); negative Ce anomalies; and low Ce/Pb ratios (0.51-74.5). Moreover, they show high δ18OSMOW values (20.95-22.61‰) and 87Sr/86Sr ratios (0.7087 ± 0.0003 (1σ, n = 17)). These characteristics indicate a sedimentary precursor for these carbonates. However, the occurrence and petrographic characteristics imply an igneous origin for the carbonates rather than a hypergene process. Further, the trace element compositions of the silicate melt and carbonate melt in the calcite-dolomite-silicate zonations fall on the same variation lines in the plots of Y-Ho, La-Yb, Li-Pb and Ba-Cu. It is suggested that these melts could have evolved from one magma system or could have been equilibrated. Given the partition coefficients of REEs and alkali elements (Cs, Rb, and K) between the carbonate melt and silicate melt, it can be inferred that these melts could have been formed from a primary H2O-Si-bearing Mg-Ca-carbonate melt by an immiscibility process at 1-3 GPa. Considering the southward subduction of the Paleo-Asian ocean along the northern margin of the North China Craton (NCC), these carbonate melts could have been derived from the melting of subducted sedimentary carbonate rocks. Interestingly, these carbonates have quite depleted carbon isotopic compositions (δ13CPDB = -8.23‰ to -11.76‰) but moderate δ18OSMOW values, implying coupled H2O-CO2 degassing during subduction and/or recycling to the Earth's surface. Low-δ13CPBD carbonates appearing at the global scale may suggest an underestimated path of CO2 emission back to the atmosphere.
NASA Astrophysics Data System (ADS)
da Silva Nogueira de Matos, José Henrique; Saraiva dos Santos, Ticiano José; Virgínia Soares Monteiro, Lena
2017-12-01
The Pedra Verde Copper Mine is located in the Viçosa do Ceará municipality, State of Ceará, NE Brazil. The copper mineralization is hosted by the Pedra Verde Phyllite, which is a carbonaceous chlorite-calcite phyllite with subordinate biotite. It belongs to the Neoproterozoic Martinópole Group of the Médio Coreaú Domain, Borborema Province. The Pedra Verde deposit is stratabound and its ore zoning is conspicuous, according to the following sequence, from bottom to top: marcasite/pyrite, native silver, chalcopyrite, bornite, chalcocite, native copper and hematite. Barite and carbonaceous material are reported in ore zones. Zoning reflects the ore formation within a redox boundary developed due to the interaction between oxidized copper- and sulfate-bearing fluids and the reduced phyllite. Structural control on mineralization is evidenced by the association of the ore minerals with veins, hinge folds, shadow pressures, and mylonitic foliation. It was mainly exercised by a dextral transcurrent shear zone developed during the third deformational stage identified in the Médio Coreaú Domain between 590 Ma and 570 Ma. This points to the importance of epigenetic, post-metamorphic deformational events for ore formation. Oxygen isotopic composition (δ18OH2O = 8.94 to 11.28‰, at 250 to 300 °C) estimated for the hydrothermal fluids in equilibrium with calcite indicates metamorphic or evolved meteoric isotopic signatures. The δ13CPDB values (-2.60 to -9.25‰) obtained for hydrothermal calcite indicate mixing of carbon sources derived from marine carbonate rocks and carbonaceous material. The δ34SCDT values (14.88 to 36.91‰) of sulfides suggest evaporites as sulfate sources or a closed system in relation to SO42- availability to form H2S. Carbonaceous matter had a key role in thermochemical sulfate processes and sulfide precipitation. The Pedra Verde Copper Mine is considered the first stratabound meta-sedimentary rock-hosted copper deposit described in Brazil and shares similarities with the syn-orogenic copper deposits of the Congo-Zambian Copperbelt formed during the Gondwana amalgamation.
NASA Astrophysics Data System (ADS)
Bristow, Thomas F.; Kennedy, Martin J.; Morrison, Keith D.; Mrofka, David D.
2012-08-01
The mineralogical, compositional and stable isotopic variability of lacustrine carbonates are frequently used as proxies for ancient paleoenvironmental change in continental settings, under the assumption that precipitated carbonates reflect conditions and chemistry of ancient lake waters. In some saline and alkaline lake systems, however, authigenic clay minerals, forming at or near the sediment water interface, are a major sedimentary component. Often these clays are rich in Mg, influencing the geochemical budget of lake waters, and are therefore expected to influence the properties of contemporaneous authigenic carbonate precipitates (which may also contain Mg). This paper documents evidence for a systematic feedback between clay mineral and carbonate authigenesis through multiple precessionally driven, m-scale sedimentary cycles in lacustrine oil-shale deposits of the Eocene Green River Formation from the Uinta Basin (NE Utah). In the studied section, authigenic, Mg-rich, trioctahedral smectite content varies cyclically between 9 and 39 wt.%. The highest concentrations occur in oil-shales and calcareous mudstones deposited during high lake level intervals that favored sedimentary condensation, lengthening the time available for clay diagenesis and reducing dilution by other siliciclastic phases. An inverse relation between dolomite percentage of carbonate and trioctahedral smectite abundance suggests the Mg uptake during clay authigenesis provides a first order control on carbonate mineralogy that better explains carbonate mineralogical trends than the possible alternative controls of (1) variable Mg/Ca ratios in lake water and (2) degree of microbial activity in sediments. We also observe that cyclical change in carbonate mineralogy, believed to be induced by clay authigenesis, also causes isotopic covariation between δ13CPDB and δ18OPDB of bulk sediments because of differences in the equilibrium fractionation factors of dolomite and calcite (˜2‰ and ˜2.6%, respectively). This provides an alternative mechanism for the common pattern of isotopic covariation, which is typically attributed to the effect of simultaneous changes in water balance and biological activity on the carbon and oxygen isotopic composition of lake waters. These findings may help improve paleoenvironmental reconstructions based on lacustrine carbonate records by adding to the factors known to influence the mineralogical, compositional and stable isotopic signals recorded by lacustrine carbonates.
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
The Danish Testicular Cancer database.
Daugaard, Gedske; Kier, Maria Gry Gundgaard; Bandak, Mikkel; Mortensen, Mette Saksø; Larsson, Heidi; Søgaard, Mette; Toft, Birgitte Groenkaer; Engvad, Birte; Agerbæk, Mads; Holm, Niels Vilstrup; Lauritsen, Jakob
2016-01-01
The nationwide Danish Testicular Cancer database consists of a retrospective research database (DaTeCa database) and a prospective clinical database (Danish Multidisciplinary Cancer Group [DMCG] DaTeCa database). The aim is to improve the quality of care for patients with testicular cancer (TC) in Denmark, that is, by identifying risk factors for relapse, toxicity related to treatment, and focusing on late effects. All Danish male patients with a histologically verified germ cell cancer diagnosis in the Danish Pathology Registry are included in the DaTeCa databases. Data collection has been performed from 1984 to 2007 and from 2013 onward, respectively. The retrospective DaTeCa database contains detailed information with more than 300 variables related to histology, stage, treatment, relapses, pathology, tumor markers, kidney function, lung function, etc. A questionnaire related to late effects has been conducted, which includes questions regarding social relationships, life situation, general health status, family background, diseases, symptoms, use of medication, marital status, psychosocial issues, fertility, and sexuality. TC survivors alive on October 2014 were invited to fill in this questionnaire including 160 validated questions. Collection of questionnaires is still ongoing. A biobank including blood/sputum samples for future genetic analyses has been established. Both samples related to DaTeCa and DMCG DaTeCa database are included. The prospective DMCG DaTeCa database includes variables regarding histology, stage, prognostic group, and treatment. The DMCG DaTeCa database has existed since 2013 and is a young clinical database. It is necessary to extend the data collection in the prospective database in order to answer quality-related questions. Data from the retrospective database will be added to the prospective data. This will result in a large and very comprehensive database for future studies on TC patients.
Building an integrated neurodegenerative disease database at an academic health center.
Xie, Sharon X; Baek, Young; Grossman, Murray; Arnold, Steven E; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M-Y; Trojanowski, John Q
2011-07-01
It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration. These comparative studies rely on powerful database tools to quickly generate data sets that match diverse and complementary criteria set by them. In this article, we present a novel integrated neurodegenerative disease (INDD) database, which was developed at the University of Pennsylvania (Penn) with the help of a consortium of Penn investigators. Because the work of these investigators are based on Alzheimer's disease, Parkinson's disease, amyotrophic lateral sclerosis, and frontotemporal lobar degeneration, it allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used the Microsoft SQL server as a platform, with built-in "backwards" functionality to provide Access as a frontend client to interface with the database. We used PHP Hypertext Preprocessor to create the "frontend" web interface and then used a master lookup table to integrate individual neurodegenerative disease databases. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Using the INDD database, we compared the results of a biomarker study with those using an alternative approach by querying individual databases separately. We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies on several neurodegenerative diseases. Copyright © 2011 The Alzheimer's Association. Published by Elsevier Inc. All rights reserved.
Normand, A C; Packeu, A; Cassagne, C; Hendrickx, M; Ranque, S; Piarroux, R
2018-05-01
Conventional dermatophyte identification is based on morphological features. However, recent studies have proposed to use the nucleotide sequences of the rRNA internal transcribed spacer (ITS) region as an identification barcode of all fungi, including dermatophytes. Several nucleotide databases are available to compare sequences and thus identify isolates; however, these databases often contain mislabeled sequences that impair sequence-based identification. We evaluated five of these databases on a clinical isolate panel. We selected 292 clinical dermatophyte strains that were prospectively subjected to an ITS2 nucleotide sequence analysis. Sequences were analyzed against the databases, and the results were compared to clusters obtained via DNA alignment of sequence segments. The DNA tree served as the identification standard throughout the study. According to the ITS2 sequence identification, the majority of strains (255/292) belonged to the genus Trichophyton , mainly T. rubrum complex ( n = 184), T. interdigitale ( n = 40), T. tonsurans ( n = 26), and T. benhamiae ( n = 5). Other genera included Microsporum (e.g., M. canis [ n = 21], M. audouinii [ n = 10], Nannizzia gypsea [ n = 3], and Epidermophyton [ n = 3]). Species-level identification of T. rubrum complex isolates was an issue. Overall, ITS DNA sequencing is a reliable tool to identify dermatophyte species given that a comprehensive and correctly labeled database is consulted. Since many inaccurate identification results exist in the DNA databases used for this study, reference databases must be verified frequently and amended in line with the current revisions of fungal taxonomy. Before describing a new species or adding a new DNA reference to the available databases, its position in the phylogenetic tree must be verified. Copyright © 2018 American Society for Microbiology.
Performance analysis of different database in new internet mapping system
NASA Astrophysics Data System (ADS)
Yao, Xing; Su, Wei; Gao, Shuai
2017-03-01
In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
Data Mining on Distributed Medical Databases: Recent Trends and Future Directions
NASA Astrophysics Data System (ADS)
Atilgan, Yasemin; Dogan, Firat
As computerization in healthcare services increase, the amount of available digital data is growing at an unprecedented rate and as a result healthcare organizations are much more able to store data than to extract knowledge from it. Today the major challenge is to transform these data into useful information and knowledge. It is important for healthcare organizations to use stored data to improve quality while reducing cost. This paper first investigates the data mining applications on centralized medical databases, and how they are used for diagnostic and population health, then introduces distributed databases. The integration needs and issues of distributed medical databases are described. Finally the paper focuses on data mining studies on distributed medical databases.
Description of 'REQUEST-KYUSHYU' for KYUKEICHO regional data base
NASA Astrophysics Data System (ADS)
Takimoto, Shin'ichi
Kyushu Economic Research Association (a foundational juridical person) initiated the regional database services, ' REQUEST-Kyushu ' recently. It is the full scale databases compiled based on the information and know-hows which the Association has accumulated over forty years. It covers the regional information database for journal and newspaper articles, and statistical information database for economic statistics. As to the former database it is searched on a personal computer and then a search result (original text) is sent through a facsimile. As to the latter, it is also searched on a personal computer where the data is processed, edited or downloaded. This paper describes characteristics, content and the system outline of 'REQUEST-Kyushu'.
Database of Novel and Emerging Adsorbent Materials
National Institute of Standards and Technology Data Gateway
SRD 205 NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials (Web, free access) The NIST/ARPA-E Database of Novel and Emerging Adsorbent Materials is a free, web-based catalog of adsorbent materials and measured adsorption properties of numerous materials obtained from article entries from the scientific literature. Search fields for the database include adsorbent material, adsorbate gas, experimental conditions (pressure, temperature), and bibliographic information (author, title, journal), and results from queries are provided as a list of articles matching the search parameters. The database also contains adsorption isotherms digitized from the cataloged articles, which can be compared visually online in the web application or exported for offline analysis.
Glycemic Index Diet: What's Behind the Claims
... choices for people with diabetes. An international GI database is maintained by Sydney University Glycemic Index Research Services in Sydney, Australia. The database contains the results of studies conducted there and ...
Wright, Judy M; Cottrell, David J; Mir, Ghazala
2014-07-01
To determine the optimal databases to search for studies of faith-sensitive interventions for treating depression. We examined 23 health, social science, religious, and grey literature databases searched for an evidence synthesis. Databases were prioritized by yield of (1) search results, (2) potentially relevant references identified during screening, (3) included references contained in the synthesis, and (4) included references that were available in the database. We assessed the impact of databases beyond MEDLINE, EMBASE, and PsycINFO by their ability to supply studies identifying new themes and issues. We identified pragmatic workload factors that influence database selection. PsycINFO was the best performing database within all priority lists. ArabPsyNet, CINAHL, Dissertations and Theses, EMBASE, Global Health, Health Management Information Consortium, MEDLINE, PsycINFO, and Sociological Abstracts were essential for our searches to retrieve the included references. Citation tracking activities and the personal library of one of the research teams made significant contributions of unique, relevant references. Religion studies databases (Am Theo Lib Assoc, FRANCIS) did not provide unique, relevant references. Literature searches for reviews and evidence syntheses of religion and health studies should include social science, grey literature, non-Western databases, personal libraries, and citation tracking activities. Copyright © 2014 Elsevier Inc. All rights reserved.
The STEP database through the end-users eyes--USABILITY STUDY.
Salunke, Smita; Tuleu, Catherine
2015-08-15
The user-designed database of Safety and Toxicity of Excipients for Paediatrics ("STEP") is created to address the shared need of drug development community to access the relevant information of excipients effortlessly. Usability testing was performed to validate if the database satisfies the need of the end-users. Evaluation framework was developed to assess the usability. The participants performed scenario based tasks and provided feedback and post-session usability ratings. Failure Mode Effect Analysis (FMEA) was performed to prioritize the problems and improvements to the STEP database design and functionalities. The study revealed several design vulnerabilities. Tasks such as limiting the results, running complex queries, location of data and registering to access the database were challenging. The three critical attributes identified to have impact on the usability of the STEP database included (1) content and presentation (2) the navigation and search features (3) potential end-users. Evaluation framework proved to be an effective method for evaluating database effectiveness and user satisfaction. This study provides strong initial support for the usability of the STEP database. Recommendations would be incorporated into the refinement of the database to improve its usability and increase user participation towards the advancement of the database. Copyright © 2015 Elsevier B.V. All rights reserved.
Evaluation of Federated Searching Options for the School Library
ERIC Educational Resources Information Center
Abercrombie, Sarah E.
2008-01-01
Three hosted federated search tools, Follett One Search, Gale PowerSearch Plus, and WebFeat Express, were configured and implemented in a school library. Databases from five vendors and the OPAC were systematically searched. Federated search results were compared with each other and to the results of the same searches in the database's native…
Human Variome Project Quality Assessment Criteria for Variation Databases.
Vihinen, Mauno; Hancock, John M; Maglott, Donna R; Landrum, Melissa J; Schaafsma, Gerard C P; Taschner, Peter
2016-06-01
Numerous databases containing information about DNA, RNA, and protein variations are available. Gene-specific variant databases (locus-specific variation databases, LSDBs) are typically curated and maintained for single genes or groups of genes for a certain disease(s). These databases are widely considered as the most reliable information source for a particular gene/protein/disease, but it should also be made clear they may have widely varying contents, infrastructure, and quality. Quality is very important to evaluate because these databases may affect health decision-making, research, and clinical practice. The Human Variome Project (HVP) established a Working Group for Variant Database Quality Assessment. The basic principle was to develop a simple system that nevertheless provides a good overview of the quality of a database. The HVP quality evaluation criteria that resulted are divided into four main components: data quality, technical quality, accessibility, and timeliness. This report elaborates on the developed quality criteria and how implementation of the quality scheme can be achieved. Examples are provided for the current status of the quality items in two different databases, BTKbase, an LSDB, and ClinVar, a central archive of submissions about variants and their clinical significance. © 2016 WILEY PERIODICALS, INC.
On patterns and re-use in bioinformatics databases
Bell, Michael J.; Lord, Phillip
2017-01-01
Abstract Motivation: As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. Results: We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Availability and implementation: Analytical software is available on request. Contact: phillip.lord@newcastle.ac.uk PMID:28525546
Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases.
Ahmadi, Maryam; Sarabi, Roghayeh Ershad; Orak, Roohangiz Jamshidi; Bahaadinbeigy, Kambiz
2015-06-01
The first step in each systematic review is selection of the most valid database that can provide the highest number of relevant references. This study was carried out to determine the most suitable database for information retrieval in telemedicine field. Cinhal, PubMed, Web of Science and Scopus databases were searched for telemedicine matched with Education, cost benefit and patient satisfaction. After analysis of the obtained results, the accuracy coefficient, sensitivity, uniqueness and overlap of databases were calculated. The studied databases differed in the number of retrieved articles. PubMed was identified as the most suitable database for retrieving information on the selected topics with the accuracy and sensitivity ratios of 50.7% and 61.4% respectively. The uniqueness percent of retrieved articles ranged from 38% for Pubmed to 3.0% for Cinhal. The highest overlap rate (18.6%) was found between PubMed and Web of Science. Less than 1% of articles have been indexed in all searched databases. PubMed is suggested as the most suitable database for starting search in telemedicine and after PubMed, Scopus and Web of Science can retrieve about 90% of the relevant articles.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
Sagace: A web-based search engine for biomedical databases in Japan
2012-01-01
Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
Ecker, David J; Sampath, Rangarajan; Willett, Paul; Wyatt, Jacqueline R; Samant, Vivek; Massire, Christian; Hall, Thomas A; Hari, Kumar; McNeil, John A; Büchen-Osmond, Cornelia; Budowle, Bruce
2005-01-01
Background Thousands of different microorganisms affect the health, safety, and economic stability of populations. Many different medical and governmental organizations have created lists of the pathogenic microorganisms relevant to their missions; however, the nomenclature for biological agents on these lists and pathogens described in the literature is inexact. This ambiguity can be a significant block to effective communication among the diverse communities that must deal with epidemics or bioterrorist attacks. Results We have developed a database known as the Microbial Rosetta Stone. The database relates microorganism names, taxonomic classifications, diseases, specific detection and treatment protocols, and relevant literature. The database structure facilitates linkage to public genomic databases. This paper focuses on the information in the database for pathogens that impact global public health, emerging infectious organisms, and bioterrorist threat agents. Conclusion The Microbial Rosetta Stone is available at . The database provides public access to up-to-date taxonomic classifications of organisms that cause human diseases, improves the consistency of nomenclature in disease reporting, and provides useful links between different public genomic and public health databases. PMID:15850481
Marsh, Erin; Anderson, Eric D.
2015-01-01
Three ore deposits databases from previous studies were evaluated and combined with new known mineral occurrences into one database, which can now be used to manage information about the known mineral occurrences of Mauritania. The Microsoft Access 2010 database opens with the list of tables and forms held within the database and a Switchboard control panel from which to easily navigate through the existing mineral deposit data and to enter data for new deposit locations. The database is a helpful tool for the organization of the basic information about the mineral occurrences of Mauritania. It is suggested the database be administered by a single operator in order to avoid data overlap and override that can result from shared real time data entry. It is proposed that the mineral occurrence database be used in concert with the geologic maps, geophysics and geochemistry datasets, as a publically advertised interface for the abundant geospatial information that the Mauritanian government can provide to interested parties.
Establishment and Assessment of Plasma Disruption and Warning Databases from EAST
NASA Astrophysics Data System (ADS)
Wang, Bo; Robert, Granetz; Xiao, Bingjia; Li, Jiangang; Yang, Fei; Li, Junjun; Chen, Dalong
2016-12-01
Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language (SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents, and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST, to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions, a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the 2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented. supported by the National Magnetic Confinement Fusion Science Program of China (No. 2014GB103000)
The role of insurance claims databases in drug therapy outcomes research.
Lewis, N J; Patwell, J T; Briesacher, B A
1993-11-01
The use of insurance claims databases in drug therapy outcomes research holds great promise as a cost-effective alternative to post-marketing clinical trials. Claims databases uniquely capture information about episodes of care across healthcare services and settings. They also facilitate the examination of drug therapy effects on cohorts of patients and specific patient subpopulations. However, there are limitations to the use of insurance claims databases including incomplete diagnostic and provider identification data. The characteristics of the population included in the insurance plan, the plan benefit design, and the variables of the database itself can influence the research results. Given the current concerns regarding the completeness of insurance claims databases, and the validity of their data, outcomes research usually requires original data to validate claims data or to obtain additional information. Improvements to claims databases such as standardisation of claims information reporting, addition of pertinent clinical and economic variables, and inclusion of information relative to patient severity of illness, quality of life, and satisfaction with provided care will enhance the benefit of such databases for outcomes research.
Extension of the COG and arCOG databases by amino acid and nucleotide sequences
Meereis, Florian; Kaufmann, Michael
2008-01-01
Background The current versions of the COG and arCOG databases, both excellent frameworks for studies in comparative and functional genomics, do not contain the nucleotide sequences corresponding to their protein or protein domain entries. Results Using sequence information obtained from GenBank flat files covering the completely sequenced genomes of the COG and arCOG databases, we constructed NUCOCOG (nucleotide sequences containing COG databases) as an extended version including all nucleotide sequences and in addition the amino acid sequences originally utilized to construct the current COG and arCOG databases. We make available three comprehensive single XML files containing the complete databases including all sequence information. In addition, we provide a web interface as a utility suitable to browse the NUCOCOG database for sequence retrieval. The database is accessible at . Conclusion NUCOCOG offers the possibility to analyze any sequence related property in the context of the COG and arCOG framework simply by using script languages such as PERL applied to a large but single XML document. PMID:19014535
... Terms and Conditions Disclaimer ClinicalTrials.gov is a database of privately and publicly funded clinical studies conducted ... world. ClinicalTrials.gov is a registry and results database of publicly and privately supported clinical studies of ...
Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy
2013-08-01
Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
Techniques of Photometry and Astrometry with APASS, Gaia, and Pan-STARRs Results (Abstract)
NASA Astrophysics Data System (ADS)
Green, W.
2017-12-01
(Abstract only) The databases with the APASS DR9, Gaia DR1, and the Pan-STARRs 3pi DR1 data releases are publicly available for use. There is a bit of data-mining involved to download and manage these reference stars. This paper discusses the use of these databases to acquire accurate photometric references as well as techniques for improving results. Images are prepared in the usual way: zero, dark, flat-fields, and WCS solutions with Astrometry.net. Images are then processed with Sextractor to produce an ASCII table of identifying photometric features. The database manages photometics catalogs and images converted to ASCII tables. Scripts convert the files into SQL and assimilate them into database tables. Using SQL techniques, each image star is merged with reference data to produce publishable results. The VYSOS has over 13,000 images of the ONC5 field to process with roughly 100 total fields in the campaign. This paper provides the overview for this daunting task.
Administrative database research has unique characteristics that can risk biased results.
van Walraven, Carl; Austin, Peter
2012-02-01
The provision of health care frequently creates digitized data--such as physician service claims, medication prescription records, and hospitalization abstracts--that can be used to conduct studies termed "administrative database research." While most guidelines for assessing the validity of observational studies apply to administrative database research, the unique data source and analytical opportunities for these studies create risks that can make them uninterpretable or bias their results. Nonsystematic review. The risks of uninterpretable or biased results can be minimized by; providing a robust description of the data tables used, focusing on both why and how they were created; measuring and reporting the accuracy of diagnostic and procedural codes used; distinguishing between clinical significance and statistical significance; properly accounting for any time-dependent nature of variables; and analyzing clustered data properly to explore its influence on study outcomes. This article reviewed these five issues as they pertain to administrative database research to help maximize the utility of these studies for both readers and writers. Copyright © 2012 Elsevier Inc. All rights reserved.
An X-Ray Analysis Database of Photoionization Cross Sections Including Variable Ionization
NASA Technical Reports Server (NTRS)
Wang, Ping; Cohen, David H.; MacFarlane, Joseph J.; Cassinelli, Joseph P.
1997-01-01
Results of research efforts in the following areas are discussed: review of the major theoretical and experimental data of subshell photoionization cross sections and ionization edges of atomic ions to assess the accuracy of the data, and to compile the most reliable of these data in our own database; detailed atomic physics calculations to complement the database for all ions of 17 cosmically abundant elements; reconciling the data from various sources and our own calculations; and fitting cross sections with functional approximations and incorporating these functions into a compact computer code.Also, efforts included adapting an ionization equilibrium code, tabulating results, and incorporating them into the overall program and testing the code (both ionization equilibrium and opacity codes) with existing observational data. The background and scientific applications of this work are discussed. Atomic physics cross section models and calculations are described. Calculation results are compared with available experimental data and other theoretical data. The functional approximations used for fitting cross sections are outlined and applications of the database are discussed.
Exploring Antarctic Land Surface Temperature Extremes Using Condensed Anomaly Databases
NASA Astrophysics Data System (ADS)
Grant, Glenn Edwin
Satellite observations have revolutionized the Earth Sciences and climate studies. However, data and imagery continue to accumulate at an accelerating rate, and efficient tools for data discovery, analysis, and quality checking lag behind. In particular, studies of long-term, continental-scale processes at high spatiotemporal resolutions are especially problematic. The traditional technique of downloading an entire dataset and using customized analysis code is often impractical or consumes too many resources. The Condensate Database Project was envisioned as an alternative method for data exploration and quality checking. The project's premise was that much of the data in any satellite dataset is unneeded and can be eliminated, compacting massive datasets into more manageable sizes. Dataset sizes are further reduced by retaining only anomalous data of high interest. Hosting the resulting "condensed" datasets in high-speed databases enables immediate availability for queries and exploration. Proof of the project's success relied on demonstrating that the anomaly database methods can enhance and accelerate scientific investigations. The hypothesis of this dissertation is that the condensed datasets are effective tools for exploring many scientific questions, spurring further investigations and revealing important information that might otherwise remain undetected. This dissertation uses condensed databases containing 17 years of Antarctic land surface temperature anomalies as its primary data. The study demonstrates the utility of the condensate database methods by discovering new information. In particular, the process revealed critical quality problems in the source satellite data. The results are used as the starting point for four case studies, investigating Antarctic temperature extremes, cloud detection errors, and the teleconnections between Antarctic temperature anomalies and climate indices. The results confirm the hypothesis that the condensate databases are a highly useful tool for Earth Science analyses. Moreover, the quality checking capabilities provide an important method for independent evaluation of dataset veracity.
Palm-Vein Classification Based on Principal Orientation Features
Zhou, Yujia; Liu, Yaqin; Feng, Qianjin; Yang, Feng; Huang, Jing; Nie, Yixiao
2014-01-01
Personal recognition using palm–vein patterns has emerged as a promising alternative for human recognition because of its uniqueness, stability, live body identification, flexibility, and difficulty to cheat. With the expanding application of palm–vein pattern recognition, the corresponding growth of the database has resulted in a long response time. To shorten the response time of identification, this paper proposes a simple and useful classification for palm–vein identification based on principal direction features. In the registration process, the Gaussian-Radon transform is adopted to extract the orientation matrix and then compute the principal direction of a palm–vein image based on the orientation matrix. The database can be classified into six bins based on the value of the principal direction. In the identification process, the principal direction of the test sample is first extracted to ascertain the corresponding bin. One-by-one matching with the training samples is then performed in the bin. To improve recognition efficiency while maintaining better recognition accuracy, two neighborhood bins of the corresponding bin are continuously searched to identify the input palm–vein image. Evaluation experiments are conducted on three different databases, namely, PolyU, CASIA, and the database of this study. Experimental results show that the searching range of one test sample in PolyU, CASIA and our database by the proposed method for palm–vein identification can be reduced to 14.29%, 14.50%, and 14.28%, with retrieval accuracy of 96.67%, 96.00%, and 97.71%, respectively. With 10,000 training samples in the database, the execution time of the identification process by the traditional method is 18.56 s, while that by the proposed approach is 3.16 s. The experimental results confirm that the proposed approach is more efficient than the traditional method, especially for a large database. PMID:25383715
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika
2010-01-27
Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.
Hallin, Peter F; Ussery, David W
2004-12-12
Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
Rimland, Joseph M; Abraha, Iosief; Luchetta, Maria Laura; Cozzolino, Francesco; Orso, Massimiliano; Cherubini, Antonio; Dell'Aquila, Giuseppina; Chiatti, Carlos; Ambrosio, Giuseppe; Montedori, Alessandro
2016-06-01
Healthcare databases are useful sources to investigate the epidemiology of chronic obstructive pulmonary disease (COPD), to assess longitudinal outcomes in patients with COPD, and to develop disease management strategies. However, in order to constitute a reliable source for research, healthcare databases need to be validated. The aim of this protocol is to perform the first systematic review of studies reporting the validation of codes related to COPD diagnoses in healthcare databases. MEDLINE, EMBASE, Web of Science and the Cochrane Library databases will be searched using appropriate search strategies. Studies that evaluated the validity of COPD codes (such as the International Classification of Diseases 9th Revision and 10th Revision system; the Real codes system or the International Classification of Primary Care) in healthcare databases will be included. Inclusion criteria will be: (1) the presence of a reference standard case definition for COPD; (2) the presence of at least one test measure (eg, sensitivity, positive predictive values, etc); and (3) the use of a healthcare database (including administrative claims databases, electronic healthcare databases or COPD registries) as a data source. Pairs of reviewers will independently abstract data using standardised forms and will assess quality using a checklist based on the Standards for Reporting of Diagnostic accuracy (STARD) criteria. This systematic review protocol has been produced in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol (PRISMA-P) 2015 statement. Ethics approval is not required. Results of this study will be submitted to a peer-reviewed journal for publication. The results from this systematic review will be used for outcome research on COPD and will serve as a guide to identify appropriate case definitions of COPD, and reference standards, for researchers involved in validating healthcare databases. CRD42015029204. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
NASA Technical Reports Server (NTRS)
1993-01-01
All the options in the NASA VEGetation Workbench (VEG) make use of a database of historical cover types. This database contains results from experiments by scientists on a wide variety of different cover types. The learning system uses the database to provide positive and negative training examples of classes that enable it to learn distinguishing features between classes of vegetation. All the other VEG options use the database to estimate the error bounds involved in the results obtained when various analysis techniques are applied to the sample of cover type data that is being studied. In the previous version of VEG, the historical cover type database was stored as part of the VEG knowledge base. This database was removed from the knowledge base. It is now stored as a series of flat files that are external to VEG. An interface between VEG and these files was provided. The interface allows the user to select which files of historical data to use. The files are then read, and the data are stored in Knowledge Engineering Environment (KEE) units using the same organization of units as in the previous version of VEG. The interface also allows the user to delete some or all of the historical database units from VEG and load new historical data from a file. This report summarizes the use of the historical cover type database in VEG. It then describes the new interface to the files containing the historical data. It describes minor changes that were made to VEG to enable the externally stored database to be used. Test runs to test the operation of the new interface and also to test the operation of VEG using historical data loaded from external files are described. Task F was completed. A Sun cartridge tape containing the KEE and Common Lisp code for the new interface and the modified version of the VEG knowledge base was delivered to the NASA GSFC technical representative.
Liew, H B; Rosli, M A; Wan Azman, W A; Robaayah, Z; Sim, K H
2008-09-01
The National Cardiovascular Database for Percutaneous Coronary Intervention (NCVD PCI) Registry is the first multicentre interventional cardiology project, involving the main cardiac centres in the country. The ultimate goal of NCVD PCI is to provide a contemporary appraisal of PCI in Malaysia. This article introduces the foundation, the aims, methodology, database collection and preliminary results of the first six-month database.
Zhang, Jiyang; Ma, Jie; Dou, Lei; Wu, Songfeng; Qian, Xiaohong; Xie, Hongwei; Zhu, Yunping; He, Fuchu
2009-02-01
The hybrid linear trap quadrupole Fourier-transform (LTQ-FT) ion cyclotron resonance mass spectrometer, an instrument with high accuracy and resolution, is widely used in the identification and quantification of peptides and proteins. However, time-dependent errors in the system may lead to deterioration of the accuracy of these instruments, negatively influencing the determination of the mass error tolerance (MET) in database searches. Here, a comprehensive discussion of LTQ/FT precursor ion mass error is provided. On the basis of an investigation of the mass error distribution, we propose an improved recalibration formula and introduce a new tool, FTDR (Fourier-transform data recalibration), that employs a graphic user interface (GUI) for automatic calibration. It was found that the calibration could adjust the mass error distribution to more closely approximate a normal distribution and reduce the standard deviation (SD). Consequently, we present a new strategy, LDSF (Large MET database search and small MET filtration), for database search MET specification and validation of database search results. As the name implies, a large-MET database search is conducted and the search results are then filtered using the statistical MET estimated from high-confidence results. By applying this strategy to a standard protein data set and a complex data set, we demonstrate the LDSF can significantly improve the sensitivity of the result validation procedure.
Relational databases for rare disease study: application to vascular anomalies.
Perkins, Jonathan A; Coltrera, Marc D
2008-01-01
To design a relational database integrating clinical and basic science data needed for multidisciplinary treatment and research in the field of vascular anomalies. Based on data points agreed on by the American Society of Pediatric Otolaryngology (ASPO) Vascular Anomalies Task Force. The database design enables sharing of data subsets in a Health Insurance Portability and Accountability Act (HIPAA)-compliant manner for multisite collaborative trials. Vascular anomalies pose diagnostic and therapeutic challenges. Our understanding of these lesions and treatment improvement is limited by nonstandard terminology, severity assessment, and measures of treatment efficacy. The rarity of these lesions places a premium on coordinated studies among multiple participant sites. The relational database design is conceptually centered on subjects having 1 or more lesions. Each anomaly can be tracked individually along with their treatment outcomes. This design allows for differentiation between treatment responses and untreated lesions' natural course. The relational database design eliminates data entry redundancy and results in extremely flexible search and data export functionality. Vascular anomaly programs in the United States. A relational database correlating clinical findings and photographic, radiologic, histologic, and treatment data for vascular anomalies was created for stand-alone and multiuser networked systems. Proof of concept for independent site data gathering and HIPAA-compliant sharing of data subsets was demonstrated. The collaborative effort by the ASPO Vascular Anomalies Task Force to create the database helped define a common vascular anomaly data set. The resulting relational database software is a powerful tool to further the study of vascular anomalies and the development of evidence-based treatment innovation.
BNDB - the Biochemical Network Database.
Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter
2007-10-02
Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
BioWarehouse: a bioinformatics database warehouse toolkit
Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D
2006-01-01
Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
Al-Nasheri, Ahmed; Muhammad, Ghulam; Alsulaiman, Mansour; Ali, Zulfiqar; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H; Bencherif, Mohamed A
2017-01-01
Automatic voice-pathology detection and classification systems may help clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. The main aim of this paper is to investigate Multidimensional Voice Program (MDVP) parameters to automatically detect and classify the voice pathologies in multiple databases, and then to find out which parameters performed well in these two processes. Samples of the sustained vowel /a/ of normal and pathological voices were extracted from three different databases, which have three voice pathologies in common. The selected databases in this study represent three distinct languages: (1) the Arabic voice pathology database; (2) the Massachusetts Eye and Ear Infirmary database (English database); and (3) the Saarbruecken Voice Database (German database). A computerized speech lab program was used to extract MDVP parameters as features, and an acoustical analysis was performed. The Fisher discrimination ratio was applied to rank the parameters. A t test was performed to highlight any significant differences in the means of the normal and pathological samples. The experimental results demonstrate a clear difference in the performance of the MDVP parameters using these databases. The highly ranked parameters also differed from one database to another. The best accuracies were obtained by using the three highest ranked MDVP parameters arranged according to the Fisher discrimination ratio: these accuracies were 99.68%, 88.21%, and 72.53% for the Saarbruecken Voice Database, the Massachusetts Eye and Ear Infirmary database, and the Arabic voice pathology database, respectively. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Virginia Bridge Information Systems Laboratory.
DOT National Transportation Integrated Search
2014-06-01
This report presents the results of applied data mining of legacy bridge databases, focusing on the Pontis and : National Bridge Inventory databases maintained by the Virginia Department of Transportation (VDOT). Data : analysis was performed using a...
Viereck, Christopher; Boudes, Pol
2009-07-01
We compared the clinical trial transparency practices of US/European pharma by analyzing the publicly-accessible clinical trial results databases of major drugs (doripenem, varenicline, lapatinib, zoledronic acid, adalimumab, insulin glargine, raltegravir, gefitinib). We evaluated their accessibility and utility from the perspective of the lay public. We included databases on company websites, http://www.clinicalstudyresults.org, http://www.clinicaltrials.gov and http://clinicaltrials.ifpma.org. Only 2 of 8 company homepages provide a direct link to the results. While the use of common terms on company search engines led to results for 5 of the 8 drugs following 2-4 clicks, no logical pathway was identified. The number of clinical trials in the databases was inconsistent: 0 for doripenem to 45 for insulin glargine. Results from all phases of clinical development were provided for 2 (insulin glargine and gefitinib) of the 8 drugs. Analyses of phase III reports revealed that most critical elements of the International Conference of Harmonization E3 Structure and Content of Synopses for Clinical Trial Reports were provided for 2 (varenicline, lapatinib) of the 8 drugs. For adalimumab and zoledronic acid, only citations were provided, which the lay public would be unable to access. None of the clinical trial reports was written in lay language. User-friendly support, when provided, was of marginal benefit. Only 1 of the databases (gefitinib) permitted the user to find the most recently updated reports. None of the glossaries included explanations for adverse events or statistical methodology. In conclusion, our study indicates that the public faces significant hurdles in finding and understanding clinical trial results databases.
Yerkes, R.F.; Campbell, Russell H.
1995-01-01
This database, identified as "Preliminary Geologic Map of the Oat Mountain 7.5' Quadrangle, southern California: A Digital Database," has been approved for release and publication by the Director of the USGS. Although this database has been reviewed and is substantially complete, the USGS reserves the right to revise the data pursuant to further analysis and review. This database is released on condition that neither the USGS nor the U. S. Government may be held liable for any damages resulting from its use. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1993). More specific information about the units may be available in the original sources.
Database of Mechanical Properties of Textile Composites
NASA Technical Reports Server (NTRS)
Delbrey, Jerry
1996-01-01
This report describes the approach followed to develop a database for mechanical properties of textile composites. The data in this database is assembled from NASA Advanced Composites Technology (ACT) programs and from data in the public domain. This database meets the data documentation requirements of MIL-HDBK-17, Section 8.1.2, which describes in detail the type and amount of information needed to completely document composite material properties. The database focuses on mechanical properties of textile composite. Properties are available for a range of parameters such as direction, fiber architecture, materials, environmental condition, and failure mode. The composite materials in the database contain innovative textile architectures such as the braided, woven, and knitted materials evaluated under the NASA ACT programs. In summary, the database contains results for approximately 3500 coupon level tests, for ten different fiber/resin combinations, and seven different textile architectures. It also includes a limited amount of prepreg tape composites data from ACT programs where side-by-side comparisons were made.
Developing a Nursing Database System in Kenya
Riley, Patricia L; Vindigni, Stephen M; Arudo, John; Waudo, Agnes N; Kamenju, Andrew; Ngoya, Japheth; Oywer, Elizabeth O; Rakuom, Chris P; Salmon, Marla E; Kelley, Maureen; Rogers, Martha; St Louis, Michael E; Marum, Lawrence H
2007-01-01
Objective To describe the development, initial findings, and implications of a national nursing workforce database system in Kenya. Principal Findings Creating a national electronic nursing workforce database provides more reliable information on nurse demographics, migration patterns, and workforce capacity. Data analyses are most useful for human resources for health (HRH) planning when workforce capacity data can be linked to worksite staffing requirements. As a result of establishing this database, the Kenya Ministry of Health has improved capability to assess its nursing workforce and document important workforce trends, such as out-migration. Current data identify the United States as the leading recipient country of Kenyan nurses. The overwhelming majority of Kenyan nurses who elect to out-migrate are among Kenya's most qualified. Conclusions The Kenya nursing database is a first step toward facilitating evidence-based decision making in HRH. This database is unique to developing countries in sub-Saharan Africa. Establishing an electronic workforce database requires long-term investment and sustained support by national and global stakeholders. PMID:17489921
Scale out databases for CERN use cases
NASA Astrophysics Data System (ADS)
Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper
2015-12-01
Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.
Normative Databases for Imaging Instrumentation.
Realini, Tony; Zangwill, Linda M; Flanagan, John G; Garway-Heath, David; Patella, Vincent M; Johnson, Chris A; Artes, Paul H; Gaddie, Ian B; Fingeret, Murray
2015-08-01
To describe the process by which imaging devices undergo reference database development and regulatory clearance. The limitations and potential improvements of reference (normative) data sets for ophthalmic imaging devices will be discussed. A symposium was held in July 2013 in which a series of speakers discussed issues related to the development of reference databases for imaging devices. Automated imaging has become widely accepted and used in glaucoma management. The ability of such instruments to discriminate healthy from glaucomatous optic nerves, and to detect glaucomatous progression over time is limited by the quality of reference databases associated with the available commercial devices. In the absence of standardized rules governing the development of reference databases, each manufacturer's database differs in size, eligibility criteria, and ethnic make-up, among other key features. The process for development of imaging reference databases may be improved by standardizing eligibility requirements and data collection protocols. Such standardization may also improve the degree to which results may be compared between commercial instruments.
The Brain Database: A Multimedia Neuroscience Database for Research and Teaching
Wertheim, Steven L.
1989-01-01
The Brain Database is an information tool designed to aid in the integration of clinical and research results in neuroanatomy and regional biochemistry. It can handle a wide range of data types including natural images, 2 and 3-dimensional graphics, video, numeric data and text. It is organized around three main entities: structures, substances and processes. The database will support a wide variety of graphical interfaces. Two sample interfaces have been made. This tool is intended to serve as one component of a system that would allow neuroscientists and clinicians 1) to represent clinical and experimental data within a common framework 2) to compare results precisely between experiments and among laboratories, 3) to use computing tools as an aid in collaborative work and 4) to contribute to a shared and accessible body of knowledge about the nervous system.
The Giardia genome project database.
McArthur, A G; Morrison, H G; Nixon, J E; Passamaneck, N Q; Kim, U; Hinkle, G; Crocker, M K; Holder, M E; Farr, R; Reich, C I; Olsen, G E; Aley, S B; Adam, R D; Gillin, F D; Sogin, M L
2000-08-15
The Giardia genome project database provides an online resource for Giardia lamblia (WB strain, clone C6) genome sequence information. The database includes edited single-pass reads, the results of BLASTX searches, and details of progress towards sequencing the entire 12 million-bp Giardia genome. Pre-sorted BLASTX results can be retrieved based on keyword searches and BLAST searches of the high throughput Giardia data can be initiated from the web site or through NCBI. Descriptions of the genomic DNA libraries, project protocols and summary statistics are also available. Although the Giardia genome project is ongoing, new sequences are made available on a bi-monthly basis to ensure that researchers have access to information that may assist them in the search for genes and their biological function. The current URL of the Giardia genome project database is www.mbl.edu/Giardia.
Jeong, Sohyun; Han, Nayoung; Choi, Boyoon; Sohn, Minji; Song, Yun-Kyoung; Chung, Myeon-Woo; Na, Han-Sung; Ji, Eunhee; Kim, Hyunah; Rhew, Ki Yon; Kim, Therasa; Kim, In-Wha; Oh, Jung Mi
2016-06-01
To construct a database of published clinical drug trials suitable for use 1) as a research tool in accessing clinical trial information and 2) in evidence-based decision-making by regulatory professionals, clinical research investigators, and medical practitioners. Comprehensive information obtained from a search of design elements and results of clinical trials in peer reviewed journals using PubMed (http://www.ncbi.nlm.ih.gov/pubmed). The methodology to develop a structured database was devised by a panel composed of experts in medical, pharmaceutical, information technology, and members of Ministry of Food and Drug Safety (MFDS) using a step by step approach. A double-sided system consisting of user mode and manager mode served as the framework for the database; elements of interest from each trial were entered via secure manager mode enabling the input information to be accessed in a user-friendly manner (user mode). Information regarding methodology used and results of drug treatment were extracted as detail elements of each data set and then inputted into the web-based database system. Comprehensive information comprising 2,326 clinical trial records, 90 disease states, and 939 drugs entities and concerning study objectives, background, methods used, results, and conclusion could be extracted from published information on phase II/III drug intervention clinical trials appearing in SCI journals within the last 10 years. The extracted data was successfully assembled into a clinical drug trial database with easy access suitable for use as a research tool. The clinically most important therapeutic categories, i.e., cancer, cardiovascular, respiratory, neurological, metabolic, urogenital, gastrointestinal, psychological, and infectious diseases were covered by the database. Names of test and control drugs, details on primary and secondary outcomes and indexed keywords could also be retrieved and built into the database. The construction used in the database enables the user to sort and download targeted information as a Microsoft Excel spreadsheet. Because of the comprehensive and standardized nature of the clinical drug trial database and its ease of access it should serve as valuable information repository and research tool for accessing clinical trial information and making evidence-based decisions by regulatory professionals, clinical research investigators, and medical practitioners.
Carey, George B; Kazantsev, Stephanie; Surati, Mosmi; Rolle, Cleo E; Kanteti, Archana; Sadiq, Ahad; Bahroos, Neil; Raumann, Brigitte; Madduri, Ravi; Dave, Paul; Starkey, Adam; Hensing, Thomas; Husain, Aliya N; Vokes, Everett E; Vigneswaran, Wickii; Armato, Samuel G; Kindler, Hedy L; Salgia, Ravi
2012-01-01
Objective An area of need in cancer informatics is the ability to store images in a comprehensive database as part of translational cancer research. To meet this need, we have implemented a novel tandem database infrastructure that facilitates image storage and utilisation. Background We had previously implemented the Thoracic Oncology Program Database Project (TOPDP) database for our translational cancer research needs. While useful for many research endeavours, it is unable to store images, hence our need to implement an imaging database which could communicate easily with the TOPDP database. Methods The Thoracic Oncology Research Program (TORP) imaging database was designed using the Research Electronic Data Capture (REDCap) platform, which was developed by Vanderbilt University. To demonstrate proof of principle and evaluate utility, we performed a retrospective investigation into tumour response for malignant pleural mesothelioma (MPM) patients treated at the University of Chicago Medical Center with either of two analogous chemotherapy regimens and consented to at least one of two UCMC IRB protocols, 9571 and 13473A. Results A cohort of 22 MPM patients was identified using clinical data in the TOPDP database. After measurements were acquired, two representative CT images and 0–35 histological images per patient were successfully stored in the TORP database, along with clinical and demographic data. Discussion We implemented the TORP imaging database to be used in conjunction with our comprehensive TOPDP database. While it requires an additional effort to use two databases, our database infrastructure facilitates more comprehensive translational research. Conclusions The investigation described herein demonstrates the successful implementation of this novel tandem imaging database infrastructure, as well as the potential utility of investigations enabled by it. The data model presented here can be utilised as the basis for further development of other larger, more streamlined databases in the future. PMID:23103606
Collaborative WiFi Fingerprinting Using Sensor-Based Navigation on Smartphones.
Zhang, Peng; Zhao, Qile; Li, You; Niu, Xiaoji; Zhuang, Yuan; Liu, Jingnan
2015-07-20
This paper presents a method that trains the WiFi fingerprint database using sensor-based navigation solutions. Since micro-electromechanical systems (MEMS) sensors provide only a short-term accuracy but suffer from the accuracy degradation with time, we restrict the time length of available indoor navigation trajectories, and conduct post-processing to improve the sensor-based navigation solution. Different middle-term navigation trajectories that move in and out of an indoor area are combined to make up the database. Furthermore, we evaluate the effect of WiFi database shifts on WiFi fingerprinting using the database generated by the proposed method. Results show that the fingerprinting errors will not increase linearly according to database (DB) errors in smartphone-based WiFi fingerprinting applications.
NSWC Crane Aerospace Cell Test History Database
NASA Technical Reports Server (NTRS)
Brown, Harry; Moore, Bruce
1994-01-01
The Aerospace Cell Test History Database was developed to provide project engineers and scientists ready access to the data obtained from testing of aerospace cell designs at Naval Surface Warfare Center, Crane Division. The database is intended for use by all aerospace engineers and scientists involved in the design of power systems for satellites. Specifically, the database will provide a tool for project engineers to review the progress of their test at Crane and to have ready access to data for evaluation. Additionally, the database will provide a history of test results that designers can draw upon to answer questions about cell performance under certain test conditions and aid in selection of a cell for a satellite battery. Viewgraphs are included.
Collaborative WiFi Fingerprinting Using Sensor-Based Navigation on Smartphones
Zhang, Peng; Zhao, Qile; Li, You; Niu, Xiaoji; Zhuang, Yuan; Liu, Jingnan
2015-01-01
This paper presents a method that trains the WiFi fingerprint database using sensor-based navigation solutions. Since micro-electromechanical systems (MEMS) sensors provide only a short-term accuracy but suffer from the accuracy degradation with time, we restrict the time length of available indoor navigation trajectories, and conduct post-processing to improve the sensor-based navigation solution. Different middle-term navigation trajectories that move in and out of an indoor area are combined to make up the database. Furthermore, we evaluate the effect of WiFi database shifts on WiFi fingerprinting using the database generated by the proposed method. Results show that the fingerprinting errors will not increase linearly according to database (DB) errors in smartphone-based WiFi fingerprinting applications. PMID:26205269
PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.
Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István
2005-01-01
PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.
NASA Astrophysics Data System (ADS)
Watanabe, S.; Utsumi, N.; Take, M.; Iida, A.
2016-12-01
This study aims to develop a new approach to assess the impact of climate change on the small oceanic islands in the Pacific. In the new approach, the change of the probabilities of various situations was projected with considering the spread of projection derived from ensemble simulations, instead of projecting the most probable situation. The database for Policy Decision making for Future climate change (d4PDF) is a database of long-term high-resolution climate ensemble experiments, which has the results of 100 ensemble simulations. We utilized the database for Policy Decision making for Future climate change (d4PDF), which was (a long-term and high-resolution database) composed of results of 100 ensemble experiments. A new methodology, Multi Threshold Ensemble Assessment (MTEA), was developed using the d4PDF in order to assess the impact of climate change. We focused on the impact of climate change on tourism because it has played an important role in the economy of the Pacific Islands. The Yaeyama Region, one of the tourist destinations in Okinawa, Japan, was selected as the case study site. Two kinds of impact were assessed: change in probability of extreme climate phenomena and tourist satisfaction associated with weather. The database of long-term high-resolution climate ensemble experiments and the questionnaire survey conducted by a local government were used for the assessment. The result indicated that the strength of extreme events would be increased, whereas the probability of occurrence would be decreased. This change should result in increase of the number of clear days and it could contribute to improve the tourist satisfaction.
Bohl, Daniel D; Russo, Glenn S; Basques, Bryce A; Golinvaux, Nicholas S; Fu, Michael C; Long, William D; Grauer, Jonathan N
2014-12-03
There has been an increasing use of national databases to conduct orthopaedic research. Questions regarding the validity and consistency of these studies have not been fully addressed. The purpose of this study was to test for similarity in reported measures between two national databases commonly used for orthopaedic research. A retrospective cohort study of patients undergoing lumbar spinal fusion procedures during 2009 to 2011 was performed in two national databases: the Nationwide Inpatient Sample and the National Surgical Quality Improvement Program. Demographic characteristics, comorbidities, and inpatient adverse events were directly compared between databases. The total numbers of patients included were 144,098 from the Nationwide Inpatient Sample and 8434 from the National Surgical Quality Improvement Program. There were only small differences in demographic characteristics between the two databases. There were large differences between databases in the rates at which specific comorbidities were documented. Non-morbid obesity was documented at rates of 9.33% in the Nationwide Inpatient Sample and 36.93% in the National Surgical Quality Improvement Program (relative risk, 0.25; p < 0.05). Peripheral vascular disease was documented at rates of 2.35% in the Nationwide Inpatient Sample and 0.60% in the National Surgical Quality Improvement Program (relative risk, 3.89; p < 0.05). Similarly, there were large differences between databases in the rates at which specific inpatient adverse events were documented. Sepsis was documented at rates of 0.38% in the Nationwide Inpatient Sample and 0.81% in the National Surgical Quality Improvement Program (relative risk, 0.47; p < 0.05). Acute kidney injury was documented at rates of 1.79% in the Nationwide Inpatient Sample and 0.21% in the National Surgical Quality Improvement Program (relative risk, 8.54; p < 0.05). As database studies become more prevalent in orthopaedic surgery, authors, reviewers, and readers should view these studies with caution. This study shows that two commonly used databases can identify demographically similar patients undergoing a common orthopaedic procedure; however, the databases document markedly different rates of comorbidities and inpatient adverse events. The differences are likely the result of the very different mechanisms through which the databases collect their comorbidity and adverse event data. Findings highlight concerns regarding the validity of orthopaedic database research. Copyright © 2014 by The Journal of Bone and Joint Surgery, Incorporated.
A review of accessibility of administrative healthcare databases in the Asia-Pacific region
Milea, Dominique; Azmi, Soraya; Reginald, Praveen; Verpillat, Patrice; Francois, Clement
2015-01-01
Objective We describe and compare the availability and accessibility of administrative healthcare databases (AHDB) in several Asia-Pacific countries: Australia, Japan, South Korea, Taiwan, Singapore, China, Thailand, and Malaysia. Methods The study included hospital records, reimbursement databases, prescription databases, and data linkages. Databases were first identified through PubMed, Google Scholar, and the ISPOR database register. Database custodians were contacted. Six criteria were used to assess the databases and provided the basis for a tool to categorise databases into seven levels ranging from least accessible (Level 1) to most accessible (Level 7). We also categorised overall data accessibility for each country as high, medium, or low based on accessibility of databases as well as the number of academic articles published using the databases. Results Fifty-four administrative databases were identified. Only a limited number of databases allowed access to raw data and were at Level 7 [Medical Data Vision EBM Provider, Japan Medical Data Centre (JMDC) Claims database and Nihon-Chouzai Pharmacy Claims database in Japan, and Medicare, Pharmaceutical Benefits Scheme (PBS), Centre for Health Record Linkage (CHeReL), HealthLinQ, Victorian Data Linkages (VDL), SA-NT DataLink in Australia]. At Levels 3–6 were several databases from Japan [Hamamatsu Medical University Database, Medi-Trend, Nihon University School of Medicine Clinical Data Warehouse (NUSM)], Australia [Western Australia Data Linkage (WADL)], Taiwan [National Health Insurance Research Database (NHIRD)], South Korea [Health Insurance Review and Assessment Service (HIRA)], and Malaysia [United Nations University (UNU)-Casemix]. Countries were categorised as having a high level of data accessibility (Australia, Taiwan, and Japan), medium level of accessibility (South Korea), or a low level of accessibility (Thailand, China, Malaysia, and Singapore). In some countries, data may be available but accessibility was restricted based on requirements by data custodians. Conclusions Compared with previous research, this study describes the landscape of databases in the selected countries with more granularity using an assessment tool developed for this purpose. A high number of databases were identified but most had restricted access, preventing their potential use to support research. We hope that this study helps to improve the understanding of the AHDB landscape, increase data sharing and database research in Asia-Pacific countries. PMID:27123180
Databases applicable to quantitative hazard/risk assessment-Towards a predictive systems toxicology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Waters, Michael; Jackson, Marcus
2008-11-15
The Workshop on The Power of Aggregated Toxicity Data addressed the requirement for distributed databases to support quantitative hazard and risk assessment. The authors have conceived and constructed with federal support several databases that have been used in hazard identification and risk assessment. The first of these databases, the EPA Gene-Tox Database was developed for the EPA Office of Toxic Substances by the Oak Ridge National Laboratory, and is currently hosted by the National Library of Medicine. This public resource is based on the collaborative evaluation, by government, academia, and industry, of short-term tests for the detection of mutagens andmore » presumptive carcinogens. The two-phased evaluation process resulted in more than 50 peer-reviewed publications on test system performance and a qualitative database on thousands of chemicals. Subsequently, the graphic and quantitative EPA/IARC Genetic Activity Profile (GAP) Database was developed in collaboration with the International Agency for Research on Cancer (IARC). A chemical database driven by consideration of the lowest effective dose, GAP has served IARC for many years in support of hazard classification of potential human carcinogens. The Toxicological Activity Profile (TAP) prototype database was patterned after GAP and utilized acute, subchronic, and chronic data from the Office of Air Quality Planning and Standards. TAP demonstrated the flexibility of the GAP format for air toxics, water pollutants and other environmental agents. The GAP format was also applied to developmental toxicants and was modified to represent quantitative results from the rodent carcinogen bioassay. More recently, the authors have constructed: 1) the NIEHS Genetic Alterations in Cancer (GAC) Database which quantifies specific mutations found in cancers induced by environmental agents, and 2) the NIEHS Chemical Effects in Biological Systems (CEBS) Knowledgebase that integrates genomic and other biological data including dose-response studies in toxicology and pathology. Each of the public databases has been discussed in prior publications. They will be briefly described in the present report from the perspective of aggregating datasets to augment the data and information contained within them.« less
ERIC Educational Resources Information Center
Shaw, W. M., Jr.
1993-01-01
Describes a study conducted on the cystic fibrosis (CF) database, a subset of MEDLINE, that investigated clustering structure and the effectiveness of cluster-based retrieval as a function of the exhaustivity of the uncontrolled subject descriptions. Results are compared to calculations for controlled descriptions based on Medical Subject Headings…
Environmental and Molecular Science Laboratory Arrow
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-06-24
Arrows is a software package that combines NWChem, SQL and NOSQL databases, email, and social networks (e.g. Twitter, Tumblr) that simplifies molecular and materials modeling and makes these modeling capabilities accessible to all scientists and engineers. EMSL Arrows is very simple to use. The user just emails chemical reactions to arrows@emsl.pnnl.gov and then an email is sent back with thermodynamic, reaction pathway (kinetic), spectroscopy, and other results. EMSL Arrows parses the email and then searches the database for the compounds in the reactions. If a compound isn't there, an NWChem calculation is setup and submitted to calculate it. Once themore » calculation is finished the results are entered into the database and then results are emailed back.« less
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
NASA Aeronautics and Space Database for bibliometric analysis
NASA Technical Reports Server (NTRS)
Powers, R.; Rudman, R.
2004-01-01
The authors use the NASA Aeronautics and Space Database to perform bibliometric analysis of citations. This paper explains their research methodology and gives some sample results showing collaboration trends between NASA Centers and other institutions.
AGRICULTURAL BEST MANAGEMENT PRACTICE EFFECTIVENESS DATABASE
Resource Purpose:The Agricultural Best Management Practice Effectiveness Database contains the results of research projects which have collected water quality data for the purpose of determining the effectiveness of agricultural management practices in reducing pollutants ...
Landscape features, standards, and semantics in U.S. national topographic mapping databases
Varanka, Dalia
2009-01-01
The objective of this paper is to examine the contrast between local, field-surveyed topographical representation and feature representation in digital, centralized databases and to clarify their ontological implications. The semantics of these two approaches are contrasted by examining the categorization of features by subject domains inherent to national topographic mapping. When comparing five USGS topographic mapping domain and feature lists, results indicate that multiple semantic meanings and ontology rules were applied to the initial digital database, but were lost as databases became more centralized at national scales, and common semantics were replaced by technological terms.
NASA Astrophysics Data System (ADS)
Belov, G. V.; Dyachkov, S. A.; Levashov, P. R.; Lomonosov, I. V.; Minakov, D. V.; Morozov, I. V.; Sineva, M. A.; Smirnov, V. N.
2018-01-01
The database structure, main features and user interface of an IVTANTHERMO-Online system are reviewed. This system continues the series of the IVTANTHERMO packages developed in JIHT RAS. It includes the database for thermodynamic properties of individual substances and related software for analysis of experimental results, data fitting, calculation and estimation of thermodynamical functions and thermochemistry quantities. In contrast to the previous IVTANTHERMO versions it has a new extensible database design, the client-server architecture, a user-friendly web interface with a number of new features for online and offline data processing.
Comparison of hospital databases on antibiotic consumption in France, for a single management tool.
Henard, S; Boussat, S; Demoré, B; Clément, S; Lecompte, T; May, T; Rabaud, C
2014-07-01
The surveillance of antibiotic use in hospitals and of data on resistance is an essential measure for antibiotic stewardship. There are 3 national systems in France to collect data on antibiotic use: DREES, ICATB, and ATB RAISIN. We compared these databases and drafted recommendations for the creation of an optimized database of information on antibiotic use, available to all concerned personnel: healthcare authorities, healthcare facilities, and healthcare professionals. We processed and analyzed the 3 databases (2008 data), and surveyed users. The qualitative analysis demonstrated major discrepancies in terms of objectives, healthcare facilities, participation rate, units of consumption, conditions for collection, consolidation, and control of data, and delay before availability of results. The quantitative analysis revealed that the consumption data for a given healthcare facility differed from one database to another, challenging the reliability of data collection. We specified user expectations: to compare consumption and resistance data, to carry out benchmarking, to obtain data on the prescribing habits in healthcare units, or to help understand results. The study results demonstrated the need for a reliable, single, and automated tool to manage data on antibiotic consumption compared with resistance data on several levels (national, regional, healthcare facility, healthcare units), providing rapid local feedback and educational benchmarking. Copyright © 2014 Elsevier Masson SAS. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupcich, Franco; Badal, Andreu; Kyprianou, Iacovos
Purpose: The purpose of this study was to develop a database for estimating organ dose in a voxelized patient model for coronary angiography and brain perfusion CT acquisitions with any spectra and angular tube current modulation setting. The database enables organ dose estimation for existing and novel acquisition techniques without requiring Monte Carlo simulations. Methods: The study simulated transport of monoenergetic photons between 5 and 150 keV for 1000 projections over 360 Degree-Sign through anthropomorphic voxelized female chest and head (0 Degree-Sign and 30 Degree-Sign tilt) phantoms and standard head and body CTDI dosimetry cylinders. The simulations resulted in tablesmore » of normalized dose deposition for several radiosensitive organs quantifying the organ dose per emitted photon for each incident photon energy and projection angle for coronary angiography and brain perfusion acquisitions. The values in a table can be multiplied by an incident spectrum and number of photons at each projection angle and then summed across all energies and angles to estimate total organ dose. Scanner-specific organ dose may be approximated by normalizing the database-estimated organ dose by the database-estimated CTDI{sub vol} and multiplying by a physical CTDI{sub vol} measurement. Two examples are provided demonstrating how to use the tables to estimate relative organ dose. In the first, the change in breast and lung dose during coronary angiography CT scans is calculated for reduced kVp, angular tube current modulation, and partial angle scanning protocols relative to a reference protocol. In the second example, the change in dose to the eye lens is calculated for a brain perfusion CT acquisition in which the gantry is tilted 30 Degree-Sign relative to a nontilted scan. Results: Our database provides tables of normalized dose deposition for several radiosensitive organs irradiated during coronary angiography and brain perfusion CT scans. Validation results indicate total organ doses calculated using our database are within 1% of those calculated using Monte Carlo simulations with the same geometry and scan parameters for all organs except red bone marrow (within 6%), and within 23% of published estimates for different voxelized phantoms. Results from the example of using the database to estimate organ dose for coronary angiography CT acquisitions show 2.1%, 1.1%, and -32% change in breast dose and 2.1%, -0.74%, and 4.7% change in lung dose for reduced kVp, tube current modulated, and partial angle protocols, respectively, relative to the reference protocol. Results show -19.2% difference in dose to eye lens for a tilted scan relative to a nontilted scan. The reported relative changes in organ doses are presented without quantification of image quality and are for the sole purpose of demonstrating the use of the proposed database. Conclusions: The proposed database and calculation method enable the estimation of organ dose for coronary angiography and brain perfusion CT scans utilizing any spectral shape and angular tube current modulation scheme by taking advantage of the precalculated Monte Carlo simulation results. The database can be used in conjunction with image quality studies to develop optimized acquisition techniques and may be particularly beneficial for optimizing dual kVp acquisitions for which numerous kV, mA, and filtration combinations may be investigated.« less
ERIC Educational Resources Information Center
Lamothe, Alain R.
2011-01-01
The purpose of this paper is to report the results of a quantitative analysis exploring the interaction and relationship between the online database and electronic journal collections at the J. N. Desmarais Library of Laurentian University. A very strong relationship exists between the number of searches and the size of the online database…
Detecting errors and anomalies in computerized materials control and accountability databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whiteson, R.; Hench, K.; Yarbro, T.
The Automated MC and A Database Assessment project is aimed at improving anomaly and error detection in materials control and accountability (MC and A) databases and increasing confidence in the data that they contain. Anomalous data resulting in poor categorization of nuclear material inventories greatly reduces the value of the database information to users. Therefore it is essential that MC and A data be assessed periodically for anomalies or errors. Anomaly detection can identify errors in databases and thus provide assurance of the integrity of data. An expert system has been developed at Los Alamos National Laboratory that examines thesemore » large databases for anomalous or erroneous data. For several years, MC and A subject matter experts at Los Alamos have been using this automated system to examine the large amounts of accountability data that the Los Alamos Plutonium Facility generates. These data are collected and managed by the Material Accountability and Safeguards System, a near-real-time computerized nuclear material accountability and safeguards system. This year they have expanded the user base, customizing the anomaly detector for the varying requirements of different groups of users. This paper describes the progress in customizing the expert systems to the needs of the users of the data and reports on their results.« less
Holm, Sven; Russell, Greg; Nourrit, Vincent; McLoughlin, Niall
2017-01-01
A database of retinal fundus images, the DR HAGIS database, is presented. This database consists of 39 high-resolution color fundus images obtained from a diabetic retinopathy screening program in the UK. The NHS screening program uses service providers that employ different fundus and digital cameras. This results in a range of different image sizes and resolutions. Furthermore, patients enrolled in such programs often display other comorbidities in addition to diabetes. Therefore, in an effort to replicate the normal range of images examined by grading experts during screening, the DR HAGIS database consists of images of varying image sizes and resolutions and four comorbidity subgroups: collectively defined as the diabetic retinopathy, hypertension, age-related macular degeneration, and Glaucoma image set (DR HAGIS). For each image, the vasculature has been manually segmented to provide a realistic set of images on which to test automatic vessel extraction algorithms. Modified versions of two previously published vessel extraction algorithms were applied to this database to provide some baseline measurements. A method based purely on the intensity of images pixels resulted in a mean segmentation accuracy of 95.83% ([Formula: see text]), whereas an algorithm based on Gabor filters generated an accuracy of 95.71% ([Formula: see text]).
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
Semi-automatic feedback using concurrence between mixture vectors for general databases
NASA Astrophysics Data System (ADS)
Larabi, Mohamed-Chaker; Richard, Noel; Colot, Olivier; Fernandez-Maloigne, Christine
2001-12-01
This paper describes how a query system can exploit the basic knowledge by employing semi-automatic relevance feedback to refine queries and runtimes. For general databases, it is often useless to call complex attributes, because we have not sufficient information about images in the database. Moreover, these images can be topologically very different from one to each other and an attribute that is powerful for a database category may be very powerless for the other categories. The idea is to use very simple features, such as color histogram, correlograms, Color Coherence Vectors (CCV), to fill out the signature vector. Then, a number of mixture vectors is prepared depending on the number of very distinctive categories in the database. Knowing that a mixture vector is a vector containing the weight of each attribute that will be used to compute a similarity distance. We post a query in the database using successively all the mixture vectors defined previously. We retain then the N first images for each vector in order to make a mapping using the following information: Is image I present in several mixture vectors results? What is its rank in the results? These informations allow us to switch the system on an unsupervised relevance feedback or user's feedback (supervised feedback).
Yoo, Do Hyeon; Shin, Wook-Geun; Lee, Jaekook; Yeom, Yeon Soo; Kim, Chan Hyeong; Chang, Byung-Uck; Min, Chul Hee
2017-11-01
After the Fukushima accident in Japan, the Korean Government implemented the "Act on Protective Action Guidelines Against Radiation in the Natural Environment" to regulate unnecessary radiation exposure to the public. However, despite the law which came into effect in July 2012, an appropriate method to evaluate the equivalent and effective doses from naturally occurring radioactive material (NORM) in consumer products is not available. The aim of the present study is to develop and validate an effective dose coefficient database enabling the simple and correct evaluation of the effective dose due to the usage of NORM-added consumer products. To construct the database, we used a skin source method with a computational human phantom and Monte Carlo (MC) simulation. For the validation, the effective dose was compared between the database using interpolation method and the original MC method. Our result showed a similar equivalent dose across the 26 organs and a corresponding average dose between the database and the MC calculations of < 5% difference. The differences in the effective doses were even less, and the result generally show that equivalent and effective doses can be quickly calculated with the database with sufficient accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.
MICA: desktop software for comprehensive searching of DNA databases
Stokes, William A; Glick, Benjamin S
2006-01-01
Background Molecular biologists work with DNA databases that often include entire genomes. A common requirement is to search a DNA database to find exact matches for a nondegenerate or partially degenerate query. The software programs available for such purposes are normally designed to run on remote servers, but an appealing alternative is to work with DNA databases stored on local computers. We describe a desktop software program termed MICA (K-Mer Indexing with Compact Arrays) that allows large DNA databases to be searched efficiently using very little memory. Results MICA rapidly indexes a DNA database. On a Macintosh G5 computer, the complete human genome could be indexed in about 5 minutes. The indexing algorithm recognizes all 15 characters of the DNA alphabet and fully captures the information in any DNA sequence, yet for a typical sequence of length L, the index occupies only about 2L bytes. The index can be searched to return a complete list of exact matches for a nondegenerate or partially degenerate query of any length. A typical search of a long DNA sequence involves reading only a small fraction of the index into memory. As a result, searches are fast even when the available RAM is limited. Conclusion MICA is suitable as a search engine for desktop DNA analysis software. PMID:17018144
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of “big data” has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on “adherence to prescription and medical plans” identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases. PMID:27358570
Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.
Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery
2009-01-01
We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).
Total choline and choline-containing moieties of commercially available pulses.
Lewis, Erin D; Kosik, Sarah J; Zhao, Yuan-Yuan; Jacobs, René L; Curtis, Jonathan M; Field, Catherine J
2014-06-01
Estimating dietary choline intake can be challenging due to missing foods in the current United States Department of Agriculture (USDA) database. The objectives of the study were to quantify the choline-containing moieties and the total choline content of a variety of pulses available in North America and use the expanded compositional database to determine the potential contribution of pulses to dietary choline intake. Commonly consumed pulses (n = 32) were analyzed by hydrophilic interaction liquid chromatography-tandem mass spectrometry (HILIC LC-MS/MS) and compared to the current USDA database. Cooking was found to reduce the relative percent from free choline and increased the contribution of phosphatidylcholine to total choline for most pulses (P < 0.05). Using the expanded database to estimate choline content of recipes using pulses as meat alternatives, resulted in a different estimation of choline content per serving (±30%), compared to the USDA database. These results suggest that when pulses are a large part of a meal or diet, the use of accurate food composition data should be used.
Moo-Young, Tricia A; Panergo, Jessel; Wang, Chih E; Patel, Subhash; Duh, Hong Yan; Winchester, David J; Prinz, Richard A; Fogelfeld, Leon
2013-11-01
Clinicopathologic variables influence the treatment and prognosis of patients with thyroid cancer. A retrospective analysis of public hospital thyroid cancer database and the Surveillance, Epidemiology and End Results 17 database was conducted. Demographic, clinical, and pathologic data were compared across ethnic groups. Within the public hospital database, Hispanics versus non-Hispanic whites were younger and had more lymph node involvement (34% vs 17%, P < .001). Median tumor size was not statistically different across ethnic groups. Similar findings were demonstrated within the Surveillance, Epidemiology and End Results database. African Americans aged <45 years had the largest tumors but were least likely to have lymph node involvement. Asians had the most stage IV disease despite having no differences in tumor size, lymph node involvement, and capsular invasion. There is considerable variability in the clinical presentation of thyroid cancer across ethnic groups. Such disparities persist within an equal-access health care system. These findings suggest that factors beyond socioeconomics may contribute to such differences. Copyright © 2013 Elsevier Inc. All rights reserved.
Handwritten word preprocessing for database adaptation
NASA Astrophysics Data System (ADS)
Oprean, Cristina; Likforman-Sulem, Laurence; Mokbel, Chafic
2013-01-01
Handwriting recognition systems are typically trained using publicly available databases, where data have been collected in controlled conditions (image resolution, paper background, noise level,...). Since this is not often the case in real-world scenarios, classification performance can be affected when novel data is presented to the word recognition system. To overcome this problem, we present in this paper a new approach called database adaptation. It consists of processing one set (training or test) in order to adapt it to the other set (test or training, respectively). Specifically, two kinds of preprocessing, namely stroke thickness normalization and pixel intensity normalization are considered. The advantage of such approach is that we can re-use the existing recognition system trained on controlled data. We conduct several experiments with the Rimes 2011 word database and with a real-world database. We adapt either the test set or the training set. Results show that training set adaptation achieves better results than test set adaptation, at the cost of a second training stage on the adapted data. Accuracy of data set adaptation is increased by 2% to 3% in absolute value over no adaptation.
A literature search tool for intelligent extraction of disease-associated genes.
Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P
2014-01-01
To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
Age estimation using cortical surface pattern combining thickness with curvatures
Wang, Jieqiong; Li, Wenjing; Miao, Wen; Dai, Dai; Hua, Jing; He, Huiguang
2014-01-01
Brain development and healthy aging have been proved to follow a specific pattern, which, in turn, can be applied to help doctors diagnose mental diseases. In this paper, we design a cortical surface pattern (CSP) combining the cortical thickness with curvatures, which constructs an accurate human age estimation model with relevance vector regression. We test our model with two public databases. One is the IXI database (360 healthy subjects aging from 20 to 82 years old were selected), and the other is the INDI database (303 subjects aging from 7 to 22 years old were selected). The results show that our model can achieve as small as 4.57 years deviation in the IXI database and 1.38 years deviation in the INDI database. Furthermore, we employ this surface pattern to age groups classification, and get a remarkably high accuracy (97.77%) and a significantly high sensitivity/specificity (97.30%/98.10%). These results suggest that our designed CSP combining thickness with curvatures is stable and sensitive to brain development, and it is much more powerful than voxel-based morphometry used in previous methods for age estimation. PMID:24395657
Person identification in irregular cardiac conditions using electrocardiogram signals.
Sidek, Khairul Azami; Khalil, Ibrahim
2011-01-01
This paper presents a person identification mechanism in irregular cardiac conditions using ECG signals. A total of 30 subjects were used in the study from three different public ECG databases containing various abnormal heart conditions from the Paroxysmal Atrial Fibrillation Predicition Challenge database (AFPDB), MIT-BIH Supraventricular Arrthymia database (SVDB) and T-Wave Alternans Challenge database (TWADB). Cross correlation (CC) was used as the biometric matching algorithm with defined threshold values to evaluate the performance. In order to measure the efficiency of this simple yet effective matching algorithm, two biometric performance metrics were used which are false acceptance rate (FAR) and false reject rate (FRR). Our experimentation results suggest that ECG based biometric identification with irregular cardiac condition gives a higher recognition rate of different ECG signals when tested for three different abnormal cardiac databases yielding false acceptance rate (FAR) of 2%, 3% and 2% and false reject rate (FRR) of 1%, 2% and 0% for AFPDB, SVDB and TWADB respectively. These results also indicate the existence of salient biometric characteristics in the ECG morphology within the QRS complex that tends to differentiate individuals.
From ClinicalTrials.gov trial registry to an analysis-ready database of clinical trial results.
Cepeda, M Soledad; Lobanov, Victor; Berlin, Jesse A
2013-04-01
The ClinicalTrials.gov web site provides a convenient interface to look up study results, but it does not allow downloading data in a format that can be readily used for quantitative analyses. To develop a system that automatically downloads study results from ClinicalTrials.gov and provides an interface to retrieve study results in a spreadsheet format ready for analysis. Sherlock(®) identifies studies by intervention, population, or outcome of interest and in seconds creates an analytic database of study results ready for analyses. The outcome classification algorithms used in Sherlock were validated against a classification by an expert. Having a database ready for analysis that can be updated automatically, dramatically extends the utility of the ClinicalTrials.gov trial registry. It increases the speed of comparative research, reduces the need for manual extraction of data, and permits answering a vast array of questions.
National Databases for Neurosurgical Outcomes Research: Options, Strengths, and Limitations.
Karhade, Aditya V; Larsen, Alexandra M G; Cote, David J; Dubois, Heloise M; Smith, Timothy R
2017-08-05
Quality improvement, value-based care delivery, and personalized patient care depend on robust clinical, financial, and demographic data streams of neurosurgical outcomes. The neurosurgical literature lacks a comprehensive review of large national databases. To assess the strengths and limitations of various resources for outcomes research in neurosurgery. A review of the literature was conducted to identify surgical outcomes studies using national data sets. The databases were assessed for the availability of patient demographics and clinical variables, longitudinal follow-up of patients, strengths, and limitations. The number of unique patients contained within each data set ranged from thousands (Quality Outcomes Database [QOD]) to hundreds of millions (MarketScan). Databases with both clinical and financial data included PearlDiver, Premier Healthcare Database, Vizient Clinical Data Base and Resource Manager, and the National Inpatient Sample. Outcomes collected by databases included patient-reported outcomes (QOD); 30-day morbidity, readmissions, and reoperations (National Surgical Quality Improvement Program); and disease incidence and disease-specific survival (Surveillance, Epidemiology, and End Results-Medicare). The strengths of large databases included large numbers of rare pathologies and multi-institutional nationally representative sampling; the limitations of these databases included variable data veracity, variable data completeness, and missing disease-specific variables. The improvement of existing large national databases and the establishment of new registries will be crucial to the future of neurosurgical outcomes research. Copyright © 2017 by the Congress of Neurological Surgeons
Bekkers, Stijn; Bot, Arjan G J; Makarawung, Dennis; Neuhaus, Valentin; Ring, David
2014-11-01
The National Hospital Discharge Survey (NHDS) and the Nationwide Inpatient Sample (NIS) collect sample data and publish annual estimates of inpatient care in the United States, and both are commonly used in orthopaedic research. However, there are important differences between the databases, and because of these differences, asking these two databases the same question may result in different answers. The degree to which this is true for arthroplasty-related research has, to our knowledge, not been characterized. We tested the following null hypotheses: (1) there are no differences between the NHDS and NIS in patient characteristics, comorbidities, and adverse events in patients with hip osteoarthritis treated with THA, and (2) there are no differences between databases in factors associated with inpatient mortality, adverse events, and length of hospital stay after THA. The NHDS and NIS databases use different methods of data collection and weighting to provide data representative of all nonfederal hospital discharges in the United States. In 2006 the NHDS database contained 203,149 patients with hip arthritis treated with hip arthroplasty, and the NIS database included 193,879 patients. Multivariable analyses for factors associated with inpatient mortality, adverse events, and days of care were constructed for each database. We found that 26 of 42 of the factors in demographics, comorbidities, and adverse events after THA in the NIS and NHDS databases differed more than 10%. Age and days of care were associated with inpatient mortality with the NHDS and the NIS although the effect rates differ more than 10%. The NIS identified seven other factors not identified by the NHDS: wound complications, congestive heart failure, new mental disorder, chronic pulmonary disease, dementia, geographic region Northeast, acute postoperative anemia, and sex, that were associated with inpatient mortality even after controlling for potentially confounding variables. For inpatient adverse events, atrial fibrillation, osteoporosis, and female sex were associated with the NHDS and the NIS although the effect rates differ more than 10%. There were different directions for sources of payment, dementia, congestive heart failure, and geographic region. For longer length of stay, common factors differing more than 10% in effect rate included chronic pulmonary disease, atrial fibrillation, complication not elsewhere classified, congestive heart failure, transfusion, discharge nonroutine compared with routine, acute postoperative anemia, hypertension, wound adverse events, and diabetes mellitus, whereas discrepant factors included geographic region, payment method, dementia, sex, and iatrogenic hypotension. Studies that use large databases intended to be representative of the entire United States population can produce different results, likely related to differences in the databases, such as the number of comorbidities and procedures that can be entered in the database. In other words, analyses of large databases can have limited reliability and should be interpreted with caution. Level II, prognostic study. See the Instructions for Authors for a complete description of levels of evidence.
Animal Detection in Natural Images: Effects of Color and Image Database
Zhu, Weina; Drewes, Jan; Gegenfurtner, Karl R.
2013-01-01
The visual system has a remarkable ability to extract categorical information from complex natural scenes. In order to elucidate the role of low-level image features for the recognition of objects in natural scenes, we recorded saccadic eye movements and event-related potentials (ERPs) in two experiments, in which human subjects had to detect animals in previously unseen natural images. We used a new natural image database (ANID) that is free of some of the potential artifacts that have plagued the widely used COREL images. Color and grayscale images picked from the ANID and COREL databases were used. In all experiments, color images induced a greater N1 EEG component at earlier time points than grayscale images. We suggest that this influence of color in animal detection may be masked by later processes when measuring reation times. The ERP results of go/nogo and forced choice tasks were similar to those reported earlier. The non-animal stimuli induced bigger N1 than animal stimuli both in the COREL and ANID databases. This result indicates ultra-fast processing of animal images is possible irrespective of the particular database. With the ANID images, the difference between color and grayscale images is more pronounced than with the COREL images. The earlier use of the COREL images might have led to an underestimation of the contribution of color. Therefore, we conclude that the ANID image database is better suited for the investigation of the processing of natural scenes than other databases commonly used. PMID:24130744
CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002.
Yang, Yaohua; Feng, Jie; Li, Tao; Ge, Feng; Zhao, Jindong
2015-01-01
Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics's usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics. © The Author(s) 2015. Published by Oxford University Press.
The Protein Disease Database of human body fluids: II. Computer methods and data issues.
Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R
1995-01-01
The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.
Design of Integrated Database on Mobile Information System: A Study of Yogyakarta Smart City App
NASA Astrophysics Data System (ADS)
Nurnawati, E. K.; Ermawati, E.
2018-02-01
An integration database is a database which acts as the data store for multiple applications and thus integrates data across these applications (in contrast to an Application Database). An integration database needs a schema that takes all its client applications into account. The benefit of the schema that sharing data among applications does not require an extra layer of integration services on the applications. Any changes to data made in a single application are made available to all applications at the time of database commit - thus keeping the applications’ data use better synchronized. This study aims to design and build an integrated database that can be used by various applications in a mobile device based system platforms with the based on smart city system. The built-in database can be used by various applications, whether used together or separately. The design and development of the database are emphasized on the flexibility, security, and completeness of attributes that can be used together by various applications to be built. The method used in this study is to choice of the appropriate database logical structure (patterns of data) and to build the relational-database models (Design Databases). Test the resulting design with some prototype apps and analyze system performance with test data. The integrated database can be utilized both of the admin and the user in an integral and comprehensive platform. This system can help admin, manager, and operator in managing the application easily and efficiently. This Android-based app is built based on a dynamic clientserver where data is extracted from an external database MySQL. So if there is a change of data in the database, then the data on Android applications will also change. This Android app assists users in searching of Yogyakarta (as smart city) related information, especially in culture, government, hotels, and transportation.
Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio
2017-01-01
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067
Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio
2017-01-01
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.
The ClinicalTrials.gov results database--update and key issues.
Zarin, Deborah A; Tse, Tony; Williams, Rebecca J; Califf, Robert M; Ide, Nicholas C
2011-03-03
The ClinicalTrials.gov trial registry was expanded in 2008 to include a database for reporting summary results. We summarize the structure and contents of the results database, provide an update of relevant policies, and show how the data can be used to gain insight into the state of clinical research. We analyzed ClinicalTrials.gov data that were publicly available between September 2009 and September 2010. As of September 27, 2010, ClinicalTrials.gov received approximately 330 new and 2000 revised registrations each week, along with 30 new and 80 revised results submissions. We characterized the 79,413 registry and 2178 results of trial records available as of September 2010. From a sample cohort of results records, 78 of 150 (52%) had associated publications within 2 years after posting. Of results records available publicly, 20% reported more than two primary outcome measures and 5% reported more than five. Of a sample of 100 registry record outcome measures, 61% lacked specificity in describing the metric used in the planned analysis. In a sample of 700 results records, the mean number of different analysis populations per study group was 2.5 (median, 1; range, 1 to 25). Of these trials, 24% reported results for 90% or less of their participants. ClinicalTrials.gov provides access to study results not otherwise available to the public. Although the database allows examination of various aspects of ongoing and completed clinical trials, its ultimate usefulness depends on the research community to submit accurate, informative data.
Teaching Children to Use Databases through Direct Instruction.
ERIC Educational Resources Information Center
Rooze, Gene E.
1988-01-01
Provides a direct instruction strategy for teaching skills and concepts required for database use. Creates an interactive environment which motivates, provides a model, imparts information, allows active student participation, gives knowledge of results, and presents guidance. (LS)
Effects of distributed database modeling on evaluation of transaction rollbacks
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.
Effects of distributed database modeling on evaluation of transaction rollbacks
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. Here, researchers investigate the effect of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks in a partitioned distributed database system. The researchers developed six probabilistic models and expressions for the number of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results obtained are compared to results from simulation. It was concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughput is also grossly undermined when such models are employed.
Drozdovitch, Vladimir; Zhukova, Olga; Germenchuk, Maria; Khrutchinsky, Arkady; Kukhta, Tatiana; Luckyanov, Nickolas; Minenko, Victor; Podgaiskaya, Marina; Savkin, Mikhail; Vakulovsky, Sergey; Voillequé, Paul; Bouville, André
2012-01-01
Results of all available meteorological and radiation measurements that were performed in Belarus during the first three months after the Chernobyl accident were collected from various sources and incorporated into a single database. Meteorological information such as precipitation, wind speed and direction, and temperature in localities were obtained from meteorological station facilities. Radiation measurements include gamma-exposure rate in air, daily fallout, concentration of different radionuclides in soil, grass, cow’s milk and water as well as total beta-activity in cow’s milk. Considerable efforts were made to evaluate the reliability of the measurements that were collected. The electronic database can be searched according to type of measurement, date, and location. The main purpose of the database is to provide reliable data that can be used in the reconstruction of thyroid doses resulting from the Chernobyl accident. PMID:23103580
Review of Methods for Buildings Energy Performance Modelling
NASA Astrophysics Data System (ADS)
Krstić, Hrvoje; Teni, Mihaela
2017-10-01
Research presented in this paper gives a brief review of methods used for buildings energy performance modelling. This paper gives also a comprehensive review of the advantages and disadvantages of available methods as well as the input parameters used for modelling buildings energy performance. European Directive EPBD obliges the implementation of energy certification procedure which gives an insight on buildings energy performance via exiting energy certificate databases. Some of the methods for buildings energy performance modelling mentioned in this paper are developed by employing data sets of buildings which have already undergone an energy certification procedure. Such database is used in this paper where the majority of buildings in the database have already gone under some form of partial retrofitting - replacement of windows or installation of thermal insulation but still have poor energy performance. The case study presented in this paper utilizes energy certificates database obtained from residential units in Croatia (over 400 buildings) in order to determine the dependence between buildings energy performance and variables from database by using statistical dependencies tests. Building energy performance in database is presented with building energy efficiency rate (from A+ to G) which is based on specific annual energy needs for heating for referential climatic data [kWh/(m2a)]. Independent variables in database are surfaces and volume of the conditioned part of the building, building shape factor, energy used for heating, CO2 emission, building age and year of reconstruction. Research results presented in this paper give an insight in possibilities of methods used for buildings energy performance modelling. Further on it gives an analysis of dependencies between buildings energy performance as a dependent variable and independent variables from the database. Presented results could be used for development of new building energy performance predictive model.
Validated MicroRNA Target Databases: An Evaluation.
Lee, Yun Ji Diana; Kim, Veronica; Muth, Dillon C; Witwer, Kenneth W
2015-11-01
Preclinical Research Positive findings from preclinical and clinical studies involving depletion or supplementation of microRNA (miRNA) engender optimism about miRNA-based therapeutics. However, off-target effects must be considered. Predicting these effects is complicated. Each miRNA may target many gene transcripts, and the rules governing imperfectly complementary miRNA: target interactions are incompletely understood. Several databases provide lists of the relatively small number of experimentally confirmed miRNA: target pairs. Although incomplete, this information might allow assessment of at least some of the off-target effects. We evaluated the performance of four databases of experimentally validated miRNA: target interactions (miRWalk 2.0, miRTarBase, miRecords, and TarBase 7.0) using a list of 50 alphabetically consecutive genes. We examined the provided citations to determine the degree to which each interaction was experimentally supported. To assess stability, we tested at the beginning and end of a five-month period. Results varied widely by database. Two of the databases changed significantly over the course of 5 months. Most reported evidence for miRNA: target interactions were indirect or otherwise weak, and relatively few interactions were supported by more than one publication. Some returned results appear to arise from simplistic text searches that offer no insight into the relationship of the search terms, may not even include the reported gene or miRNA, and may thus, be invalid. We conclude that validation databases provide important information, but not all information in all extant databases is up-to-date or accurate. Nevertheless, the more comprehensive validation databases may provide useful starting points for investigation of off-target effects of proposed small RNA therapies. © 2015 Wiley Periodicals, Inc.
Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-05-01
To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Development, deployment and operations of ATLAS databases
NASA Astrophysics Data System (ADS)
Vaniachine, A. V.; Schmitt, J. G. v. d.
2008-07-01
In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.
Database tomography for commercial application
NASA Technical Reports Server (NTRS)
Kostoff, Ronald N.; Eberhart, Henry J.
1994-01-01
Database tomography is a method for extracting themes and their relationships from text. The algorithms, employed begin with word frequency and word proximity analysis and build upon these results. When the word 'database' is used, think of medical or police records, patents, journals, or papers, etc. (any text information that can be computer stored). Database tomography features a full text, user interactive technique enabling the user to identify areas of interest, establish relationships, and map trends for a deeper understanding of an area of interest. Database tomography concepts and applications have been reported in journals and presented at conferences. One important feature of the database tomography algorithm is that it can be used on a database of any size, and will facilitate the users ability to understand the volume of content therein. While employing the process to identify research opportunities it became obvious that this promising technology has potential applications for business, science, engineering, law, and academe. Examples include evaluating marketing trends, strategies, relationships and associations. Also, the database tomography process would be a powerful component in the area of competitive intelligence, national security intelligence and patent analysis. User interests and involvement cannot be overemphasized.
Technical Aspects of Interfacing MUMPS to an External SQL Relational Database Management System
Kuzmak, Peter M.; Walters, Richard F.; Penrod, Gail
1988-01-01
This paper describes an interface connecting InterSystems MUMPS (M/VX) to an external relational DBMS, the SYBASE Database Management System. The interface enables MUMPS to operate in a relational environment and gives the MUMPS language full access to a complete set of SQL commands. MUMPS generates SQL statements as ASCII text and sends them to the RDBMS. The RDBMS executes the statements and returns ASCII results to MUMPS. The interface suggests that the language features of MUMPS make it an attractive tool for use in the relational database environment. The approach described in this paper separates MUMPS from the relational database. Positioning the relational database outside of MUMPS promotes data sharing and permits a number of different options to be used for working with the data. Other languages like C, FORTRAN, and COBOL can access the RDBMS database. Advanced tools provided by the relational database vendor can also be used. SYBASE is an advanced high-performance transaction-oriented relational database management system for the VAX/VMS and UNIX operating systems. SYBASE is designed using a distributed open-systems architecture, and is relatively easy to interface with MUMPS.
Gold, L S; Slone, T H; Backman, G M; Magaw, R; Da Costa, M; Lopipero, P; Blumenthal, M; Ames, B N
1987-01-01
This paper is the second chronological supplement to the Carcinogenic Potency Database, published earlier in this journal (1,2,4). We report here results of carcinogenesis bioassays published in the general literature between January 1983 and December 1984, and in Technical Reports of the National Cancer Institute/National Toxicology Program between January 1983 and May 1986. This supplement includes results of 525 long-term, chronic experiments of 199 test compounds, and reports the same information about each experiment in the same plot format as the earlier papers: e.g., the species and strain of test animal, the route and duration of compound administration, dose level and other aspects of experimental protocol, histopathology and tumor incidence, TD50 (carcinogenic potency) and its statistical significance, dose response, author's opinion about carcinogenicity, and literature citation. We refer the reader to the 1984 publications for a description of the numerical index of carcinogenic potency (TD50), a guide to the plot of the database, and a discussion of the sources of data, the rationale for the inclusion of particular experiments and particular target sites, and the conventions adopted in summarizing the literature. The three plots of the database are to be used together, since results of experiments published in earlier plots are not repeated. Taken together, the three plots include results for more than 3500 experiments on 975 chemicals. Appendix 14 is an index to all chemicals in the database and indicates which plot(s) each chemical appears in. PMID:3691431
Gold, L S; Slone, T H; Backman, G M; Eisenberg, S; Da Costa, M; Wong, M; Manley, N B; Rohrbach, L; Ames, B N
1990-01-01
This paper is the third chronological supplement to the Carcinogenic Potency Database that first appeared in this journal in 1984. We report here results of carcinogenesis bioassays published in the general literature between January 1985 and December 1986, and in Technical Reports of the National Toxicology Program between June 1986 and June 1987. This supplement includes results of 337 long-term, chronic experiments of 121 compounds, and reports the same information about each experiment in the same plot format as the earlier papers, e.g., the species and strain of animal, the route and duration of compound administration, dose level, and other aspects of experimental protocol, histopathology, and tumor incidence, TD50 (carcinogenic potency) and its statistical significance, dose response, opinion of the author about carcinogenicity, and literature citation. The reader needs to refer to the 1984 publication for a guide to the plot of the database, a complete description of the numerical index of carcinogenic potency, and a discussion of the sources of data, the rationale for the inclusion of particular experiments and particular target sites, and the conventions adopted in summarizing the literature. The four plots of the database are to be used together as results published earlier are not repeated. In all, the four plots include results for approximately 4000 experiments on 1050 chemicals. Appendix 14 of this paper is an alphabetical index to all chemicals in the database and indicates which plot(s) each chemical appears in. A combined plot of all results from the four separate papers, that is ordered alphabetically by chemical, is available from the first author, in printed form or on computer tape or diskette. PMID:2351123
Peptide reranking with protein-peptide correspondence and precursor peak intensity information.
Yang, Chao; He, Zengyou; Yang, Can; Yu, Weichuan
2012-01-01
Searching tandem mass spectra against a protein database has been a mainstream method for peptide identification. Improving peptide identification results by ranking true Peptide-Spectrum Matches (PSMs) over their false counterparts leads to the development of various reranking algorithms. In peptide reranking, discriminative information is essential to distinguish true PSMs from false PSMs. Generally, most peptide reranking methods obtain discriminative information directly from database search scores or by training machine learning models. Information in the protein database and MS1 spectra (i.e., single stage MS spectra) is ignored. In this paper, we propose to use information in the protein database and MS1 spectra to rerank peptide identification results. To quantitatively analyze their effects to peptide reranking results, three peptide reranking methods are proposed: PPMRanker, PPIRanker, and MIRanker. PPMRanker only uses Protein-Peptide Map (PPM) information from the protein database, PPIRanker only uses Precursor Peak Intensity (PPI) information, and MIRanker employs both PPM information and PPI information. According to our experiments on a standard protein mixture data set, a human data set and a mouse data set, PPMRanker and MIRanker achieve better peptide reranking results than PetideProphet, PeptideProphet+NSP (number of sibling peptides) and a score regularization method SRPI. The source codes of PPMRanker, PPIRanker, and MIRanker, and all supplementary documents are available at our website: http://bioinformatics.ust.hk/pepreranking/. Alternatively, these documents can also be downloaded from: http://sourceforge.net/projects/pepreranking/.
Subject searching of monographs online in the medical literature.
Brahmi, F A
1988-01-01
Searching by subject for monographic information online in the medical literature is a challenging task. The NLM database of choice is CATLINE. Other NLM databases of interest are BIOTHICSLINE, CANCERLIT, HEALTH, POPLINE, and TOXLINE. Ten BRS databases are also discussed. Of these, Books in Print, Bookinfo, and OCLC are explored further. The databases are compared as to number of total records and number and percentage of monographs. Three topics were searched on CROSS to compare hits on BBIP, BOOK, and OCLC. The same searches were run on CATLINE. The parameters of time coverage and language were equalized and the resulting citations were compared and analyzed for duplication and uniqueness. With the input of CATLINE tapes into OCLC, OCLC has become the database of choice for searching by subject for medical monographs.
Conceptual and logical level of database modeling
NASA Astrophysics Data System (ADS)
Hunka, Frantisek; Matula, Jiri
2016-06-01
Conceptual and logical levels form the top most levels of database modeling. Usually, ORM (Object Role Modeling) and ER diagrams are utilized to capture the corresponding schema. The final aim of business process modeling is to store its results in the form of database solution. For this reason, value oriented business process modeling which utilizes ER diagram to express the modeling entities and relationships between them are used. However, ER diagrams form the logical level of database schema. To extend possibilities of different business process modeling methodologies, the conceptual level of database modeling is needed. The paper deals with the REA value modeling approach to business process modeling using ER-diagrams, and derives conceptual model utilizing ORM modeling approach. Conceptual model extends possibilities for value modeling to other business modeling approaches.
A RESEARCH DATABASE FOR IMPROVED DATA MANAGEMENT AND ANALYSIS IN LONGITUDINAL STUDIES
BIELEFELD, ROGER A.; YAMASHITA, TOYOKO S.; KEREKES, EDWARD F.; ERCANLI, EHAT; SINGER, LYNN T.
2014-01-01
We developed a research database for a five-year prospective investigation of the medical, social, and developmental correlates of chronic lung disease during the first three years of life. We used the Ingres database management system and the Statit statistical software package. The database includes records containing 1300 variables each, the results of 35 psychological tests, each repeated five times (providing longitudinal data on the child, the parents, and behavioral interactions), both raw and calculated variables, and both missing and deferred values. The four-layer menu-driven user interface incorporates automatic activation of complex functions to handle data verification, missing and deferred values, static and dynamic backup, determination of calculated values, display of database status, reports, bulk data extraction, and statistical analysis. PMID:7596250
A high performance, ad-hoc, fuzzy query processing system for relational databases
NASA Technical Reports Server (NTRS)
Mansfield, William H., Jr.; Fleischman, Robert M.
1992-01-01
Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
Functionally Graded Materials Database
NASA Astrophysics Data System (ADS)
Kisara, Katsuto; Konno, Tomomi; Niino, Masayuki
2008-02-01
Functionally Graded Materials Database (hereinafter referred to as FGMs Database) was open to the society via Internet in October 2002, and since then it has been managed by the Japan Aerospace Exploration Agency (JAXA). As of October 2006, the database includes 1,703 research information entries with 2,429 researchers data, 509 institution data and so on. Reading materials such as "Applicability of FGMs Technology to Space Plane" and "FGMs Application to Space Solar Power System (SSPS)" were prepared in FY 2004 and 2005, respectively. The English version of "FGMs Application to Space Solar Power System (SSPS)" is now under preparation. This present paper explains the FGMs Database, describing the research information data, the sitemap and how to use it. From the access analysis, user access results and users' interests are discussed.
SSME environment database development
NASA Technical Reports Server (NTRS)
Reardon, John
1987-01-01
The internal environment of the Space Shuttle Main Engine (SSME) is being determined from hot firings of the prototype engines and from model tests using either air or water as the test fluid. The objectives are to develop a database system to facilitate management and analysis of test measurements and results, to enter available data into the the database, and to analyze available data to establish conventions and procedures to provide consistency in data normalization and configuration geometry references.
Access to DNA and protein databases on the Internet.
Harper, R
1994-02-01
During the past year, the number of biological databases that can be queried via Internet has dramatically increased. This increase has resulted from the introduction of networking tools, such as Gopher and WAIS, that make it easy for research workers to index databases and make them available for on-line browsing. Biocomputing in the nineties will see the advent of more client/server options for the solution of problems in bioinformatics.
A reference system for animal biometrics: application to the northern leopard frog
Petrovska-Delacretaz, D.; Edwards, A.; Chiasson, J.; Chollet, G.; Pilliod, D.S.
2014-01-01
Reference systems and public databases are available for human biometrics, but to our knowledge nothing is available for animal biometrics. This is surprising because animals are not required to give their agreement to be in a database. This paper proposes a reference system and database for the northern leopard frog (Lithobates pipiens). Both are available for reproducible experiments. Results of both open set and closed set experiments are given.
WebBee: A Platform for Secure Coordination and Communication in Crisis Scenarios
2008-04-16
implemented through database triggers. The Webbee Database Server contains an Information Server, which is a Postgres database with PostGIS [5] extension...sends it to the target user. The heavy lifting for this mechanism is done through an extension of Postgres triggers (Figures 6.1 and 6.2), resulting...in fewer queries and better performance. Trigger support in Postgres is table-based and comparatively primitive: with n table triggers, an update
Measurement tools for the diagnosis of nasal septal deviation: a systematic review
2014-01-01
Objective To perform a systematic review of measurement tools utilized for the diagnosis of nasal septal deviation (NSD). Methods Electronic database searches were performed using MEDLINE (from 1966 to second week of August 2013), EMBASE (from 1966 to second week of August 2013), Web of Science (from 1945 to second week of August 2013) and all Evidence Based Medicine Reviews Files (EBMR); Cochrane Database of Systematic Review (CDSR), Cochrane Central Register of Controlled Trials (CCTR), Cochrane Methodology Register (CMR), Database of Abstracts of Reviews of Effects (DARE), American College of Physicians Journal Club (ACP Journal Club), Health Technology Assessments (HTA), NHS Economic Evaluation Database (NHSEED) till the second quarter of 2013. The search terms used in database searches were ‘nasal septum’, ‘deviation’, ‘diagnosis’, ‘nose deformities’ and ‘nose malformation’. The studies were reviewed using the updated Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool. Results Online searches resulted in 23 abstracts after removal of duplicates that resulted from overlap of studies between the electronic databases. An additional 15 abstracts were excluded due to lack of relevance. A total of 8 studies were systematically reviewed. Conclusions Diagnostic modalities such as acoustic rhinometry, rhinomanometry and nasal spectral sound analysis may be useful in identifying NSD in anterior region of the nasal cavity, but these tests in isolation are of limited utility. Compared to anterior rhinoscopy, nasal endoscopy, and imaging the above mentioned index tests lack sensitivity and specificity in identifying the presence, location, and severity of NSD. PMID:24762010
Developing an A Priori Database for Passive Microwave Snow Water Retrievals Over Ocean
NASA Astrophysics Data System (ADS)
Yin, Mengtao; Liu, Guosheng
2017-12-01
A physically optimized a priori database is developed for Global Precipitation Measurement Microwave Imager (GMI) snow water retrievals over ocean. The initial snow water content profiles are derived from CloudSat Cloud Profiling Radar (CPR) measurements. A radiative transfer model in which the single-scattering properties of nonspherical snowflakes are based on the discrete dipole approximate results is employed to simulate brightness temperatures and their gradients. Snow water content profiles are then optimized through a one-dimensional variational (1D-Var) method. The standard deviations of the difference between observed and simulated brightness temperatures are in a similar magnitude to the observation errors defined for observation error covariance matrix after the 1D-Var optimization, indicating that this variational method is successful. This optimized database is applied in a Bayesian retrieval snow water algorithm. The retrieval results indicated that the 1D-Var approach has a positive impact on the GMI retrieved snow water content profiles by improving the physical consistency between snow water content profiles and observed brightness temperatures. Global distribution of snow water contents retrieved from the a priori database is compared with CloudSat CPR estimates. Results showed that the two estimates have a similar pattern of global distribution, and the difference of their global means is small. In addition, we investigate the impact of using physical parameters to subset the database on snow water retrievals. It is shown that using total precipitable water to subset the database with 1D-Var optimization is beneficial for snow water retrievals.
Achieving high confidence protein annotations in a sea of unknowns
NASA Astrophysics Data System (ADS)
Timmins-Schiffman, E.; May, D. H.; Noble, W. S.; Nunn, B. L.; Mikan, M.; Harvey, H. R.
2016-02-01
Increased sensitivity of mass spectrometry (MS) technology allows deep and broad insight into community functional analyses. Metaproteomics holds the promise to reveal functional responses of natural microbial communities, whereas metagenomics alone can only hint at potential functions. The complex datasets resulting from ocean MS have the potential to inform diverse realms of the biological, chemical, and physical ocean sciences, yet the extent of bacterial functional diversity and redundancy has not been fully explored. To take advantage of these impressive datasets, we need a clear bioinformatics pipeline for metaproteomics peptide identification and annotation with a database that can provide confident identifications. Researchers must consider whether it is sufficient to leverage the vast quantities of available ocean sequence data or if they must invest in site-specific metagenomic sequencing. We have sequenced, to our knowledge, the first western arctic metagenomes from the Bering Strait and the Chukchi Sea. We have addressed the long standing question: Is a metagenome required to accurately complete metaproteomics and assess the biological distribution of metabolic functions controlling nutrient acquisition in the ocean? Two different protein databases were constructed from 1) a site-specific metagenome and 2) subarctic/arctic groups available in NCBI's non-redundant database. Multiple proteomic search strategies were employed, against each individual database and against both databases combined, to determine the algorithm and approach that yielded the balance of high sensitivity and confident identification. Results yielded over 8200 confidently identified proteins. Our comparison of these results allows us to quantify the utility of investing resources in a metagenome versus using the constantly expanding and immediately available public databases for metaproteomic studies.
Fine-grained policy control in U.S. Army Research Laboratory (ARL) multimodal signatures database
NASA Astrophysics Data System (ADS)
Bennett, Kelly; Grueneberg, Keith; Wood, David; Calo, Seraphin
2014-06-01
The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) consists of a number of colocated relational databases representing a collection of data from various sensors. Role-based access to this data is granted to external organizations such as DoD contractors and other government agencies through a client Web portal. In the current MMSDB system, access control is only at the database and firewall level. In order to offer finer grained security, changes to existing user profile schemas and authentication mechanisms are usually needed. In this paper, we describe a software middleware architecture and implementation that allows fine-grained access control to the MMSDB at a dataset, table, and row level. Result sets from MMSDB queries issued in the client portal are filtered with the use of a policy enforcement proxy, with minimal changes to the existing client software and database. Before resulting data is returned to the client, policies are evaluated to determine if the user or role is authorized to access the data. Policies can be authored to filter data at the row, table or column level of a result set. The system uses various technologies developed in the International Technology Alliance in Network and Information Science (ITA) for policy-controlled information sharing and dissemination1. Use of the Policy Management Library provides a mechanism for the management and evaluation of policies to support finer grained access to the data in the MMSDB system. The GaianDB is a policy-enabled, federated database that acts as a proxy between the client application and the MMSDB system.
dbMDEGA: a database for meta-analysis of differentially expressed genes in autism spectrum disorder.
Zhang, Shuyun; Deng, Libin; Jia, Qiyue; Huang, Shaoting; Gu, Junwang; Zhou, Fankun; Gao, Meng; Sun, Xinyi; Feng, Chang; Fan, Guangqin
2017-11-16
Autism spectrum disorders (ASD) are hereditary, heterogeneous and biologically complex neurodevelopmental disorders. Individual studies on gene expression in ASD cannot provide clear consensus conclusions. Therefore, a systematic review to synthesize the current findings from brain tissues and a search tool to share the meta-analysis results are urgently needed. Here, we conducted a meta-analysis of brain gene expression profiles in the current reported human ASD expression datasets (with 84 frozen male cortex samples, 17 female cortex samples, 32 cerebellum samples and 4 formalin fixed samples) and knock-out mouse ASD model expression datasets (with 80 collective brain samples). Then, we applied R language software and developed an interactive shared and updated database (dbMDEGA) displaying the results of meta-analysis of data from ASD studies regarding differentially expressed genes (DEGs) in the brain. This database, dbMDEGA ( https://dbmdega.shinyapps.io/dbMDEGA/ ), is a publicly available web-portal for manual annotation and visualization of DEGs in the brain from data from ASD studies. This database uniquely presents meta-analysis values and homologous forest plots of DEGs in brain tissues. Gene entries are annotated with meta-values, statistical values and forest plots of DEGs in brain samples. This database aims to provide searchable meta-analysis results based on the current reported brain gene expression datasets of ASD to help detect candidate genes underlying this disorder. This new analytical tool may provide valuable assistance in the discovery of DEGs and the elucidation of the molecular pathogenicity of ASD. This database model may be replicated to study other disorders.
Côté, Richard G; Jones, Philip; Martens, Lennart; Kerrien, Samuel; Reisinger, Florian; Lin, Quan; Leinonen, Rasko; Apweiler, Rolf; Hermjakob, Henning
2007-10-18
Each major protein database uses its own conventions when assigning protein identifiers. Resolving the various, potentially unstable, identifiers that refer to identical proteins is a major challenge. This is a common problem when attempting to unify datasets that have been annotated with proteins from multiple data sources or querying data providers with one flavour of protein identifiers when the source database uses another. Partial solutions for protein identifier mapping exist but they are limited to specific species or techniques and to a very small number of databases. As a result, we have not found a solution that is generic enough and broad enough in mapping scope to suit our needs. We have created the Protein Identifier Cross-Reference (PICR) service, a web application that provides interactive and programmatic (SOAP and REST) access to a mapping algorithm that uses the UniProt Archive (UniParc) as a data warehouse to offer protein cross-references based on 100% sequence identity to proteins from over 70 distinct source databases loaded into UniParc. Mappings can be limited by source database, taxonomic ID and activity status in the source database. Users can copy/paste or upload files containing protein identifiers or sequences in FASTA format to obtain mappings using the interactive interface. Search results can be viewed in simple or detailed HTML tables or downloaded as comma-separated values (CSV) or Microsoft Excel (XLS) files suitable for use in a local database or a spreadsheet. Alternatively, a SOAP interface is available to integrate PICR functionality in other applications, as is a lightweight REST interface. We offer a publicly available service that can interactively map protein identifiers and protein sequences to the majority of commonly used protein databases. Programmatic access is available through a standards-compliant SOAP interface or a lightweight REST interface. The PICR interface, documentation and code examples are available at http://www.ebi.ac.uk/Tools/picr.
Critical assessment of human metabolic pathway databases: a stepping stone for future integration
2011-01-01
Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor. PMID:21999653
SPECIES DATABASES AND THE BIOINFORMATICS REVOLUTION.
Biological databases are having a growth spurt. Much of this results from research in genetics and biodiversity, coupled with fast-paced developments in information technology. The revolution in bioinformatics, defined by Sugden and Pennisi (2000) as the "tools and techniques for...
Content Based Image Matching for Planetary Science
NASA Astrophysics Data System (ADS)
Deans, M. C.; Meyer, C.
2006-12-01
Planetary missions generate large volumes of data. With the MER rovers still functioning on Mars, PDS contains over 7200 released images from the Microscopic Imagers alone. These data products are only searchable by keys such as the Sol, spacecraft clock, or rover motion counter index, with little connection to the semantic content of the images. We have developed a method for matching images based on the visual textures in images. For every image in a database, a series of filters compute the image response to localized frequencies and orientations. Filter responses are turned into a low dimensional descriptor vector, generating a 37 dimensional fingerprint. For images such as the MER MI, this represents a compression ratio of 99.9965% (the fingerprint is approximately 0.0035% the size of the original image). At query time, fingerprints are quickly matched to find images with similar appearance. Image databases containing several thousand images are preprocessed offline in a matter of hours. Image matches from the database are found in a matter of seconds. We have demonstrated this image matching technique using three sources of data. The first database consists of 7200 images from the MER Microscopic Imager. The second database consists of 3500 images from the Narrow Angle Mars Orbital Camera (MOC-NA), which were cropped into 1024×1024 sub-images for consistency. The third database consists of 7500 scanned archival photos from the Apollo Metric Camera. Example query results from all three data sources are shown. We have also carried out user tests to evaluate matching performance by hand labeling results. User tests verify approximately 20% false positive rate for the top 14 results for MOC NA and MER MI data. This means typically 10 to 12 results out of 14 match the query image sufficiently. This represents a powerful search tool for databases of thousands of images where the a priori match probability for an image might be less than 1%. Qualitatively, correct matches can also be confirmed by verifying MI images taken in the same z-stack, or MOC image tiles taken from the same image strip. False negatives are difficult to quantify as it would mean finding matches in the database of thousands of images that the algorithm did not detect.
Analysis of human serum phosphopeptidome by a focused database searching strategy.
Zhu, Jun; Wang, Fangjun; Cheng, Kai; Song, Chunxia; Qin, Hongqiang; Hu, Lianghai; Figeys, Daniel; Ye, Mingliang; Zou, Hanfa
2013-01-14
As human serum is an important source for early diagnosis of many serious diseases, analysis of serum proteome and peptidome has been extensively performed. However, the serum phosphopeptidome was less explored probably because the effective method for database searching is lacking. Conventional database searching strategy always uses the whole proteome database, which is very time-consuming for phosphopeptidome search due to the huge searching space resulted from the high redundancy of the database and the setting of dynamic modifications during searching. In this work, a focused database searching strategy using an in-house collected human serum pro-peptidome target/decoy database (HuSPep) was established. It was found that the searching time was significantly decreased without compromising the identification sensitivity. By combining size-selective Ti (IV)-MCM-41 enrichment, RP-RP off-line separation, and complementary CID and ETD fragmentation with the new searching strategy, 143 unique endogenous phosphopeptides and 133 phosphorylation sites (109 novel sites) were identified from human serum with high reliability. Copyright © 2012 Elsevier B.V. All rights reserved.
A database of georeferenced nutrient chemistry data for mountain lakes of the Western United States
Williams, Jason; Labou, Stephanie G.
2017-01-01
Human activities have increased atmospheric nitrogen and phosphorus deposition rates relative to pre-industrial background. In the Western U.S., anthropogenic nutrient deposition has increased nutrient concentrations and stimulated algal growth in at least some remote mountain lakes. The Georeferenced Lake Nutrient Chemistry (GLNC) Database was constructed to create a spatially-extensive lake chemistry database needed to assess atmospheric nutrient deposition effects on Western U.S. mountain lakes. The database includes nitrogen and phosphorus water chemistry data spanning 1964–2015, with 148,336 chemistry results from 51,048 samples collected across 3,602 lakes in the Western U.S. Data were obtained from public databases, government agencies, scientific literature, and researchers, and were formatted into a consistent table structure. All data are georeferenced to a modified version of the National Hydrography Dataset Plus version 2. The database is transparent and reproducible; R code and input files used to format data are provided in an appendix. The database will likely be useful to those assessing spatial patterns of lake nutrient chemistry associated with atmospheric deposition or other environmental stressors. PMID:28509907
The Halophile protein database.
Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj
2014-01-01
Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.
Domain Regeneration for Cross-Database Micro-Expression Recognition
NASA Astrophysics Data System (ADS)
Zong, Yuan; Zheng, Wenming; Huang, Xiaohua; Shi, Jingang; Cui, Zhen; Zhao, Guoying
2018-05-01
In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.
Moreo, Michael T.; Justet, Leigh
2008-01-01
Ground-water withdrawal estimates from 1913 through 2003 for the Death Valley regional ground-water flow system are compiled in an electronic database to support a regional, three-dimensional, transient ground-water flow model. This database updates a previously published database that compiled estimates of ground-water withdrawals for 1913-1998. The same methodology is used to construct each database. Primary differences between the 2 databases are an additional 5 years of ground-water withdrawal data, well locations in the updated database are restricted to Death Valley regional ground-water flow system model boundary, and application rates are from 0 to 1.5 feet per year lower than original estimates. The lower application rates result from revised estimates of crop consumptive use, which are based on updated estimates of potential evapotranspiration. In 2003, about 55,700 acre-feet of ground water was pumped in the DVRFS, of which 69 percent was used for irrigation, 13 percent for domestic, and 18 percent for public supply, commercial, and mining activities.
A storage scheme for the real-time database supporting the on-line commitment
NASA Astrophysics Data System (ADS)
Dai, Hong-bin; Jing, Yu-jian; Wang, Hui
2013-07-01
The modern SCADA (Supervisory Control and Data acquisition) systems have been applied to various aspects of everyday life. As the time goes on, the requirements of the applications of the systems vary. Thus the data structure of the real-time database, which is the core of a SCADA system, often needs modification. As a result, the commitment consisting of a sequence of configuration operations modifying the data structure of the real-time database is performed from time to time. Though it is simple to perform the off-line commitment by first stopping and then restarting the system, during which all the data in the real-time database are reconstructed. It is much more preferred or in some cases even necessary to perform the on-line commitment, during which the real-time database can still provide real-time service and the system continues working normally. In this paper, a storage scheme of the data in the real-time database is proposed. It helps the real-time database support its on-line commitment, during which real-time service is still available.
Development of a land-cover characteristics database for the conterminous U.S.
Loveland, Thomas R.; Merchant, J.W.; Ohlen, D.O.; Brown, Jesslyn F.
1991-01-01
Information regarding the characteristics and spatial distribution of the Earth's land cover is critical to global environmental research. A prototype land-cover database for the conterminous United States designed for use in a variety of global modelling, monitoring, mapping, and analytical endeavors has been created. The resultant database contains multiple layers, including the source AVHRR data, the ancillary data layers, the land-cover regions defined by the research, and translation tables linking the regions to other land classification schema (for example, UNESCO, USGS Anderson System). The land-cover characteristics database can be analyzed, transformed, or aggregated by users to meet a broad spectrum of requirements. -from Authors
The methodology of database design in organization management systems
NASA Astrophysics Data System (ADS)
Chudinov, I. L.; Osipova, V. V.; Bobrova, Y. V.
2017-01-01
The paper describes the unified methodology of database design for management information systems. Designing the conceptual information model for the domain area is the most important and labor-intensive stage in database design. Basing on the proposed integrated approach to design, the conceptual information model, the main principles of developing the relation databases are provided and user’s information needs are considered. According to the methodology, the process of designing the conceptual information model includes three basic stages, which are defined in detail. Finally, the article describes the process of performing the results of analyzing user’s information needs and the rationale for use of classifiers.
Construction of a Linux based chemical and biological information system.
Molnár, László; Vágó, István; Fehér, András
2003-01-01
A chemical and biological information system with a Web-based easy-to-use interface and corresponding databases has been developed. The constructed system incorporates all chemical, numerical and textual data related to the chemical compounds, including numerical biological screen results. Users can search the database by traditional textual/numerical and/or substructure or similarity queries through the web interface. To build our chemical database management system, we utilized existing IT components such as ORACLE or Tripos SYBYL for database management and Zope application server for the web interface. We chose Linux as the main platform, however, almost every component can be used under various operating systems.
LSE-Sign: A lexical database for Spanish Sign Language.
Gutierrez-Sigut, Eva; Costello, Brendan; Baus, Cristina; Carreiras, Manuel
2016-03-01
The LSE-Sign database is a free online tool for selecting Spanish Sign Language stimulus materials to be used in experiments. It contains 2,400 individual signs taken from a recent standardized LSE dictionary, and a further 2,700 related nonsigns. Each entry is coded for a wide range of grammatical, phonological, and articulatory information, including handshape, location, movement, and non-manual elements. The database is accessible via a graphically based search facility which is highly flexible both in terms of the search options available and the way the results are displayed. LSE-Sign is available at the following website: http://www.bcbl.eu/databases/lse/.
Brown, Sandra [University of Illinois, Urbana, Illinois (USA); Iverson, Louis R. [University of Illinois, Urbana, Illinois (USA); Prasad, Anantha [University of Illinois, Urbana, Illinois (USA); Beaty, Tammy W. [CDIAC, Oak Ridge National Laboratory, Oak Ridge, TN (USA); Olsen, Lisa M. [CDIAC, Oak Ridge National Laboratory, Oak Ridge, TN (USA); Cushman, Robert M. [CDIAC, Oak Ridge National Laboratory, Oak Ridge, TN (USA); Brenkert, Antoinette L. [CDIAC, Oak Ridge National Laboratory, Oak Ridge, TN (USA)
2001-03-01
A database was generated of estimates of geographically referenced carbon densities of forest vegetation in tropical Southeast Asia for 1980. A geographic information system (GIS) was used to incorporate spatial databases of climatic, edaphic, and geomorphological indices and vegetation to estimate potential (i.e., in the absence of human intervention and natural disturbance) carbon densities of forests. The resulting map was then modified to estimate actual 1980 carbon density as a function of population density and climatic zone. The database covers the following 13 countries: Bangladesh, Brunei, Cambodia (Campuchea), India, Indonesia, Laos, Malaysia, Myanmar (Burma), Nepal, the Philippines, Sri Lanka, Thailand, and Vietnam.
Centralized database for interconnection system design. [for spacecraft
NASA Technical Reports Server (NTRS)
Billitti, Joseph W.
1989-01-01
A database application called DFACS (Database, Forms and Applications for Cabling and Systems) is described. The objective of DFACS is to improve the speed and accuracy of interconnection system information flow during the design and fabrication stages of a project, while simultaneously supporting both the horizontal (end-to-end wiring) and the vertical (wiring by connector) design stratagems used by the Jet Propulsion Laboratory (JPL) project engineering community. The DFACS architecture is centered around a centralized database and program methodology which emulates the manual design process hitherto used at JPL. DFACS has been tested and successfully applied to existing JPL hardware tasks with a resulting reduction in schedule time and costs.
Chan, Derek K P; Tsui, Henry C L; Kot, Brian C W
2017-11-21
Databases are systematic tools to archive and manage information related to marine mammal stranding and mortality events. Stranding response networks, governmental authorities and non-governmental organizations have established regional or national stranding networks and have developed unique standard stranding response and necropsy protocols to document and track stranded marine mammal demographics, signalment and health data. The objectives of this study were to (1) describe and review the current status of marine mammal stranding and mortality databases worldwide, including the year established, types of database and their goals; and (2) summarize the geographic range included in the database, the number of cases recorded, accessibility, filter and display methods. Peer-reviewed literature was searched, focussing on published databases of live and dead marine mammal strandings and mortality and information released from stranding response organizations (i.e. online updates, journal articles and annual stranding reports). Databases that were not published in the primary literature or recognized by government agencies were excluded. Based on these criteria, 10 marine mammal stranding and mortality databases were identified, and strandings and necropsy data found in these databases were evaluated. We discuss the results, limitations and future prospects of database development. Future prospects include the development and application of virtopsy, a new necropsy investigation tool. A centralized web-accessed database of all available postmortem multimedia from stranded marine mammals may eventually support marine conservation and policy decisions, which will allow the use of marine animals as sentinels of ecosystem health, working towards a 'One Ocean-One Health' ideal.
Does filler database size influence identification accuracy?
Bergold, Amanda N; Heaton, Paul
2018-06-01
Police departments increasingly use large photo databases to select lineup fillers using facial recognition software, but this technological shift's implications have been largely unexplored in eyewitness research. Database use, particularly if coupled with facial matching software, could enable lineup constructors to increase filler-suspect similarity and thus enhance eyewitness accuracy (Fitzgerald, Oriet, Price, & Charman, 2013). However, with a large pool of potential fillers, such technologies might theoretically produce lineup fillers too similar to the suspect (Fitzgerald, Oriet, & Price, 2015; Luus & Wells, 1991; Wells, Rydell, & Seelau, 1993). This research proposes a new factor-filler database size-as a lineup feature affecting eyewitness accuracy. In a facial recognition experiment, we select lineup fillers in a legally realistic manner using facial matching software applied to filler databases of 5,000, 25,000, and 125,000 photos, and find that larger databases are associated with a higher objective similarity rating between suspects and fillers and lower overall identification accuracy. In target present lineups, witnesses viewing lineups created from the larger databases were less likely to make correct identifications and more likely to select known innocent fillers. When the target was absent, database size was associated with a lower rate of correct rejections and a higher rate of filler identifications. Higher algorithmic similarity ratings were also associated with decreases in eyewitness identification accuracy. The results suggest that using facial matching software to select fillers from large photograph databases may reduce identification accuracy, and provides support for filler database size as a meaningful system variable. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
NONE
2011-02-15
Purpose: The development of computer-aided diagnostic (CAD) methods for lung nodule detection, classification, and quantitative assessment can be facilitated through a well-characterized repository of computed tomography (CT) scans. The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) completed such a database, establishing a publicly available reference for the medical imaging research community. Initiated by the National Cancer Institute (NCI), further advanced by the Foundation for the National Institutes of Health (FNIH), and accompanied by the Food and Drug Administration (FDA) through active participation, this public-private partnership demonstrates the success of a consortium founded on a consensus-based process.more » Methods: Seven academic centers and eight medical imaging companies collaborated to identify, address, and resolve challenging organizational, technical, and clinical issues to provide a solid foundation for a robust database. The LIDC/IDRI Database contains 1018 cases, each of which includes images from a clinical thoracic CT scan and an associated XML file that records the results of a two-phase image annotation process performed by four experienced thoracic radiologists. In the initial blinded-read phase, each radiologist independently reviewed each CT scan and marked lesions belonging to one of three categories (''nodule{>=}3 mm,''''nodule<3 mm,'' and ''non-nodule{>=}3 mm''). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to render a final opinion. The goal of this process was to identify as completely as possible all lung nodules in each CT scan without requiring forced consensus. Results: The Database contains 7371 lesions marked ''nodule'' by at least one radiologist. 2669 of these lesions were marked ''nodule{>=}3 mm'' by at least one radiologist, of which 928 (34.7%) received such marks from all four radiologists. These 2669 lesions include nodule outlines and subjective nodule characteristic ratings. Conclusions: The LIDC/IDRI Database is expected to provide an essential medical imaging research resource to spur CAD development, validation, and dissemination in clinical practice.« less
Zhang, Liming; Yu, Dongsheng; Shi, Xuezheng; Xu, Shengxiang; Xing, Shihe; Zhao, Yongcong
2014-01-01
Soil organic carbon (SOC) models were often applied to regions with high heterogeneity, but limited spatially differentiated soil information and simulation unit resolution. This study, carried out in the Tai-Lake region of China, defined the uncertainty derived from application of the DeNitrification-DeComposition (DNDC) biogeochemical model in an area with heterogeneous soil properties and different simulation units. Three different resolution soil attribute databases, a polygonal capture of mapping units at 1∶50,000 (P5), a county-based database of 1∶50,000 (C5) and county-based database of 1∶14,000,000 (C14), were used as inputs for regional DNDC simulation. The P5 and C5 databases were combined with the 1∶50,000 digital soil map, which is the most detailed soil database for the Tai-Lake region. The C14 database was combined with 1∶14,000,000 digital soil map, which is a coarse database and is often used for modeling at a national or regional scale in China. The soil polygons of P5 database and county boundaries of C5 and C14 databases were used as basic simulation units. Results project that from 1982 to 2000, total SOC change in the top layer (0–30 cm) of the 2.3 M ha of paddy soil in the Tai-Lake region was +1.48 Tg C, −3.99 Tg C and −15.38 Tg C based on P5, C5 and C14 databases, respectively. With the total SOC change as modeled with P5 inputs as the baseline, which is the advantages of using detailed, polygon-based soil dataset, the relative deviation of C5 and C14 were 368% and 1126%, respectively. The comparison illustrates that DNDC simulation is strongly influenced by choice of fundamental geographic resolution as well as input soil attribute detail. The results also indicate that improving the framework of DNDC is essential in creating accurate models of the soil carbon cycle. PMID:24523922
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
DOE Office of Scientific and Technical Information (OSTI.GOV)
D. D. Blackwell; K. W. Wisian; M. C. Richards
2000-04-01
Several activities related to geothermal resources in the western United States are described in this report. A database of geothermal site-specific thermal gradient and heat flow results from individual exploration wells in the western US has been assembled. Extensive temperature gradient and heat flow exploration data from the active exploration of the 1970's and 1980's were collected, compiled, and synthesized, emphasizing previously unavailable company data. Examples of the use and applications of the database are described. The database and results are available on the world wide web. In this report numerical models are used to establish basic qualitative relationships betweenmore » structure, heat input, and permeability distribution, and the resulting geothermal system. A series of steady state, two-dimensional numerical models evaluate the effect of permeability and structural variations on an idealized, generic Basin and Range geothermal system and the results are described.« less
The ClinicalTrials.gov Results Database — Update and Key Issues
Zarin, Deborah A.; Tse, Tony; Williams, Rebecca J.; Califf, Robert M.; Ide, Nicholas C.
2011-01-01
BACKGROUND The ClinicalTrials.gov trial registry was expanded in 2008 to include a database for reporting summary results. We summarize the structure and contents of the results database, provide an update of relevant policies, and show how the data can be used to gain insight into the state of clinical research. METHODS We analyzed ClinicalTrials.gov data that were publicly available between September 2009 and September 2010. RESULTS As of September 27, 2010, ClinicalTrials.gov received approximately 330 new and 2000 revised registrations each week, along with 30 new and 80 revised results submissions. We characterized the 79,413 registry and 2178 results of trial records available as of September 2010. From a sample cohort of results records, 78 of 150 (52%) had associated publications within 2 years after posting. Of results records available publicly, 20% reported more than two primary outcome measures and 5% reported more than five. Of a sample of 100 registry record outcome measures, 61% lacked specificity in describing the metric used in the planned analysis. In a sample of 700 results records, the mean number of different analysis populations per study group was 2.5 (median, 1; range, 1 to 25). Of these trials, 24% reported results for 90% or less of their participants. CONCLUSIONS ClinicalTrials.gov provides access to study results not otherwise available to the public. Although the database allows examination of various aspects of ongoing and completed clinical trials, its ultimate usefulness depends on the research community to submit accurate, informative data. PMID:21366476
Separation and confirmation of showers
NASA Astrophysics Data System (ADS)
Neslušan, L.; Hajduková, M.
2017-02-01
Aims: Using IAU MDC photographic, IAU MDC CAMS video, SonotaCo video, and EDMOND video databases, we aim to separate all provable annual meteor showers from each of these databases. We intend to reveal the problems inherent in this procedure and answer the question whether the databases are complete and the methods of separation used are reliable. We aim to evaluate the statistical significance of each separated shower. In this respect, we intend to give a list of reliably separated showers rather than a list of the maximum possible number of showers. Methods: To separate the showers, we simultaneously used two methods. The use of two methods enables us to compare their results, and this can indicate the reliability of the methods. To evaluate the statistical significance, we suggest a new method based on the ideas of the break-point method. Results: We give a compilation of the showers from all four databases using both methods. Using the first (second) method, we separated 107 (133) showers, which are in at least one of the databases used. These relatively low numbers are a consequence of discarding any candidate shower with a poor statistical significance. Most of the separated showers were identified as meteor showers from the IAU MDC list of all showers. Many of them were identified as several of the showers in the list. This proves that many showers have been named multiple times with different names. Conclusions: At present, a prevailing share of existing annual showers can be found in the data and confirmed when we use a combination of results from large databases. However, to gain a complete list of showers, we need more-complete meteor databases than the most extensive databases currently are. We also still need a more sophisticated method to separate showers and evaluate their statistical significance. Tables A.1 and A.2 are also available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A40
Gradishar, William; Johnson, KariAnne; Brown, Krystal; Mundt, Erin; Manley, Susan
2017-07-01
There is a growing move to consult public databases following receipt of a genetic test result from a clinical laboratory; however, the well-documented limitations of these databases call into question how often clinicians will encounter discordant variant classifications that may introduce uncertainty into patient management. Here, we evaluate discordance in BRCA1 and BRCA2 variant classifications between a single commercial testing laboratory and a public database commonly consulted in clinical practice. BRCA1 and BRCA2 variant classifications were obtained from ClinVar and compared with the classifications from a reference laboratory. Full concordance and discordance were determined for variants whose ClinVar entries were of the same pathogenicity (pathogenic, benign, or uncertain). Variants with conflicting ClinVar classifications were considered partially concordant if ≥1 of the listed classifications agreed with the reference laboratory classification. Four thousand two hundred and fifty unique BRCA1 and BRCA2 variants were available for analysis. Overall, 73.2% of classifications were fully concordant and 12.3% were partially concordant. The remaining 14.5% of variants had discordant classifications, most of which had a definitive classification (pathogenic or benign) from the reference laboratory compared with an uncertain classification in ClinVar (14.0%). Here, we show that discrepant classifications between a public database and single reference laboratory potentially account for 26.7% of variants in BRCA1 and BRCA2 . The time and expertise required of clinicians to research these discordant classifications call into question the practicality of checking all test results against a database and suggest that discordant classifications should be interpreted with these limitations in mind. With the increasing use of clinical genetic testing for hereditary cancer risk, accurate variant classification is vital to ensuring appropriate medical management. There is a growing move to consult public databases following receipt of a genetic test result from a clinical laboratory; however, we show that up to 26.7% of variants in BRCA1 and BRCA2 have discordant classifications between ClinVar and a reference laboratory. The findings presented in this paper serve as a note of caution regarding the utility of database consultation. © AlphaMed Press 2017.
1981-05-01
factors that cause damage are discussed below. a. Architectural elements. Damage to architectural elements can result in both significant dollar losses...hazard priority- ranking procedure are: 1. To produce meaningful results which are as simple as possible, con- sidering the existing databases. 2. To...minimize the amount of data required for meaningful results , i.e., the database should contain only the most fundamental building characteris- tics. 3. To
Vlek, Anneloes; Kolecka, Anna; Khayhan, Kantarawee; Theelen, Bart; Groenewald, Marizeth; Boel, Edwin
2014-01-01
An interlaboratory study using matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) to determine the identification of clinically important yeasts (n = 35) was performed at 11 clinical centers, one company, and one reference center using the Bruker Daltonics MALDI Biotyper system. The optimal cutoff for the MALDI-TOF MS score was investigated using receiver operating characteristic (ROC) curve analyses. The percentages of correct identifications were compared for different sample preparation methods and different databases. Logistic regression analysis was performed to analyze the association between the number of spectra in the database and the percentage of strains that were correctly identified. A total of 5,460 MALDI-TOF MS results were obtained. Using all results, the area under the ROC curve was 0.95 (95% confidence interval [CI], 0.94 to 0.96). With a sensitivity of 0.84 and a specificity of 0.97, a cutoff value of 1.7 was considered optimal. The overall percentage of correct identifications (formic acid-ethanol extraction method, score ≥ 1.7) was 61.5% when the commercial Bruker Daltonics database (BDAL) was used, and it increased to 86.8% by using an extended BDAL supplemented with a Centraalbureau voor Schimmelcultures (CBS)-KNAW Fungal Biodiversity Centre in-house database (BDAL+CBS in-house). A greater number of main spectra (MSP) in the database was associated with a higher percentage of correct identifications (odds ratio [OR], 1.10; 95% CI, 1.05 to 1.15; P < 0.01). The results from the direct transfer method ranged from 0% to 82.9% correct identifications, with the results of the top four centers ranging from 71.4% to 82.9% correct identifications. This study supports the use of a cutoff value of 1.7 for the identification of yeasts using MALDI-TOF MS. The inclusion of enough isolates of the same species in the database can enhance the proportion of correctly identified strains. Further optimization of the preparation methods, especially of the direct transfer method, may contribute to improved diagnosis of yeast-related infections. PMID:24920782
Younger, Paula; Boddy, Kate
2009-06-01
The researchers involved in this study work at Exeter Health library and at the Complementary Medicine Unit, Peninsula School of Medicine and Dentistry (PCMD). Within this collaborative environment it is possible to access the electronic resources of three institutions. This includes access to AMED and other databases using different interfaces. The aim of this study was to investigate whether searching different interfaces to the AMED allied health and complementary medicine database produced the same results when using identical search terms. The following Internet-based AMED interfaces were searched: DIALOG DataStar; EBSCOhost and OVID SP_UI01.00.02. Search results from all three databases were saved in an endnote database to facilitate analysis. A checklist was also compiled comparing interface features. In our initial search, DIALOG returned 29 hits, OVID 14 and Ebsco 8. If we assume that DIALOG returned 100% of potential hits, OVID initially returned only 48% of hits and EBSCOhost only 28%. In our search, a researcher using the Ebsco interface to carry out a simple search on AMED would miss over 70% of possible search hits. Subsequent EBSCOhost searches on different subjects failed to find between 21 and 86% of the hits retrieved using the same keywords via DIALOG DataStar. In two cases, the simple EBSCOhost search failed to find any of the results found via DIALOG DataStar. Depending on the interface, the number of hits retrieved from the same database with the same simple search can vary dramatically. Some simple searches fail to retrieve a substantial percentage of citations. This may result in an uninformed literature review, research funding application or treatment intervention. In addition to ensuring that keywords, spelling and medical subject headings (MeSH) accurately reflect the nature of the search, database users should include wildcards and truncation and adapt their search strategy substantially to retrieve the maximum number of appropriate citations possible. Librarians should be aware of these differences when making purchasing decisions, carrying out literature searches and planning user education.
Lhermitte, L; Mejstrikova, E; van der Sluijs-Gelling, A J; Grigore, G E; Sedek, L; Bras, A E; Gaipa, G; Sobral da Costa, E; Novakova, M; Sonneveld, E; Buracchi, C; de Sá Bacelar, T; te Marvelde, J G; Trinquand, A; Asnafi, V; Szczepanski, T; Matarraz, S; Lopez, A; Vidriales, B; Bulsa, J; Hrusak, O; Kalina, T; Lecrevisse, Q; Martin Ayuso, M; Brüggemann, M; Verde, J; Fernandez, P; Burgos, L; Paiva, B; Pedreira, C E; van Dongen, J J M; Orfao, A; van der Velden, V H J
2018-01-01
Precise classification of acute leukemia (AL) is crucial for adequate treatment. EuroFlow has previously designed an AL orientation tube (ALOT) to guide towards the relevant classification panel (T-cell acute lymphoblastic leukemia (T-ALL), B-cell precursor (BCP)-ALL and/or acute myeloid leukemia (AML)) and final diagnosis. Now we built a reference database with 656 typical AL samples (145 T-ALL, 377 BCP-ALL, 134 AML), processed and analyzed via standardized protocols. Using principal component analysis (PCA)-based plots and automated classification algorithms for direct comparison of single-cells from individual patients against the database, another 783 cases were subsequently evaluated. Depending on the database-guided results, patients were categorized as: (i) typical T, B or Myeloid without or; (ii) with a transitional component to another lineage; (iii) atypical; or (iv) mixed-lineage. Using this automated algorithm, in 781/783 cases (99.7%) the right panel was selected, and data comparable to the final WHO-diagnosis was already provided in >93% of cases (85% T-ALL, 97% BCP-ALL, 95% AML and 87% mixed-phenotype AL patients), even without data on the full-characterization panels. Our results show that database-guided analysis facilitates standardized interpretation of ALOT results and allows accurate selection of the relevant classification panels, hence providing a solid basis for designing future WHO AL classifications. PMID:29089646
2002-06-01
Student memo for personnel MCLLS . . . . . . . . . . . . . . 75 i. Migrate data to SQL Server...The Web Server is on the same server as the SWORD database in the current version. 4: results set 5: dynamic HTML page 6: dynamic HTML page 3: SQL ...still be supported by Access. SQL Server would be a more viable tool for a fully developed application based on the number of potential users and
Handwriting Identification, Matching, and Indexing in Noisy Document Images
2006-01-01
algorithm to detect all parallel lines simultaneously. Our method can detect 96.8% of the severely broken rule lines in the Arabic database we collected...in the database to guide later processing. It is widely used in banks, post offices, and tax offices where the types of forms are most often pre...be used for different fields), and output the recognition results to a database . Although special anchors may be avail- able to facilitate form
Communication Lower Bounds and Optimal Algorithms for Programs that Reference Arrays - Part 1
2013-05-14
include tensor contractions, the direct N-body algorithm, and database join. 1This indicates that this is the first of 5 times that matrix multiplication...and database join. Section 8 summarizes our results, and outlines the contents of Part 2 of this paper. Part 2 will discuss how to compute lower...contractions, the direct N–body algo- rithm, database join, and computing matrix powers Ak. 2 Geometric Model We begin by reviewing the geometric
Rapid Prototyping-Unmanned Combat Air Vehicle (UCAV)/Sensorcraft
2008-01-01
model. RP may prove to be the fastest means to create a bridge between these CFD and experimental ground testing databases . In the past, it took...UCAV X-45A wind tunnel model within the /RB) ment FD results provide a database of global surface and off-body measurements. It is imperative t...extend the knowledge database for a given aircraft configuration beyond the ground test envelope and into the fligh regime. Working in tandem, in an
NASA Technical Reports Server (NTRS)
Wassil-Grimm, Andrew D.
1997-01-01
More effective electronic communication processes are needed to transfer contractor and international partner data into NASA and prime contractor baseline database systems. It is estimated that the International Space Station Alpha (ISSA) parts database will contain up to one million parts each of which may require database capabilities for approximately one thousand bytes of data for each part. The resulting gigabyte database must provide easy access to users who will be preparing multiple analyses and reports in order to verify as-designed, as-built, launch, on-orbit, and return configurations for up to 45 missions associated with the construction of the ISSA. Additionally, Internet access to this data base is strongly indicated to allow multiple user access from clients located in many foreign countries. This summer's project involved familiarization and evaluation of the ISSA Electrical, Electronic, and Electromechanical (EEE) Parts data and the process of electronically managing these data. Particular attention was devoted to improving the interfaces among the many elements of the ISSA information system and its global customers and suppliers. Additionally, prototype queries were developed to facilitate the identification of data changes in the data base, verifications that the designs used only approved parts, and certifications that the flight hardware containing EEE parts was ready for flight. This project also resulted in specific recommendations to NASA for further development in the area of EEE parts database development and usage.
The Development of a Korean Drug Dosing Database
Kim, Sun Ah; Kim, Jung Hoon; Jang, Yoo Jin; Jeon, Man Ho; Hwang, Joong Un; Jeong, Young Mi; Choi, Kyung Suk; Lee, Iyn Hyang; Jeon, Jin Ok; Lee, Eun Sook; Lee, Eun Kyung; Kim, Hong Bin; Chin, Ho Jun; Ha, Ji Hye; Kim, Young Hoon
2011-01-01
Objectives This report describes the development process of a drug dosing database for ethical drugs approved by the Korea Food & Drug Administration (KFDA). The goal of this study was to develop a computerized system that supports physicians' prescribing decisions, particularly in regards to medication dosing. Methods The advisory committee, comprised of doctors, pharmacists, and nurses from the Seoul National University Bundang Hospital, pharmacists familiar with drug databases, KFDA officials, and software developers from the BIT Computer Co. Ltd. analyzed approved KFDA drug dosing information, defined the fields and properties of the information structure, and designed a management program used to enter dosing information. The management program was developed using a web based system that allows multiple researchers to input drug dosing information in an organized manner. The whole process was improved by adding additional input fields and eliminating the unnecessary existing fields used when the dosing information was entered, resulting in an improved field structure. Results A total of 16,994 drugs sold in the Korean market in July 2009, excluding the exclusion criteria (e.g., radioactivity drugs, X-ray contrast medium), usage and dosing information were made into a database. Conclusions The drug dosing database was successfully developed and the dosing information for new drugs can be continually maintained through the management mode. This database will be used to develop the drug utilization review standards and to provide appropriate dosing information. PMID:22259729
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis
Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy
2015-01-01
We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660
Variability sensitivity of dynamic texture based recognition in clinical CT data
NASA Astrophysics Data System (ADS)
Kwitt, Roland; Razzaque, Sharif; Lowell, Jeffrey; Aylward, Stephen
2014-03-01
Dynamic texture recognition using a database of template models has recently shown promising results for the task of localizing anatomical structures in Ultrasound video. In order to understand its clinical value, it is imperative to study the sensitivity with respect to inter-patient variability as well as sensitivity to acquisition parameters such as Ultrasound probe angle. Fully addressing patient and acquisition variability issues, however, would require a large database of clinical Ultrasound from many patients, acquired in a multitude of controlled conditions, e.g., using a tracked transducer. Since such data is not readily attainable, we advocate an alternative evaluation strategy using abdominal CT data as a surrogate. In this paper, we describe how to replicate Ultrasound variabilities by extracting subvolumes from CT and interpreting the image material as an ordered sequence of video frames. Utilizing this technique, and based on a database of abdominal CT from 45 patients, we report recognition results on an organ (kidney) recognition task, where we try to discriminate kidney subvolumes/videos from a collection of randomly sampled negative instances. We demonstrate that (1) dynamic texture recognition is relatively insensitive to inter-patient variation while (2) viewing angle variability needs to be accounted for in the template database. Since naively extending the template database to counteract variability issues can lead to impractical database sizes, we propose an alternative strategy based on automated identification of a small set of representative models.
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.
Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M
2011-01-01
Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
Dufour, Jean-Charles; Fieschi, Dominique; Fieschi, Marius
2004-01-01
Background Clinical Practice Guidelines (CPGs) available today are not extensively used due to lack of proper integration into clinical settings, knowledge-related information resources, and lack of decision support at the point of care in a particular clinical context. Objective The PRESGUID project (PREScription and GUIDelines) aims to improve the assistance provided by guidelines. The project proposes an online service enabling physicians to consult computerized CPGs linked to drug databases for easier integration into the healthcare process. Methods Computable CPGs are structured as decision trees and coded in XML format. Recommendations related to drug classes are tagged with ATC codes. We use a mapping module to enhance computerized guidelines coupling with a drug database, which contains detailed information about each usable specific medication. In this way, therapeutic recommendations are backed up with current and up-to-date information from the database. Results Two authoritative CPGs, originally diffused as static textual documents, have been implemented to validate the computerization process and to illustrate the usefulness of the resulting automated CPGs and their coupling with a drug database. We discuss the advantages of this approach for practitioners and the implications for both guideline developers and drug database providers. Other CPGs will be implemented and evaluated in real conditions by clinicians working in different health institutions. PMID:15053828
Peng, Jinye; Babaguchi, Noboru; Luo, Hangzai; Gao, Yuli; Fan, Jianping
2010-07-01
Digital video now plays an important role in supporting more profitable online patient training and counseling, and integration of patient training videos from multiple competitive organizations in the health care network will result in better offerings for patients. However, privacy concerns often prevent multiple competitive organizations from sharing and integrating their patient training videos. In addition, patients with infectious or chronic diseases may not want the online patient training organizations to identify who they are or even which video clips they are interested in. Thus, there is an urgent need to develop more effective techniques to protect both video content privacy and access privacy . In this paper, we have developed a new approach to construct a distributed Hippocratic video database system for supporting more profitable online patient training and counseling. First, a new database modeling approach is developed to support concept-oriented video database organization and assign a degree of privacy of the video content for each database level automatically. Second, a new algorithm is developed to protect the video content privacy at the level of individual video clip by filtering out the privacy-sensitive human objects automatically. In order to integrate the patient training videos from multiple competitive organizations for constructing a centralized video database indexing structure, a privacy-preserving video sharing scheme is developed to support privacy-preserving distributed classifier training and prevent the statistical inferences from the videos that are shared for cross-validation of video classifiers. Our experiments on large-scale video databases have also provided very convincing results.
A Bayesian network approach to the database search problem in criminal proceedings
2012-01-01
Background The ‘database search problem’, that is, the strengthening of a case - in terms of probative value - against an individual who is found as a result of a database search, has been approached during the last two decades with substantial mathematical analyses, accompanied by lively debate and centrally opposing conclusions. This represents a challenging obstacle in teaching but also hinders a balanced and coherent discussion of the topic within the wider scientific and legal community. This paper revisits and tracks the associated mathematical analyses in terms of Bayesian networks. Their derivation and discussion for capturing probabilistic arguments that explain the database search problem are outlined in detail. The resulting Bayesian networks offer a distinct view on the main debated issues, along with further clarity. Methods As a general framework for representing and analyzing formal arguments in probabilistic reasoning about uncertain target propositions (that is, whether or not a given individual is the source of a crime stain), this paper relies on graphical probability models, in particular, Bayesian networks. This graphical probability modeling approach is used to capture, within a single model, a series of key variables, such as the number of individuals in a database, the size of the population of potential crime stain sources, and the rarity of the corresponding analytical characteristics in a relevant population. Results This paper demonstrates the feasibility of deriving Bayesian network structures for analyzing, representing, and tracking the database search problem. The output of the proposed models can be shown to agree with existing but exclusively formulaic approaches. Conclusions The proposed Bayesian networks allow one to capture and analyze the currently most well-supported but reputedly counter-intuitive and difficult solution to the database search problem in a way that goes beyond the traditional, purely formulaic expressions. The method’s graphical environment, along with its computational and probabilistic architectures, represents a rich package that offers analysts and discussants with additional modes of interaction, concise representation, and coherent communication. PMID:22849390
Addition of a breeding database in the Genome Database for Rosaceae
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox PMID:24247530
Addition of a breeding database in the Genome Database for Rosaceae.
Evans, Kate; Jung, Sook; Lee, Taein; Brutcher, Lisa; Cho, Ilhyung; Peace, Cameron; Main, Dorrie
2013-01-01
Breeding programs produce large datasets that require efficient management systems to keep track of performance, pedigree, geographical and image-based data. With the development of DNA-based screening technologies, more breeding programs perform genotyping in addition to phenotyping for performance evaluation. The integration of breeding data with other genomic and genetic data is instrumental for the refinement of marker-assisted breeding tools, enhances genetic understanding of important crop traits and maximizes access and utility by crop breeders and allied scientists. Development of new infrastructure in the Genome Database for Rosaceae (GDR) was designed and implemented to enable secure and efficient storage, management and analysis of large datasets from the Washington State University apple breeding program and subsequently expanded to fit datasets from other Rosaceae breeders. The infrastructure was built using the software Chado and Drupal, making use of the Natural Diversity module to accommodate large-scale phenotypic and genotypic data. Breeders can search accessions within the GDR to identify individuals with specific trait combinations. Results from Search by Parentage lists individuals with parents in common and results from Individual Variety pages link to all data available on each chosen individual including pedigree, phenotypic and genotypic information. Genotypic data are searchable by markers and alleles; results are linked to other pages in the GDR to enable the user to access tools such as GBrowse and CMap. This breeding database provides users with the opportunity to search datasets in a fully targeted manner and retrieve and compare performance data from multiple selections, years and sites, and to output the data needed for variety release publications and patent applications. The breeding database facilitates efficient program management. Storing publicly available breeding data in a database together with genomic and genetic data will further accelerate the cross-utilization of diverse data types by researchers from various disciplines. Database URL: http://www.rosaceae.org/breeders_toolbox.
Space Station Freedom environmental database system (FEDS) for MSFC testing
NASA Technical Reports Server (NTRS)
Story, Gail S.; Williams, Wendy; Chiu, Charles
1991-01-01
The Water Recovery Test (WRT) at Marshall Space Flight Center (MSFC) is the first demonstration of integrated water recovery systems for potable and hygiene water reuse as envisioned for Space Station Freedom (SSF). In order to satisfy the safety and health requirements placed on the SSF program and facilitate test data assessment, an extensive laboratory analysis database was established to provide a central archive and data retrieval function. The database is required to store analysis results for physical, chemical, and microbial parameters measured from water, air and surface samples collected at various locations throughout the test facility. The Oracle Relational Database Management System (RDBMS) was utilized to implement a secured on-line information system with the ECLSS WRT program as the foundation for this system. The database is supported on a VAX/VMS 8810 series mainframe and is accessible from the Marshall Information Network System (MINS). This paper summarizes the database requirements, system design, interfaces, and future enhancements.
A Database as a Service for the Healthcare System to Store Physiological Signal Data.
Chang, Hsien-Tsung; Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.
A Database as a Service for the Healthcare System to Store Physiological Signal Data
Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415
Shuttle Hypervelocity Impact Database
NASA Technical Reports Server (NTRS)
Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.
2011-01-01
With three missions outstanding, the Shuttle Hypervelocity Impact Database has nearly 3000 entries. The data is divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. Details and insights on the contents of the database including examples of descriptive statistics will be provided. Post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will also be discussed. Potential enhancements to the database structure and availability of the data for other researchers will be addressed in the Future Work section. A related database of returned surfaces from the International Space Station will also be introduced.
Virus Database and Online Inquiry System Based on Natural Vectors.
Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St
2017-01-01
We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.
Fujimura, Tomomi; Umemura, Hiroyuki
2018-01-15
The present study describes the development and validation of a facial expression database comprising five different horizontal face angles in dynamic and static presentations. The database includes twelve expression types portrayed by eight Japanese models. This database was inspired by the dimensional and categorical model of emotions: surprise, fear, sadness, anger with open mouth, anger with closed mouth, disgust with open mouth, disgust with closed mouth, excitement, happiness, relaxation, sleepiness, and neutral (static only). The expressions were validated using emotion classification and Affect Grid rating tasks [Russell, Weiss, & Mendelsohn, 1989. Affect Grid: A single-item scale of pleasure and arousal. Journal of Personality and Social Psychology, 57(3), 493-502]. The results indicate that most of the expressions were recognised as the intended emotions and could systematically represent affective valence and arousal. Furthermore, face angle and facial motion information influenced emotion classification and valence and arousal ratings. Our database will be available online at the following URL. https://www.dh.aist.go.jp/database/face2017/ .
Owens, John
2009-01-01
Technological advances in the acquisition of DNA and protein sequence information and the resulting onrush of data can quickly overwhelm the scientist unprepared for the volume of information that must be evaluated and carefully dissected to discover its significance. Few laboratories have the luxury of dedicated personnel to organize, analyze, or consistently record a mix of arriving sequence data. A methodology based on a modern relational-database manager is presented that is both a natural storage vessel for antibody sequence information and a conduit for organizing and exploring sequence data and accompanying annotation text. The expertise necessary to implement such a plan is equal to that required by electronic word processors or spreadsheet applications. Antibody sequence projects maintained as independent databases are selectively unified by the relational-database manager into larger database families that contribute to local analyses, reports, interactive HTML pages, or exported to facilities dedicated to sophisticated sequence analysis techniques. Database files are transposable among current versions of Microsoft, Macintosh, and UNIX operating systems.
Martin, Tiphaine; Sherman, David J; Durrens, Pascal
2011-01-01
The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Schell, Scott R
2006-02-01
Enforcement of the Health Insurance Portability and Accountability Act (HIPAA) began in April, 2003. Designed as a law mandating health insurance availability when coverage was lost, HIPAA imposed sweeping and broad-reaching protections of patient privacy. These changes dramatically altered clinical research by placing sizeable regulatory burdens upon investigators with threat of severe and costly federal and civil penalties. This report describes development of an algorithmic approach to clinical research database design based upon a central key-shared data (CK-SD) model allowing researchers to easily analyze, distribute, and publish clinical research without disclosure of HIPAA Protected Health Information (PHI). Three clinical database formats (small clinical trial, operating room performance, and genetic microchip array datasets) were modeled using standard structured query language (SQL)-compliant databases. The CK database was created to contain PHI data, whereas a shareable SD database was generated in real-time containing relevant clinical outcome information while protecting PHI items. Small (< 100 records), medium (< 50,000 records), and large (> 10(8) records) model databases were created, and the resultant data models were evaluated in consultation with an HIPAA compliance officer. The SD database models complied fully with HIPAA regulations, and resulting "shared" data could be distributed freely. Unique patient identifiers were not required for treatment or outcome analysis. Age data were resolved to single-integer years, grouping patients aged > 89 years. Admission, discharge, treatment, and follow-up dates were replaced with enrollment year, and follow-up/outcome intervals calculated eliminating original data. Two additional data fields identified as PHI (treating physician and facility) were replaced with integer values, and the original data corresponding to these values were stored in the CK database. Use of the algorithm at the time of database design did not increase cost or design effort. The CK-SD model for clinical database design provides an algorithm for investigators to create, maintain, and share clinical research data compliant with HIPAA regulations. This model is applicable to new projects and large institutional datasets, and should decrease regulatory efforts required for conduct of clinical research. Application of the design algorithm early in the clinical research enterprise does not increase cost or the effort of data collection.
Inequality of obesity and socioeconomic factors in Iran: a systematic review and meta- analyses.
Djalalinia, Shirin; Peykari, Niloofar; Qorbani, Mostafa; Larijani, Bagher; Farzadfar, Farshad
2015-01-01
Socioeconomic status and demographic factors, such as education, occupation, place of residence, gender, age, and marital status have been reported to be associated with obesity. We conducted a systematic review to summarize evidences on associations between socioeconomic factors and obesity/overweight in Iranian population. We systematically searched international databases; ISI, PubMed/Medline, Scopus, and national databases Iran-medex, Irandoc, and Scientific Information Database (SID). We refined data for associations between socioeconomic factors and obesity/overweight by sex, age, province, and year. There were no limitations for time and languages. Based on our search strategy we found 151 records; of them 139 were from international databases and the remaining 12 were obtained from national databases. After removing duplicates, via the refining steps, only 119 articles were found related to our study domains. Extracted results were attributed to 146596 person/data from included studies. Increased ages, low educational levels, being married, residence in urban area, as well as female sex were clearly associated with obesity. RESULTS could be useful for better health policy and more planned studies in this field. These also could be used for future complementary analyses.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quock, D. E. R.; Cianciarulo, M. B.; APS Engineering Support Division
2007-01-01
The Integrated Relational Model of Installed Systems (IRMIS) is a relational database tool that has been implemented at the Advanced Photon Source to maintain an updated account of approximately 600 control system software applications, 400,000 process variables, and 30,000 control system hardware components. To effectively display this large amount of control system information to operators and engineers, IRMIS was initially built with nine Web-based viewers: Applications Organizing Index, IOC, PLC, Component Type, Installed Components, Network, Controls Spares, Process Variables, and Cables. However, since each viewer is designed to provide details from only one major category of the control system, themore » necessity for a one-stop global search tool for the entire database became apparent. The user requirements for extremely fast database search time and ease of navigation through search results led to the choice of Asynchronous JavaScript and XML (AJAX) technology in the implementation of the IRMIS global search tool. Unique features of the global search tool include a two-tier level of displayed search results, and a database data integrity validation and reporting mechanism.« less
Analyzing a multimodal biometric system using real and virtual users
NASA Astrophysics Data System (ADS)
Scheidat, Tobias; Vielhauer, Claus
2007-02-01
Three main topics of recent research on multimodal biometric systems are addressed in this article: The lack of sufficiently large multimodal test data sets, the influence of cultural aspects and data protection issues of multimodal biometric data. In this contribution, different possibilities are presented to extend multimodal databases by generating so-called virtual users, which are created by combining single biometric modality data of different users. Comparative tests on databases containing real and virtual users based on a multimodal system using handwriting and speech are presented, to study to which degree the use of virtual multimodal databases allows conclusions with respect to recognition accuracy in comparison to real multimodal data. All tests have been carried out on databases created from donations from three different nationality groups. This allows to review the experimental results both in general and in context of cultural origin. The results show that in most cases the usage of virtual persons leads to lower accuracy than the usage of real users in terms of the measurement applied: the Equal Error Rate. Finally, this article will address the general question how the concept of virtual users may influence the data protection requirements for multimodal evaluation databases in the future.
Fourment, Mathieu; Gibbs, Mark J
2008-01-01
Background Viruses of the Bunyaviridae have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database. Results The VirusBanker database contains Bunyaviridae sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. Bunyaviridae sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview. Conclusion VirusBanker allows large datasets of aligned nucleotide and protein sequences from the Bunyaviridae to be compiled and winnowed rapidly using criteria that are formulated heuristically. PMID:18251994
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.
Truong, Kevin; Ikura, Mitsuhiko
2003-05-06
Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database
Arita, Masanori; Suwa, Kazuhiro
2008-01-01
Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
A comparative study of six European databases of medically oriented Web resources.
Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes
2005-10-01
The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.
Adverse events resulting from lasers used in urology.
Althunayan, Abdulaziz M; Elkoushy, Mohamed A; Elhilali, Mostafa M; Andonian, Sero
2014-02-01
To collate world reports of adverse events (AEs) resulting from lasers used in urology. The Manufacturer and User Facility Device Experience (MAUDE) database of the United States Food and Drug Administration (FDA) was searched using the term "Laser for gastro-urology use." In addition, the Rockwell Laser Industries (RLI) Laser Accident Database was searched for the following types of lasers: neodymium-doped yttrium aluminum garnet (Nd:YAG), holmium:yttrium aluminum garnet (Ho:YAG), potassium titanyl phosphate (KTP), diode and thulium:YAG (Tm:YAG). Both databases were last accessed on October 1, 2012. Overall, there were 433 AEs; 166 in MAUDE database (1992-2012) and 267 in RLI database (1964-2005). Most of the AEs (198/433 or 46%) resulted from generator failure or fiber tip breaking. Whereas there were 20 (4.6%) AEs harming medical operators, there were 159 (37%) AEs harming nonmedical operators using Nd:YAG, KTP, and diode lasers. Eye injuries ranging from mild corneal abrasions to total vision loss were reported in 164 AEs with the use of Nd:YAG, KTP, and diode lasers. Overall, there were 36 (8.3%) AEs resulting in patient harm, including 7 (1.6%) mortalities, 3 deaths from ureteral perforation using the Ho:YAG laser, and 4 deaths from air emboli using the Nd:YAG laser. Other reported patient injuries included bladder perforation resulting in urinary diversion in a patient, in addition to minor skin burns, internal burns, and bleeding in others. There were no AEs reported with the use of Tm:YAG laser. Most of the AEs reported relate to equipment failure. There were no eye injuries reported with the use of Ho:YAG lasers. Caution must be exercised when using lasers in urology, including wearing appropriate eye protection when using Nd:YAG, KTP, and diode lasers.
Shepshelovich, D; Goldvaser, H; Wang, L; Abdul Razak, A R
2017-12-13
Introduction The role of phase I cancer trials is constantly evolving and they are increasingly being used in 'go/no' decisions in drug development. As a result, there is a growing need to ensure trials are published when completed. There are limited data on the publication rate and the factors associated with publication in phase I trials. Methods The ClinicalTrials.gov database was searched for completed adult phase I cancer trials with reported results. PubMed was searched for matching publications published prior to April 1, 2017. Logistic regression was used to identify factors associated with unpublished trials. Linear regression was used to explore factors associated with time lag from study database lock to publication for published trials. Results The study cohort included 319 trials. 95 (30%) trials had no matching publication. Thirty (9%) trials were not published in abstract form as well. On multivariable analysis, the most significant factor associated with unpublished trials was industry funding (odds ratio 3.3, 95% confidence interval 1.7-6.6, p=0.019). For published trials, time lag between database lock and publication was longer by 10.9 months (standard error 3.6, p<0.001) for industry funded trials compared with medical center funded trials. Conclusions Timely publishing of early cancer clinical trials results remains unsatisfactory. Industry funded phase I cancer trials were more likely to remain unpublished, and were associated with a longer time lag from database lock to publication. Policies that promote transparency and data sharing in clinical trial research might improve accountability among industry and investigators and improve timely results publication.
Minefields Associated with Mining Data from Peer-reviewed Literature
The USEPA’s ECOTOX database is the largest compilation of ecotoxicity study results, providing information on the adverse effects of single chemical stressors to ecologically relevant aquatic and terrestrial species. The primary source of data included in the ECOTOX database is t...
ERIC Educational Resources Information Center
Klein, Regina; And Others
1988-01-01
The first of three articles describes the results of a survey that examined characteristics and responsibilities of help-desk personnel at major database and online services. The second provides guidelines to using such customer services, and the third lists help-desk numbers for online databases and systems. (CLB)
Literature searches on Ayurveda: An update.
Aggithaya, Madhur G; Narahari, Saravu R
2015-01-01
The journals that publish on Ayurveda are increasingly indexed by popular medical databases in recent years. However, many Eastern journals are not indexed biomedical journal databases such as PubMed. Literature searches for Ayurveda continue to be challenging due to the nonavailability of active, unbiased dedicated databases for Ayurvedic literature. In 2010, authors identified 46 databases that can be used for systematic search of Ayurvedic papers and theses. This update reviewed our previous recommendation and identified current and relevant databases. To update on Ayurveda literature search and strategy to retrieve maximum publications. Author used psoriasis as an example to search previously listed databases and identify new. The population, intervention, control, and outcome table included keywords related to psoriasis and Ayurvedic terminologies for skin diseases. Current citation update status, search results, and search options of previous databases were assessed. Eight search strategies were developed. Hundred and five journals, both biomedical and Ayurveda, which publish on Ayurveda, were identified. Variability in databases was explored to identify bias in journal citation. Five among 46 databases are now relevant - AYUSH research portal, Annotated Bibliography of Indian Medicine, Digital Helpline for Ayurveda Research Articles (DHARA), PubMed, and Directory of Open Access Journals. Search options in these databases are not uniform, and only PubMed allows complex search strategy. "The Researches in Ayurveda" and "Ayurvedic Research Database" (ARD) are important grey resources for hand searching. About 44/105 (41.5%) journals publishing Ayurvedic studies are not indexed in any database. Only 11/105 (10.4%) exclusive Ayurveda journals are indexed in PubMed. AYUSH research portal and DHARA are two major portals after 2010. It is mandatory to search PubMed and four other databases because all five carry citations from different groups of journals. The hand searching is important to identify Ayurveda publications that are not indexed elsewhere. Availability information of citations in Ayurveda libraries from National Union Catalogue of Scientific Serials in India if regularly updated will improve the efficacy of hand searching. A grey database (ARD) contains unpublished PG/Ph.D. theses. The AYUSH portal, DHARA (funded by Ministry of AYUSH), and ARD should be merged to form single larger database to limit Ayurveda literature searches.
dBBQs: dataBase of Bacterial Quality scores.
Wanchai, Visanu; Patumcharoenpol, Preecha; Nookaew, Intawat; Ussery, David
2017-12-28
It is well-known that genome sequencing technologies are becoming significantly cheaper and faster. As a result of this, the exponential growth in sequencing data in public databases allows us to explore ever growing large collections of genome sequences. However, it is less known that the majority of available sequenced genome sequences in public databases are not complete, drafts of varying qualities. We have calculated quality scores for around 100,000 bacterial genomes from all major genome repositories and put them in a fast and easy-to-use database. Prokaryotic genomic data from all sources were collected and combined to make a non-redundant set of bacterial genomes. The genome quality score for each was calculated by four different measurements: assembly quality, number of rRNA and tRNA genes, and the occurrence of conserved functional domains. The dataBase of Bacterial Quality scores (dBBQs) was designed to store and retrieve quality scores. It offers fast searching and download features which the result can be used for further analysis. In addition, the search results are shown in interactive JavaScript chart framework using DC.js. The analysis of quality scores across major public genome databases find that around 68% of the genomes are of acceptable quality for many uses. dBBQs (available at http://arc-gem.uams.edu/dbbqs ) provides genome quality scores for all available prokaryotic genome sequences with a user-friendly Web-interface. These scores can be used as cut-offs to get a high-quality set of genomes for testing bioinformatics tools or improving the analysis. Moreover, all data of the four measurements that were combined to make the quality score for each genome, which can potentially be used for further analysis. dBBQs will be updated regularly and is freely use for non-commercial purpose.
2010-01-01
Background A plant-based diet protects against chronic oxidative stress-related diseases. Dietary plants contain variable chemical families and amounts of antioxidants. It has been hypothesized that plant antioxidants may contribute to the beneficial health effects of dietary plants. Our objective was to develop a comprehensive food database consisting of the total antioxidant content of typical foods as well as other dietary items such as traditional medicine plants, herbs and spices and dietary supplements. This database is intended for use in a wide range of nutritional research, from in vitro and cell and animal studies, to clinical trials and nutritional epidemiological studies. Methods We procured samples from countries worldwide and assayed the samples for their total antioxidant content using a modified version of the FRAP assay. Results and sample information (such as country of origin, product and/or brand name) were registered for each individual food sample and constitute the Antioxidant Food Table. Results The results demonstrate that there are several thousand-fold differences in antioxidant content of foods. Spices, herbs and supplements include the most antioxidant rich products in our study, some exceptionally high. Berries, fruits, nuts, chocolate, vegetables and products thereof constitute common foods and beverages with high antioxidant values. Conclusions This database is to our best knowledge the most comprehensive Antioxidant Food Database published and it shows that plant-based foods introduce significantly more antioxidants into human diet than non-plant foods. Because of the large variations observed between otherwise comparable food samples the study emphasizes the importance of using a comprehensive database combined with a detailed system for food registration in clinical and epidemiological studies. The present antioxidant database is therefore an essential research tool to further elucidate the potential health effects of phytochemical antioxidants in diet. PMID:20096093
Mining the Galaxy Zoo Database: Machine Learning Applications
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.
2010-01-01
The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.
Validation and extraction of molecular-geometry information from small-molecule databases.
Long, Fei; Nicholls, Robert A; Emsley, Paul; Graǽulis, Saulius; Merkys, Andrius; Vaitkus, Antanas; Murshudov, Garib N
2017-02-01
A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.
NASA Astrophysics Data System (ADS)
Modolo, R.; Hess, S.; Génot, V.; Leclercq, L.; Leblanc, F.; Chaufray, J.-Y.; Weill, P.; Gangloff, M.; Fedorov, A.; Budnik, E.; Bouchemit, M.; Steckiewicz, M.; André, N.; Beigbeder, L.; Popescu, D.; Toniutti, J.-P.; Al-Ubaidi, T.; Khodachenko, M.; Brain, D.; Curry, S.; Jakosky, B.; Holmström, M.
2018-01-01
We present the Latmos Hybrid Simulation (LatHyS) database, which is dedicated to the investigations of planetary plasma environment. Simulation results of several planetary objects (Mars, Mercury, Ganymede) are available in an online catalogue. The full description of the simulations and their results is compliant with a data model developped in the framework of the FP7 IMPEx project. The catalogue is interfaced with VO-visualization tools such AMDA, 3DView, TOPCAT, CLweb or the IMPEx portal. Web services ensure the possibilities of accessing and extracting simulated quantities/data. We illustrate the interoperability between the simulation database and VO-tools using a detailed science case that focuses on a three-dimensional representation of the solar wind interaction with the Martian upper atmosphere, combining MAVEN and Mars Express observations and simulation results.
CardioTF, a database of deconstructing transcriptional circuits in the heart system.
Zhen, Yisong
2016-01-01
Information on cardiovascular gene transcription is fragmented and far behind the present requirements of the systems biology field. To create a comprehensive source of data for cardiovascular gene regulation and to facilitate a deeper understanding of genomic data, the CardioTF database was constructed. The purpose of this database is to collate information on cardiovascular transcription factors (TFs), position weight matrices (PWMs), and enhancer sequences discovered using the ChIP-seq method. The Naïve-Bayes algorithm was used to classify literature and identify all PubMed abstracts on cardiovascular development. The natural language learning tool GNAT was then used to identify corresponding gene names embedded within these abstracts. Local Perl scripts were used to integrate and dump data from public databases into the MariaDB management system (MySQL). In-house R scripts were written to analyze and visualize the results. Known cardiovascular TFs from humans and human homologs from fly, Ciona, zebrafish, frog, chicken, and mouse were identified and deposited in the database. PWMs from Jaspar, hPDI, and UniPROBE databases were deposited in the database and can be retrieved using their corresponding TF names. Gene enhancer regions from various sources of ChIP-seq data were deposited into the database and were able to be visualized by graphical output. Besides biocuration, mouse homologs of the 81 core cardiac TFs were selected using a Naïve-Bayes approach and then by intersecting four independent data sources: RNA profiling, expert annotation, PubMed abstracts and phenotype. The CardioTF database can be used as a portal to construct transcriptional network of cardiac development. Database URL: http://www.cardiosignal.org/database/cardiotf.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, D
Purpose: A unified database system was developed to allow accumulation, review and analysis of quality assurance (QA) data for measurement, treatment, imaging and simulation equipment in our department. Recording these data in a database allows a unified and structured approach to review and analysis of data gathered using commercial database tools. Methods: A clinical database was developed to track records of quality assurance operations on linear accelerators, a computed tomography (CT) scanner, high dose rate (HDR) afterloader and imaging systems such as on-board imaging (OBI) and Calypso in our department. The database was developed using Microsoft Access database and visualmore » basic for applications (VBA) programming interface. Separate modules were written for accumulation, review and analysis of daily, monthly and annual QA data. All modules were designed to use structured query language (SQL) as the basis of data accumulation and review. The SQL strings are dynamically re-written at run time. The database also features embedded documentation, storage of documents produced during QA activities and the ability to annotate all data within the database. Tests are defined in a set of tables that define test type, specific value, and schedule. Results: Daily, Monthly and Annual QA data has been taken in parallel with established procedures to test MQA. The database has been used to aggregate data across machines to examine the consistency of machine parameters and operations within the clinic for several months. Conclusion: The MQA application has been developed as an interface to a commercially available SQL engine (JET 5.0) and a standard database back-end. The MQA system has been used for several months for routine data collection.. The system is robust, relatively simple to extend and can be migrated to a commercial SQL server.« less
A survey of commercial object-oriented database management systems
NASA Technical Reports Server (NTRS)
Atkins, John
1992-01-01
The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.
2014-06-01
central location. Each of the SQLite databases are converted and stored in one MySQL database and the pcap files are parsed to extract call information...from the specific communications applications used during the experiment. This extracted data is then stored in the same MySQL database. With all...rhythm of the event. Figure 3 demonstrates the application usage over the course of the experiment for the EXDIR. As seen, the EXDIR spent the majority
Recent NASA Wake-Vortex Flight Tests, Flow-Physics Database and Wake-Development Analysis
NASA Technical Reports Server (NTRS)
Vicroy, Dan D.; Vijgen, Paul M.; Reimer, Heidi M.; Gallegos, Joey L.; Spalart, Philippe R.
1998-01-01
A series of flight tests over the ocean of a four engine turboprop airplane in the cruise configuration have provided a data set for improved understanding of wake vortex physics and atmospheric interaction. An integrated database has been compiled for wake characterization and validation of wake-vortex computational models. This paper describes the wake-vortex flight tests, the data processing, the database development and access, and results obtained from preliminary wake-characterization analysis using the data sets.
An Online Resource for Flight Test Safety Planning
NASA Technical Reports Server (NTRS)
Lewis, Greg
2007-01-01
A viewgraph presentation describing an online database for flight test safety techniques is shown. The topics include: 1) Goal; 2) Test Hazard Analyses; 3) Online Database Background; 4) Data Gathering; 5) NTPS Role; 6) Organizations; 7) Hazard Titles; 8) FAR Paragraphs; 9) Maneuver Name; 10) Identified Hazard; 11) Matured Hazard Titles; 12) Loss of Control Causes; 13) Mitigations; 14) Database Now Open to the Public; 15) FAR Reference Search; 16) Record Field Search; 17) Keyword Search; and 18) Results of FAR Reference Search.
1993-06-09
within the framework of an update for the computer database "DiaNIK" which has been developed at the Vernadsky Institute of Geochemistry and Analytical...chemical thermodynamic data for minerals and mineral-forming substances. The structure of thermodynamic database "DiaNIK" is based on the principles...in the database . A substantial portion of the thermodynamic values recommended by "DiaNIK" experts for the substances in User Version 3.1 resulted from
Niu, Heng; Yang, Jingyu; Yang, Kunxian; Huang, Yingze
2017-11-01
DNA promoter methylation can suppresses gene expression and shows an important role in the biological functions of Ras association domain family 1A (RASSF1A). Many studies have performed to elucidate the role of RASSF1A promoter methylation in thyroid carcinoma, while the results were conflicting and heterogeneous. Here, we analyzed the data of databases to determine the relationship between RASSF1A promoter methylation and thyroid carcinoma. We used the data from 14 cancer-normal studies and Gene Expression Omnibus (GEO) database to analyze RASSF1A promoter methylation in thyroid carcinoma susceptibility. The data from the Cancer Genome Atlas project (TCGA) database was used to analyze the relationship between RASSF1A promoter methylation and thyroid carcinoma susceptibility, clinical characteristics, prognosis. Odds ratios were estimated for thyroid carcinoma susceptibility and hazard ratios were estimated for thyroid carcinoma prognosis. The heterogeneity between studies of meta-analysis was explored using H, I values, and meta-regression. We adopted quality criteria to classify the studies of meta-analysis. Subgroup analyses were done for thyroid carcinoma susceptibility according to ethnicity, methods, and primers. Result of meta-analysis indicated that RASSF1A promoter methylation is associated with higher susceptibility to thyroid carcinoma with small heterogeneity. Similarly, the result from GEO database also showed that a significant association between RASSF1A gene promoter methylation and thyroid carcinoma susceptibility. For the results of TCGA database, we found that RASSF1A promoter methylation is associated with susceptibility and poor disease-free survival (DFS) of thyroid carcinoma. In addition, we also found a close association between RASSF1A promoter methylation and patient tumor stage and age, but not in patients of different genders. The methylation status of RASSF1A promoter is strongly associated with thyroid carcinoma susceptibility and DFS. The RASSF1A promoter methylation test can be applied in the clinical diagnosis of thyroid carcinoma.
Wang, Jingjing; Sun, Tao; Gao, Ni; Menon, Desmond Dev; Luo, Yanxia; Gao, Qi; Li, Xia; Wang, Wei; Zhu, Huiping; Lv, Pingxin; Liang, Zhigang; Tao, Lixin; Liu, Xiangtong; Guo, Xiuhua
2014-01-01
To determine the value of contourlet textural features obtained from solitary pulmonary nodules in two dimensional CT images used in diagnoses of lung cancer. A total of 6,299 CT images were acquired from 336 patients, with 1,454 benign pulmonary nodule images from 84 patients (50 male, 34 female) and 4,845 malignant from 252 patients (150 male, 102 female). Further to this, nineteen patient information categories, which included seven demographic parameters and twelve morphological features, were also collected. A contourlet was used to extract fourteen types of textural features. These were then used to establish three support vector machine models. One comprised a database constructed of nineteen collected patient information categories, another included contourlet textural features and the third one contained both sets of information. Ten-fold cross-validation was used to evaluate the diagnosis results for the three databases, with sensitivity, specificity, accuracy, the area under the curve (AUC), precision, Youden index, and F-measure were used as the assessment criteria. In addition, the synthetic minority over-sampling technique (SMOTE) was used to preprocess the unbalanced data. Using a database containing textural features and patient information, sensitivity, specificity, accuracy, AUC, precision, Youden index, and F-measure were: 0.95, 0.71, 0.89, 0.89, 0.92, 0.66, and 0.93 respectively. These results were higher than results derived using the database without textural features (0.82, 0.47, 0.74, 0.67, 0.84, 0.29, and 0.83 respectively) as well as the database comprising only textural features (0.81, 0.64, 0.67, 0.72, 0.88, 0.44, and 0.85 respectively). Using the SMOTE as a pre-processing procedure, new balanced database generated, including observations of 5,816 benign ROIs and 5,815 malignant ROIs, and accuracy was 0.93. Our results indicate that the combined contourlet textural features of solitary pulmonary nodules in CT images with patient profile information could potentially improve the diagnosis of lung cancer.
Construction of In-house Databases in a Corporation
NASA Astrophysics Data System (ADS)
Dezaki, Kyoko; Saeki, Makoto
Rapid progress in advanced informationalization has increased need to enforce documentation activities in industries. Responding to it Tokin Corporation has been engaged in database construction for patent information, technical reports and so on accumulated inside the Company. Two results are obtained; One is TOPICS, inhouse patent information management system, the other is TOMATIS, management and technical information system by use of personal computers and all-purposed relational database software. These systems aim at compiling databases of patent and technological management information generated internally and externally by low labor efforts as well as low cost, and providing for comprehensive information company-wide. This paper introduces the outline of these systems and how they are actually used.
An online database of nuclear electromagnetic moments
NASA Astrophysics Data System (ADS)
Mertzimekis, T. J.; Stamou, K.; Psaltis, A.
2016-01-01
Measurements of nuclear magnetic dipole and electric quadrupole moments are considered quite important for the understanding of nuclear structure both near and far from the valley of stability. The recent advent of radioactive beams has resulted in a plethora of new, continuously flowing, experimental data on nuclear structure - including nuclear moments - which hinders the information management. A new, dedicated, public and user friendly online database (http://magneticmoments.info) has been created comprising experimental data of nuclear electromagnetic moments. The present database supersedes existing printed compilations, including also non-evaluated series of data and relevant meta-data, while putting strong emphasis on bimonthly updates. The scope, features and extensions of the database are reported.
Compartmental and Data-Based Modeling of Cerebral Hemodynamics: Linear Analysis.
Henley, B C; Shin, D C; Zhang, R; Marmarelis, V Z
Compartmental and data-based modeling of cerebral hemodynamics are alternative approaches that utilize distinct model forms and have been employed in the quantitative study of cerebral hemodynamics. This paper examines the relation between a compartmental equivalent-circuit and a data-based input-output model of dynamic cerebral autoregulation (DCA) and CO2-vasomotor reactivity (DVR). The compartmental model is constructed as an equivalent-circuit utilizing putative first principles and previously proposed hypothesis-based models. The linear input-output dynamics of this compartmental model are compared with data-based estimates of the DCA-DVR process. This comparative study indicates that there are some qualitative similarities between the two-input compartmental model and experimental results.
Fusion of Dependent and Independent Biometric Information Sources
2005-03-01
palmprint , DNA, ECG, signature, etc. The comparison of various biometric techniques is given in [13] and is presented in Table 1. Since, each...theory. Experimental studies on the M2VTS database [32] showed that a reduction in error rates is up to about 40%. Four combination strategies are...taken from the CEDAR benchmark database . The word recognition results were the highest (91%) among published results for handwritten words (before 2001
Code of Federal Regulations, 2010 CFR
2010-01-01
... database resulting from the transformation of the ENC by ECDIS for appropriate use, updates to the ENC by... of the 1974 SOLAS Convention. Electronic Navigational Chart (ENC) means a database, standardized as to content, structure, and format, issued for use with ECDIS on the authority of government...
Results from a new die-to-database reticle inspection platform
NASA Astrophysics Data System (ADS)
Broadbent, William; Xiong, Yalin; Giusti, Michael; Walsh, Robert; Dayal, Aditya
2007-03-01
A new die-to-database high-resolution reticle defect inspection system has been developed for the 45nm logic node and extendable to the 32nm node (also the comparable memory nodes). These nodes will use predominantly 193nm immersion lithography although EUV may also be used. According to recent surveys, the predominant reticle types for the 45nm node are 6% simple tri-tone and COG. Other advanced reticle types may also be used for these nodes including: dark field alternating, Mask Enhancer, complex tri-tone, high transmission, CPL, EUV, etc. Finally, aggressive model based OPC will typically be used which will include many small structures such as jogs, serifs, and SRAF (sub-resolution assist features) with accompanying very small gaps between adjacent structures. The current generation of inspection systems is inadequate to meet these requirements. The architecture and performance of a new die-to-database inspection system is described. This new system is designed to inspect the aforementioned reticle types in die-to-database and die-to-die modes. Recent results from internal testing of the prototype systems are shown. The results include standard programmed defect test reticles and advanced 45nm and 32nm node reticles from industry sources. The results show high sensitivity and low false detections being achieved.
Extracting patterns of database and software usage from the bioinformatics literature
Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert
2014-01-01
Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253
Computerized database management system for breast cancer patients.
Sim, Kok Swee; Chong, Sze Siang; Tso, Chih Ping; Nia, Mohsen Esmaeili; Chong, Aun Kee; Abbas, Siti Fathimah
2014-01-01
Data analysis based on breast cancer risk factors such as age, race, breastfeeding, hormone replacement therapy, family history, and obesity was conducted on breast cancer patients using a new enhanced computerized database management system. My Structural Query Language (MySQL) is selected as the application for database management system to store the patient data collected from hospitals in Malaysia. An automatic calculation tool is embedded in this system to assist the data analysis. The results are plotted automatically and a user-friendly graphical user interface is developed that can control the MySQL database. Case studies show breast cancer incidence rate is highest among Malay women, followed by Chinese and Indian. The peak age for breast cancer incidence is from 50 to 59 years old. Results suggest that the chance of developing breast cancer is increased in older women, and reduced with breastfeeding practice. The weight status might affect the breast cancer risk differently. Additional studies are needed to confirm these findings.
Effective spatial database support for acquiring spatial information from remote sensing images
NASA Astrophysics Data System (ADS)
Jin, Peiquan; Wan, Shouhong; Yue, Lihua
2009-12-01
In this paper, a new approach to maintain spatial information acquiring from remote-sensing images is presented, which is based on Object-Relational DBMS. According to this approach, the detected and recognized results of targets are stored and able to be further accessed in an ORDBMS-based spatial database system, and users can access the spatial information using the standard SQL interface. This approach is different from the traditional ArcSDE-based method, because the spatial information management module is totally integrated into the DBMS and becomes one of the core modules in the DBMS. We focus on three issues, namely the general framework for the ORDBMS-based spatial database system, the definitions of the add-in spatial data types and operators, and the process to develop a spatial Datablade on Informix. The results show that the ORDBMS-based spatial database support for image-based target detecting and recognition is easy and practical to be implemented.
Final Results of Shuttle MMOD Impact Database
NASA Technical Reports Server (NTRS)
Hyde, J. L.; Christiansen, E. L.; Lear, D. M.
2015-01-01
The Shuttle Hypervelocity Impact Database documents damage features on each Orbiter thought to be from micrometeoroids (MM) or orbital debris (OD). Data is divided into tables for crew module windows, payload bay door radiators and thermal protection systems along with other miscellaneous regions. The combined number of records in the database is nearly 3000. Each database record provides impact feature dimensions, location on the vehicle and relevant mission information. Additional detail on the type and size of particle that produced the damage site is provided when sampling data and definitive spectroscopic analysis results are available. Guidelines are described which were used in determining whether impact damage is from micrometeoroid or orbital debris impact based on the findings from scanning electron microscopy chemical analysis. Relationships assumed when converting from observed feature sizes in different shuttle materials to particle sizes will be presented. A small number of significant impacts on the windows, radiators and wing leading edge will be highlighted and discussed in detail, including the hypervelocity impact testing performed to estimate particle sizes that produced the damage.
NASA Astrophysics Data System (ADS)
Protsyuk, Yu.; Pinigin, G.; Shulga, A.
2005-06-01
Results of the development and organization of the digital database of the Nikolaev Astronomical Observatory (NAO) are presented. At present, three telescopes are connected to the local area network of NAO. All the data obtained, and results of data processing are entered into the common database of NAO. The daily average volume of new astronomical information obtained from the CCD instruments ranges from 300 MB up to 2 GB, depending on the purposes and conditions of observations. The overwhelming majority of the data are stored in the FITS format. Development and further improvement of storage standards, procedures of data handling and data processing are being carried out. It is planned to create an astronomical web portal with the possibility to have interactive access to databases and telescopes. In the future, this resource may become a part of an international virtual observatory. There are the prototypes of search tools with the use of PHP and MySQL. Efforts for getting more links to the Internet are being made.
Search Filter Precision Can Be Improved By NOTing Out Irrelevant Content
Wilczynski, Nancy L.; McKibbon, K. Ann; Haynes, R. Brian
2011-01-01
Background: Most methodologic search filters developed for use in large electronic databases such as MEDLINE have low precision. One method that has been proposed but not tested for improving precision is NOTing out irrelevant content. Objective: To determine if search filter precision can be improved by NOTing out the text words and index terms assigned to those articles that are retrieved but are off-target. Design: Analytic survey. Methods: NOTing out unique terms in off-target articles and testing search filter performance in the Clinical Hedges Database. Main Outcome Measures: Sensitivity, specificity, precision and number needed to read (NNR). Results: For all purpose categories (diagnosis, prognosis and etiology) except treatment and for all databases (MEDLINE, EMBASE, CINAHL and PsycINFO), constructing search filters that NOTed out irrelevant content resulted in substantive improvements in NNR (over four-fold for some purpose categories and databases). Conclusion: Search filter precision can be improved by NOTing out irrelevant content. PMID:22195215
Olejniczak, Marta; Galka-Marciniak, Paulina; Polak, Katarzyna; Fligier, Andrzej; Krzyzosiak, Wlodzimierz J.
2012-01-01
The RNAimmuno database was created to provide easy access to information regarding the nonspecific effects generated in cells by RNA interference triggers and microRNA regulators. Various RNAi and microRNA reagents, which differ in length and structure, often cause non-sequence-specific immune responses, in addition to triggering the intended sequence-specific effects. The activation of the cellular sensors of foreign RNA or DNA may lead to the induction of type I interferon and proinflammatory cytokine release. Subsequent changes in the cellular transcriptome and proteome may result in adverse effects, including cell death during therapeutic treatments or the misinterpretation of experimental results in research applications. The manually curated RNAimmuno database gathers the majority of the published data regarding the immunological side effects that are caused in investigated cell lines, tissues, and model organisms by different reagents. The database is accessible at http://rnaimmuno.ibch.poznan.pl and may be helpful in the further application and development of RNAi- and microRNA-based technologies. PMID:22411954
Olejniczak, Marta; Galka-Marciniak, Paulina; Polak, Katarzyna; Fligier, Andrzej; Krzyzosiak, Wlodzimierz J
2012-05-01
The RNAimmuno database was created to provide easy access to information regarding the nonspecific effects generated in cells by RNA interference triggers and microRNA regulators. Various RNAi and microRNA reagents, which differ in length and structure, often cause non-sequence-specific immune responses, in addition to triggering the intended sequence-specific effects. The activation of the cellular sensors of foreign RNA or DNA may lead to the induction of type I interferon and proinflammatory cytokine release. Subsequent changes in the cellular transcriptome and proteome may result in adverse effects, including cell death during therapeutic treatments or the misinterpretation of experimental results in research applications. The manually curated RNAimmuno database gathers the majority of the published data regarding the immunological side effects that are caused in investigated cell lines, tissues, and model organisms by different reagents. The database is accessible at http://rnaimmuno.ibch.poznan.pl and may be helpful in the further application and development of RNAi- and microRNA-based technologies.
A psycholinguistic database for traditional Chinese character naming.
Chang, Ya-Ning; Hsu, Chun-Hsien; Tsai, Jie-Li; Chen, Chien-Liang; Lee, Chia-Ying
2016-03-01
In this study, we aimed to provide a large-scale set of psycholinguistic norms for 3,314 traditional Chinese characters, along with their naming reaction times (RTs), collected from 140 Chinese speakers. The lexical and semantic variables in the database include frequency, regularity, familiarity, consistency, number of strokes, homophone density, semantic ambiguity rating, phonetic combinability, semantic combinability, and the number of disyllabic compound words formed by a character. Multiple regression analyses were conducted to examine the predictive powers of these variables for the naming RTs. The results demonstrated that these variables could account for a significant portion of variance (55.8%) in the naming RTs. An additional multiple regression analysis was conducted to demonstrate the effects of consistency and character frequency. Overall, the regression results were consistent with the findings of previous studies on Chinese character naming. This database should be useful for research into Chinese language processing, Chinese education, or cross-linguistic comparisons. The database can be accessed via an online inquiry system (http://ball.ling.sinica.edu.tw/namingdatabase/index.html).
Benigni, Romualdo; Battistelli, Chiara Laura; Bossa, Cecilia; Tcheremenskaia, Olga; Crettaz, Pierre
2013-07-01
Currently, the public has access to a variety of databases containing mutagenicity and carcinogenicity data. These resources are crucial for the toxicologists and regulators involved in the risk assessment of chemicals, which necessitates access to all the relevant literature, and the capability to search across toxicity databases using both biological and chemical criteria. Towards the larger goal of screening chemicals for a wide range of toxicity end points of potential interest, publicly available resources across a large spectrum of biological and chemical data space must be effectively harnessed with current and evolving information technologies (i.e. systematised, integrated and mined), if long-term screening and prediction objectives are to be achieved. A key to rapid progress in the field of chemical toxicity databases is that of combining information technology with the chemical structure as identifier of the molecules. This permits an enormous range of operations (e.g. retrieving chemicals or chemical classes, describing the content of databases, finding similar chemicals, crossing biological and chemical interrogations, etc.) that other more classical databases cannot allow. This article describes the progress in the technology of toxicity databases, including the concepts of Chemical Relational Database and Toxicological Standardized Controlled Vocabularies (Ontology). Then it describes the ISSTOX cluster of toxicological databases at the Istituto Superiore di Sanitá. It consists of freely available databases characterised by the use of modern information technologies and by curation of the quality of the biological data. Finally, this article provides examples of analyses and results made possible by ISSTOX.
Performance of Stratified and Subgrouped Disproportionality Analyses in Spontaneous Databases.
Seabroke, Suzie; Candore, Gianmario; Juhlin, Kristina; Quarcoo, Naashika; Wisniewski, Antoni; Arani, Ramin; Painter, Jeffery; Tregunno, Philip; Norén, G Niklas; Slattery, Jim
2016-04-01
Disproportionality analyses are used in many organisations to identify adverse drug reactions (ADRs) from spontaneous report data. Reporting patterns vary over time, with patient demographics, and between different geographical regions, and therefore subgroup analyses or adjustment by stratification may be beneficial. The objective of this study was to evaluate the performance of subgroup and stratified disproportionality analyses for a number of key covariates within spontaneous report databases of differing sizes and characteristics. Using a reference set of established ADRs, signal detection performance (sensitivity and precision) was compared for stratified, subgroup and crude (unadjusted) analyses within five spontaneous report databases (two company, one national and two international databases). Analyses were repeated for a range of covariates: age, sex, country/region of origin, calendar time period, event seriousness, vaccine/non-vaccine, reporter qualification and report source. Subgroup analyses consistently performed better than stratified analyses in all databases. Subgroup analyses also showed benefits in both sensitivity and precision over crude analyses for the larger international databases, whilst for the smaller databases a gain in precision tended to result in some loss of sensitivity. Additionally, stratified analyses did not increase sensitivity or precision beyond that associated with analytical artefacts of the analysis. The most promising subgroup covariates were age and region/country of origin, although this varied between databases. Subgroup analyses perform better than stratified analyses and should be considered over the latter in routine first-pass signal detection. Subgroup analyses are also clearly beneficial over crude analyses for larger databases, but further validation is required for smaller databases.
Selby, Luke V; Sjoberg, Daniel D; Cassella, Danielle; Sovel, Mindy; Weiser, Martin R; Sepkowitz, Kent; Jones, David R; Strong, Vivian E
2015-06-15
Surgical quality improvement requires accurate tracking and benchmarking of postoperative adverse events. We track surgical site infections (SSIs) with two systems; our in-house surgical secondary events (SSE) database and the National Surgical Quality Improvement Project (NSQIP). The SSE database, a modification of the Clavien-Dindo classification, categorizes SSIs by their anatomic site, whereas NSQIP categorizes by their level. Our aim was to directly compare these different definitions. NSQIP and the SSE database entries for all surgeries performed in 2011 and 2012 were compared. To match NSQIP definitions, and while blinded to NSQIP results, entries in the SSE database were categorized as either incisional (superficial or deep) or organ space infections. These categorizations were compared with NSQIP records; agreement was assessed with Cohen kappa. The 5028 patients in our cohort had a 6.5% SSI in the SSE database and a 4% rate in NSQIP, with an overall agreement of 95% (kappa = 0.48, P < 0.0001). The rates of categorized infections were similarly well matched; incisional rates of 4.1% and 2.7% for the SSE database and NSQIP and organ space rates of 2.6% and 1.5%. Overall agreements were 96% (kappa = 0.36, P < 0.0001) and 98% (kappa = 0.55, P < 0.0001), respectively. Over 80% of cases recorded by the SSE database but not NSQIP did not meet NSQIP criteria. The SSE database is an accurate, real-time record of postoperative SSIs. Institutional databases that capture all surgical cases can be used in conjunction with NSQIP with excellent concordance. Copyright © 2015 Elsevier Inc. All rights reserved.
Martin, Stanton L; Blackmon, Barbara P; Rajagopalan, Ravi; Houfek, Thomas D; Sceeles, Robert G; Denn, Sheila O; Mitchell, Thomas K; Brown, Douglas E; Wing, Rod A; Dean, Ralph A
2002-01-01
We have created a federated database for genome studies of Magnaporthe grisea, the causal agent of rice blast disease, by integrating end sequence data from BAC clones, genetic marker data and BAC contig assembly data. A library of 9216 BAC clones providing >25-fold coverage of the entire genome was end sequenced and fingerprinted by HindIII digestion. The Image/FPC software package was then used to generate an assembly of 188 contigs covering >95% of the genome. The database contains the results of this assembly integrated with hybridization data of genetic markers to the BAC library. AceDB was used for the core database engine and a MySQL relational database, populated with numerical representations of BAC clones within FPC contigs, was used to create appropriately scaled images. The database is being used to facilitate sequencing efforts. The database also allows researchers mapping known genes or other sequences of interest, rapid and easy access to the fundamental organization of the M.grisea genome. This database, MagnaportheDB, can be accessed on the web at http://www.cals.ncsu.edu/fungal_genomics/mgdatabase/int.htm.
New tools and methods for direct programmatic access to the dbSNP relational database.
Saccone, Scott F; Quan, Jiaxi; Mehta, Gaurang; Bolze, Raphael; Thomas, Prasanth; Deelman, Ewa; Tischfield, Jay A; Rice, John P
2011-01-01
Genome-wide association studies often incorporate information from public biological databases in order to provide a biological reference for interpreting the results. The dbSNP database is an extensive source of information on single nucleotide polymorphisms (SNPs) for many different organisms, including humans. We have developed free software that will download and install a local MySQL implementation of the dbSNP relational database for a specified organism. We have also designed a system for classifying dbSNP tables in terms of common tasks we wish to accomplish using the database. For each task we have designed a small set of custom tables that facilitate task-related queries and provide entity-relationship diagrams for each task composed from the relevant dbSNP tables. In order to expose these concepts and methods to a wider audience we have developed web tools for querying the database and browsing documentation on the tables and columns to clarify the relevant relational structure. All web tools and software are freely available to the public at http://cgsmd.isi.edu/dbsnpq. Resources such as these for programmatically querying biological databases are essential for viably integrating biological information into genetic association experiments on a genome-wide scale.
PoMaMo--a comprehensive database for potato genome data.
Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane
2005-01-01
A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes.
PoMaMo—a comprehensive database for potato genome data
Meyer, Svenja; Nagel, Axel; Gebhardt, Christiane
2005-01-01
A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes. PMID:15608284
Osteoporosis therapies: evidence from health-care databases and observational population studies.
Silverman, Stuart L
2010-11-01
Osteoporosis is a well-recognized disease with severe consequences if left untreated. Randomized controlled trials are the most rigorous method for determining the efficacy and safety of therapies. Nevertheless, randomized controlled trials underrepresent the real-world patient population and are costly in both time and money. Modern technology has enabled researchers to use information gathered from large health-care or medical-claims databases to assess the practical utilization of available therapies in appropriate patients. Observational database studies lack randomization but, if carefully designed and successfully completed, can provide valuable information that complements results obtained from randomized controlled trials and extends our knowledge to real-world clinical patients. Randomized controlled trials comparing fracture outcomes among osteoporosis therapies are difficult to perform. In this regard, large observational database studies could be useful in identifying clinically important differences among therapeutic options. Database studies can also provide important information with regard to osteoporosis prevalence, health economics, and compliance and persistence with treatment. This article describes the strengths and limitations of both randomized controlled trials and observational database studies, discusses considerations for observational study design, and reviews a wealth of information generated by database studies in the field of osteoporosis.
Searching fee and non-fee toxicology information resources: an overview of selected databases.
Wright, L L
2001-01-12
Toxicology profiles organize information by broad subjects, the first of which affirms identity of the agent studied. Studies here show two non-fee databases (ChemFinder and ChemIDplus) verify the identity of compounds with high efficiency (63% and 73% respectively) with the fee-based Chemical Abstracts Registry file serving well to fill data gaps (100%). Continued searching proceeds using knowledge of structure, scope and content to select databases. Valuable sources for information are factual databases that collect data and facts in special subject areas organized in formats available for analysis or use. Some sources representative of factual files are RTECS, CCRIS, HSDB, GENE-TOX and IRIS. Numerous factual databases offer a wealth of reliable information; however, exhaustive searches probe information published in journal articles and/or technical reports with records residing in bibliographic databases such as BIOSIS, EMBASE, MEDLINE, TOXLINE and Web of Science. Listed with descriptions are numerous factual and bibliographic databases supplied by 11 producers. Given the multitude of options and resources, it is often necessary to seek service desk assistance. Questions were posed by telephone and e-mail to service desks at DIALOG, ISI, MEDLARS, Micromedex and STN International. Results of the survey are reported.
Distribution Characteristics of Air-Bone Gaps – Evidence of Bias in Manual Audiometry
Margolis, Robert H.; Wilson, Richard H.; Popelka, Gerald R.; Eikelboom, Robert H.; Swanepoel, De Wet; Saly, George L.
2015-01-01
Objective Five databases were mined to examine distributions of air-bone gaps obtained by automated and manual audiometry. Differences in distribution characteristics were examined for evidence of influences unrelated to the audibility of test signals. Design The databases provided air- and bone-conduction thresholds that permitted examination of air-bone gap distributions that were free of ceiling and floor effects. Cases with conductive hearing loss were eliminated based on air-bone gaps, tympanometry, and otoscopy, when available. The analysis is based on 2,378,921 threshold determinations from 721,831 subjects from five databases. Results Automated audiometry produced air-bone gaps that were normally distributed suggesting that air- and bone-conduction thresholds are normally distributed. Manual audiometry produced air-bone gaps that were not normally distributed and show evidence of biasing effects of assumptions of expected results. In one database, the form of the distributions showed evidence of inclusion of conductive hearing losses. Conclusions Thresholds obtained by manual audiometry show tester bias effects from assumptions of the patient’s hearing loss characteristics. Tester bias artificially reduces the variance of bone-conduction thresholds and the resulting air-bone gaps. Because the automated method is free of bias from assumptions of expected results, these distributions are hypothesized to reflect the true variability of air- and bone-conduction thresholds and the resulting air-bone gaps. PMID:26627469
Validating crash locations for quantitative spatial analysis: a GIS-based approach.
Loo, Becky P Y
2006-09-01
In this paper, the spatial variables of the crash database in Hong Kong from 1993 to 2004 are validated. The proposed spatial data validation system makes use of three databases (the crash, road network and district board databases) and relies on GIS to carry out most of the validation steps so that the human resource required for manually checking the accuracy of the spatial data can be enormously reduced. With the GIS-based spatial data validation system, it was found that about 65-80% of the police crash records from 1993 to 2004 had correct road names and district board information. In 2004, the police crash database contained about 12.7% mistakes for road names and 9.7% mistakes for district boards. The situation was broadly comparable to the United Kingdom. However, the results also suggest that safety researchers should carefully validate spatial data in the crash database before scientific analysis.
Nørgaard, M; Johnsen, S P
2016-02-01
In Denmark, the need for monitoring of clinical quality and patient safety with feedback to the clinical, administrative and political systems has resulted in the establishment of a network of more than 60 publicly financed nationwide clinical quality databases. Although primarily devoted to monitoring and improving quality of care, the potential of these databases as data sources in clinical research is increasingly being recognized. In this review, we describe these databases focusing on their use as data sources for clinical research, including their strengths and weaknesses as well as future concerns and opportunities. The research potential of the clinical quality databases is substantial but has so far only been explored to a limited extent. Efforts related to technical, legal and financial challenges are needed in order to take full advantage of this potential. © 2016 The Association for the Publication of the Journal of Internal Medicine.
Report on Wind Turbine Subsystem Reliability - A Survey of Various Databases (Presentation)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sheng, S.
2013-07-01
Wind industry has been challenged by premature subsystem/component failures. Various reliability data collection efforts have demonstrated their values in supporting wind turbine reliability and availability research & development and industrial activities. However, most information on these data collection efforts are scattered and not in a centralized place. With the objective of getting updated reliability statistics of wind turbines and/or subsystems so as to benefit future wind reliability and availability activities, this report is put together based on a survey of various reliability databases that are accessible directly or indirectly by NREL. For each database, whenever feasible, a brief description summarizingmore » database population, life span, and data collected is given along with its features & status. Then selective results deemed beneficial to the industry and generated based on the database are highlighted. This report concludes with several observations obtained throughout the survey and several reliability data collection opportunities in the future.« less
A new feature constituting approach to detection of vocal fold pathology
NASA Astrophysics Data System (ADS)
Hariharan, M.; Polat, Kemal; Yaacob, Sazali
2014-08-01
In the last two decades, non-invasive methods through acoustic analysis of voice signal have been proved to be excellent and reliable tool to diagnose vocal fold pathologies. This paper proposes a new feature vector based on the wavelet packet transform and singular value decomposition for the detection of vocal fold pathology. k-means clustering based feature weighting is proposed to increase the distinguishing performance of the proposed features. In this work, two databases Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database and MAPACI speech pathology database are used. Four different supervised classifiers such as k-nearest neighbour (k-NN), least-square support vector machine, probabilistic neural network and general regression neural network are employed for testing the proposed features. The experimental results uncover that the proposed features give very promising classification accuracy of 100% for both MEEI database and MAPACI speech pathology database.
Designing a Multi-Petabyte Database for LSST
DOE Office of Scientific and Technical Information (OSTI.GOV)
Becla, Jacek; Hanushevsky, Andrew; Nikolaev, Sergei
2007-01-10
The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are beingmore » evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.« less
Exploring Discretization Error in Simulation-Based Aerodynamic Databases
NASA Technical Reports Server (NTRS)
Aftosmis, Michael J.; Nemec, Marian
2010-01-01
This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretization errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tolerances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simulations using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation highlights sensitivities of the database under a variety of conditions. The presence of transonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh resolution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, controlling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.
Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal
2013-01-01
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
NoSQL technologies for the CMS Conditions Database
NASA Astrophysics Data System (ADS)
Sipos, Roland
2015-12-01
With the restart of the LHC in 2015, the growth of the CMS Conditions dataset will continue, therefore the need of consistent and highly available access to the Conditions makes a great cause to revisit different aspects of the current data storage solutions. We present a study of alternative data storage backends for the Conditions Databases, by evaluating some of the most popular NoSQL databases to support a key-value representation of the CMS Conditions. The definition of the database infrastructure is based on the need of storing the conditions as BLOBs. Because of this, each condition can reach the size that may require special treatment (splitting) in these NoSQL databases. As big binary objects may be problematic in several database systems, and also to give an accurate baseline, a testing framework extension was implemented to measure the characteristics of the handling of arbitrary binary data in these databases. Based on the evaluation, prototypes of a document store, using a column-oriented and plain key-value store, are deployed. An adaption layer to access the backends in the CMS Offline software was developed to provide transparent support for these NoSQL databases in the CMS context. Additional data modelling approaches and considerations in the software layer, deployment and automatization of the databases are also covered in the research. In this paper we present the results of the evaluation as well as a performance comparison of the prototypes studied.
Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal
2013-01-01
We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456
The Danish Nonmelanoma Skin Cancer Dermatology Database.
Lamberg, Anna Lei; Sølvsten, Henrik; Lei, Ulrikke; Vinding, Gabrielle Randskov; Stender, Ida Marie; Jemec, Gregor Borut Ernst; Vestergaard, Tine; Thormann, Henrik; Hædersdal, Merete; Dam, Tomas Norman; Olesen, Anne Braae
2016-01-01
The Danish Nonmelanoma Skin Cancer Dermatology Database was established in 2008. The aim of this database was to collect data on nonmelanoma skin cancer (NMSC) treatment and improve its treatment in Denmark. NMSC is the most common malignancy in the western countries and represents a significant challenge in terms of public health management and health care costs. However, high-quality epidemiological and treatment data on NMSC are sparse. The NMSC database includes patients with the following skin tumors: basal cell carcinoma (BCC), squamous cell carcinoma, Bowen's disease, and keratoacanthoma diagnosed by the participating office-based dermatologists in Denmark. Clinical and histological diagnoses, BCC subtype, localization, size, skin cancer history, skin phototype, and evidence of metastases and treatment modality are the main variables in the NMSC database. Information on recurrence, cosmetic results, and complications are registered at two follow-up visits at 3 months (between 0 and 6 months) and 12 months (between 6 and 15 months) after treatment. In 2014, 11,522 patients with 17,575 tumors were registered in the database. Of tumors with a histological diagnosis, 13,571 were BCCs, 840 squamous cell carcinomas, 504 Bowen's disease, and 173 keratoakanthomas. The NMSC database encompasses detailed information on the type of tumor, a variety of prognostic factors, treatment modalities, and outcomes after treatment. The database has revealed that overall, the quality of care of NMSC in Danish dermatological clinics is high, and the database provides the necessary data for continuous quality assurance.
Patel, Vanash M.; Ashrafian, Hutan; Almoudaris, Alex; Makanjuola, Jonathan; Bucciarelli-Ducci, Chiara; Darzi, Ara; Athanasiou, Thanos
2013-01-01
Objectives To compare H index scores for healthcare researchers returned by Google Scholar, Web of Science and Scopus databases, and to assess whether a researcher's age, country of institutional affiliation and physician status influences calculations. Subjects and Methods One hundred and ninety-five Nobel laureates in Physiology and Medicine from 1901 to 2009 were considered. Year of first and last publications, total publications and citation counts, and the H index for each laureate were calculated from each database. Cronbach's alpha statistics was used to measure the reliability of H index scores between the databases. Laureate characteristic influence on the H index was analysed using linear regression. Results There was no concordance between the databases when considering the number of publications and citations count per laureate. The H index was the most reliably calculated bibliometric across the three databases (Cronbach's alpha = 0.900). All databases returned significantly higher H index scores for younger laureates (p < 0.0001). Google Scholar and Web of Science returned significantly higher H index for physician laureates (p = 0.025 and p = 0.029, respectively). Country of institutional affiliation did not influence the H index in any database. Conclusion The H index appeared to be the most consistently calculated bibliometric between the databases for Nobel laureates in Physiology and Medicine. Researcher-specific characteristics constituted an important component of objective research assessment. The findings of this study call to question the choice of current and future academic performance databases. PMID:22964880
A dedicated database system for handling multi-level data in systems biology.
Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens
2014-01-01
Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
Private and Efficient Query Processing on Outsourced Genomic Databases.
Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian
2017-09-01
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.
Private and Efficient Query Processing on Outsourced Genomic Databases
Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian
2017-01-01
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660
MAGIC database and interfaces: an integrated package for gene discovery and expression.
Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H
2004-01-01
The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
Michaleff, Zoe A; Costa, Leonardo O P; Moseley, Anne M; Maher, Christopher G; Elkins, Mark R; Herbert, Robert D; Sherrington, Catherine
2011-02-01
Many bibliographic databases index research studies evaluating the effects of health care interventions. One study has concluded that the Physiotherapy Evidence Database (PEDro) has the most complete indexing of reports of randomized controlled trials of physical therapy interventions, but the design of that study may have exaggerated estimates of the completeness of indexing by PEDro. The purpose of this study was to compare the completeness of indexing of reports of randomized controlled trials of physical therapy interventions by 8 bibliographic databases. This study was an audit of bibliographic databases. Prespecified criteria were used to identify 400 reports of randomized controlled trials from the reference lists of systematic reviews published in 2008 that evaluated physical therapy interventions. Eight databases (AMED, CENTRAL, CINAHL, EMBASE, Hooked on Evidence, PEDro, PsycINFO, and PubMed) were searched for each trial report. The proportion of the 400 trial reports indexed by each database was calculated. The proportions of the 400 trial reports indexed by the databases were as follows: CENTRAL, 95%; PEDro, 92%; PubMed, 89%; EMBASE, 88%; CINAHL, 53%; AMED, 50%; Hooked on Evidence, 45%; and PsycINFO, 6%. Almost all of the trial reports (99%) were found in at least 1 database, and 88% were indexed by 4 or more databases. Four trial reports were uniquely indexed by a single database only (2 in CENTRAL and 1 each in PEDro and PubMed). The results are only applicable to searching for English-language published reports of randomized controlled trials evaluating physical therapy interventions. The 4 most comprehensive databases of trial reports evaluating physical therapy interventions were CENTRAL, PEDro, PubMed, and EMBASE. Clinicians seeking quick answers to clinical questions could search any of these databases knowing that all are reasonably comprehensive. PEDro, unlike the other 3 most complete databases, is specific to physical therapy, so studies not relevant to physical therapy are less likely to be retrieved. Researchers could use CENTRAL, PEDro, PubMed, and EMBASE in combination to conduct exhaustive searches for randomized trials in physical therapy.
NASA Astrophysics Data System (ADS)
Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.
2017-12-01
Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.
Exploring consumer pathways and patterns of use for ...
Background: Humans may be exposed to thousands of chemicals through contact in the workplace, home, and via air, water, food, and soil. A major challenge is estimating exposures to these chemicals, which requires understanding potential exposure routes directly related to how chemicals are used. Objectives: We aimed to assign “use categories” to a database of chemicals, including ingredients in consumer products, to help prioritize which chemicals will be given more scrutiny relative to human exposure potential and target populations. The goal was to identify (a) human activities that result in increased chemical exposure while (b) simplifying the dimensionality of hazard assessment for risk characterization. Methods: Major data sources on consumer- and industrial-process based chemical uses were compiled from multiple countries, including from regulatory agencies, manufacturers, and retailers. The resulting categorical chemical use and functional information are presented through the Chemical/Product Categories Database (CPCat). Results: CPCat contains information on 43,596 unique chemicals mapped to 833 terms categorizing their usage or function. Examples presented demonstrate potential applications of the CPCat database, including the identification of chemicals to which children may be exposed (including those that are not identified on product ingredient labels), and prioritization of chemicals for toxicity screening. The CPCat database is availabl
Glycan fragment database: a database of PDB-based glycan 3D structures.
Jo, Sunhwan; Im, Wonpil
2013-01-01
The glycan fragment database (GFDB), freely available at http://www.glycanstructure.org, is a database of the glycosidic torsion angles derived from the glycan structures in the Protein Data Bank (PDB). Analogous to protein structure, the structure of an oligosaccharide chain in a glycoprotein, referred to as a glycan, can be characterized by the torsion angles of glycosidic linkages between relatively rigid carbohydrate monomeric units. Knowledge of accessible conformations of biologically relevant glycans is essential in understanding their biological roles. The GFDB provides an intuitive glycan sequence search tool that allows the user to search complex glycan structures. After a glycan search is complete, each glycosidic torsion angle distribution is displayed in terms of the exact match and the fragment match. The exact match results are from the PDB entries that contain the glycan sequence identical to the query sequence. The fragment match results are from the entries with the glycan sequence whose substructure (fragment) or entire sequence is matched to the query sequence, such that the fragment results implicitly include the influences from the nearby carbohydrate residues. In addition, clustering analysis based on the torsion angle distribution can be performed to obtain the representative structures among the searched glycan structures.
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
A Circular Dichroism Reference Database for Membrane Proteins
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace,B.; Wien, F.; Stone, T.
2006-01-01
Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
Algorithms for database-dependent search of MS/MS data.
Matthiesen, Rune
2013-01-01
The frequent used bottom-up strategy for identification of proteins and their associated modifications generate nowadays typically thousands of MS/MS spectra that normally are matched automatically against a protein sequence database. Search engines that take as input MS/MS spectra and a protein sequence database are referred as database-dependent search engines. Many programs both commercial and freely available exist for database-dependent search of MS/MS spectra and most of the programs have excellent user documentation. The aim here is therefore to outline the algorithm strategy behind different search engines rather than providing software user manuals. The process of database-dependent search can be divided into search strategy, peptide scoring, protein scoring, and finally protein inference. Most efforts in the literature have been put in to comparing results from different software rather than discussing the underlining algorithms. Such practical comparisons can be cluttered by suboptimal implementation and the observed differences are frequently caused by software parameters settings which have not been set proper to allow even comparison. In other words an algorithmic idea can still be worth considering even if the software implementation has been demonstrated to be suboptimal. The aim in this chapter is therefore to split the algorithms for database-dependent searching of MS/MS data into the above steps so that the different algorithmic ideas become more transparent and comparable. Most search engines provide good implementations of the first three data analysis steps mentioned above, whereas the final step of protein inference are much less developed for most search engines and is in many cases performed by an external software. The final part of this chapter illustrates how protein inference is built into the VEMS search engine and discusses a stand-alone program SIR for protein inference that can import a Mascot search result.
Ali, Zulfiqar; Alsulaiman, Mansour; Muhammad, Ghulam; Elamvazuthi, Irraivan; Al-Nasheri, Ahmed; Mesallam, Tamer A; Farahat, Mohamed; Malki, Khalid H
2017-05-01
A large population around the world has voice complications. Various approaches for subjective and objective evaluations have been suggested in the literature. The subjective approach strongly depends on the experience and area of expertise of a clinician, and human error cannot be neglected. On the other hand, the objective or automatic approach is noninvasive. Automatic developed systems can provide complementary information that may be helpful for a clinician in the early screening of a voice disorder. At the same time, automatic systems can be deployed in remote areas where a general practitioner can use them and may refer the patient to a specialist to avoid complications that may be life threatening. Many automatic systems for disorder detection have been developed by applying different types of conventional speech features such as the linear prediction coefficients, linear prediction cepstral coefficients, and Mel-frequency cepstral coefficients (MFCCs). This study aims to ascertain whether conventional speech features detect voice pathology reliably, and whether they can be correlated with voice quality. To investigate this, an automatic detection system based on MFCC was developed, and three different voice disorder databases were used in this study. The experimental results suggest that the accuracy of the MFCC-based system varies from database to database. The detection rate for the intra-database ranges from 72% to 95%, and that for the inter-database is from 47% to 82%. The results conclude that conventional speech features are not correlated with voice, and hence are not reliable in pathology detection. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
The use and misuse of biomedical data: is bigger really better?
Hoffman, Sharona; Podgurski, Andy
2013-01-01
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is "big data" necessarily better data? This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for approaching such research with care and being cautious in relying on its outcomes for purposes of public policy or litigation. First, the data contained in biomedical databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of research results, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policymakers. In short, this paper sheds much-needed light on the problems of credulous and uninformed acceptance of research results derived from biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policymakers, and the public at large. The Article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
The MPI Emotional Body Expressions Database for Narrative Scenarios
Volkova, Ekaterina; de la Rosa, Stephan; Bülthoff, Heinrich H.; Mohler, Betty
2014-01-01
Emotion expression in human-human interaction takes place via various types of information, including body motion. Research on the perceptual-cognitive mechanisms underlying the processing of natural emotional body language can benefit greatly from datasets of natural emotional body expressions that facilitate stimulus manipulation and analysis. The existing databases have so far focused on few emotion categories which display predominantly prototypical, exaggerated emotion expressions. Moreover, many of these databases consist of video recordings which limit the ability to manipulate and analyse the physical properties of these stimuli. We present a new database consisting of a large set (over 1400) of natural emotional body expressions typical of monologues. To achieve close-to-natural emotional body expressions, amateur actors were narrating coherent stories while their body movements were recorded with motion capture technology. The resulting 3-dimensional motion data recorded at a high frame rate (120 frames per second) provides fine-grained information about body movements and allows the manipulation of movement on a body joint basis. For each expression it gives the positions and orientations in space of 23 body joints for every frame. We report the results of physical motion properties analysis and of an emotion categorisation study. The reactions of observers from the emotion categorisation study are included in the database. Moreover, we recorded the intended emotion expression for each motion sequence from the actor to allow for investigations regarding the link between intended and perceived emotions. The motion sequences along with the accompanying information are made available in a searchable MPI Emotional Body Expression Database. We hope that this database will enable researchers to study expression and perception of naturally occurring emotional body expressions in greater depth. PMID:25461382
The Marshall Islands Data Management Program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoker, A.C.; Conrado, C.L.
1995-09-01
This report is a resource document of the methods and procedures used currently in the Data Management Program of the Marshall Islands Dose Assessment and Radioecology Project. Since 1973, over 60,000 environmental samples have been collected. Our program includes relational database design, programming and maintenance; sample and information management; sample tracking; quality control; and data entry, evaluation and reduction. The usefulness of scientific databases involves careful planning in order to fulfill the requirements of any large research program. Compilation of scientific results requires consolidation of information from several databases, and incorporation of new information as it is generated. The successmore » in combining and organizing all radionuclide analysis, sample information and statistical results into a readily accessible form, is critical to our project.« less
Hyaline membrane disease is underreported in a linked birth-infant death certificate database.
Hamvas, A; Kwong, P; DeBaun, M; Schramm, W; Cole, F S
1998-01-01
OBJECTIVE: This study compared the Missouri State Department of Health linked birth-infant death certificate database and medical records with respect to recording hyaline membrane disease in very low-birth-weight infants. METHODS: We reviewed the records for all 976 infants weighing 500 to 1500 g who were born to St. Louis, Mo, residents in 1989, 1991, and 1992. RESULTS: Eighteen percent of the birth certificates and 54% of the medical records documented hyaline membrane disease, resulting in 34% sensitivity and 99% specificity. CONCLUSIONS: The Missouri State Department of Health birth-infant death certificate database underestimates the incidence of hyaline membrane disease, which suggest that national statistics for the disease are also underestimated. PMID:9736884
Content Based Image Retrieval based on Wavelet Transform coefficients distribution
Lamard, Mathieu; Cazuguel, Guy; Quellec, Gwénolé; Bekri, Lynda; Roux, Christian; Cochener, Béatrice
2007-01-01
In this paper we propose a content based image retrieval method for diagnosis aid in medical fields. We characterize images without extracting significant features by using distribution of coefficients obtained by building signatures from the distribution of wavelet transform. The research is carried out by computing signature distances between the query and database images. Several signatures are proposed; they use a model of wavelet coefficient distribution. To enhance results, a weighted distance between signatures is used and an adapted wavelet base is proposed. Retrieval efficiency is given for different databases including a diabetic retinopathy, a mammography and a face database. Results are promising: the retrieval efficiency is higher than 95% for some cases using an optimization process. PMID:18003013
Kaulard, Kathrin; Cunningham, Douglas W.; Bülthoff, Heinrich H.; Wallraven, Christian
2012-01-01
The ability to communicate is one of the core aspects of human life. For this, we use not only verbal but also nonverbal signals of remarkable complexity. Among the latter, facial expressions belong to the most important information channels. Despite the large variety of facial expressions we use in daily life, research on facial expressions has so far mostly focused on the emotional aspect. Consequently, most databases of facial expressions available to the research community also include only emotional expressions, neglecting the largely unexplored aspect of conversational expressions. To fill this gap, we present the MPI facial expression database, which contains a large variety of natural emotional and conversational expressions. The database contains 55 different facial expressions performed by 19 German participants. Expressions were elicited with the help of a method-acting protocol, which guarantees both well-defined and natural facial expressions. The method-acting protocol was based on every-day scenarios, which are used to define the necessary context information for each expression. All facial expressions are available in three repetitions, in two intensities, as well as from three different camera angles. A detailed frame annotation is provided, from which a dynamic and a static version of the database have been created. In addition to describing the database in detail, we also present the results of an experiment with two conditions that serve to validate the context scenarios as well as the naturalness and recognizability of the video sequences. Our results provide clear evidence that conversational expressions can be recognized surprisingly well from visual information alone. The MPI facial expression database will enable researchers from different research fields (including the perceptual and cognitive sciences, but also affective computing, as well as computer vision) to investigate the processing of a wider range of natural facial expressions. PMID:22438875
Kim, Chang-Gon; Mun, Su-Jeong; Kim, Ka-Na; Shin, Byung-Cheul; Kim, Nam-Kwen; Lee, Dong-Hyo; Lee, Jung-Han
2016-05-13
Manual therapy is the non-surgical conservative management of musculoskeletal disorders using the practitioner's hands on the patient's body for diagnosing and treating disease. The aim of this study is to systematically review trial-based economic evaluations of manual therapy relative to other interventions used for the management of musculoskeletal diseases. Randomised clinical trials (RCTs) on the economic evaluation of manual therapy for musculoskeletal diseases will be included in the review. The following databases will be searched from their inception: Medline, Embase, Cochrane Central Register of Controlled Trials (CENTRAL), Cumulative Index to Nursing and Allied Health Literature (CINAHL), Econlit, Mantis, Index to Chiropractic Literature, Science Citation Index, Social Science Citation Index, Allied and Complementary Medicine Database (AMED), Cochrane Database of Systematic Reviews (CDSR), National Health Service Database of Abstracts of Reviews of Effects (NHS DARE), National Health Service Health Technology Assessment Database (NHS HTA), National Health Service Economic Evaluation Database (NHS EED), CENTRAL, five Korean medical databases (Oriental Medicine Advanced Searching Integrated System (OASIS), Research Information Service System (RISS), DBPIA, Korean Traditional Knowledge Portal (KTKP) and KoreaMed) and three Chinese databases (China National Knowledge Infrastructure (CNKI), VIP and Wanfang). The evidence for the cost-effectiveness, cost-utility and cost-benefit of manual therapy for musculoskeletal diseases will be assessed as the primary outcome. Health-related quality of life and adverse effects will be assessed as secondary outcomes. We will critically appraise the included studies using the Cochrane risk of bias tool and the Drummond checklist. Results will be summarised using Slavin's qualitative best-evidence synthesis approach. The results of the study will be disseminated via a peer-reviewed journal and/or conference presentations. PROSPERO CRD42015026757. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Abraha, Iosief; Giovannini, Gianni; Serraino, Diego; Fusco, Mario; Montedori, Alessandro
2016-03-18
Breast, lung and colorectal cancers constitute the most common cancers worldwide and their epidemiology, related health outcomes and quality indicators can be studied using administrative healthcare databases. To constitute a reliable source for research, administrative healthcare databases need to be validated. The aim of this protocol is to perform the first systematic review of studies reporting the validation of International Classification of Diseases 9th and 10th revision codes to identify breast, lung and colorectal cancer diagnoses in administrative healthcare databases. This review protocol has been developed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocol (PRISMA-P) 2015 statement. We will search the following databases: MEDLINE, EMBASE, Web of Science and the Cochrane Library, using appropriate search strategies. We will include validation studies that used administrative data to identify breast, lung and colorectal cancer diagnoses or studies that evaluated the validity of breast, lung and colorectal cancer codes in administrative data. The following inclusion criteria will be used: (1) the presence of a reference standard case definition for the disease of interest; (2) the presence of at least one test measure (eg, sensitivity, positive predictive values, etc) and (3) the use of data source from an administrative database. Pairs of reviewers will independently abstract data using standardised forms and will assess quality using a checklist based on the Standards for Reporting of Diagnostic accuracy (STARD) criteria. Ethics approval is not required. We will submit results of this study to a peer-reviewed journal for publication. The results will serve as a guide to identify appropriate case definitions and algorithms of breast, lung and colorectal cancers for researchers involved in validating administrative healthcare databases as well as for outcome research on these conditions that used administrative healthcare databases. CRD42015026881. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
Wang, Jingjing; Sun, Tao; Gao, Ni; Menon, Desmond Dev; Luo, Yanxia; Gao, Qi; Li, Xia; Wang, Wei; Zhu, Huiping; Lv, Pingxin; Liang, Zhigang; Tao, Lixin; Liu, Xiangtong; Guo, Xiuhua
2014-01-01
Objective To determine the value of contourlet textural features obtained from solitary pulmonary nodules in two dimensional CT images used in diagnoses of lung cancer. Materials and Methods A total of 6,299 CT images were acquired from 336 patients, with 1,454 benign pulmonary nodule images from 84 patients (50 male, 34 female) and 4,845 malignant from 252 patients (150 male, 102 female). Further to this, nineteen patient information categories, which included seven demographic parameters and twelve morphological features, were also collected. A contourlet was used to extract fourteen types of textural features. These were then used to establish three support vector machine models. One comprised a database constructed of nineteen collected patient information categories, another included contourlet textural features and the third one contained both sets of information. Ten-fold cross-validation was used to evaluate the diagnosis results for the three databases, with sensitivity, specificity, accuracy, the area under the curve (AUC), precision, Youden index, and F-measure were used as the assessment criteria. In addition, the synthetic minority over-sampling technique (SMOTE) was used to preprocess the unbalanced data. Results Using a database containing textural features and patient information, sensitivity, specificity, accuracy, AUC, precision, Youden index, and F-measure were: 0.95, 0.71, 0.89, 0.89, 0.92, 0.66, and 0.93 respectively. These results were higher than results derived using the database without textural features (0.82, 0.47, 0.74, 0.67, 0.84, 0.29, and 0.83 respectively) as well as the database comprising only textural features (0.81, 0.64, 0.67, 0.72, 0.88, 0.44, and 0.85 respectively). Using the SMOTE as a pre-processing procedure, new balanced database generated, including observations of 5,816 benign ROIs and 5,815 malignant ROIs, and accuracy was 0.93. Conclusion Our results indicate that the combined contourlet textural features of solitary pulmonary nodules in CT images with patient profile information could potentially improve the diagnosis of lung cancer. PMID:25250576
EPA's Integrated Risk Information System (IRIS) database was developed and is maintained by EPA's Office of Research and Developement, National Center for Environmental Assessment. IRIS is a database of human health effects that may result from exposure to various substances fou...
ERIC Educational Resources Information Center
Antonucci, Yvonne Lederer; Wozny, Lucy Anne
1996-01-01
Identifies and describes sublevels of novices using a database management package, clustering those whose interaction is effective, partially effective, and totally ineffective. Among assistance documentation, functional tree diagrams (FTDs) were more beneficial to partially effective users than traditional reference material. The results have…
THE HUMAN EXPOSURE DATABASE SYSTEM (HEDS)-PUTTING THE NHEXAS DATA ON-LINE
The EPA's National Exposure Research Laboratory (NERL) has developed an Internet accessible Human Exposure Database System (HEDS) to provide the results of NERL human exposure studies to both the EPA and the external scientific communities. The first data sets that will be ava...
2012-01-01
Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
Mapping the literature of nursing: 1996–2000
Allen, Margaret (Peg); Jacobs, Susan Kaplan; Levy, June R.
2006-01-01
Introduction: This project is a collaborative effort of the Task Force on Mapping the Nursing Literature of the Nursing and Allied Health Resources Section of the Medical Library Association. This overview summarizes eighteen studies covering general nursing and sixteen specialties. Method: Following a common protocol, citations from source journals were analyzed for a three-year period within the years 1996 to 2000. Analysis included cited formats, age, and ranking of the frequency of cited journal titles. Highly cited journals were analyzed for coverage in twelve health sciences and academic databases. Results: Journals were the most frequently cited format, followed by books. More than 60% of the cited resources were published in the previous seven years. Bradford's law was validated, with a small core of cited journals accounting for a third of the citations. Medical and science databases provided the most comprehensive access for biomedical titles, while CINAHL and PubMed provided the best access for nursing journals. Discussion: Beyond a heavily cited core, nursing journal citations are widely dispersed among a variety of sources and disciplines, with corresponding access via a variety of bibliographic tools. Results underscore the interdisciplinary nature of the nursing profession. Conclusion: For comprehensive searches, nurses need to search multiple databases. Libraries need to provide access to databases beyond PubMed, including CINAHL and academic databases. Database vendors should improve their coverage of nursing, biomedical, and psychosocial titles identified in these studies. Additional research is needed to update these studies and analyze nursing specialties not covered. PMID:16636714
Mock jurors' use of error rates in DNA database trawls.
Scurich, Nicholas; John, Richard S
2013-12-01
Forensic science is not infallible, as data collected by the Innocence Project have revealed. The rate at which errors occur in forensic DNA testing-the so-called "gold standard" of forensic science-is not currently known. This article presents a Bayesian analysis to demonstrate the profound impact that error rates have on the probative value of a DNA match. Empirical evidence on whether jurors are sensitive to this effect is equivocal: Studies have typically found they are not, while a recent, methodologically rigorous study found that they can be. This article presents the results of an experiment that examined this issue within the context of a database trawl case in which one DNA profile was tested against a multitude of profiles. The description of the database was manipulated (i.e., "medical" or "offender" database, or not specified) as was the rate of error (i.e., one-in-10 or one-in-1,000). Jury-eligible participants were nearly twice as likely to convict in the offender database condition compared to the condition not specified. The error rates did not affect verdicts. Both factors, however, affected the perception of the defendant's guilt, in the expected direction, although the size of the effect was meager compared to Bayesian prescriptions. The results suggest that the disclosure of an offender database to jurors might constitute prejudicial evidence, and calls for proficiency testing in forensic science as well as training of jurors are echoed. (c) 2013 APA, all rights reserved
In search of the emotional face: anger versus happiness superiority in visual search.
Savage, Ruth A; Lipp, Ottmar V; Craig, Belinda M; Becker, Stefanie I; Horstmann, Gernot
2013-08-01
Previous research has provided inconsistent results regarding visual search for emotional faces, yielding evidence for either anger superiority (i.e., more efficient search for angry faces) or happiness superiority effects (i.e., more efficient search for happy faces), suggesting that these results do not reflect on emotional expression, but on emotion (un-)related low-level perceptual features. The present study investigated possible factors mediating anger/happiness superiority effects; specifically search strategy (fixed vs. variable target search; Experiment 1), stimulus choice (Nimstim database vs. Ekman & Friesen database; Experiments 1 and 2), and emotional intensity (Experiment 3 and 3a). Angry faces were found faster than happy faces regardless of search strategy using faces from the Nimstim database (Experiment 1). By contrast, a happiness superiority effect was evident in Experiment 2 when using faces from the Ekman and Friesen database. Experiment 3 employed angry, happy, and exuberant expressions (Nimstim database) and yielded anger and happiness superiority effects, respectively, highlighting the importance of the choice of stimulus materials. Ratings of the stimulus materials collected in Experiment 3a indicate that differences in perceived emotional intensity, pleasantness, or arousal do not account for differences in search efficiency. Across three studies, the current investigation indicates that prior reports of anger or happiness superiority effects in visual search are likely to reflect on low-level visual features associated with the stimulus materials used, rather than on emotion. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Interactive Exploration for Continuously Expanding Neuron Databases.
Li, Zhongyu; Metaxas, Dimitris N; Lu, Aidong; Zhang, Shaoting
2017-02-15
This paper proposes a novel framework to help biologists explore and analyze neurons based on retrieval of data from neuron morphological databases. In recent years, the continuously expanding neuron databases provide a rich source of information to associate neuronal morphologies with their functional properties. We design a coarse-to-fine framework for efficient and effective data retrieval from large-scale neuron databases. In the coarse-level, for efficiency in large-scale, we employ a binary coding method to compress morphological features into binary codes of tens of bits. Short binary codes allow for real-time similarity searching in Hamming space. Because the neuron databases are continuously expanding, it is inefficient to re-train the binary coding model from scratch when adding new neurons. To solve this problem, we extend binary coding with online updating schemes, which only considers the newly added neurons and update the model on-the-fly, without accessing the whole neuron databases. In the fine-grained level, we introduce domain experts/users in the framework, which can give relevance feedback for the binary coding based retrieval results. This interactive strategy can improve the retrieval performance through re-ranking the above coarse results, where we design a new similarity measure and take the feedback into account. Our framework is validated on more than 17,000 neuron cells, showing promising retrieval accuracy and efficiency. Moreover, we demonstrate its use case in assisting biologists to identify and explore unknown neurons. Copyright © 2017 Elsevier Inc. All rights reserved.
Evaluation of consumer drug information databases.
Choi, J A; Sullivan, J; Pankaskie, M; Brufsky, J
1999-01-01
To evaluate prescription drug information contained in six consumer drug information databases available on CD-ROM, and to make health care professionals aware of the information provided, so that they may appropriately recommend these databases for use by their patients. Observational study of six consumer drug information databases: The Corner Drug Store, Home Medical Advisor, Mayo Clinic Family Pharmacist, Medical Drug Reference, Mosby's Medical Encyclopedia, and PharmAssist. Not applicable. Not applicable. Information on 20 frequently prescribed drugs was evaluated in each database. The databases were ranked using a point-scale system based on primary and secondary assessment criteria. For the primary assessment, 20 categories of information based on those included in the 1998 edition of the USP DI Volume II, Advice for the Patient: Drug Information in Lay Language were evaluated for each of the 20 drugs, and each database could earn up to 400 points (for example, 1 point was awarded if the database mentioned a drug's mechanism of action). For the secondary assessment, the inclusion of 8 additional features that could enhance the utility of the databases was evaluated (for example, 1 point was awarded if the database contained a picture of the drug), and each database could earn up to 8 points. The results of the primary and secondary assessments, listed in order of highest to lowest number of points earned, are as follows: Primary assessment--Mayo Clinic Family Pharmacist (379), Medical Drug Reference (251), PharmAssist (176), Home Medical Advisor (113.5), The Corner Drug Store (98), and Mosby's Medical Encyclopedia (18.5); secondary assessment--The Mayo Clinic Family Pharmacist (8), The Corner Drug Store (5), Mosby's Medical Encyclopedia (5), Home Medical Advisor (4), Medical Drug Reference (4), and PharmAssist (3). The Mayo Clinic Family Pharmacist was the most accurate and complete source of prescription drug information based on the USP DI Volume II and would be an appropriate database for health care professionals to recommend to patients.