structure database icsd: Topics by Science.gov

Sample records for structure database icsd

Inorganic Crystal Structure Database (ICSD)

National Institute of Standards and Technology Data Gateway

SRD 84 FIZ/NIST Inorganic Crystal Structure Database (ICSD) (PC database for purchase) The Inorganic Crystal Structure Database (ICSD) is produced cooperatively by the Fachinformationszentrum Karlsruhe(FIZ) and the National Institute of Standards and Technology (NIST). The ICSD is a comprehensive collection of crystal structure data of inorganic compounds containing more than 140,000 entries and covering the literature from 1915 to the present.
Inorganic Crystal Structure Database (ICSD) and Standardized Data and Crystal Chemical Characterization of Inorganic Structure Types (TYPIX)—Two Tools for Inorganic Chemists and Crystallographers

PubMed Central

Fluck, Ekkehard

1996-01-01

The two databases ICSD and TYPIX are described. ICSD is a comprehensive compilation of crystal structure data of inorganic compounds (about 39 000 entries). TYPIX contains 3600 critically evaluated data sets representative of structure types formed by inorganic compounds. PMID:27805158
Metastable structure of Li13Si4

NASA Astrophysics Data System (ADS)

Gruber, Thomas; Bahmann, Silvia; Kortus, Jens

2016-04-01

The Li13Si4 phase is one out of several crystalline lithium silicide phases, which is a potential electrode material for lithium ion batteries and contains a high theoretical specific capacity. By means of ab initio methods like density functional theory (DFT) many properties such as heat capacity or heat of formation can be calculated. These properties are based on the calculation of phonon frequencies, which contain information about the thermodynamical stability. The current unit cell of "Li13Si4" given in the ICSD database is unstable with respect to DFT calculations. We propose a modified unit cell that is stable in the calculations. The evolutionary algorithm EVO found a structure very similar to the ICSD one with both of them containing metastable lithium positions. Molecular dynamic simulations show a phase transition between both structures where these metastable lithium atoms move. This phase transition is achieved by a very fast one-dimensional lithium diffusion and stabilizes this phase.
A nanocomposite of Au-AgI core/shell dimer as a dual-modality contrast agent for x-ray computed tomography and photoacoustic imaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orza, Anamaria; Wu, Hui; Li, Yuancheng

Purpose: To develop a core/shell nanodimer of gold (core) and silver iodine (shell) as a dual-modal contrast-enhancing agent for biomarker targeted x-ray computed tomography (CT) and photoacoustic imaging (PAI) applications. Methods: The gold and silver iodine core/shell nanodimer (Au/AgICSD) was prepared by fusing together components of gold, silver, and iodine. The physicochemical properties of Au/AgICSD were then characterized using different optical and imaging techniques (e.g., HR- transmission electron microscope, scanning transmission electron microscope, x-ray photoelectron spectroscopy, energy-dispersive x-ray spectroscopy, Z-potential, and UV-vis). The CT and PAI contrast-enhancing effects were tested and then compared with a clinically used CT contrast agentmore » and Au nanoparticles. To confer biocompatibility and the capability for efficient biomarker targeting, the surface of the Au/AgICSD nanodimer was modified with the amphiphilic diblock polymer and then functionalized with transferrin for targeting transferrin receptor that is overexpressed in various cancer cells. Cytotoxicity of the prepared Au/AgICSD nanodimer was also tested with both normal and cancer cell lines. Results: The characterizations of prepared Au/AgI core/shell nanostructure confirmed the formation of Au/AgICSD nanodimers. Au/AgICSD nanodimer is stable in physiological conditions for in vivo applications. Au/AgICSD nanodimer exhibited higher contrast enhancement in both CT and PAI for dual-modality imaging. Moreover, transferrin functionalized Au/AgICSD nanodimer showed specific binding to the tumor cells that have a high level of expression of the transferrin receptor. Conclusions: The developed Au/AgICSD nanodimer can be used as a potential biomarker targeted dual-modal contrast agent for both or combined CT and PAI molecular imaging.« less
Psychometric properties of the Sleep Condition Indicator and Insomnia Severity Index in the evaluation of insomnia disorder.

PubMed

Wong, Mark Lawrence; Lau, Kristy Nga Ting; Espie, Colin A; Luik, Annemarie I; Kyle, Simon D; Lau, Esther Yuet Ying

2017-05-01

The Sleep Condition Indicator (SCI) and Insomnia Severity Index (ISI) are commonly used instruments to assess insomnia. We evaluated their psychometric properties, particularly their discriminant validity against structured clinical interview (according to DSM-5 and ICSD-3), and their concurrent validity with measures of sleep and daytime functioning. A total of 158 young adults, 16% of whom were diagnosed with DSM-5 insomnia disorder and 13% with ICSD-3 Chronic Insomnia by structured interview, completed the ISI and SCI twice in 7-14 days, in addition to measures of sleep and daytime function. The Chinese version of the SCI was validated with good psychometric properties (ICC = 0.882). A cutoff of ≥8 on the ISI, ≤5 on the SCI short form, and ≤21 on the SCI achieved high discriminant validity (AUC > 0.85) in identifying individuals with insomnia based on both DSM-5 and ICSD-3 criteria. The SCI and ISI had comparable associations with subjective (0.18 < r < 0.51) and actigraphic sleep (0.31 < r < 0.43) and daytime functioning (0.34 < r < 0.53). The SCI, SCI short form, and ISI were found to correctly identify individuals with DSM-5- and ICSD-3-defined insomnia disorder. Moreover, they showed good concordance with measures of daytime dysfunction, as well as subjective and objective sleep. The SCI and ISI are recommended for use in clinical and research settings. Copyright © 2016 Elsevier B.V. All rights reserved.
Remarkable features in lattice-parameter ratios of crystals. II. Monoclinic and triclinic crystals.

PubMed

de Gelder, R; Janner, A

2005-06-01

The frequency distributions of monoclinic crystals as a function of the lattice-parameter ratios resemble the corresponding ones of orthorhombic crystals: an exponential component, with more or less pronounced sharp peaks, with in general the most important peak at the ratio value 1. In addition, the distribution as a function of the monoclinic angle beta has a sharp peak at 90 degrees and decreases sensibly at larger angles. Similar behavior is observed for the three triclinic angular parameters alpha, beta and gamma, with characteristic differences between the organic and metal-organic, bio-macromolecular and inorganic crystals, respectively. The general behavior observed for the hexagonal, tetragonal, orthorhombic, monoclinic and triclinic crystals {in the first part of this series [de Gelder & Janner (2005). Acta Cryst. B61, 287-295] and in the present case} is summarized and commented. The data involved represent 366 800 crystals, with lattice parameters taken from the Cambridge Structural Database, CSD (294 400 entries), the Protein Data Bank, PDB (18 800 entries), and the Inorganic Crystal Structure Database, ICSD (53 600 entries). A new general structural principle is suggested.
Reactivity of 12-tungstophosphoric acid and its inhibitor potency toward Na+/K+-ATPase: A combined 31P NMR study, ab initio calculations and crystallographic analysis.

PubMed

Bošnjaković-Pavlović, Nada; Bajuk-Bogdanović, Danica; Zakrzewska, Joanna; Yan, Zeyin; Holclajtner-Antunović, Ivanka; Gillet, Jean-Michel; Spasojević-de Biré, Anne

2017-11-01

Influence of 12-tungstophosphoric acid (WPA) on conversion of adenosine triphosphate (ATP) to adenosine diphosphate (ADP) in the presence of Na + /K + -ATPase was monitored by 31 P NMR spectroscopy. It was shown that WPA exhibits inhibitory effect on Na + /K + -ATPase activity. In order to study WPA reactivity and intermolecular interactions between WPA oxygen atoms and different proton donor types (D=O, N, C), we have considered data for WPA based compounds from the Cambridge Structural Database (CSD), the Crystallographic Open Database (COD) and the Inorganic Crystal Structure Database (ICSD). Binding properties of Keggin's anion in biological systems are illustrated using Protein Data Bank (PDB). This work constitutes the first determination of theoretical Bader charges on polyoxotungstate compound via the Atom In Molecule theory. An analysis of electrostatic potential maps at the molecular surface and charge of WPA, resulting from DFT calculations, suggests that the preferred protonation site corresponds to WPA bridging oxygen. These results enlightened WPA chemical reactivity and its potential biological applications such as the inhibition of the ATPase activity. Copyright © 2017 Elsevier Inc. All rights reserved.
High-throughput screening for thermoelectric sulphides by using crystal structure features as descriptors

NASA Astrophysics Data System (ADS)

Zhang, Ruizhi; Du, Baoli; Chen, Kan; Reece, Mike; Materials Research Insititute Team

With the increasing computational power and reliable databases, high-throughput screening is playing a more and more important role in the search of new thermoelectric materials. Rather than the well established density functional theory (DFT) calculation based methods, we propose an alternative approach to screen for new TE materials: using crystal structural features as 'descriptors'. We show that a non-distorted transition metal sulphide polyhedral network can be a good descriptor for high power factor according to crystal filed theory. By using Cu/S containing compounds as an example, 1600+ Cu/S containing entries in the Inorganic Crystal Structure Database (ICSD) were screened, and of those 84 phases are identified as promising thermoelectric materials. The screening results are validated by both electronic structure calculations and experimental results from the literature. We also fabricated some new compounds to test our screening results. Another advantage of using crystal structure features as descriptors is that we can easily establish structural relationships between the identified phases. Based on this, two material design approaches are discussed: 1) High-pressure synthesis of metastable phase; 2) In-situ 2-phase composites with coherent interface. This work was supported by a Marie Curie International Incoming Fellowship of the European Community Human Potential Program.
On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets

PubMed Central

Kusne, Aaron Gilad; Gao, Tieren; Mehta, Apurva; Ke, Liqin; Nguyen, Manh Cuong; Ho, Kai-Ming; Antropov, Vladimir; Wang, Cai-Zhuang; Kramer, Matthew J.; Long, Christian; Takeuchi, Ichiro

2014-01-01

Advanced materials characterization techniques with ever-growing data acquisition speed and storage capabilities represent a challenge in modern materials science, and new procedures to quickly assess and analyze the data are needed. Machine learning approaches are effective in reducing the complexity of data and rapidly homing in on the underlying trend in multi-dimensional data. Here, we show that by employing an algorithm called the mean shift theory to a large amount of diffraction data in high-throughput experimentation, one can streamline the process of delineating the structural evolution across compositional variations mapped on combinatorial libraries with minimal computational cost. Data collected at a synchrotron beamline are analyzed on the fly, and by integrating experimental data with the inorganic crystal structure database (ICSD), we can substantially enhance the accuracy in classifying the structural phases across ternary phase spaces. We have used this approach to identify a novel magnetic phase with enhanced magnetic anisotropy which is a candidate for rare-earth free permanent magnet. PMID:25220062
AFLOW-SYM: platform for the complete, automatic and self-consistent symmetry analysis of crystals.

PubMed

Hicks, David; Oses, Corey; Gossett, Eric; Gomez, Geena; Taylor, Richard H; Toher, Cormac; Mehl, Michael J; Levy, Ohad; Curtarolo, Stefano

2018-05-01

Determination of the symmetry profile of structures is a persistent challenge in materials science. Results often vary amongst standard packages, hindering autonomous materials development by requiring continuous user attention and educated guesses. This article presents a robust procedure for evaluating the complete suite of symmetry properties, featuring various representations for the point, factor and space groups, site symmetries and Wyckoff positions. The protocol determines a system-specific mapping tolerance that yields symmetry operations entirely commensurate with fundamental crystallographic principles. The self-consistent tolerance characterizes the effective spatial resolution of the reported atomic positions. The approach is compared with the most used programs and is successfully validated against the space-group information provided for over 54 000 entries in the Inorganic Crystal Structure Database (ICSD). Subsequently, a complete symmetry analysis is applied to all 1.7+ million entries of the AFLOW data repository. The AFLOW-SYM package has been implemented in, and made available for, public use through the automated ab initio framework AFLOW.
Nanodosimetry of (125)I Auger electrons.

PubMed

Bantsar, Aliaksandr; Pszona, Stanislaw

2012-12-01

The nanodosimetric description of the radiation action of Auger electrons on nitrogen targets of nanometric size is presented. Experimental microdosimetry at nanometer scale for Auger electrons has been accomplished with the set-up called Jet Counter. This consists of a pulse-operated valve which injects an expanding nitrogen jet into an interaction chamber where a gaseous sensitive volume of cylindrical shape is created. The ionization cluster size distributions (ICSD) created by Auger electrons emitted by (125)I while crossing a nanometer-sized volume have been measured. The ICSD for the sensitive volumes corresponding to 3 and 12 nm in diameter (in unit density 1 g/cm(3)) irradiated by electrons emitted by a (125)I source were collected and compared with the corresponding Monte Carlo (MC) simulation. The preliminary results of the experiments with Auger electrons of (125)I interacting with a nitrogen jet having nanometric size comparable to a deoxyribonucleic acid (DNA) and nucleosome, showing the discrete spectrum of ICSD with extended cluster size, are described. The presented paper describes for the first time the nanodosimetric experiments with Auger electrons emitted by (125)I. A set of the new descriptors of the radiation quality describing the radiation effect at nanometer level is proposed. The ICSD were determined for the first time for an Auger emitter of (125)I.
Cross-cultural and comparative epidemiology of insomnia: the Diagnostic and statistical manual (DSM), International classification of diseases (ICD) and International classification of sleep disorders (ICSD).

PubMed

Chung, Ka-Fai; Yeung, Wing-Fai; Ho, Fiona Yan-Yee; Yung, Kam-Ping; Yu, Yee-Man; Kwok, Chi-Wa

2015-04-01

To compare the prevalence of insomnia according to symptoms, quantitative criteria, and Diagnostic and Statistical Manual of Mental Disorders, 4th and 5th Edition (DSM-IV and DSM-5), International Classification of Diseases, 10th Revision (ICD-10), and International Classification of Sleep Disorders, 2nd Edition (ICSD-2), and to compare the prevalence of insomnia disorder between Hong Kong and the United States by adopting a similar methodology used by the America Insomnia Survey (AIS). Population-based epidemiological survey respondents (n = 2011) completed the Brief Insomnia Questionnaire (BIQ), a validated scale generating DSM-IV, DSM-5, ICD-10, and ICSD-2 insomnia disorder. The weighted prevalence of difficulty falling asleep, difficulty staying asleep, waking up too early, and non-restorative sleep that occurred ≥3 days per week was 14.0%, 28.3%, 32.1%, and 39.9%, respectively. When quantitative criteria were included, the prevalence dropped the most from 39.9% to 8.4% for non-restorative sleep, and the least from 14.0% to 12.9% for difficulty falling asleep. The weighted prevalence of DSM-IV, ICD-10, ICSD-2, and any of the three insomnia disorders was 22.1%, 4.7%, 15.1%, and 22.1%, respectively; for DSM-5 insomnia disorder, it was 10.8%. Compared with 22.1%, 3.9%, and 14.7% for DSM-IV, ICD-10, and ICSD-2 in the AIS, cross-cultural difference in the prevalence of insomnia disorder is less than what is expected. The prevalence is reduced by half from DSM-IV to DSM-5. ICD-10 insomnia disorder has the lowest prevalence, perhaps because excessive concern and preoccupation, one of its diagnostic criteria, is not always present in people with insomnia. Copyright © 2014 Elsevier B.V. All rights reserved.
Epidemiological and clinical relevance of insomnia diagnosis algorithms according to the DSM-IV and the International Classification of Sleep Disorders (ICSD).

PubMed

Ohayon, Maurice M; Reynolds, Charles F

2009-10-01

Although the epidemiology of insomnia in the general population has received considerable attention in the past 20 years, few studies have investigated the prevalence of insomnia using operational definitions such as those set forth in the ICSD and DSM-IV, specifying what proportion of respondents satisfied the criteria to reach a diagnosis of insomnia disorder. This is a cross-sectional study involving 25,579 individuals aged 15 years and over representative of the general population of France, the United Kingdom, Germany, Italy, Portugal, Spain and Finland. The participants were interviewed on sleep habits and disorders managed by the Sleep-EVAL expert system using DSM-IV and ICSD classifications. At the complaint level, too short sleep (20.2%), light sleep (16.6%), and global sleep dissatisfaction (8.2%) were reported by 37% of the subjects. At the symptom level (difficulty initiating or maintaining sleep and non-restorative sleep at least 3 nights per week), 34.5% of the sample reported at least one of them. At the criterion level, (symptoms+daytime consequences), 9.8% of the total sample reported having them. At the diagnostic level, 6.6% satisfied the DSM-IV requirement for positive and differential diagnosis. However, many respondents failed to meet diagnostic criteria for duration, frequency and severity in the two classifications, suggesting that multidimensional measures are needed. A significant proportion of the population with sleep complaints do not fit into DSM-IV and ICSD classifications. Further efforts are needed to identify diagnostic criteria and dimensional measures that will lead to insomnia diagnoses and thus provide a more reliable, valid and clinically relevant classification.
Narcolepsy with and without cataplexy, idiopathic hypersomnia with and without long sleep time: a cluster analysis.

PubMed

Šonka, Karel; Šusta, Marek; Billiard, Michel

2015-02-01

The successive editions of the International Classification of Sleep Disorders (ICSD) reflect the evolution of the concepts of various sleep disorders. This is particularly the case for central disorders of hypersomnolence, with continuous changes in terminology and divisions of narcolepsy, idiopathic hypersomnia, and recurrent hypersomnia. According to the ICSD 2nd Edition (ICSD-2), narcolepsy with cataplexy (NwithC), narcolepsy without cataplexy (Nw/oC), idiopathic hypersomnia with long sleep time (IHwithLST), and idiopathic hypersomnia without long sleep time (IHw/oLST) are four, well-defined hypersomnias of central origin. However, in the absence of biological markers, doubts have been raised as to the relevance of a division of idiopathic hypersomnia into two forms, and it is not yet clear whether Nw/oC and IHw/oLST are two distinct entities. With this in mind, it was decided to empirically review the ICSD-2 classification by using a hierarchical cluster analysis to see whether this division has some relevance, even though the terms "with long sleep time" and "without long sleep time" are inappropriate. The cluster analysis differentiated three main clusters: Cluster 1, "combined monosymptomatic hypersomnia/narcolepsy type 2" (people initially diagnosed with IHw/oLST and Nw/oC); Cluster 2 "polysymptomatic hypersomnia" (people initially diagnosed with IHwithLST); and Cluster 3, narcolepsy type 1 (people initially diagnosed with NwithC). Cluster analysis confirmed that narcolepsy type 1 and polysymptomatic hypersomnia are independent sleep disorders. People who were initially diagnosed with Nw/oC and IHw/oLST formed a single cluster, referred to as "combined monosymptomatic hypersomnia/narcolepsy type 2." Copyright © 2014 Elsevier B.V. All rights reserved.
Effects of diagnosis on treatment recommendations in chronic insomnia--a report from the APA/NIMH DSM-IV field trial.

PubMed

Buysse, D J; Reynolds, C F; Kupfer, D J; Thorpy, M J; Bixler, E; Kales, A; Manfredi, R; Vgontzas, A; Stepanski, E; Roth, T; Hauri, P; Stapf, D

1997-07-01

The objective of this study was to determine whether sleep specialists and nonspecialists recommend different treatments for different insomnia diagnoses according to two different diagnostic classifications. Two hundred sixteen patients with chronic insomnia at five sites were each interviewed by two clinicians: one sleep specialist and one nonsleep specialist. All interviewers indicated diagnoses using the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV); sleep specialists also indicated diagnoses according to the International Classification for Sleep Disorders (ICSD). Interviewers then indicated how strongly they would recommend each item in a standard list of treatment and diagnostic interventions for each patient. We examined differences in treatment recommendations among the six most common DSM-IV diagnoses assigned by sleep specialists at different sites (n = 192), among the six most common ICSD diagnoses assigned by sleep specialists at different sites (n = 153), and among the six most common DSM-IV diagnoses assigned by nonspecialists at different sites (n = 186). In each analysis, specific treatment and polysomnography recommendations differed significantly for different diagnoses, using either DSM-IV or ICSD criteria. Conversely, different diagnoses were associated with different rank orderings of specific treatment and diagnostic recommendations. Sleep specialist and nonspecialist interviewers each distinguished treatment recommendations among different diagnoses, but in general, nonspecialists more strongly recommended medications and relaxation treatments. Significant site-related differences in treatment recommendations also emerged. Differences in treatment recommendations support the distinction between different DSM-IV and ICSD diagnoses, although they do not provide formal validation. Site-related differences suggest a lack of consensus in how these disorders are conceptualized and treated.
The EPOS Vision for the Open Science Cloud

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Harrison, Matt; Cocco, Massimo

2016-04-01

Cloud computing offers dynamic elastic scalability for data processing on demand. For much research activity, demand for computing is uneven over time and so CLOUD computing offers both cost-effectiveness and capacity advantages. However, as reported repeatedly by the EC Cloud Expert Group, there are barriers to the uptake of Cloud Computing: (1) security and privacy; (2) interoperability (avoidance of lock-in); (3) lack of appropriate systems development environments for application programmers to characterise their applications to allow CLOUD middleware to optimize their deployment and execution. From CERN, the Helix-Nebula group has proposed the architecture for the European Open Science Cloud. They are discussing with other e-Infrastructure groups such as EGI (GRIDs), EUDAT (data curation), AARC (network authentication and authorisation) and also with the EIROFORUM group of 'international treaty' RIs (Research Infrastructures) and the ESFRI (European Strategic Forum for Research Infrastructures) RIs including EPOS. Many of these RIs are either e-RIs (electronic-RIs) or have an e-RI interface for access and use. The EPOS architecture is centred on a portal: ICS (Integrated Core Services). The architectural design already allows for access to e-RIs (which may include any or all of data, software, users and resources such as computers or instruments). Those within any one domain (subject area) of EPOS are considered within the TCS (Thematic Core Services). Those outside, or available across multiple domains of EPOS, are ICS-d (Integrated Core Services-Distributed) since the intention is that they will be used by any or all of the TCS via the ICS. Another such service type is CES (Computational Earth Science); effectively an ICS-d specializing in high performance computation, analytics, simulation or visualization offered by a TCS for others to use. Already discussions are underway between EPOS and EGI, EUDAT, AARC and Helix-Nebula for those offerings to be considered as ICS-ds by EPOS.. Provision of access to ICS-Ds from ICS-C concerns several aspects: (a) Technical : it may be more or less difficult to connect and pass from ICS-C to the ICS-d/ CES the 'package' (probably a virtual machine) of data and software; (b) Security/privacy : including passing personal information e.g. related to AAAI (Authentication, authorization, accounting Infrastructure); (c) financial and legal : such as payment, licence conditions; Appropriate interfaces from ICS-C to ICS-d are being designed to accommodate these aspects. The Open Science Cloud is timely because it provides a framework to discuss governance and sustainability for computational resource provision as well as an effective interpretation of federated approach to HPC(High Performance Computing) -HTC (High Throughput Computing). It will be a unique opportunity to share and adopt procurement policies to provide access to computational resources for RIs. The current state of discussions and expected roadmap for the EPOS-Open Science Cloud relationship are presented.
The ICSD-3 and DSM-5 guidelines for diagnosing narcolepsy: clinical relevance and practicality.

PubMed

Ruoff, Chad; Rye, David

2016-07-20

Narcolepsy is a chronic neurological disease manifesting as difficulty with maintaining continuous wake and sleep. Clinical presentation varies but requires excessive daytime sleepiness (EDS) occurring alone or together with features of rapid-eye movement (REM) sleep dissociation (e.g., cataplexy, hypnagogic/hypnopompic hallucinations, sleep paralysis), and disrupted nighttime sleep. Narcolepsy with cataplexy is associated with reductions of cerebrospinal fluid (CSF) hypocretin due to destruction of hypocretin peptide-producing neurons in the hypothalamus in individuals with a specific genetic predisposition. Updated diagnostic criteria include the Diagnostic and Statistical Manual of Mental Disorders Fifth Edition (DSM-5) and International Classification of Sleep Disorders Third Edition (ICSD-3). DSM-5 criteria require EDS in association with any one of the following: (1) cataplexy; (2) CSF hypocretin deficiency; (3) REM sleep latency ≤15 minutes on nocturnal polysomnography (PSG); or (4) mean sleep latency ≤8 minutes on multiple sleep latency testing (MSLT) with ≥2 sleep-onset REM-sleep periods (SOREMPs). ICSD-3 relies more upon objective data in addition to EDS, somewhat complicating the diagnostic criteria: 1) cataplexy and either positive MSLT/PSG findings or CSF hypocretin deficiency; (2) MSLT criteria similar to DSM-5 except that a SOREMP on PSG may count as one of the SOREMPs required on MSLT; and (3) distinct division of narcolepsy into type 1, which requires the presence of cataplexy or documented CSF hypocretin deficiency, and type 2, where cataplexy is absent, and CSF hypocretin levels are either normal or undocumented. We discuss limitations of these criteria such as variability in clinical presentation of cataplexy, particularly when cataplexy may be ambiguous, as well as by age; multiple and/or invasive CSF diagnostic test requirements; and lack of normative diagnostic test data (e.g., MSLT) in certain populations. While ICSD-3 criteria reflect narcolepsy pathophysiology, DSM-5 criteria have greater clinical practicality, suggesting that valid and reliable biomarkers to help standardize narcolepsy diagnosis would be welcomed.
Reliability and Validity of the Brief Insomnia Questionnaire in the America Insomnia Survey

PubMed Central

Kessler, Ronald C.; Coulouvrat, Catherine; Hajak, Goeran; Lakoma, Matthew D.; Roth, Thomas; Sampson, Nancy; Shahly, Victoria; Shillington, Alicia; Stephenson, Judith J.; Walsh, James K.; Zammit, Gary K.

2010-01-01

Study Objectives: To evaluate the reliability and validity of the Brief Insomnia Questionnaire (BIQ), a fully structured questionnaire developed to diagnose insomnia according to hierarchy-free Diagnostic and Statistical Manual, Fourth Edition, Text Revision (DSM-IV-TR), International Classification of Diseases-10 (ICD-10), and research diagnostic criteria/International Classification of Sleep Disorders-2 (RDC/ICSD-2) general criteria without organic exclusions in the America Insomnia Survey (AIS). Design: Probability subsamples of AIS respondents, oversampling BIQ positives, completed short-term test-retest interviews (n = 59) or clinical reappraisal interviews (n = 203) to assess BIQ reliability and validity. Setting: The AIS is a large (n = 10,094) epidemiologic survey of the prevalence and correlates of insomnia. Participants: Adult subscribers to a national managed healthcare plan. Intervention: None Measurements and Results: BIQ test-retest correlations were 0.47-0.94 for nature of the sleep problems (initiation, maintenance, nonrestorative sleep [NRS]), 0.72-0.95 for problem frequency, 0.66-0.88 for daytime impairment/distress, and 0.62 for duration of sleep. Good individual-level concordance was found between BIQ diagnoses and diagnoses based on expert interviews for meeting hierarchy-free inclusion criteria for diagnoses in any of the diagnostic systems, with area under the receiver operating characteristic curve (AUC, a measure of classification accuracy insensitive to disorder prevalence) of 0.86 for dichotomous classifications. The AUC increased to 0.94 when symptom-level data were added to generate continuous predicted-probability of diagnosis measures. The AUC was lower for dichotomous classifications based on RDC/ICSD-2 (0.68) and ICD-10 (0.70) than for DSM-IV-TR (0.83) criteria but increased consistently when symptom-level data were added to generate continuous predicted-probability measures of RDC/ICSD-2, ICD-10, and DSM-IV-TR diagnoses (0.92-0.95). Conclusions: These results show that the BIQ generates accurate estimates of the prevalence and correlates of hierarchy-free insomnia in the America Insomnia Survey. Citation: Kessler RC; Coulouvrat C; Hajak G; Lakoma MD; Roth T; Sampson N; Shahly V; Shillington A; Stephenson JJ; Walsh JK; Zammit GK. Reliability and validity of the brief insomnia questionnaire in the america insomnia survey. SLEEP 2010;33(11):1539-1549. PMID:21102996
Machine Learning of ABO3 Crystalline Compounds

NASA Astrophysics Data System (ADS)

Gubernatis, J. E.; Balachandran, P. V.; Lookman, T.

We apply two advanced machine learning methods to a database of experimentally known ABO3 materials to predict the existence of possible new perovskite materials and possible new cubic perovskites. Constructing a list of 625 possible new materials from charge conserving combinations of A and B atoms in known stable ABO3 materials, we predict about 440 new perovskites. These new perovskites are predicted most likely to occur when the A and B atoms are a lanthanide or actinide, when the A atom is a alkali, alkali earth, or late transition metal, and a when the B atom is a p-block atom. These results are in basic agreement with the recent materials discovery by substitution analysis of Hautier et al. who data-mined the entire ICSD data base to develop the probability that in any crystal structure atom X could be substituted for by atom Y. The results of our analysis has several points of disagreement with a recent high throughput DFT study of ABO3 crystalline compounds by Emery et al. who predict few, if any, new perovskites whose A and B atoms are both a lanthanide. They also predict far more new cubic perovskites than we do: We predict few, if any, with a high degree of probability. This work was supported by the LDRD DR program of the Los Alamos National Laboratory.
Characterization of REM sleep without atonia in patients with narcolepsy and idiopathic hypersomnia using AASM scoring manual criteria.

PubMed

DelRosso, Lourdes M; Chesson, Andrew L; Hoque, Romy

2013-07-15

The AASM Manual for the Scoring of Sleep and Associated Events (Manual) has provided standardized definitions for tonic and phasic REM sleep without atonia (RSWA). This study used Manual criteria to characterize REM sleep in patients with narcolepsy and idiopathic hypersomnia (IH). A retrospective review of PSG data from ICSD-2 defined patients with narcolepsy or IH, performed by two board certified sleep medicine physicians. Data compiled included REM sleep epochs and the presence in REM sleep of epochs scored as sustained muscle activity (tonic), and excessive transient muscle activity (phasic) as defined by Manual criteria. PSG data from 8 narcolepsy patients (mean age: 27.5 years; age range: 11-55) showed mean ± standard deviation values for: total REM sleep epochs 205 ± 46.1; RSWA/ phasic epochs 56.1 ± 25.4; and RSWA/tonic epochs 15.0 ± 10.7. PSG data from 8 IH patients (mean age: 33.1 years; age range: 20-57) showed mean ± standard deviation values of total REM sleep epochs 163.8 ± 67.9; RSWA/phasic epochs 6.2 ± 3.5; and RSWA/tonic epochs 0.2 ± 0.4. Comparison revealed intergroup differences in phasic REM sleep (p < 0.01) and tonic REM sleep (p < 0.01) were significantly increased in narcoleptics compared to IH. Our retrospective analysis showed that RSWA phasic activity and RSWA tonic activity are significantly increased in patients meeting ICSD-2 criteria for narcolepsy compared to patients meeting ICSD-2 criteria for IH. This robust difference, with further validation, could be useful as electrophysiological criteria differentiating the two disorders and understanding the physiological differences.

Discovery of a Red-Emitting Li3RbGe8O18:Mn4+ Phosphor in the Alkali-Germanate System: Structural Determination and Electronic Calculations.

PubMed

Singh, Satendra Pal; Kim, Minseuk; Park, Woon Bae; Lee, Jin-Woong; Sohn, Kee-Sun

2016-10-17

A solid-state combinatorial chemistry approach, which used the A-Ge-O (A = Li, K, Rb) system doped with a small amount of Mn 4+ as an activator, was adopted in a search for novel red-emitting phosphors. The A site may have been composed of either a single alkali metal ion or of a combination of them. This approach led to the discovery of a novel phosphor in the above system with the chemical formula Li 3 RbGe 8 O 18 :Mn 4+ . The crystal structure of this novel phosphor was solved via direct methods, and subsequent Rietveld refinement revealed a trigonal structure in the P3̅1m space group. The discovered phosphor is believed to be novel in the sense that neither the crystal structure nor the chemical formula matches any of the prototype structures available in the crystallographic information database (ICDD or ICSD). The measured photoluminescence intensity that peaked at a wavelength of 667 nm was found to be much higher than the best intensity obtained among all the existing A 2 Ge 4 O 9 (A = Li, K, Rb) compounds in the alkali-germanate system. An ab initio calculation based on density function theory (DFT) was conducted to verify the crystal structure model and compare the calculated value of the optical band gap with the experimental results. The optical band gap obtained from diffuse reflectance measurement (5.26 eV) and DFT calculation (4.64 eV) results were in very good agreement. The emission wavelength of this phosphor that exists in the deep red region of the electromagnetic spectrum may be very useful for increasing the color gamut of LED-based display devices such as ultrahigh-definition television (UHDTV) as per the ITU-R BT.2020-2 recommendations and also for down-converter phosphors that are used in solar-cell applications.
Critical evaluation of the effect of valerian extract on sleep structure and sleep quality.

PubMed

Donath, F; Quispe, S; Diefenbach, K; Maurer, A; Fietze, I; Roots, I

2000-03-01

A carefully designed study assessed the short-term (single dose) and long-term (14 days with multiple dosage) effects of a valerian extract on both objective and subjective sleep parameters. The investigation was performed as a randomised, double-blind, placebo-controlled, cross-over study. Sixteen patients (4 male, 12 female) with previously established psychophysiological insomnia (ICSD-code 1.A.1.), and with a median age of 49 (range: 22 to 55), were included in the study. The main inclusion criteria were reported primary insomnia according to ICSD criteria, which was confirmed by polysomnographic recording, and the absence of acute diseases. During the study, the patients underwent 8 polysomnographic recordings: i.e., 2 recordings (baseline and study night) at each time point at which the short and long-term effects of placebo and valerian were tested. The target variable of the study was sleep efficiency. Other parameters describing objective sleep structure were the usual features of sleep-stage analysis, based on the rules of Rechtschaffen and Kales (1968), and the arousal index (scored according to ASDA criteria, 1992) as a sleep microstructure parameter. Subjective parameters such as sleep quality, morning feeling, daytime performance, subjectively perceived duration of sleep latency, and sleep period time were assessed by means of questionnaires. After a single dose of valerian, no effects on sleep structure and subjective sleep assessment were observed. After multiple-dose treatment, sleep efficiency showed a significant increase for both the placebo and the valerian condition in comparison with baseline polysomnography. We confirmed significant differences between valerian and placebo for parameters describing slow-wave sleep. In comparison with the placebo, slow-wave sleep latency was reduced after administration of valerian (21.3 vs. 13.5 min respectively, p<0.05). The SWS percentage of time in bed (TIB) was increased after long-term valerian treatment, in comparison to baseline (9.8 vs. 8.1% respectively, p<0.05). At the same time point, a tendency for shorter subjective sleep latency, as well as a higher correlation coefficient between subjective and objective sleep latencies, were observed under valerian treatment. Other improvements in sleep structure - such as an increase in REM percentage and a decrease in NREM1 percentage - took place simultaneously under placebo and valerian treatment. A remarkable finding of the study was the extremely low number of adverse events during the valerian treatment periods (3 vs. 18 in the placebo period). In conclusion, treatment with a herbal extract of radix valerianae demonstrated positive effects on sleep structure and sleep perception of insomnia patients, and can therefore be recommended for the treatment of patients with mild psychophysiological insomnia.
Inverse current source density method in two dimensions: inferring neural activation from multielectrode recordings.

PubMed

Łęski, Szymon; Pettersen, Klas H; Tunstall, Beth; Einevoll, Gaute T; Gigg, John; Wójcik, Daniel K

2011-12-01

The recent development of large multielectrode recording arrays has made it affordable for an increasing number of laboratories to record from multiple brain regions simultaneously. The development of analytical tools for array data, however, lags behind these technological advances in hardware. In this paper, we present a method based on forward modeling for estimating current source density from electrophysiological signals recorded on a two-dimensional grid using multi-electrode rectangular arrays. This new method, which we call two-dimensional inverse Current Source Density (iCSD 2D), is based upon and extends our previous one- and three-dimensional techniques. We test several variants of our method, both on surrogate data generated from a collection of Gaussian sources, and on model data from a population of layer 5 neocortical pyramidal neurons. We also apply the method to experimental data from the rat subiculum. The main advantages of the proposed method are the explicit specification of its assumptions, the possibility to include system-specific information as it becomes available, the ability to estimate CSD at the grid boundaries, and lower reconstruction errors when compared to the traditional approach. These features make iCSD 2D a substantial improvement over the approaches used so far and a powerful new tool for the analysis of multielectrode array data. We also provide a free GUI-based MATLAB toolbox to analyze and visualize our test data as well as user datasets.
Insomnia in people with epilepsy: A review of insomnia prevalence, risk factors and associations with epilepsy-related factors.

PubMed

Macêdo, Philippe Joaquim Oliveira Menezes; Oliveira, Pedro Sudbrack de; Foldvary-Schaefer, Nancy; Gomes, Marleide da Mota

2017-09-01

Insomnia is a common sleep complaint in the general population, and sleep loss may be a trigger for epileptic seizures. To conduct a comprehensive review of the literature of insomnia symptoms and insomnia disorder, their prevalence and epilepsy-related risk factors in people with epilepsy (PWE). A PUBMED search was performed for articles indexed to June 2016 involving human subjects, excluding papers in languages other than English, Spanish and Portuguese and case reports. Eligible studies were those using a clear definition of insomnia and reporting quantitative data on prevalence rates and risk factors. The search included the following terms: insomnia, sleep disorder(s), sleep disturbance(s) and sleep-wake in the title and abstract; and epilep* in the title. 425 papers were reviewed and 31 were selected for the final analysis (21 adult and 10 paediatric). Twenty-one studies used a control group. Two reviewer authors independently extracted all data and a third author resolved disagreements. Most studies were hospital-based, cross-sectional and evaluated convenience samples representing highly select populations. Various insomnia inventories were used. Fourteen assessed insomnia (10 in adults, four, children), but only five as primary outcome (none in children). Four evaluated insomnia disorder based on international classification criteria (International Classification of Sleep Disorders - ICSD-2-in 3, and DSM-IV-TR, in 1). In adults, insomnia prevalence was 28.9-51% based on the Insomnia Severity Index ≥15 and 36-74.4% based on DSM-IV-TR or ICSD-2. The prevalence of insomnia in children was 13.1-31.5% using the Sleep Disturbance Scale for Children and 11% based on ICSD-2 diagnostic criteria. Compared to control groups, PWE usually had higher frequencies of insomnia symptoms and disorder. Insomnia was associated with greater impairment in quality of life and higher degree of depressive symptoms in several studies, and was inconsistently related to female gender, poor seizure control and antiepileptic drug polytherapy. In children, insomnia was associated with developmental delay, focal epilepsies and poor seizure control. Insomnia symptoms and insomnia disorder are highly prevalent among PWE based on a limited number of studies with variable inclusion criteria and methodology. Excessive daytime sleepiness (EDS) was not found to be related to insomnia disorder or symptoms, and the exclusion of individuals with EDS may explain the higher frequencies of insomnia found in some studies. Additional investigations are needed given the potential impact of insomnia on seizure control, mood and QOL in PWE. Copyright © 2017. Published by Elsevier B.V.
[A contemporary conception of insomnia syndrome and its treatments in view of International classification of sleep disorders].

PubMed

Poluektov, M G; Tsenteradze, S L

2014-01-01

Insomnia is one of the most common and wide-spread sleep disorders. It includes difficulties of sleep initiation, sustaining and daytime impairment. A condition of cerebral hyperarousal plays the most important role in the genesis of insomnia. Cognitive, electrophysiological and metabolic parameters are correlated with hyperarousal state. According to the International classification of sleep disorders (ICSD-3), insomnia is divided into acute, chronic and unclassified. Treatment of insomnia includes specific and nonspecific approaches. Regardless of the origin of insomnia, sleep hygiene and behavioral therapy remain the methods of choice for the treatment.
Normal Morning Melanin-Concentrating Hormone Levels and No Association with Rapid Eye Movement or Non-Rapid Eye Movement Sleep Parameters in Narcolepsy Type 1 and Type 2.

PubMed

Schrölkamp, Maren; Jennum, Poul J; Gammeltoft, Steen; Holm, Anja; Kornum, Birgitte R; Knudsen, Stine

2017-02-15

Other than hypocretin-1 (HCRT-1) deficiency in narcolepsy type 1 (NT1), the neurochemical imbalance of NT1 and narcolepsy type 2 (NT2) with normal HCRT-1 levels is largely unknown. The neuropeptide melanin-concentrating hormone (MCH) is mainly secreted during sleep and is involved in rapid eye movement (REM) and non-rapid eye movement (NREM) sleep regulation. Hypocretin neurons reciprocally interact with MCH neurons. We hypothesized that altered MCH secretion contributes to the symptoms and sleep abnormalities of narcolepsy and that this is reflected in morning cerebrospinal fluid (CSF) MCH levels, in contrast to previously reported normal evening/afternoon levels. Lumbar CSF and plasma were collected from 07:00 to 10:00 from 57 patients with narcolepsy (subtypes: 47 NT1; 10 NT2) diagnosed according to International Classification of Sleep Disorders, Third Edition (ICSD-3) and 20 healthy controls. HCRT-1 and MCH levels were quantified by radioimmunoassay and correlated with clinical symptoms, polysomnography (PSG), and Multiple Sleep Latency Test (MSLT) parameters. CSF and plasma MCH levels were not significantly different between narcolepsy patients regardless of ICSD-3 subtype, HCRT-1 levels, or compared to controls. CSF MCH and HCRT-1 levels were not significantly correlated. Multivariate regression models of CSF MCH levels, age, sex, and body mass index predicting clinical, PSG, and MSLT parameters did not reveal any significant associations to CSF MCH levels. Our study shows that MCH levels in CSF collected in the morning are normal in narcolepsy and not associated with the clinical symptoms, REM sleep abnormalities, nor number of muscle movements during REM or NREM sleep of the patients. We conclude that morning lumbar CSF MCH measurement is not an informative diagnostic marker for narcolepsy. © 2017 American Academy of Sleep Medicine
[Insomnia disorder].

PubMed

Ozone, Motohiro; Kuroda, Ayako

2015-06-01

The rate of those who have sleep problems increases due to aging. In Japan, a super-aging society, insomnia is a common disease. It is reported that the ratio of insomniacs over sixty year-old is 29.5 %. The sleep disturbance in the elderly is caused by multi factors, such as physiological, physical, psychosociological, psychiatric, and pharmacological factors. According to the latest diagnostic criteria of sleep disorders, ICSD-3, the concept of primary or secondary insomnia was abolished. Instead of that, insomnia is categorized by the duration of disease, and general doctors can diagnose sleep disorders more easily than the past. However, it is not necessary to consider the pathophysiological mechanism, there is a concern that the clinical level of insomnia treatment might decline in quality.
The EPOS ICT Architecture

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Harrison, Matt; Bailo, Daniele

2016-04-01

The EPOS-PP Project 2010-2014 proposed an architecture and demonstrated feasibility with a prototype. Requirements based on use cases were collected and an inventory of assets (e.g. datasets, software, users, computing resources, equipment/detectors, laboratory services) (RIDE) was developed. The architecture evolved through three stages of refinement with much consultation both with the EPOS community representing EPOS users and participants in geoscience and with the overall ICT community especially those working on research such as the RDA (Research Data Alliance) community. The architecture consists of a central ICS (Integrated Core Services) consisting of a portal and catalog, the latter providing to end-users a 'map' of all EPOS resources (datasets, software, users, computing, equipment/detectors etc.). ICS is extended to ICS-d (distributed ICS) for certain services (such as visualisation software services or Cloud computing resources) and CES (Computational Earth Science) for specific simulation or analytical processing. ICS also communicates with TCS (Thematic Core Services) which represent European-wide portals to national and local assets, resources and services in the various specific domains (e.g. seismology, volcanology, geodesy) of EPOS. The EPOS-IP project 2015-2019 started October 2015. Two work-packages cover the ICT aspects; WP6 involves interaction with the TCS while WP7 concentrates on ICS including interoperation with ICS-d and CES offerings: in short the ICT architecture. Based on the experience and results of EPOS-PP the ICT team held a pre-meeting in July 2015 and set out a project plan. The first major activity involved requirements (re-)collection with use cases and also updating the inventory of assets held by the various TCS in EPOS. The RIDE database of assets is currently being converted to CERIF (Common European Research Information Format - an EU Recommendation to Member States) to provide the basis for the EPOS-IP ICS Catalog. In parallel the ICT team is tracking developments in ICT for relevance to EPOS-IP. In particular, the potential utilisation of e-Is (e-Infrastructures) such as GEANT(network), AARC (security), EGI (GRID computing), EUDAT (data curation), PRACE (High Performance Computing), HELIX-Nebula / Open Science Cloud (Cloud computing) are being assessed. Similarly relationships to other e-RIs (e-Research Infrastructures) such as ENVRI+, EXCELERATE and other ESFRI (European Strategic Forum for Research Infrastructures) projects are developed to share experience and technology and to promote interoperability. EPOS ICT team members are also involved in VRE4EIC, a project developing a reference architecture and component software services for a Virtual Research Environment to be superimposed on EPOS-ICS. The challenge which is being tackled now is therefore to keep consistency and interoperability among the different modules, initiatives and actors which participate to the process of running the EPOS platform. It implies both a continuous update about IT aspects of mentioned initiatives and a refinement of the e-architecture designed so far. One major aspect of EPOS-IP is the ICT support for legalistic, financial and governance aspects of the EPOS ERIC to be initiated during EPOS-IP. This implies a sophisticated AAAI (Authentication, authorization, accounting infrastructure) with consistency throughout the software, communications and data stack.
Behavioural and Cognitive-Behavioural Treatments of Parasomnias

PubMed Central

Galbiati, Andrea; Rinaldi, Fabrizio; Giora, Enrico; Ferini-Strambi, Luigi; Marelli, Sara

2015-01-01

Parasomnias are unpleasant or undesirable behaviours or experiences that occur predominantly during or within close proximity to sleep. Pharmacological treatments of parasomnias are available, but their efficacy is established only for few disorders. Furthermore, most of these disorders tend spontaneously to remit with development. Nonpharmacological treatments therefore represent valid therapeutic choices. This paper reviews behavioural and cognitive-behavioural managements employed for parasomnias. Referring to the ICSD-3 nosology we consider, respectively, NREM parasomnias, REM parasomnias, and other parasomnias. Although the efficacy of some of these treatments is proved, in other cases their clinical evidence cannot be provided because of the small size of the samples. Due to the rarity of some parasomnias, further multicentric researches are needed in order to offer a more complete account of behavioural and cognitive-behavioural treatments efficacy. PMID:26101458
Normal Morning Melanin-Concentrating Hormone Levels and No Association with Rapid Eye Movement or Non-Rapid Eye Movement Sleep Parameters in Narcolepsy Type 1 and Type 2

PubMed Central

Schrölkamp, Maren; Jennum, Poul J.; Gammeltoft, Steen; Holm, Anja; Kornum, Birgitte R.; Knudsen, Stine

2017-01-01

Study Objectives: Other than hypocretin-1 (HCRT-1) deficiency in narcolepsy type 1 (NT1), the neurochemical imbalance of NT1 and narcolepsy type 2 (NT2) with normal HCRT-1 levels is largely unknown. The neuropeptide melanin-concentrating hormone (MCH) is mainly secreted during sleep and is involved in rapid eye movement (REM) and non-rapid eye movement (NREM) sleep regulation. Hypocretin neurons reciprocally interact with MCH neurons. We hypothesized that altered MCH secretion contributes to the symptoms and sleep abnormalities of narcolepsy and that this is reflected in morning cerebrospinal fluid (CSF) MCH levels, in contrast to previously reported normal evening/afternoon levels. Methods: Lumbar CSF and plasma were collected from 07:00 to 10:00 from 57 patients with narcolepsy (subtypes: 47 NT1; 10 NT2) diagnosed according to International Classification of Sleep Disorders, Third Edition (ICSD-3) and 20 healthy controls. HCRT-1 and MCH levels were quantified by radioimmunoassay and correlated with clinical symptoms, polysomnography (PSG), and Multiple Sleep Latency Test (MSLT) parameters. Results: CSF and plasma MCH levels were not significantly different between narcolepsy patients regardless of ICSD-3 subtype, HCRT-1 levels, or compared to controls. CSF MCH and HCRT-1 levels were not significantly correlated. Multivariate regression models of CSF MCH levels, age, sex, and body mass index predicting clinical, PSG, and MSLT parameters did not reveal any significant associations to CSF MCH levels. Conclusions: Our study shows that MCH levels in CSF collected in the morning are normal in narcolepsy and not associated with the clinical symptoms, REM sleep abnormalities, nor number of muscle movements during REM or NREM sleep of the patients. We conclude that morning lumbar CSF MCH measurement is not an informative diagnostic marker for narcolepsy. Citation: Schrölkamp M, Jennum PJ, Gammeltoft S, Holm A, Kornum BR, Knudsen S. Normal morning melanin-concentrating hormone levels and no association with rapid eye movement or non-rapid eye movement sleep parameters in narcolepsy type 1 and type 2. J Clin Sleep Med. 2017;13(2):235–243. PMID:27855741
[Mini-KiSS--a multimodal group therapy intervention for parents of young children with sleep disorders: a pilot study].

PubMed

Schlarb, Angelika Anita; Brandhorst, Isabel; Hautzinger, Martin

2011-05-01

Sleep disorders in early childhood tend to be chronic and almost always a burden for the parents. This study developed and evaluated a multimodal parent training program for children 0.5 to 4 years of age suffering from sleep disorders (Mini-KiSS). We hypothesized that there would be specific improvements following the structured group training (reduction of sleep problems, improvement of parental well-being). The pilot study consisted of a pre-post test design without control group. Participants were n = 17 parents of children 0.5 to 4 years of age with sleep disorders determined according to the ICSD-II. Each of the six sessions was evaluated, and changes were assessed by sleep diary and CBCL. Behavioral and emotional problems of the child were assessed by CBCL, parental well-being, and SCL-90-R. The results showed high acceptance of Mini-KiSS and satisfactory feasibility. Children showed significant improvements of the sleep disturbances such as nightly awakenings as well as sleeping in parents' bed. Furthermore, improvements were found for children's emotional and behavioral problems and for parental well-being, in particular for the depression scale of the mother. This pilot study shows a high acceptance and good feasibility of the multimodal short-time parent-training program Mini-KiSS. Sleep problems were significantly reduced.
Sleepwalking episodes are preceded by arousal-related activation in the cingulate motor area: EEG current density imaging.

PubMed

Januszko, Piotr; Niemcewicz, Szymon; Gajda, Tomasz; Wołyńczyk-Gmaj, Dorota; Piotrowska, Anna Justyna; Gmaj, Bartłomiej; Piotrowski, Tadeusz; Szelenberger, Waldemar

2016-01-01

To investigate local arousal fluctuations in adults who received ICSD-2 diagnosis of somnambulism. EEG neuroimaging (eLORETA) was utilized to compare current density distribution for 4s epochs immediately preceding sleepwalking episode (from -4.0 s to 0 s) to the distribution during earlier 4s epochs (from -8.0 s to -4.0 s) in 20 EEG segments from 15 patients. Comparisons between eLORETA images revealed significant (t>4.52; p<0.05) brain activations before onset of sleepwalking, with greater current density within beta 3 frequency range (24-30 Hz) in Brodmann areas 33 and 24. Sleepwalking motor events are associated with arousal-related activation of cingulate motor area. These results support the notion of blurred boundaries between wakefulness and NREM sleep in sleepwalking. Copyright © 2015 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Challenges in integrating multidisciplinary data into a single e-infrastructure

NASA Astrophysics Data System (ADS)

Atakan, Kuvvet; Jeffery, Keith G.; Bailo, Daniele; Harrison, Matthew

2015-04-01

The European Plate Observing System (EPOS) aims to create a pan-European infrastructure for solid Earth science to support a safe and sustainable society. The mission of EPOS is to monitor and understand the dynamic and complex Earth system by relying on new e-science opportunities and integrating diverse and advanced Research Infrastructures in Europe for solid Earth Science. EPOS will enable innovative multidisciplinary research for a better understanding of the Earth's physical and chemical processes that control earthquakes, volcanic eruptions, ground instability and tsunami as well as the processes driving tectonics and Earth's surface dynamics. EPOS will improve our ability to better manage the use of the subsurface of the Earth. Through integration of data, models and facilities EPOS will allow the Earth Science community to make a step change in developing new concepts and tools for key answers to scientific and socio-economic questions concerning geo-hazards and geo-resources as well as Earth sciences applications to the environment and to human welfare. EPOS is now getting into its Implementation Phase (EPOS-IP). One of the main challenges during the implementation phase is the integration of multidisciplinary data into a single e-infrastructure. Multidisciplinary data are organized and governed by the Thematic Core Services (TCS) and are driven by various scientific communities encompassing a wide spectrum of Earth science disciplines. TCS data, data products and services will be integrated into a platform "the ICS system" that will ensure their interoperability and access to these services by the scientific community as well as other users within the society. This requires dedicated tasks for interactions with the various TCS-WPs, as well as the various distributed ICS (ICS-Ds), such as High Performance Computing (HPC) facilities, large scale data storage facilities, complex processing and visualization tools etc. Computational Earth Science (CES) services are identified as a transversal activity and as such need to be harmonized and provided within the ICS. In order to develop a metadata catalogue and the ICS system, the content from the entire spectrum of services included in TCS, ICS-Ds as well as CES activities, need to be organized in a systematic manner taking into account global and European IT-standards, while complying with the user needs and data provider requirements.
GlycomeDB – integration of open-access carbohydrate structure databases

PubMed Central

Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm

2008-01-01

Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830
ProBiS-database: precalculated binding site similarities and local pairwise alignments of PDB structures.

PubMed

Konc, Janez; Cesnik, Tomo; Konc, Joanna Trykowska; Penca, Matej; Janežič, Dušanka

2012-02-27

ProBiS-Database is a searchable repository of precalculated local structural alignments in proteins detected by the ProBiS algorithm in the Protein Data Bank. Identification of functionally important binding regions of the protein is facilitated by structural similarity scores mapped to the query protein structure. PDB structures that have been aligned with a query protein may be rapidly retrieved from the ProBiS-Database, which is thus able to generate hypotheses concerning the roles of uncharacterized proteins. Presented with uncharacterized protein structure, ProBiS-Database can discern relationships between such a query protein and other better known proteins in the PDB. Fast access and a user-friendly graphical interface promote easy exploration of this database of over 420 million local structural alignments. The ProBiS-Database is updated weekly and is freely available online at http://probis.cmm.ki.si/database.
Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases. 4. Examples of Discovery-Based Learning Using the Complete Cambridge Structural Database

ERIC Educational Resources Information Center

Battle, Gary M.; Allen, Frank H.; Ferrence, Gregory M.

2011-01-01

Parts 1 and 2 of this series described the educational value of experimental three-dimensional (3D) chemical structures determined by X-ray crystallography and retrieved from the crystallographic databases. In part 1, we described the information content of the Cambridge Structural Database (CSD) and discussed a representative teaching subset of…
Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases. 3. The Cambridge Structural Database System: Information Content and Access Software in Educational Applications

ERIC Educational Resources Information Center

Battle, Gary M.; Allen, Frank H.; Ferrence, Gregory M.

2011-01-01

Parts 1 and 2 of this series described the educational value of experimental three-dimensional (3D) chemical structures determined by X-ray crystallography and retrieved from the crystallographic databases. In part 1, we described the information content of the Cambridge Structural Database (CSD) and discussed a representative teaching subset of…
Construction of crystal structure prototype database: methods and applications.

PubMed

Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

2017-04-26

Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.
Construction of crystal structure prototype database: methods and applications

NASA Astrophysics Data System (ADS)

Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

2017-04-01

Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.
Structural Ceramics Database

National Institute of Standards and Technology Data Gateway

SRD 30 NIST Structural Ceramics Database (Web, free access) The NIST Structural Ceramics Database (WebSCD) provides evaluated materials property data for a wide range of advanced ceramics known variously as structural ceramics, engineering ceramics, and fine ceramics.

mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

PubMed

Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

2018-05-21

With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.
Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

PubMed

Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

2016-08-05

Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.
URS DataBase: universe of RNA structures and their motifs.

PubMed

Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail

2016-01-01

The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA-protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification.Database URL: http://server3.lpm.org.ru/urs/. © The Author(s) 2016. Published by Oxford University Press.
URS DataBase: universe of RNA structures and their motifs

PubMed Central

Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail

2016-01-01

The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA–protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification. Database URL: http://server3.lpm.org.ru/urs/ PMID:27242032
A case study for a digital seabed database: Bohai Sea engineering geology database

NASA Astrophysics Data System (ADS)

Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang

2006-07-01

This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.
Comparison of approximations in density functional theory calculations: Energetics and structure of binary oxides

NASA Astrophysics Data System (ADS)

Hinuma, Yoyo; Hayashi, Hiroyuki; Kumagai, Yu; Tanaka, Isao; Oba, Fumiyasu

2017-09-01

High-throughput first-principles calculations based on density functional theory (DFT) are a powerful tool in data-oriented materials research. The choice of approximation to the exchange-correlation functional is crucial as it strongly affects the accuracy of DFT calculations. This study compares performance of seven approximations, six of which are based on Perdew-Burke-Ernzerhof (PBE) generalized gradient approximation (GGA) with and without Hubbard U and van der Waals corrections (PBE, PBE+U, PBED3, PBED3+U, PBEsol, and PBEsol+U), and the strongly constrained and appropriately normed (SCAN) meta-GGA on the energetics and crystal structure of elementary substances and binary oxides. For the latter, only those with closed-shell electronic structures are considered, examples of which include C u2O , A g2O , MgO, ZnO, CdO, SnO, PbO, A l2O3 , G a2O3 , I n2O3 , L a2O3 , B i2O3 , Si O2 , Sn O2 , Pb O2 , Ti O2 , Zr O2 , Hf O2 , V2O5 , N b2O5 , T a2O5 , Mo O3 , and W O3 . Prototype crystal structures are selected from the Inorganic Crystal Structure Database (ICSD) and cation substitution is used to make a set of existing and hypothetical oxides. Two indices are proposed to quantify the extent of lattice and internal coordinate relaxation during a calculation. The former is based on the second invariant and determinant of the transformation matrix of basis vectors from before relaxation to after relaxation, and the latter is derived from shifts of internal coordinates of atoms in the unit cell. PBED3, PBEsol, and SCAN reproduce experimental lattice parameters of elementary substances and oxides well with few outliers. Notably, PBEsol and SCAN predict the lattice parameters of low dimensional structures comparably well with PBED3, even though these two functionals do not explicitly treat van der Waals interactions. SCAN gives formation enthalpies and Gibbs free energies closest to experimental data, with mean errors (MEs) of 0.01 and -0.04 eV, respectively, and root-mean-square errors (RMSEs) are both 0.07 eV. In contrast, all GGAs including those with Hubbard U and van der Waals corrections give 0.1 to 0.2 eV MEs and at least 0.11 eV RMSEs. Phonon contributions of solid phases to the formation enthalpies and Gibbs free energies are estimated to be small at less than ˜0.1 eV/atom within the quasiharmonic approximation. The same crystal structure appears as the lowest energy polymorph with different approximations in most of the investigated binary oxides. However, there are some systems where the choice of approximation significantly affects energy differences between polymorphs, or even the order of stability between phases. SCAN is the most reasonable regarding relative energies between polymorphs. The calculated transition pressure between polymorphs of ZnO and Sn O2 is closest to experimental values when PBED3, PBEsol (also PBED3+U and PBEsol+U for ZnO), and SCAN are employed. In summary, SCAN appears to be the best choice among the seven approximations based on the analysis of the energetics and crystal structure of binary oxides, while PBEsol is the best among the GGAs considered and shows a comparably good performance with SCAN for many cases. The use of PBEsol+U alongside PBEsol is also a reasonable choice, given that U corrections are required for several materials to qualitatively reproduce their electronic structures.
A novel approach: chemical relational databases, and the role of the ISSCAN database on assessing chemical carcinogenicity.

PubMed

Benigni, Romualdo; Bossa, Cecilia; Richard, Ann M; Yang, Chihae

2008-01-01

Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did not contain chemical structures. Concepts and technologies originated from the structure-activity relationships science have provided powerful tools to create new types of databases, where the effective linkage of chemical toxicity with chemical structure can facilitate and greatly enhance data gathering and hypothesis generation, by permitting: a) exploration across both chemical and biological domains; and b) structure-searchability through the data. This paper reviews the main public databases, together with the progress in the field of chemical relational databases, and presents the ISSCAN database on experimental chemical carcinogens.
Nightmare Themes: An Online Study of Most Recent Nightmares and Childhood Nightmares.

PubMed

Schredl, Michael; Göritz, Anja S

2018-03-15

Even though the common diagnostic criteria (ICSD-3, DSM-5) acknowledge that nightmares do not only contain anxiety/fear (definition of the ICD-10) but also other emotions such as grief, disgust, and anger, the definition of a nightmare still focuses on threats to survival, security, or physical integrity. However, empirical studies on nightmare content in larger samples are scarce. The current study elicited 1,216 of the most recent nightmares including childhood nightmares of a population-based sample. The findings show that nightmares encompass a diversity of different topics, being chased, physical aggression, including death/injury of close persons. Infrequent themes like being the aggressor and suicide are of special interest as they might be related to waking-life psychopathology. The variety of nightmare topics clearly indicate that current definitions of nightmare content are too narrow. Future studies should look into nightmare content of persons in whom nightmare disorder has been diagnosed. © 2018 American Academy of Sleep Medicine.
Arousals and aircraft noise - environmental disorders of sleep and health in terms of sleep medicine.

PubMed

Raschke, F

2004-01-01

World wide rules for sleep staging originate to 1967. Since then many investigations aimed to give numbers for the degree of sleep disturbances due to air traffic noise. But the variables used, such as the amount of relative sleep stages, total sleep time, or sleep efficiency, could not explain impairment in health and performance sufficiently. The beginning of the eighties has given new insight into the restorative functions of sleep, according to sleep fragmentation by micro-arousals. These are originating in autonomous dysfunctions during sleep, leading to non-restorative sleep. Environmentally related sleep disturbances are described, EEG and vegetative (micro)-arousals, and the actual knowledge in sleep medicine is given in terms of the international classification of sleep disorders (ICSD). The effects on health, and disturbed performance capacity during the day are shown by self ratings of 160 patients. Elevated metabolic rate caused by micro-arousal and/or insomnia, may play an additional role in health impairment.
Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods.

PubMed

Legrain, Fleur; Carrete, Jesús; van Roekeghem, Ambroise; Madsen, Georg K H; Mingo, Natalio

2018-01-18

Machine learning (ML) is increasingly becoming a helpful tool in the search for novel functional compounds. Here we use classification via random forests to predict the stability of half-Heusler (HH) compounds, using only experimentally reported compounds as a training set. Cross-validation yields an excellent agreement between the fraction of compounds classified as stable and the actual fraction of truly stable compounds in the ICSD. The ML model is then employed to screen 71 178 different 1:1:1 compositions, yielding 481 likely stable candidates. The predicted stability of HH compounds from three previous high-throughput ab initio studies is critically analyzed from the perspective of the alternative ML approach. The incomplete consistency among the three separate ab initio studies and between them and the ML predictions suggests that additional factors beyond those considered by ab initio phase stability calculations might be determinant to the stability of the compounds. Such factors can include configurational entropies and quasiharmonic contributions.
An international study on sleep disorders in the general population: methodological aspects of the use of the Sleep-EVAL system.

PubMed

Ohayon, M M; Guilleminault, C; Paiva, T; Priest, R G; Rapoport, D M; Sagales, T; Smirne, S; Zulley, J

1997-12-01

The comparability among epidemiological surveys of sleep disorders has been encumbered because of the array of methodologies used from study to study. The present international initiative addresses this limitation. Many such studies using the exact same methodology are being completed in six European countries (France, the United Kingdom, Germany, Italy, Portugal, and Spain), two Canadian cities (metropolitan areas of Montreal and Toronto), New York State, and the city of San Francisco. These surveys have been undertaken with the aim of documenting the prevalence of sleep disorders in the general population according to criteria of the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) and the International Classification of Sleep Disorders (ICSD-90). Data are gathered over the telephone by lay interviewers using the Sleep-EVAL expert system. This paper describes the methodology involved in the realization of these studies. Sample design and selection procedures are discussed.
The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection.

PubMed

Fernández-Suárez, Xosé M; Rigden, Daniel J; Galperin, Michael Y

2014-01-01

The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI's MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
MMDB: Entrez’s 3D-structure database

PubMed Central

Wang, Yanli; Anderson, John B.; Chen, Jie; Geer, Lewis Y.; He, Siqian; Hurwitz, David I.; Liebert, Cynthia A.; Madej, Thomas; Marchler, Gabriele H.; Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Song, James S.; Thiessen, Paul A.; Yamashita, Roxanne A.; Bryant, Stephen H.

2002-01-01

Three-dimensional structures are now known within many protein families and it is quite likely, in searching a sequence database, that one will encounter a homolog with known structure. The goal of Entrez’s 3D-structure database is to make this information, and the functional annotation it can provide, easily accessible to molecular biologists. To this end Entrez’s search engine provides three powerful features. (i) Sequence and structure neighbors; one may select all sequences similar to one of interest, for example, and link to any known 3D structures. (ii) Links between databases; one may search by term matching in MEDLINE, for example, and link to 3D structures reported in these articles. (iii) Sequence and structure visualization; identifying a homolog with known structure, one may view molecular-graphic and alignment displays, to infer approximate 3D structure. In this article we focus on two features of Entrez’s Molecular Modeling Database (MMDB) not described previously: links from individual biopolymer chains within 3D structures to a systematic taxonomy of organisms represented in molecular databases, and links from individual chains (and compact 3D domains within them) to structure neighbors, other chains (and 3D domains) with similar 3D structure. MMDB may be accessed at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Structure. PMID:11752307
HIV Structural Database

National Institute of Standards and Technology Data Gateway

SRD 102 HIV Structural Database (Web, free access) The HIV Protease Structural Database is an archive of experimentally determined 3-D structures of Human Immunodeficiency Virus 1 (HIV-1), Human Immunodeficiency Virus 2 (HIV-2) and Simian Immunodeficiency Virus (SIV) Proteases and their complexes with inhibitors or products of substrate cleavage.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

PubMed

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Tautomerism in chemical information management systems

NASA Astrophysics Data System (ADS)

Warr, Wendy A.

2010-06-01

Tautomerism has an impact on many of the processes in chemical information management systems including novelty checking during registration into chemical structure databases; storage of structures; exact and substructure searching in chemical structure databases; and depiction of structures retrieved by a search. The approaches taken by 27 different software vendors and database producers are compared. It is hoped that this comparison will act as a discussion document that could ultimately improve databases and software for researchers in the future.
Columba: an integrated database of proteins, structures, and annotations.

PubMed

Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

2005-03-31

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases. 2. Teaching Units that Utilize an Interactive Web-Accessible Subset of the Cambridge Structural Database

ERIC Educational Resources Information Center

Battle, Gary M.; Allen, Frank H.; Ferrence, Gregory M.

2010-01-01

A series of online interactive teaching units have been developed that illustrate the use of experimentally measured three-dimensional (3D) structures to teach fundamental chemistry concepts. The units integrate a 500-structure subset of the Cambridge Structural Database specially chosen for their pedagogical value. The units span a number of key…
E-MSD: an integrated data resource for bioinformatics.

PubMed

Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K

2005-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.
Using the structure-function linkage database to characterize functional domains in enzymes.

PubMed

Brown, Shoshana; Babbitt, Patricia

2014-12-12

The Structure-Function Linkage Database (SFLD; http://sfld.rbvi.ucsf.edu/) is a Web-accessible database designed to link enzyme sequence, structure, and functional information. This unit describes the protocols by which a user may query the database to predict the function of uncharacterized enzymes and to correct misannotated functional assignments. The information in this unit is especially useful in helping a user discriminate functional capabilities of a sequence that is only distantly related to characterized sequences in publicly available databases. Copyright © 2014 John Wiley & Sons, Inc.

LMSD: LIPID MAPS structure database

PubMed Central

Sud, Manish; Fahy, Eoin; Cotter, Dawn; Brown, Alex; Dennis, Edward A.; Glass, Christopher K.; Merrill, Alfred H.; Murphy, Robert C.; Raetz, Christian R. H.; Russell, David W.; Subramaniam, Shankar

2007-01-01

The LIPID MAPS Structure Database (LMSD) is a relational database encompassing structures and annotations of biologically relevant lipids. Structures of lipids in the database come from four sources: (i) LIPID MAPS Consortium's core laboratories and partners; (ii) lipids identified by LIPID MAPS experiments; (iii) computationally generated structures for appropriate lipid classes; (iv) biologically relevant lipids manually curated from LIPID BANK, LIPIDAT and other public sources. All the lipid structures in LMSD are drawn in a consistent fashion. In addition to a classification-based retrieval of lipids, users can search LMSD using either text-based or structure-based search options. The text-based search implementation supports data retrieval by any combination of these data fields: LIPID MAPS ID, systematic or common name, mass, formula, category, main class, and subclass data fields. The structure-based search, in conjunction with optional data fields, provides the capability to perform a substructure search or exact match for the structure drawn by the user. Search results, in addition to structure and annotations, also include relevant links to external databases. The LMSD is publicly available at PMID:17098933
Ambiguity of non-systematic chemical identifiers within and between small-molecule databases.

PubMed

Akhondi, Saber A; Muresan, Sorel; Williams, Antony J; Kors, Jan A

2015-01-01

A wide range of chemical compound databases are currently available for pharmaceutical research. To retrieve compound information, including structures, researchers can query these chemical databases using non-systematic identifiers. These are source-dependent identifiers (e.g., brand names, generic names), which are usually assigned to the compound at the point of registration. The correctness of non-systematic identifiers (i.e., whether an identifier matches the associated structure) can only be assessed manually, which is cumbersome, but it is possible to automatically check their ambiguity (i.e., whether an identifier matches more than one structure). In this study we have quantified the ambiguity of non-systematic identifiers within and between eight widely used chemical databases. We also studied the effect of chemical structure standardization on reducing the ambiguity of non-systematic identifiers. The ambiguity of non-systematic identifiers within databases varied from 0.1 to 15.2 % (median 2.5 %). Standardization reduced the ambiguity only to a small extent for most databases. A wide range of ambiguity existed for non-systematic identifiers that are shared between databases (17.7-60.2 %, median of 40.3 %). Removing stereochemistry information provided the largest reduction in ambiguity across databases (median reduction 13.7 percentage points). Ambiguity of non-systematic identifiers within chemical databases is generally low, but ambiguity of non-systematic identifiers that are shared between databases, is high. Chemical structure standardization reduces the ambiguity to a limited extent. Our findings can help to improve database integration, curation, and maintenance.
Molecule database framework: a framework for creating database applications with chemical structure search capability

PubMed Central

2013-01-01

Background Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Results Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes: • Support for multi-component compounds (mixtures) • Import and export of SD-files • Optional security (authorization) For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures). Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. Conclusions By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework. PMID:24325762
Molecule database framework: a framework for creating database applications with chemical structure search capability.

PubMed

Kiener, Joos

2013-12-11

Research in organic chemistry generates samples of novel chemicals together with their properties and other related data. The involved scientists must be able to store this data and search it by chemical structure. There are commercial solutions for common needs like chemical registration systems or electronic lab notebooks. However for specific requirements of in-house databases and processes no such solutions exist. Another issue is that commercial solutions have the risk of vendor lock-in and may require an expensive license of a proprietary relational database management system. To speed up and simplify the development for applications that require chemical structure search capabilities, I have developed Molecule Database Framework. The framework abstracts the storing and searching of chemical structures into method calls. Therefore software developers do not require extensive knowledge about chemistry and the underlying database cartridge. This decreases application development time. Molecule Database Framework is written in Java and I created it by integrating existing free and open-source tools and frameworks. The core functionality includes:•Support for multi-component compounds (mixtures)•Import and export of SD-files•Optional security (authorization)For chemical structure searching Molecule Database Framework leverages the capabilities of the Bingo Cartridge for PostgreSQL and provides type-safe searching, caching, transactions and optional method level security. Molecule Database Framework supports multi-component chemical compounds (mixtures).Furthermore the design of entity classes and the reasoning behind it are explained. By means of a simple web application I describe how the framework could be used. I then benchmarked this example application to create some basic performance expectations for chemical structure searches and import and export of SD-files. By using a simple web application it was shown that Molecule Database Framework successfully abstracts chemical structure searches and SD-File import and export to simple method calls. The framework offers good search performance on a standard laptop without any database tuning. This is also due to the fact that chemical structure searches are paged and cached. Molecule Database Framework is available for download on the projects web page on bitbucket: https://bitbucket.org/kienerj/moleculedatabaseframework.
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE Office of Scientific and Technical Information (OSTI.GOV)

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE PAGES

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
A dynamic clinical dental relational database.

PubMed

Taylor, D; Naguib, R N G; Boulton, S

2004-09-01

The traditional approach to relational database design is based on the logical organization of data into a number of related normalized tables. One assumption is that the nature and structure of the data is known at the design stage. In the case of designing a relational database to store historical dental epidemiological data from individual clinical surveys, the structure of the data is not known until the data is presented for inclusion into the database. This paper addresses the issues concerned with the theoretical design of a clinical dynamic database capable of adapting the internal table structure to accommodate clinical survey data, and presents a prototype database application capable of processing, displaying, and querying the dental data.
TRANSFORMATION OF DEVELOPMENTAL NEUROTOXICITY DATA INTO STRUCTURE-SEARCHABLE TOXML DATABASE IN SUPPORT OF STRUCTURE-ACTIVITY RELATIONSHIP (SAR) WORKFLOW.

EPA Science Inventory

Early hazard identification of new chemicals is often difficult due to lack of data on the novel material for toxicity endpoints, including neurotoxicity. At present, there are no structure searchable neurotoxicity databases. A working group was formed to construct a database to...
Databases and Associated Tools for Glycomics and Glycoproteomics.

PubMed

Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

2017-01-01

The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).
CREDO: a structural interactomics database for drug discovery

PubMed Central

Schreyer, Adrian M.; Blundell, Tom L.

2013-01-01

CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL: http://www-cryst.bioc.cam.ac.uk/credo PMID:23868908
E-MSD: an integrated data resource for bioinformatics

PubMed Central

Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.

2005-01-01

The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192
SM-TF: A structural database of small molecule-transcription factor complexes.

PubMed

Xu, Xianjin; Ma, Zhiwei; Sun, Hongmin; Zou, Xiaoqin

2016-06-30

Transcription factors (TFs) are the proteins involved in the transcription process, ensuring the correct expression of specific genes. Numerous diseases arise from the dysfunction of specific TFs. In fact, over 30 TFs have been identified as therapeutic targets of about 9% of the approved drugs. In this study, we created a structural database of small molecule-transcription factor (SM-TF) complexes, available online at http://zoulab.dalton.missouri.edu/SM-TF. The 3D structures of the co-bound small molecule and the corresponding binding sites on TFs are provided in the database, serving as a valuable resource to assist structure-based drug design related to TFs. Currently, the SM-TF database contains 934 entries covering 176 TFs from a variety of species. The database is further classified into several subsets by species and organisms. The entries in the SM-TF database are linked to the UniProt database and other sequence-based TF databases. Furthermore, the druggable TFs from human and the corresponding approved drugs are linked to the DrugBank. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
StraPep: a structure database of bioactive peptides

PubMed Central

Wang, Jian; Yin, Tailang; Xiao, Xuwen; He, Dan; Xue, Zhidong; Jiang, Xinnong; Wang, Yan

2018-01-01

Abstract Bioactive peptides, with a variety of biological activities and wide distribution in nature, have attracted great research interest in biological and medical fields, especially in pharmaceutical industry. The structural information of bioactive peptide is important for the development of peptide-based drugs. Many databases have been developed cataloguing bioactive peptides. However, to our knowledge, database dedicated to collect all the bioactive peptides with known structure is not available yet. Thus, we developed StraPep, a structure database of bioactive peptides. StraPep holds 3791 bioactive peptide structures, which belong to 1312 unique bioactive peptide sequences. About 905 out of 1312 (68%) bioactive peptides in StraPep contain disulfide bonds, which is significantly higher than that (21%) of PDB. Interestingly, 150 out of 616 (24%) bioactive peptides with three or more disulfide bonds form a structural motif known as cystine knot, which confers considerable structural stability on proteins and is an attractive scaffold for drug design. Detailed information of each peptide, including the experimental structure, the location of disulfide bonds, secondary structure, classification, post-translational modification and so on, has been provided. A wide range of user-friendly tools, such as browsing, sequence and structure-based searching and so on, has been incorporated into StraPep. We hope that this database will be helpful for the research community. Database URL: http://isyslab.info/StraPep PMID:29688386
PROFESS: a PROtein Function, Evolution, Structure and Sequence database

PubMed Central

Triplet, Thomas; Shortridge, Matthew D.; Griep, Mark A.; Stark, Jaime L.; Powers, Robert; Revesz, Peter

2010-01-01

The proliferation of biological databases and the easy access enabled by the Internet is having a beneficial impact on biological sciences and transforming the way research is conducted. There are ∼1100 molecular biology databases dispersed throughout the Internet. To assist in the functional, structural and evolutionary analysis of the abundant number of novel proteins continually identified from whole-genome sequencing, we introduce the PROFESS (PROtein Function, Evolution, Structure and Sequence) database. Our database is designed to be versatile and expandable and will not confine analysis to a pre-existing set of data relationships. A fundamental component of this approach is the development of an intuitive query system that incorporates a variety of similarity functions capable of generating data relationships not conceived during the creation of the database. The utility of PROFESS is demonstrated by the analysis of the structural drift of homologous proteins and the identification of potential pancreatic cancer therapeutic targets based on the observation of protein–protein interaction networks. Database URL: http://cse.unl.edu/∼profess/ PMID:20624718
SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein–Protein Interactions

PubMed Central

Jefferson, Emily R.; Walsh, Thomas P.; Roberts, Timothy J.; Barton, Geoffrey J.

2007-01-01

SNAPPI-DB, a high performance database of Structures, iNterfaces and Alignments of Protein–Protein Interactions, and its associated Java Application Programming Interface (API) is described. SNAPPI-DB contains structural data, down to the level of atom co-ordinates, for each structure in the Protein Data Bank (PDB) together with associated data including SCOP, CATH, Pfam, SWISSPROT, InterPro, GO terms, Protein Quaternary Structures (PQS) and secondary structure information. Domain–domain interactions are stored for multiple domain definitions and are classified by their Superfamily/Family pair and interaction interface. Each set of classified domain–domain interactions has an associated multiple structure alignment for each partner. The API facilitates data access via PDB entries, domains and domain–domain interactions. Rapid development, fast database access and the ability to perform advanced queries without the requirement for complex SQL statements are provided via an object oriented database and the Java Data Objects (JDO) API. SNAPPI-DB contains many features which are not available in other databases of structural protein–protein interactions. It has been applied in three studies on the properties of protein–protein interactions and is currently being employed to train a protein–protein interaction predictor and a functional residue predictor. The database, API and manual are available for download at: . PMID:17202171
Migration from relational to NoSQL database

NASA Astrophysics Data System (ADS)

Ghotiya, Sunita; Mandal, Juhi; Kandasamy, Saravanakumar

2017-11-01

Data generated by various real time applications, social networking sites and sensor devices is of very huge amount and unstructured, which makes it difficult for Relational database management systems to handle the data. Data is very precious component of any application and needs to be analysed after arranging it in some structure. Relational databases are only able to deal with structured data, so there is need of NoSQL Database management System which can deal with semi -structured data also. Relational database provides the easiest way to manage the data but as the use of NoSQL is increasing it is becoming necessary to migrate the data from Relational to NoSQL databases. Various frameworks has been proposed previously which provides mechanisms for migration of data stored at warehouses in SQL, middle layer solutions which can provide facility of data to be stored in NoSQL databases to handle data which is not structured. This paper provides a literature review of some of the recent approaches proposed by various researchers to migrate data from relational to NoSQL databases. Some researchers proposed mechanisms for the co-existence of NoSQL and Relational databases together. This paper provides a summary of mechanisms which can be used for mapping data stored in Relational databases to NoSQL databases. Various techniques for data transformation and middle layer solutions are summarised in the paper.
NALDB: nucleic acid ligand database for small molecules targeting nucleic acid

PubMed Central

Kumar Mishra, Subodh; Kumar, Amit

2016-01-01

Nucleic acid ligand database (NALDB) is a unique database that provides detailed information about the experimental data of small molecules that were reported to target several types of nucleic acid structures. NALDB is the first ligand database that contains ligand information for all type of nucleic acid. NALDB contains more than 3500 ligand entries with detailed pharmacokinetic and pharmacodynamic information such as target name, target sequence, ligand 2D/3D structure, SMILES, molecular formula, molecular weight, net-formal charge, AlogP, number of rings, number of hydrogen bond donor and acceptor, potential energy along with their Ki, Kd, IC50 values. All these details at single platform would be helpful for the development and betterment of novel ligands targeting nucleic acids that could serve as a potential target in different diseases including cancers and neurological disorders. With maximum 255 conformers for each ligand entry, our database is a multi-conformer database and can facilitate the virtual screening process. NALDB provides powerful web-based search tools that make database searching efficient and simplified using option for text as well as for structure query. NALDB also provides multi-dimensional advanced search tool which can screen the database molecules on the basis of molecular properties of ligand provided by database users. A 3D structure visualization tool has also been included for 3D structure representation of ligands. NALDB offers an inclusive pharmacological information and the structurally flexible set of small molecules with their three-dimensional conformers that can accelerate the virtual screening and other modeling processes and eventually complement the nucleic acid-based drug discovery research. NALDB can be routinely updated and freely available on bsbe.iiti.ac.in/bsbe/naldb/HOME.php. Database URL: http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php PMID:26896846
Structured Forms Reference Set of Binary Images (SFRS)

National Institute of Standards and Technology Data Gateway

NIST Structured Forms Reference Set of Binary Images (SFRS) (Web, free access) The NIST Structured Forms Database (Special Database 2) consists of 5,590 pages of binary, black-and-white images of synthesized documents. The documents in this database are 12 different tax forms from the IRS 1040 Package X for the year 1988.
SAbDab: the structural antibody database

PubMed Central

Dunbar, James; Krawczyk, Konrad; Leem, Jinwoo; Baker, Terry; Fuchs, Angelika; Georges, Guy; Shi, Jiye; Deane, Charlotte M.

2014-01-01

Structural antibody database (SAbDab; http://opig.stats.ox.ac.uk/webapps/sabdab) is an online resource containing all the publicly available antibody structures annotated and presented in a consistent fashion. The data are annotated with several properties including experimental information, gene details, correct heavy and light chain pairings, antigen details and, where available, antibody–antigen binding affinity. The user can select structures, according to these attributes as well as structural properties such as complementarity determining region loop conformation and variable domain orientation. Individual structures, datasets and the complete database can be downloaded. PMID:24214988
Searching the Cambridge Structural Database for polymorphs.

PubMed

van de Streek, Jacco; Motherwell, Sam

2005-10-01

In order to identify all pairs of polymorphs in the Cambridge Structural Database (CSD), a method was devised to automatically compare two crystal structures. The comparison is based on simulated powder diffraction patterns, but with special provisions to deal with differences in unit-cell volumes caused by temperature or pressure. Among the 325,000 crystal structures in the Cambridge Structural Database, 35,000 pairs of crystal structures of the same chemical compound were identified and compared. A total of 7300 pairs of polymorphs were identified, of which 154 previously were unknown.

PACSY, a relational database management system for protein structure and chemical shift analysis.

PubMed

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo; Lee, Weontae; Markley, John L

2012-10-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu.
Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration

PubMed Central

Gražulis, Saulius; Daškevič, Adriana; Merkys, Andrius; Chateigner, Daniel; Lutterotti, Luca; Quirós, Miguel; Serebryanaya, Nadezhda R.; Moeck, Peter; Downs, Robert T.; Le Bail, Armel

2012-01-01

Using an open-access distribution model, the Crystallography Open Database (COD, http://www.crystallography.net) collects all known ‘small molecule / small to medium sized unit cell’ crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated ∼150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of standard open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge. PMID:22070882
A comparison of complex sleep behaviors with two short-acting Z-hypnosedative drugs in nonpsychotic patients

PubMed Central

Chen, Li-Fen; Lin, Ching-En; Chou, Yu-Ching; Mao, Wei-Chung; Chen, Yi-Chyan; Tzeng, Nian-Sheng

2013-01-01

Objective Complex sleep behaviors (CSBs) are classified as “parasomnias” in the International Classifcation of Sleep Disorders, Second Edition (ICSD-2). To realize the potential danger after taking two short-acting Z-hypnosedative drugs, we estimated the incidence of CSBs in nonpsychotic patients in Taiwan. Methods Subjects (N = 1,220) using zolpidem or zopiclone were enrolled from the psychiatric outpatient clinics of a medical center in Taiwan over a 16-month period in 2006–2007. Subjects with zolpidem (N = 1,132) and subjects with zopiclone (N = 88) were analyzed. All subjects completed a questionnaire that included demographic data and complex sleep behaviors after taking hypnotics. Results Among zolpidem and zopiclone users, 3.28% of patients reported incidents of somnambulism or amnesic sleep-related behavior problems. The incidence of CSBs with zolpidem and zopiclone were 3.27%, and 3.41%, respectively, which was signifcantly lower than other studies in Taiwan. Conclusion These results serve as a reminder for clinicians to make inquiries regarding any unusual performance of parasomnic activities when prescribing zolpidem or zopiclone. PMID:23976857
PubMed

Monaca, C; Franco, P; Philip, P; Dauvilliers, Y

In the new international classification of sleep disorders (ICSD-3), narcolepsy is differentiated into two distinct pathologies: type 1 narcolepsy (NT1) and type 2 narcolepsy (NT2). NT1 is characterised by periods of an irrepressible need to sleep, cataplexy (a sudden loss of muscle tone triggered by emotion) and in some cases the presence of symptoms such as hypnagogic hallucinations, sleep paralysis and disturbed night-time sleep. Its physiopathology is based on the loss of hypocretin neurons in the hypothalamus, seemingly connected to an auto-immune process. By definition, cataplexy is absent and the hypocretin levels in the CSF are normal in NT2. Confirming the diagnosis requires polysomnography and multiple sleep latency tests. The choice of further investigations is based on the presence or absence of typical cataplexy. Further investigations include HLA typing, lumbar puncture to measure the hypocretin level in the CSF, or even brain imagery in the case of narcolepsy suspected to be secondary to an underlying pathology. In this consensus we propose recommendations for the work-up to be carried out during diagnosis and follow-up for patients suffering from narcolepsy. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Applications of the Cambridge Structural Database in organic chemistry and crystal chemistry.

PubMed

Allen, Frank H; Motherwell, W D Samuel

2002-06-01

The Cambridge Structural Database (CSD) and its associated software systems have formed the basis for more than 800 research applications in structural chemistry, crystallography and the life sciences. Relevant references, dating from the mid-1970s, and brief synopses of these papers are collected in a database, DBUse, which is freely available via the CCDC website. This database has been used to review research applications of the CSD in organic chemistry, including supramolecular applications, and in organic crystal chemistry. The review concentrates on applications that have been published since 1990 and covers a wide range of topics, including structure correlation, conformational analysis, hydrogen bonding and other intermolecular interactions, studies of crystal packing, extended structural motifs, crystal engineering and polymorphism, and crystal structure prediction. Applications of CSD information in studies of crystal structure precision, the determination of crystal structures from powder diffraction data, together with applications in chemical informatics, are also discussed.
HIV Structural Database using Chem BLAST for all classes of AIDS inhibitors

National Institute of Standards and Technology Data Gateway

SRD 155 HIV Structural Database using Chem BLAST for all classes of AIDS inhibitors (Web, free access) The HIV structural database (HIVSDB) is a comprehensive collection of the structures of HIV protease, both of unliganded enzyme and of its inhibitor complexes. It contains abstracts and crystallographic data such as inhibitor and protein coordinates for 248 data sets, of which only 141 are from the Protein Data Bank (PDB).
DB Dehydrogenase: an online integrated structural database on enzyme dehydrogenase.

PubMed

Nandy, Suman Kumar; Bhuyan, Rajabrata; Seal, Alpana

2012-01-01

Dehydrogenase enzymes are almost inevitable for metabolic processes. Shortage or malfunctioning of dehydrogenases often leads to several acute diseases like cancers, retinal diseases, diabetes mellitus, Alzheimer, hepatitis B & C etc. With advancement in modern-day research, huge amount of sequential, structural and functional data are generated everyday and widens the gap between structural attributes and its functional understanding. DB Dehydrogenase is an effort to relate the functionalities of dehydrogenase with its structures. It is a completely web-based structural database, covering almost all dehydrogenases [~150 enzyme classes, ~1200 entries from ~160 organisms] whose structures are known. It is created by extracting and integrating various online resources to provide the true and reliable data and implemented by MySQL relational database through user friendly web interfaces using CGI Perl. Flexible search options are there for data extraction and exploration. To summarize, sequence, structure, function of all dehydrogenases in one place along with the necessary option of cross-referencing; this database will be utile for researchers to carry out further work in this field. The database is available for free at http://www.bifku.in/DBD/
SATPdb: a database of structurally annotated therapeutic peptides

PubMed Central

Singh, Sandeep; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Bhalla, Sherry; Usmani, Salman Sadullah; Gautam, Ankur; Tuknait, Abhishek; Agrawal, Piyush; Mathur, Deepika; Raghava, Gajendra P.S.

2016-01-01

SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics. PMID:26527728
Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage.

PubMed

Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze

2013-04-01

Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants.

PubMed

Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng

2014-01-01

The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure-activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein-ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. © The Author(s) 2014. Published by Oxford University Press.
PACSY, a relational database management system for protein structure and chemical shift analysis

PubMed Central

Lee, Woonghee; Yu, Wookyung; Kim, Suhkmann; Chang, Iksoo

2012-01-01

PACSY (Protein structure And Chemical Shift NMR spectroscopY) is a relational database management system that integrates information from the Protein Data Bank, the Biological Magnetic Resonance Data Bank, and the Structural Classification of Proteins database. PACSY provides three-dimensional coordinates and chemical shifts of atoms along with derived information such as torsion angles, solvent accessible surface areas, and hydrophobicity scales. PACSY consists of six relational table types linked to one another for coherence by key identification numbers. Database queries are enabled by advanced search functions supported by an RDBMS server such as MySQL or PostgreSQL. PACSY enables users to search for combinations of information from different database sources in support of their research. Two software packages, PACSY Maker for database creation and PACSY Analyzer for database analysis, are available from http://pacsy.nmrfam.wisc.edu. PMID:22903636
LigandBox: A database for 3D structures of chemical compounds

PubMed Central

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening. PMID:27493549
LigandBox: A database for 3D structures of chemical compounds.

PubMed

Kawabata, Takeshi; Sugihara, Yusuke; Fukunishi, Yoshifumi; Nakamura, Haruki

2013-01-01

A database for the 3D structures of available compounds is essential for the virtual screening by molecular docking. We have developed the LigandBox database (http://ligandbox.protein.osaka-u.ac.jp/ligandbox/) containing four million available compounds, collected from the catalogues of 37 commercial suppliers, and approved drugs and biochemical compounds taken from KEGG_DRUG, KEGG_COMPOUND and PDB databases. Each chemical compound in the database has several 3D conformers with hydrogen atoms and atomic charges, which are ready to be docked into receptors using docking programs. The 3D conformations were generated using our molecular simulation program package, myPresto. Various physical properties, such as aqueous solubility (LogS) and carcinogenicity have also been calculated to characterize the ADME-Tox properties of the compounds. The Web database provides two services for compound searches: a property/chemical ID search and a chemical structure search. The chemical structure search is performed by a descriptor search and a maximum common substructure (MCS) search combination, using our program kcombu. By specifying a query chemical structure, users can find similar compounds among the millions of compounds in the database within a few minutes. Our database is expected to assist a wide range of researchers, in the fields of medical science, chemical biology, and biochemistry, who are seeking to discover active chemical compounds by the virtual screening.
THGS: a web-based database of Transmembrane Helices in Genome Sequences

PubMed Central

Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

2004-01-01

Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375
Comparison of the NCI open database with seven large chemical structural databases.

PubMed

Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C

2001-01-01

Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.
LIPS database with LIPService: a microscopic image database of intracellular structures in Arabidopsis guard cells.

PubMed

Higaki, Takumi; Kutsuna, Natsumaro; Hasezawa, Seiichiro

2013-05-16

Intracellular configuration is an important feature of cell status. Recent advances in microscopic imaging techniques allow us to easily obtain a large number of microscopic images of intracellular structures. In this circumstance, automated microscopic image recognition techniques are of extreme importance to future phenomics/visible screening approaches. However, there was no benchmark microscopic image dataset for intracellular organelles in a specified plant cell type. We previously established the Live Images of Plant Stomata (LIPS) database, a publicly available collection of optical-section images of various intracellular structures of plant guard cells, as a model system of environmental signal perception and transduction. Here we report recent updates to the LIPS database and the establishment of a database table, LIPService. We updated the LIPS dataset and established a new interface named LIPService to promote efficient inspection of intracellular structure configurations. Cell nuclei, microtubules, actin microfilaments, mitochondria, chloroplasts, endoplasmic reticulum, peroxisomes, endosomes, Golgi bodies, and vacuoles can be filtered using probe names or morphometric parameters such as stomatal aperture. In addition to the serial optical sectional images of the original LIPS database, new volume-rendering data for easy web browsing of three-dimensional intracellular structures have been released to allow easy inspection of their configurations or relationships with cell status/morphology. We also demonstrated the utility of the new LIPS image database for automated organelle recognition of images from another plant cell image database with image clustering analyses. The updated LIPS database provides a benchmark image dataset for representative intracellular structures in Arabidopsis guard cells. The newly released LIPService allows users to inspect the relationship between organellar three-dimensional configurations and morphometrical parameters.
Structured Forms Reference Set of Binary Images II (SFRS2)

National Institute of Standards and Technology Data Gateway

NIST Structured Forms Reference Set of Binary Images II (SFRS2) (Web, free access) The second NIST database of structured forms (Special Database 6) consists of 5,595 pages of binary, black-and-white images of synthesized documents containing hand-print. The documents in this database are 12 different tax forms with the IRS 1040 Package X for the year 1988.
DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

EPA Science Inventory

DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
Ann M. Richard
US Environmental Protection Agency, Research Triangle Park, NC, USA

Distributed: Decentralized set of standardized, field-delimited databases,...
FDA toxicity databases and real-time data entry.

PubMed

Arvidson, Kirk B

2008-11-15

Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.
Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.

PubMed

Radauer, Christian

2017-01-01

The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.

TIPdb-3D: the three-dimensional structure database of phytochemicals from Taiwan indigenous plants

PubMed Central

Tung, Chun-Wei; Lin, Ying-Chi; Chang, Hsun-Shuo; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng

2014-01-01

The rich indigenous and endemic plants in Taiwan serve as a resourceful bank for biologically active phytochemicals. Based on our TIPdb database curating bioactive phytochemicals from Taiwan indigenous plants, this study presents a three-dimensional (3D) chemical structure database named TIPdb-3D to support the discovery of novel pharmacologically active compounds. The Merck Molecular Force Field (MMFF94) was used to generate 3D structures of phytochemicals in TIPdb. The 3D structures could facilitate the analysis of 3D quantitative structure–activity relationship, the exploration of chemical space and the identification of potential pharmacologically active compounds using protein–ligand docking. Database URL: http://cwtung.kmu.edu.tw/tipdb. PMID:24930145
An Introduction to Database Structure and Database Machines.

ERIC Educational Resources Information Center

Detweiler, Karen

1984-01-01

Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…
NALDB: nucleic acid ligand database for small molecules targeting nucleic acid.

PubMed

Kumar Mishra, Subodh; Kumar, Amit

2016-01-01

Nucleic acid ligand database (NALDB) is a unique database that provides detailed information about the experimental data of small molecules that were reported to target several types of nucleic acid structures. NALDB is the first ligand database that contains ligand information for all type of nucleic acid. NALDB contains more than 3500 ligand entries with detailed pharmacokinetic and pharmacodynamic information such as target name, target sequence, ligand 2D/3D structure, SMILES, molecular formula, molecular weight, net-formal charge, AlogP, number of rings, number of hydrogen bond donor and acceptor, potential energy along with their Ki, Kd, IC50 values. All these details at single platform would be helpful for the development and betterment of novel ligands targeting nucleic acids that could serve as a potential target in different diseases including cancers and neurological disorders. With maximum 255 conformers for each ligand entry, our database is a multi-conformer database and can facilitate the virtual screening process. NALDB provides powerful web-based search tools that make database searching efficient and simplified using option for text as well as for structure query. NALDB also provides multi-dimensional advanced search tool which can screen the database molecules on the basis of molecular properties of ligand provided by database users. A 3D structure visualization tool has also been included for 3D structure representation of ligands. NALDB offers an inclusive pharmacological information and the structurally flexible set of small molecules with their three-dimensional conformers that can accelerate the virtual screening and other modeling processes and eventually complement the nucleic acid-based drug discovery research. NALDB can be routinely updated and freely available on bsbe.iiti.ac.in/bsbe/naldb/HOME.php. Database URL: http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php. © The Author(s) 2016. Published by Oxford University Press.
Visualization and manipulating the image of a formal data structure (FDS)-based database

NASA Astrophysics Data System (ADS)

Verdiesen, Franc; de Hoop, Sylvia; Molenaar, Martien

1994-08-01

A vector map is a terrain representation with a vector-structured geometry. Molenaar formulated an object-oriented formal data structure for 3D single valued vector maps. This FDS is implemented in a database (Oracle). In this study we describe a methodology for visualizing a FDS-based database and manipulating the image. A data set retrieved by querying the database is converted into an import file for a drawing application. An objective of this study is that an end-user can alter and add terrain objects in the image. The drawing application creates an export file, that is compared with the import file. Differences between these files result in updating the database which involves checks on consistency. In this study Autocad is used for visualizing and manipulating the image of the data set. A computer program has been written for the data exchange and conversion between Oracle and Autocad. The data structure of the FDS is compared to the data structure of Autocad and the data of the FDS is converted into the structure of Autocad equal to the FDS.
Towards computational improvement of DNA database indexing and short DNA query searching.

PubMed

Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

2014-09-03

In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.
Structator: fast index-based search for RNA sequence-structure patterns

PubMed Central

2011-01-01

Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640
A Circular Dichroism Reference Database for Membrane Proteins

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wallace,B.; Wien, F.; Stone, T.

2006-01-01

Membrane proteins are a major product of most genomes and the target of a large number of current pharmaceuticals, yet little information exists on their structures because of the difficulty of crystallising them; hence for the most part they have been excluded from structural genomics programme targets. Furthermore, even methods such as circular dichroism (CD) spectroscopy which seek to define secondary structure have not been fully exploited because of technical limitations to their interpretation for membrane embedded proteins. Empirical analyses of circular dichroism (CD) spectra are valuable for providing information on secondary structures of proteins. However, the accuracy of themore » results depends on the appropriateness of the reference databases used in the analyses. Membrane proteins have different spectral characteristics than do soluble proteins as a result of the low dielectric constants of membrane bilayers relative to those of aqueous solutions (Chen & Wallace (1997) Biophys. Chem. 65:65-74). To date, no CD reference database exists exclusively for the analysis of membrane proteins, and hence empirical analyses based on current reference databases derived from soluble proteins are not adequate for accurate analyses of membrane protein secondary structures (Wallace et al (2003) Prot. Sci. 12:875-884). We have therefore created a new reference database of CD spectra of integral membrane proteins whose crystal structures have been determined. To date it contains more than 20 proteins, and spans the range of secondary structures from mostly helical to mostly sheet proteins. This reference database should enable more accurate secondary structure determinations of membrane embedded proteins and will become one of the reference database options in the CD calculation server DICHROWEB (Whitmore & Wallace (2004) NAR 32:W668-673).« less
DSSTOX STRUCTURE-SEARCHABLE PUBLIC TOXICITY DATABASE NETWORK: CURRENT PROGRESS AND NEW INITIATIVES TO IMPROVE CHEMO-BIOINFORMATICS CAPABILITIES

EPA Science Inventory

The EPA DSSTox website (http://www/epa.gov/nheerl/dsstox) publishes standardized, structure-annotated toxicity databases, covering a broad range of toxicity disciplines. Each DSSTox database features documentation written in collaboration with the source authors and toxicity expe...
MPID-T2: a database for sequence-structure-function analyses of pMHC and TR/pMHC structures.

PubMed

Khan, Javed Mohammed; Cheruku, Harish Reddy; Tong, Joo Chuan; Ranganathan, Shoba

2011-04-15

Sequence-structure-function information is critical in understanding the mechanism of pMHC and TR/pMHC binding and recognition. A database for sequence-structure-function information on pMHC and TR/pMHC interactions, MHC-Peptide Interaction Database-TR version 2 (MPID-T2), is now available augmented with the latest PDB and IMGT/3Dstructure-DB data, advanced features and new parameters for the analysis of pMHC and TR/pMHC structures. http://biolinfo.org/mpid-t2. shoba.ranganathan@mq.edu.au Supplementary data are available at Bioinformatics online.
PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank.

PubMed

Tusnády, Gábor E; Dosztányi, Zsuzsanna; Simon, István

2005-01-01

PDB_TM is a database for transmembrane proteins with known structures. It aims to collect all transmembrane proteins that are deposited in the protein structure database (PDB) and to determine their membrane-spanning regions. These assignments are based on the TMDET algorithm, which uses only structural information to locate the most likely position of the lipid bilayer and to distinguish between transmembrane and globular proteins. This algorithm was applied to all PDB entries and the results were collected in the PDB_TM database. By using TMDET algorithm, the PDB_TM database can be automatically updated every week, keeping it synchronized with the latest PDB updates. The PDB_TM database is available at http://www.enzim.hu/PDB_TM.
[Construction of chemical information database based on optical structure recognition technique].

PubMed

Lv, C Y; Li, M N; Zhang, L R; Liu, Z M

2018-04-18

To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.
Computer systems and methods for the query and visualization of multidimensional databases

DOEpatents

Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

2006-08-08

A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.
Computer systems and methods for the query and visualization of multidimensional database

DOEpatents

Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

2010-05-11

A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.
MODBASE, a database of annotated comparative protein structure models

PubMed Central

Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

2002-01-01

MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309
DSSTox and Chemical Information Technologies in Support of PredictiveToxicology

EPA Science Inventory

The EPA NCCT Distributed Structure-Searchable Toxicity (DSSTox) Database project initially focused on the curation and publication of high-quality, standardized, chemical structure-annotated toxicity databases for use in structure-activity relationship (SAR) modeling. In recent y...
Protein structure database search and evolutionary classification.

PubMed

Yang, Jinn-Moon; Tung, Chi-Hua

2006-01-01

As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].
FDA toxicity databases and real-time data entry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arvidson, Kirk B.

Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributedmore » in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.« less
Crystallography Open Database – an open-access collection of crystal structures

PubMed Central

Gražulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quirós, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, Armel

2009-01-01

The Crystallography Open Database (COD), which is a project that aims to gather all available inorganic, metal–organic and small organic molecule structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80 000 entries in crystallographic information file format, with nearly full coverage of the International Union of Crystallography publications, and is growing in size and quality. PMID:22477773
A rudimentary database for three-dimensional objects using structural representation

NASA Technical Reports Server (NTRS)

Sowers, James P.

1987-01-01

A database which enables users to store and share the description of three-dimensional objects in a research environment is presented. The main objective of the design is to make it a compact structure that holds sufficient information to reconstruct the object. The database design is based on an object representation scheme which is information preserving, reasonably efficient, and yet economical in terms of the storage requirement. The determination of the needed data for the reconstruction process is guided by the belief that it is faster to do simple computations to generate needed data/information for construction than to retrieve everything from memory. Some recent techniques of three-dimensional representation that influenced the design of the database are discussed. The schema for the database and the structural definition used to define an object are given. The user manual for the software developed to create and maintain the contents of the database is included.
The Cambridge Structural Database

PubMed Central

Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.

2016-01-01

The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719

The Cambridge Structural Database.

PubMed

Groom, Colin R; Bruno, Ian J; Lightfoot, Matthew P; Ward, Suzanna C

2016-04-01

The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal-organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.
The Forest Inventory and Analysis Database Version 4.0: Database Description and Users Manual for Phase 3

Treesearch

Christopher W. Woodall; Barbara L. Conkling; Michael C. Amacher; John W. Coulston; Sarah Jovan; Charles H. Perry; Beth Schulz; Gretchen C. Smith; Susan Will Wolf

2010-01-01

Describes the structure of the Forest Inventory and Analysis Database (FIADB) 4.0 for phase 3 indicators. The FIADB structure provides a consistent framework for storing forest health monitoring data across all ownerships for the entire United States. These data are available to the public.
ACToR Chemical Structure processing using Open Source ...

EPA Pesticide Factsheets

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d
The living publication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Terwilliger, Thomas C.

2012-06-04

Within the ICSTI Insights Series we offer three articles on the 'living publication' that is already available to practitioners in the important field of crystal structure determination and analysis. While the specific examples are drawn from this particular field, we invite readers to draw parallels in their own fields of interest. The first article describes the present state of the crystallographic living publication, already recognized by an ALPSP (Association of Learned and Professional Society Publishers) Award for Publishing Innovation in 2006. The second article describes the potential impact on the record of science as greater post-publication analysis becomes more commonmore » within currently accepted data deposition practices, using processed diffraction data as the starting point. The third article outlines a vision for the further improvement of crystallographic structure reports within potentially achievable enhanced data deposition practices, based upon raw (unprocessed) diffraction data. The IUCr in its Commissions and Journals has for many years emphasized the importance of publications being accompanied by data and the interpretation of the data in terms of atomic models. This has been followed as policy by numerous other journals in the field and its cognate disciplines. This practice has been well served by databases and archiving institutions such as the Protein Data Bank (PDB), the Cambridge Crystallographic Data Centre (CCDC), and the Inorganic Crystal Structure Database (ICSD). Normally the models that are archived are interpretations of the data, consisting of atomic coordinates with their displacement parameters, along with processed diffraction data from X-ray, neutron or electron diffraction studies. In our current online age, a reader can not only consult the printed word, but can display and explore the results with molecular graphics software of exceptional quality. Furthermore, the routine availability of processed diffraction data allows readers to perform direct calculations of the electron density (using X-rays and electrons as probes) or the nuclear density (using neutrons as probe) on which the molecular models are directly based. This current community practice is described in our first article. There are various ways that these data and tools can be used to further analyze the molecules that have been crystallized. Notably, once a set of results is announced via the publication, the research community can start to interact directly with the data and models. This gives the community the opportunity not only to read about the structure, but to examine it in detail, and even generate subsequent improved models. These improved models could, in principle, be archived along with the original interpretation of the data and can represent a continuously improving set of interpretations of a set of diffraction data. The models could improve both by correction of errors in the original interpretation and by the use of new representations of molecules in crystal structures that more accurately represent the contents of a crystal. These possible developments are described in our second article. A current, significant, thrust for the IUCr is whether it would be advantageous for the crystallographic community to require, rather than only encourage, the archiving of the raw (unprocessed) diffraction data images measured from a crystal, a fibre or a solution. This issue is being evaluated in detail by an IUCr Working Group (see http://forums.iucr.org). Such archived raw data would be linked to and from any associated publications. The archiving of raw diffraction data could allow as yet undeveloped processing methods to have access to the originally measured data. The debate within the community about this much larger proposed archiving effort revolves around the issue of 'cost versus benefit'. Costs can be minimized by preserving the raw data in local repositories, either at centralized synchrotron and neutron research institutes, or at research universities. Archiving raw data is also perceived as being more effective than just archiving processed data in countering scientific fraud, which exists in our field, albeit at a tiny level of occurrences. In parallel developments, sensitivities to avoiding research malpractice are encouraging Universities to establish their own data repositories for research and academic staff. These various 'raw data archives', would complement the existing processed data archives. These archives could however have gaps in their coverage arising from a lack of resources. Nevertheless we believe that a sufficiently large raw data archive, with reasonable global coverage, could be encouraged and have major benefits. These possible developments, costs and benefits, are described in our third and final article on 'The living publication'.« less
MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors

PubMed Central

Muvva, Charuvaka; Patra, Sanjukta; Venkatesan, Subramanian

2016-01-01

Matrix metalloproteinases (MMPs) are a family of zinc-dependent proteinases involved in the regulation of the extracellular signaling and structural matrix environment of cells and tissues. MMPs are considered as promising targets for the treatment of many diseases. Therefore, creation of database on the inhibitors of MMP would definitely accelerate the research activities in this area due to its implication in above-mentioned diseases and associated limitations in the first and second generation inhibitors. In this communication, we report the development of a new MMpI database which provides resourceful information for all researchers working in this field. It is a web-accessible, unique resource that contains detailed information on the inhibitors of MMP including small molecules, peptides and MMP Drug Leads. The database contains entries of ~3000 inhibitors including ~72 MMP Drug Leads and ~73 peptide based inhibitors. This database provides the detailed molecular and structural details which are necessary for the drug discovery and development. The MMpI database contains physical properties, 2D and 3D structures (mol2 and pdb format files) of inhibitors of MMP. Other data fields are hyperlinked to PubChem, ChEMBL, BindingDB, DrugBank, PDB, MEROPS and PubMed. The database has extensive searching facility with MMpI ID, IUPAC name, chemical structure and with the title of research article. The MMP inhibitors provided in MMpI database are optimized using Python-based Hierarchical Environment for Integrated Xtallography (Phenix) software. MMpI Database is unique and it is the only public database that contains and provides the complete information on the inhibitors of MMP. Database URL: http://clri.res.in/subramanian/databases/mmpi/index.php. PMID:27509041
DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

PubMed Central

Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

2016-01-01

Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

PubMed Central

Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

2015-01-01

We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660
Database systems for knowledge-based discovery.

PubMed

Jagarlapudi, Sarma A R P; Kishan, K V Radha

2009-01-01

Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.
Extraction, integration and analysis of alternative splicing and protein structure distributed information

PubMed Central

D'Antonio, Matteo; Masseroli, Marco

2009-01-01

Background Alternative splicing has been demonstrated to affect most of human genes; different isoforms from the same gene encode for proteins which differ for a limited number of residues, thus yielding similar structures. This suggests possible correlations between alternative splicing and protein structure. In order to support the investigation of such relationships, we have developed the Alternative Splicing and Protein Structure Scrutinizer (PASS), a Web application to automatically extract, integrate and analyze human alternative splicing and protein structure data sparsely available in the Alternative Splicing Database, Ensembl databank and Protein Data Bank. Primary data from these databases have been integrated and analyzed using the Protein Identifier Cross-Reference, BLAST, CLUSTALW and FeatureMap3D software tools. Results A database has been developed to store the considered primary data and the results from their analysis; a system of Perl scripts has been implemented to automatically create and update the database and analyze the integrated data; a Web interface has been implemented to make the analyses easily accessible; a database has been created to manage user accesses to the PASS Web application and store user's data and searches. Conclusion PASS automatically integrates data from the Alternative Splicing Database with protein structure data from the Protein Data Bank. Additionally, it comprehensively analyzes the integrated data with publicly available well-known bioinformatics tools in order to generate structural information of isoform pairs. Further analysis of such valuable information might reveal interesting relationships between alternative splicing and protein structure differences, which may be significantly associated with different functions. PMID:19828075
Systematic analysis of snake neurotoxins' functional classification using a data warehousing approach.

PubMed

Siew, Joyce Phui Yee; Khan, Asif M; Tan, Paul T J; Koh, Judice L Y; Seah, Seng Hong; Koo, Chuay Yeng; Chai, Siaw Ching; Armugam, Arunmozhiarasi; Brusic, Vladimir; Jeyaseelan, Kandiah

2004-12-12

Sequence annotations, functional and structural data on snake venom neurotoxins (svNTXs) are scattered across multiple databases and literature sources. Sequence annotations and structural data are available in the public molecular databases, while functional data are almost exclusively available in the published articles. There is a need for a specialized svNTXs database that contains NTX entries, which are organized, well annotated and classified in a systematic manner. We have systematically analyzed svNTXs and classified them using structure-function groups based on their structural, functional and phylogenetic properties. Using conserved motifs in each phylogenetic group, we built an intelligent module for the prediction of structural and functional properties of unknown NTXs. We also developed an annotation tool to aid the functional prediction of newly identified NTXs as an additional resource for the venom research community. We created a searchable online database of NTX proteins sequences (http://research.i2r.a-star.edu.sg/Templar/DB/snake_neurotoxin). This database can also be found under Swiss-Prot Toxin Annotation Project website (http://www.expasy.org/sprot/).
A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

PubMed

Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

2010-08-01

The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.
A database of new zeolite-like materials.

PubMed

Pophale, Ramdas; Cheeseman, Phillip A; Deem, Michael W

2011-07-21

We here describe a database of computationally predicted zeolite-like materials. These crystals were discovered by a Monte Carlo search for zeolite-like materials. Positions of Si atoms as well as unit cell, space group, density, and number of crystallographically unique atoms were explored in the construction of this database. The database contains over 2.6 M unique structures. Roughly 15% of these are within +30 kJ mol(-1) Si of α-quartz, the band in which most of the known zeolites lie. These structures have topological, geometrical, and diffraction characteristics that are similar to those of known zeolites. The database is the result of refinement by two interatomic potentials that both satisfy the Pauli exclusion principle. The database has been deposited in the publicly available PCOD database and in www.hypotheticalzeolites.net/database/deem/. This journal is © the Owner Societies 2011
Genetic Testing Registry

MedlinePlus

... Splign Vector Alignment Search Tool (VAST) All Data & Software Resources... Domains & Structures BioSystems Cn3D Conserved Domain Database (CDD) Conserved Domain Search Service (CD Search) Structure (Molecular Modeling Database) Vector Alignment ...
3DSDSCAR--a three dimensional structural database for sialic acid-containing carbohydrates through molecular dynamics simulation.

PubMed

Veluraja, Kasinadar; Selvin, Jeyasigamani F A; Venkateshwari, Selvakumar; Priyadarzini, Thanu R K

2010-09-23

The inherent flexibility and lack of strong intramolecular interactions of oligosaccharides demand the use of theoretical methods for their structural elucidation. In spite of the developments of theoretical methods, not much research on glycoinformatics is done so far when compared to bioinformatics research on proteins and nucleic acids. We have developed three dimensional structural database for a sialic acid-containing carbohydrates (3DSDSCAR). This is an open-access database that provides 3D structural models of a given sialic acid-containing carbohydrate. At present, 3DSDSCAR contains 60 conformational models, belonging to 14 different sialic acid-containing carbohydrates, deduced through 10 ns molecular dynamics (MD) simulations. The database is available at the URL: http://www.3dsdscar.org. Copyright 2010 Elsevier Ltd. All rights reserved.
TIPdb: a database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan.

PubMed

Lin, Ying-Chi; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng; Tung, Chun-Wei

2013-01-01

The unique geographic features of Taiwan are attributed to the rich indigenous and endemic plant species in Taiwan. These plants serve as resourceful bank for biologically active phytochemicals. Given that these plant-derived chemicals are prototypes of potential drugs for diseases, databases connecting the chemical structures and pharmacological activities may facilitate drug development. To enhance the utility of the data, it is desirable to develop a database of chemical compounds and corresponding activities from indigenous plants in Taiwan. A database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan was constructed. The database, TIPdb, is composed of a standardized format of published anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan. A browse function was implemented for users to browse the database in a taxonomy-based manner. Search functions can be utilized to filter records of interest by botanical name, part, chemical class, or compound name. The structured and searchable database TIPdb was constructed to serve as a comprehensive and standardized resource for anticancer, antiplatelet, and antituberculosis compounds search. The manually curated chemical structures and activities provide a great opportunity to develop quantitative structure-activity relationship models for the high-throughput screening of potential anticancer, antiplatelet, and antituberculosis drugs.
TIPdb: A Database of Anticancer, Antiplatelet, and Antituberculosis Phytochemicals from Indigenous Plants in Taiwan

PubMed Central

Lin, Ying-Chi; Wang, Chia-Chi; Chen, Ih-Sheng; Jheng, Jhao-Liang; Li, Jih-Heng; Tung, Chun-Wei

2013-01-01

The unique geographic features of Taiwan are attributed to the rich indigenous and endemic plant species in Taiwan. These plants serve as resourceful bank for biologically active phytochemicals. Given that these plant-derived chemicals are prototypes of potential drugs for diseases, databases connecting the chemical structures and pharmacological activities may facilitate drug development. To enhance the utility of the data, it is desirable to develop a database of chemical compounds and corresponding activities from indigenous plants in Taiwan. A database of anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan was constructed. The database, TIPdb, is composed of a standardized format of published anticancer, antiplatelet, and antituberculosis phytochemicals from indigenous plants in Taiwan. A browse function was implemented for users to browse the database in a taxonomy-based manner. Search functions can be utilized to filter records of interest by botanical name, part, chemical class, or compound name. The structured and searchable database TIPdb was constructed to serve as a comprehensive and standardized resource for anticancer, antiplatelet, and antituberculosis compounds search. The manually curated chemical structures and activities provide a great opportunity to develop quantitative structure-activity relationship models for the high-throughput screening of potential anticancer, antiplatelet, and antituberculosis drugs. PMID:23766708
SCRIPDB: a portal for easy access to syntheses, chemicals and reactions in patents

PubMed Central

Heifets, Abraham; Jurisica, Igor

2012-01-01

The patent literature is a rich catalog of biologically relevant chemicals; many public and commercial molecular databases contain the structures disclosed in patent claims. However, patents are an equally rich source of metadata about bioactive molecules, including mechanism of action, disease class, homologous experimental series, structural alternatives, or the synthetic pathways used to produce molecules of interest. Unfortunately, this metadata is discarded when chemical structures are deposited separately in databases. SCRIPDB is a chemical structure database designed to make this metadata accessible. SCRIPDB provides the full original patent text, reactions and relationships described within any individual patent, in addition to the molecular files common to structural databases. We discuss how such information is valuable in medical text mining, chemical image analysis, reaction extraction and in silico pharmaceutical lead optimization. SCRIPDB may be searched by exact chemical structure, substructure or molecular similarity and the results may be restricted to patents describing synthetic routes. SCRIPDB is available at http://dcv.uhnres.utoronto.ca/SCRIPDB. PMID:22067445
Utilizing semantic networks to database and retrieve generalized stochastic colored Petri nets

NASA Technical Reports Server (NTRS)

Farah, Jeffrey J.; Kelley, Robert B.

1992-01-01

Previous work has introduced the Planning Coordinator (PCOORD), a coordinator functioning within the hierarchy of the Intelligent Machine Mode. Within the structure of the Planning Coordinator resides the Primitive Structure Database (PSDB) functioning to provide the primitive structures utilized by the Planning Coordinator in the establishing of error recovery or on-line path plans. This report further explores the Primitive Structure Database and establishes the potential of utilizing semantic networks as a means of efficiently storing and retrieving the Generalized Stochastic Colored Petri Nets from which the error recovery plans are derived.
An approach in building a chemical compound search engine in oracle database.

PubMed

Wang, H; Volarath, P; Harrison, R

2005-01-01

A searching or identifying of chemical compounds is an important process in drug design and in chemistry research. An efficient search engine involves a close coupling of the search algorithm and database implementation. The database must process chemical structures, which demands the approaches to represent, store, and retrieve structures in a database system. In this paper, a general database framework for working as a chemical compound search engine in Oracle database is described. The framework is devoted to eliminate data type constrains for potential search algorithms, which is a crucial step toward building a domain specific query language on top of SQL. A search engine implementation based on the database framework is also demonstrated. The convenience of the implementation emphasizes the efficiency and simplicity of the framework.
Database citation in full text biomedical articles.

PubMed

Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R

2013-01-01

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services.

Database Citation in Full Text Biomedical Articles

PubMed Central

Kafkas, Şenay; Kim, Jee-Hyub; McEntyre, Johanna R.

2013-01-01

Molecular biology and literature databases represent essential infrastructure for life science research. Effective integration of these data resources requires that there are structured cross-references at the level of individual articles and biological records. Here, we describe the current patterns of how database entries are cited in research articles, based on analysis of the full text Open Access articles available from Europe PMC. Focusing on citation of entries in the European Nucleotide Archive (ENA), UniProt and Protein Data Bank, Europe (PDBe), we demonstrate that text mining doubles the number of structured annotations of database record citations supplied in journal articles by publishers. Many thousands of new literature-database relationships are found by text mining, since these relationships are also not present in the set of articles cited by database records. We recommend that structured annotation of database records in articles is extended to other databases, such as ArrayExpress and Pfam, entries from which are also cited widely in the literature. The very high precision and high-throughput of this text-mining pipeline makes this activity possible both accurately and at low cost, which will allow the development of new integrated data services. PMID:23734176
National Center for Biotechnology Information

MedlinePlus

... Splign Vector Alignment Search Tool (VAST) All Data & Software Resources... Domains & Structures BioSystems Cn3D Conserved Domain Database (CDD) Conserved Domain Search Service (CD Search) Structure (Molecular Modeling Database) Vector Alignment ...
New Powder Diffraction File (PDF-4) in relational database format: advantages and data-mining capabilities.

PubMed

Kabekkodu, Soorya N; Faber, John; Fawcett, Tim

2002-06-01

The International Centre for Diffraction Data (ICDD) is responding to the changing needs in powder diffraction and materials analysis by developing the Powder Diffraction File (PDF) in a very flexible relational database (RDB) format. The PDF now contains 136,895 powder diffraction patterns. In this paper, an attempt is made to give an overview of the PDF-4, search/match methods and the advantages of having the PDF-4 in RDB format. Some case studies have been carried out to search for crystallization trends, properties, frequencies of space groups and prototype structures. These studies give a good understanding of the basic structural aspects of classes of compounds present in the database. The present paper also reports data-mining techniques and demonstrates the power of a relational database over the traditional (flat-file) database structures.
An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents

NASA Technical Reports Server (NTRS)

Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)

2002-01-01

Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.
NETMARK: A Schema-less Extension for Relational Databases for Managing Semi-structured Data Dynamically

NASA Technical Reports Server (NTRS)

Maluf, David A.; Tran, Peter B.

2003-01-01

Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
The Histone Database: an integrated resource for histones and histone fold-containing proteins

PubMed Central

Mariño-Ramírez, Leonardo; Levine, Kevin M.; Morales, Mario; Zhang, Suiyuan; Moreland, R. Travis; Baxevanis, Andreas D.; Landsman, David

2011-01-01

Eukaryotic chromatin is composed of DNA and protein components—core histones—that act to compactly pack the DNA into nucleosomes, the fundamental building blocks of chromatin. These nucleosomes are connected to adjacent nucleosomes by linker histones. Nucleosomes are highly dynamic and, through various core histone post-translational modifications and incorporation of diverse histone variants, can serve as epigenetic marks to control processes such as gene expression and recombination. The Histone Sequence Database is a curated collection of sequences and structures of histones and non-histone proteins containing histone folds, assembled from major public databases. Here, we report a substantial increase in the number of sequences and taxonomic coverage for histone and histone fold-containing proteins available in the database. Additionally, the database now contains an expanded dataset that includes archaeal histone sequences. The database also provides comprehensive multiple sequence alignments for each of the four core histones (H2A, H2B, H3 and H4), the linker histones (H1/H5) and the archaeal histones. The database also includes current information on solved histone fold-containing structures. The Histone Sequence Database is an inclusive resource for the analysis of chromatin structure and function focused on histones and histone fold-containing proteins. Database URL: The Histone Sequence Database is freely available and can be accessed at http://research.nhgri.nih.gov/histones/. PMID:22025671
SCOPe: Manual Curation and Artifact Removal in the Structural Classification of Proteins - extended Database.

PubMed

Chandonia, John-Marc; Fox, Naomi K; Brenner, Steven E

2017-02-03

SCOPe (Structural Classification of Proteins-extended, http://scop.berkeley.edu) is a database of relationships between protein structures that extends the Structural Classification of Proteins (SCOP) database. SCOP is an expert-curated ordering of domains from the majority of proteins of known structure in a hierarchy according to structural and evolutionary relationships. SCOPe classifies the majority of protein structures released since SCOP development concluded in 2009, using a combination of manual curation and highly precise automated tools, aiming to have the same accuracy as fully hand-curated SCOP releases. SCOPe also incorporates and updates the ASTRAL compendium, which provides several databases and tools to aid in the analysis of the sequences and structures of proteins classified in SCOPe. SCOPe continues high-quality manual classification of new superfamilies, a key feature of SCOP. Artifacts such as expression tags are now separated into their own class, in order to distinguish them from the homology-based annotations in the remainder of the SCOPe hierarchy. SCOPe 2.06 contains 77,439 Protein Data Bank entries, double the 38,221 structures classified in SCOP. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
3DNALandscapes: a database for exploring the conformational features of DNA.

PubMed

Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

2010-01-01

3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.
GALT protein database: querying structural and functional features of GALT enzyme.

PubMed

d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

2014-09-01

Knowledge of the impact of variations on protein structure can enhance the comprehension of the mechanisms of genetic diseases related to that protein. Here, we present a new version of GALT Protein Database, a Web-accessible data repository for the storage and interrogation of structural effects of variations of the enzyme galactose-1-phosphate uridylyltransferase (GALT), the impairment of which leads to classic Galactosemia, a rare genetic disease. This new version of this database now contains the models of 201 missense variants of GALT enzyme, including heterozygous variants, and it allows users not only to retrieve information about the missense variations affecting this protein, but also to investigate their impact on substrate binding, intersubunit interactions, stability, and other structural features. In addition, it allows the interactive visualization of the models of variants collected into the database. We have developed additional tools to improve the use of the database by nonspecialized users. This Web-accessible database (http://bioinformatica.isa.cnr.it/GALT/GALT2.0) represents a model of tools potentially suitable for application to other proteins that are involved in human pathologies and that are subjected to genetic variations. © 2014 WILEY PERIODICALS, INC.
STCRDab: the structural T-cell receptor database

PubMed Central

de Oliveira, Saulo H P; Krawczyk, Konrad

2018-01-01

Abstract The Structural T–cell Receptor Database (STCRDab; http://opig.stats.ox.ac.uk/webapps/stcrdab) is an online resource that automatically collects and curates TCR structural data from the Protein Data Bank. For each entry, the database provides annotations, such as the α/β or γ/δ chain pairings, major histocompatibility complex details, and where available, antigen binding affinities. In addition, the orientation between the variable domains and the canonical forms of the complementarity-determining region loops are also provided. Users can select, view, and download individual or bulk sets of structures based on these criteria. Where available, STCRDab also finds antibody structures that are similar to TCRs, helping users explore the relationship between TCRs and antibodies. PMID:29087479
Environmental modeling and recognition for an autonomous land vehicle

NASA Technical Reports Server (NTRS)

Lawton, D. T.; Levitt, T. S.; Mcconnell, C. C.; Nelson, P. C.

1987-01-01

An architecture for object modeling and recognition for an autonomous land vehicle is presented. Examples of objects of interest include terrain features, fields, roads, horizon features, trees, etc. The architecture is organized around a set of data bases for generic object models and perceptual structures, temporary memory for the instantiation of object and relational hypotheses, and a long term memory for storing stable hypotheses that are affixed to the terrain representation. Multiple inference processes operate over these databases. Researchers describe these particular components: the perceptual structure database, the grouping processes that operate over this, schemas, and the long term terrain database. A processing example that matches predictions from the long term terrain model to imagery, extracts significant perceptual structures for consideration as potential landmarks, and extracts a relational structure to update the long term terrain database is given.
Caregiving-Related Sleep Problems and Their Relationship to Mental Health and Daytime Function in Female Veterans.

PubMed

Song, Yeonsu; Washington, Donna L; Yano, Elizabeth M; McCurry, Susan M; Fung, Constance H; Dzierzewski, Joseph M; Rodriguez, Juan Carlos; Jouldjian, Stella; Mitchell, Michael N; Alessi, Cathy A; Martin, Jennifer L

2018-01-01

To identify caregiving-related sleep problems and their relationship to mental health and daytime function in female Veterans. Female Veterans (N = 1,477) from cross-sectional, nationwide, postal survey data. The survey respondent characteristics included demographics, comorbidity, physical activity, health, use of sleep medications, and history of sleep apnea. They self-identified caregiving- related sleep problems (i.e., those who had trouble sleeping because of caring for a sick adult, an infant/child, or other respondents). Patient Health Questionnaire (PHQ-4) was used to assess mental health, and daytime function was measured using 11 items of International Classification of Sleep Disorders-2 (ICSD-2). Female Veterans with self-identified sleep problems due to caring for a sick adult (n = 59) experienced significantly more symptoms of depression and anxiety (p < 0.001) and impairment in daytime function (e.g., fatigue, daytime sleepiness, loss of concentration, p < 0.001) than those with self-identified sleep problems due to caring for an infant or child (n = 95) or all other respondents (n = 1,323) after controlling for the respondent characteristics. Healthcare providers should pay attention to assessing sleep characteristics of female Veterans with caregiving responsibilities, particularly those caregiving for a sick adult.
PASS2: an automated database of protein alignments organised as structural superfamilies.

PubMed

Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

2004-04-02

The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html
Structure elucidation of organic compounds aided by the computer program system SCANNET

NASA Astrophysics Data System (ADS)

Guzowska-Swider, B.; Hippe, Z. S.

1992-12-01

Recognition of chemical structure is a very important problem currently solved by molecular spectroscopy, particularly IR, UV, NMR and Raman spectroscopy, and mass spectrometry. Nowadays, solution of the problem is frequently aided by the computer. SCANNET is a computer program system for structure elucidation of organic compounds, developed by our group. The structure recognition of an unknown substance is made by comparing its spectrum with successive reference spectra of standard compounds, i.e. chemical compounds of known chemical structure, stored in a spectral database. The computer program system SCANNET consists of six different spectral databases for following the analytical methods: IR, UV, 13C-NMR, 1H-NMR and Raman spectroscopy, and mass spectrometry. A chemist, to elucidate a structure, can use one of these spectral methods or a combination of them and search the appropriate databases. As the result of searching each spectral database, the user obtains a list of chemical substances whose spectra are identical and/or similar to the spectrum input into the computer. The final information obtained from searching the spectral databases is in the form of a list of chemical substances having all the examined spectra, for each type of spectroscopy, identical or simlar to those of the unknown compound.
GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

PubMed

d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

2009-06-01

We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.
IMGT/3Dstructure-DB and IMGT/StructuralQuery, a database and a tool for immunoglobulin, T cell receptor and MHC structural data

PubMed Central

Kaas, Quentin; Ruiz, Manuel; Lefranc, Marie-Paule

2004-01-01

IMGT/3Dstructure-DB and IMGT/Structural-Query are a novel 3D structure database and a new tool for immunological proteins. They are part of IMGT, the international ImMunoGenetics information system®, a high-quality integrated knowledge resource specializing in immunoglobulins (IG), T cell receptors (TR), major histocompatibility complex (MHC) and related proteins of the immune system (RPI) of human and other vertebrate species, which consists of databases, Web resources and interactive on-line tools. IMGT/3Dstructure-DB data are described according to the IMGT Scientific chart rules based on the IMGT-ONTOLOGY concepts. IMGT/3Dstructure-DB provides IMGT gene and allele identification of IG, TR and MHC proteins with known 3D structures, domain delimitations, amino acid positions according to the IMGT unique numbering and renumbered coordinate flat files. Moreover IMGT/3Dstructure-DB provides 2D graphical representations (or Collier de Perles) and results of contact analysis. The IMGT/StructuralQuery tool allows search of this database based on specific structural characteristics. IMGT/3Dstructure-DB and IMGT/StructuralQuery are freely available at http://imgt.cines.fr. PMID:14681396
bpRNA: large-scale automated annotation and analysis of RNA secondary structure.

PubMed

Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David

2018-05-09

While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
YAdumper: extracting and translating large information volumes from relational databases to structured flat files.

PubMed

Fernández, José M; Valencia, Alfonso

2004-10-12

Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.
Integrative interactive visualization of crystal structure, band structure, and Brillouin zone

NASA Astrophysics Data System (ADS)

Hanson, Robert; Hinke, Ben; van Koevering, Matthew; Oses, Corey; Toher, Cormac; Hicks, David; Gossett, Eric; Plata Ramos, Jose; Curtarolo, Stefano; Aflow Collaboration

The AFLOW library is an open-access database for high throughput ab-initio calculations that serves as a resource for the dissemination of computational results in the area of materials science. Our project aims to create an interactive web-based visualization of any structure in the AFLOW database that has associate band structure data in a way that allows novel simultaneous exploration of the crystal structure, band structure, and Brillouin zone. Interactivity is obtained using two synchronized JSmol implementations, one for the crystal structure and one for the Brillouin zone, along with a D3-based band-structure diagram produced on the fly from data obtained from the AFLOW database. The current website portal (http://aflowlib.mems.duke.edu/users/jmolers/matt/website) allows interactive access and visualization of crystal structure, Brillouin zone and band structure for more than 55,000 inorganic crystal structures. This work was supported by the US Navy Office of Naval Research through a Broad Area Announcement administered by Duke University.
Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.

PubMed

Arita, Masanori; Suwa, Kazuhiro

2008-09-17

In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.

Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

PubMed Central

Arita, Masanori; Suwa, Kazuhiro

2008-01-01

Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
Kentucky geotechnical database.

DOT National Transportation Integrated Search

2005-03-01

Development of a comprehensive dynamic, geotechnical database is described. Computer software selected to program the client/server application in windows environment, components and structure of the geotechnical database, and primary factors cons...
DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY (DSSTOX) DATABASE NETWORK: MAKING PUBLIC TOXICITY DATA RESOURCES MORE ACCESSIBLE AND USABLE FOR DATA EXPLORATION AND SAR DEVELOPMENT

EPA Science Inventory

Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development

Many sources of public toxicity data are not currently linked to chemical structure, are not ...
Social media based NPL system to find and retrieve ARM data: Concept paper

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides

PubMed Central

Pirtskhalava, Malak; Gabrielian, Andrei; Cruz, Phillip; Griggs, Hannah L.; Squires, R. Burke; Hurt, Darrell E.; Grigolava, Maia; Chubinidze, Mindia; Gogoladze, George; Vishnepolsky, Boris; Alekseev, Vsevolod; Rosenthal, Alex; Tartakovsky, Michael

2016-01-01

Antimicrobial peptides (AMPs) are anti-infectives that may represent a novel and untapped class of biotherapeutics. Increasing interest in AMPs means that new peptides (natural and synthetic) are discovered faster than ever before. We describe herein a new version of the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv.2, which is freely accessible at http://dbaasp.org). This iteration of the database reports chemical structures and empirically-determined activities (MICs, IC50, etc.) against more than 4200 specific target microbes for more than 2000 ribosomal, 80 non-ribosomal and 5700 synthetic peptides. Of these, the vast majority are monomeric, but nearly 200 of these peptides are found as homo- or heterodimers. More than 6100 of the peptides are linear, but about 515 are cyclic and more than 1300 have other intra-chain covalent bonds. More than half of the entries in the database were added after the resource was initially described, which reflects the recent sharp uptick of interest in AMPs. New features of DBAASPv.2 include: (i) user-friendly utilities and reporting functions, (ii) a ‘Ranking Search’ function to query the database by target species and return a ranked list of peptides with activity against that target and (iii) structural descriptions of the peptides derived from empirical data or calculated by molecular dynamics (MD) simulations. The three-dimensional structural data are critical components for understanding structure–activity relationships and for design of new antimicrobial drugs. We created more than 300 high-throughput MD simulations specifically for inclusion in DBAASP. The resulting structures are described in the database by novel trajectory analysis plots and movies. Another 200+ DBAASP entries have links to the Protein DataBank. All of the structures are easily visualized directly in the web browser. PMID:26578581
Validation and extraction of molecular-geometry information from small-molecule databases.

PubMed

Long, Fei; Nicholls, Robert A; Emsley, Paul; Graǽulis, Saulius; Merkys, Andrius; Vaitkus, Antanas; Murshudov, Garib N

2017-02-01

A freely available small-molecule structure database, the Crystallography Open Database (COD), is used for the extraction of molecular-geometry information on small-molecule compounds. The results are used for the generation of new ligand descriptions, which are subsequently used by macromolecular model-building and structure-refinement software. To increase the reliability of the derived data, and therefore the new ligand descriptions, the entries from this database were subjected to very strict validation. The selection criteria made sure that the crystal structures used to derive atom types, bond and angle classes are of sufficiently high quality. Any suspicious entries at a crystal or molecular level were removed from further consideration. The selection criteria included (i) the resolution of the data used for refinement (entries solved at 0.84 Å resolution or higher) and (ii) the structure-solution method (structures must be from a single-crystal experiment and all atoms of generated molecules must have full occupancies), as well as basic sanity checks such as (iii) consistency between the valences and the number of connections between atoms, (iv) acceptable bond-length deviations from the expected values and (v) detection of atomic collisions. The derived atom types and bond classes were then validated using high-order moment-based statistical techniques. The results of the statistical analyses were fed back to fine-tune the atom typing. The developed procedure was repeated four times, resulting in fine-grained atom typing, bond and angle classes. The procedure will be repeated in the future as and when new entries are deposited in the COD. The whole procedure can also be applied to any source of small-molecule structures, including the Cambridge Structural Database and the ZINC database.
James Webb Space Telescope XML Database: From the Beginning to Today

NASA Technical Reports Server (NTRS)

Gal-Edd, Jonathan; Fatig, Curtis C.

2005-01-01

The James Webb Space Telescope (JWST) Project has been defining, developing, and exercising the use of a common eXtensible Markup Language (XML) for the command and telemetry (C&T) database structure. JWST is the first large NASA space mission to use XML for databases. The JWST project started developing the concepts for the C&T database in 2002. The database will need to last at least 20 years since it will be used beginning with flight software development, continuing through Observatory integration and test (I&T) and through operations. Also, a database tool kit has been provided to the 18 various flight software development laboratories located in the United States, Europe, and Canada that allows the local users to create their own databases. Recently the JWST Project has been working with the Jet Propulsion Laboratory (JPL) and Object Management Group (OMG) XML Telemetry and Command Exchange (XTCE) personnel to provide all the information needed by JWST and JPL for exchanging database information using a XML standard structure. The lack of standardization requires custom ingest scripts for each ground system segment, increasing the cost of the total system. Providing a non-proprietary standard of the telemetry and command database definition formation will allow dissimilar systems to communicate without the need for expensive mission specific database tools and testing of the systems after the database translation. The various ground system components that would benefit from a standardized database are the telemetry and command systems, archives, simulators, and trending tools. JWST has exchanged the XML database with the Eclipse, EPOCH, ASIST ground systems, Portable spacecraft simulator (PSS), a front-end system, and Integrated Trending and Plotting System (ITPS) successfully. This paper will discuss how JWST decided to use XML, the barriers to a new concept, experiences utilizing the XML structure, exchanging databases with other users, and issues that have been experienced in creating databases for the C&T system.
[Historical overview of REM sleep behavior disorder in relation to its pathophysiology].

PubMed

Tachibana, Naoko

2009-05-01

Rapid eye movement (REM) sleep behavior disorder (RBD), which is characterized by dream-enacted, sometimes violent and aggressive, behaviors was firstly reported by Schenck and his colleagues in 1986; thereafter, it was incorporated as parasomnia in the International Classification of Sleep Disorders 1st edition (ICSD-1). The polysomnographical hallmarks of RBD include intermittent/sustained loss of the skeletal muscle atonia of REM sleep (REM sleep without atonia [RWA]); further, this finding has been mandatory in the diagnostic criterion (requiring polysomnographic [PSG] monitoring) in the ICSD-2 in 2005. The animal equivalent of RBD was previously described by Jouvet's and Morrison's groups, dated back to 1965, when Jouvet's group firstly created experimentally lesioned cats (in the bilateral pontine tegmentum areas) presenting with "oneiric behaviors". In 1970s Hishikawa's group had also described peculiar sleep state in alcoholics and other subjects of drug withdrawal with rapid eye movements and tonically increased chin muscle activity (reffered to as "Stage 1-REM with tonic EMG" [Stage 1-REM]). It was difficult to determine from the polysomnographical features whether Stage 1-REM was REM sleep or not, as this state did not preserve proper cyclic appearance of REM sleep. They also reported Stage 1-REM in patients with Shy-Drager syndrome in 1981. The latter finding of Hishikawa's group, together with RBD observed in multiple system atrophy (MSA) reported by other groups, could be best explained by the experimental cat model because of its presumed extensive brainstem pathology. However, neurophysiology of withdrawal states has not been well understood; therefore, Stage 1-REM should be reappraised from new perspectives. After 1990, more extensive studies on RBD revealed that about half of RBD cases were associated with neurological disorders, especially neurodegenerative diseases pathologically known as syncleiopathies (Parkinson disease [PD], dementia with Lewy bodies, and MSA). In addition, it has been shown that a substantial number of idiopathic RBD (iRBD) patients eventually developed Parkinsonian diseases. In accordance with accumulative data indicating that various non-parkinsonian features can precede the onset of motor symptoms of PD (or pathologically Lewy body diseases), a search of early PD markers in patients with iRBD has been performed. The results of the studies support the hypothesis of RBD as an early sign of a neurodegenerative disorder. More recently, it was reported that RBD is frequently symptomatic of narcolepsy, although the pathophysiological mechanism of this state was still unknown. RBD in stroke patients have been anecdotal; however, under such conditions, specific lesion studies can be possible, as data in the experimental RBD rats have been accumulated during these few years. In conclusion, RBD is observed in a wide range of neurological disorders, and the causative mechanism of RWA and behavioral manifestations may not only be attributable to brainstem lesions. RBD is not a homogeneous clinical entity, and further refinement of its diagnostic classification is warranted to avoid diagnostic confusion.
XML: James Webb Space Telescope Database Issues, Lessons, and Status

NASA Technical Reports Server (NTRS)

Detter, Ryan; Mooney, Michael; Fatig, Curtis

2003-01-01

This paper will present the current concept using extensible Markup Language (XML) as the underlying structure for the James Webb Space Telescope (JWST) database. The purpose of using XML is to provide a JWST database, independent of any portion of the ground system, yet still compatible with the various systems using a variety of different structures. The testing of the JWST Flight Software (FSW) started in 2002, yet the launch is scheduled for 2011 with a planned 5-year mission and a 5-year follow on option. The initial database and ground system elements, including the commands, telemetry, and ground system tools will be used for 19 years, plus post mission activities. During the Integration and Test (I&T) phases of the JWST development, 24 distinct laboratories, each geographically dispersed, will have local database tools with an XML database. Each of these laboratories database tools will be used for the exporting and importing of data both locally and to a central database system, inputting data to the database certification process, and providing various reports. A centralized certified database repository will be maintained by the Space Telescope Science Institute (STScI), in Baltimore, Maryland, USA. One of the challenges for the database is to be flexible enough to allow for the upgrade, addition or changing of individual items without effecting the entire ground system. Also, using XML should allow for the altering of the import and export formats needed by the various elements, tracking the verification/validation of each database item, allow many organizations to provide database inputs, and the merging of the many existing database processes into one central database structure throughout the JWST program. Many National Aeronautics and Space Administration (NASA) projects have attempted to take advantage of open source and commercial technology. Often this causes a greater reliance on the use of Commercial-Off-The-Shelf (COTS), which is often limiting. In our review of the database requirements and the COTS software available, only very expensive COTS software will meet 90% of requirements. Even with the high projected initial cost of COTS, the development and support for custom code over the 19-year mission period was forecasted to be higher than the total licensing costs. A group did look at reusing existing database tools and formats. If the JWST database was already in a mature state, the reuse made sense, but with the database still needing to handing the addition of different types of command and telemetry structures, defining new spacecraft systems, accept input and export to systems which has not been defined yet, XML provided the flexibility desired. It remains to be determined whether the XML database will reduce the over all cost for the JWST mission.
Prototype of web-based database of surface wave investigation results for site classification

NASA Astrophysics Data System (ADS)

Hayashi, K.; Cakir, R.; Martin, A. J.; Craig, M. S.; Lorenzo, J. M.

2016-12-01

As active and passive surface wave methods are getting popular for evaluating site response of earthquake ground motion, demand on the development of database for investigation results is also increasing. Seismic ground motion not only depends on 1D velocity structure but also on 2D and 3D structures so that spatial information of S-wave velocity must be considered in ground motion prediction. The database can support to construct 2D and 3D underground models. Inversion of surface wave processing is essentially non-unique so that other information must be combined into the processing. The database of existed geophysical, geological and geotechnical investigation results can provide indispensable information to improve the accuracy and reliability of investigations. Most investigations, however, are carried out by individual organizations and investigation results are rarely stored in the unified and organized database. To study and discuss appropriate database and digital standard format for the surface wave investigations, we developed a prototype of web-based database to store observed data and processing results of surface wave investigations that we have performed at more than 400 sites in U.S. and Japan. The database was constructed on a web server using MySQL and PHP so that users can access to the database through the internet from anywhere with any device. All data is registered in the database with location and users can search geophysical data through Google Map. The database stores dispersion curves, horizontal to vertical spectral ratio and S-wave velocity profiles at each site that was saved in XML files as digital data so that user can review and reuse them. The database also stores a published 3D deep basin and crustal structure and user can refer it during the processing of surface wave data.
EDCs DataBank: 3D-Structure database of endocrine disrupting chemicals.

PubMed

Montes-Grajales, Diana; Olivero-Verbel, Jesus

2015-01-02

Endocrine disrupting chemicals (EDCs) are a group of compounds that affect the endocrine system, frequently found in everyday products and epidemiologically associated with several diseases. The purpose of this work was to develop EDCs DataBank, the only database of EDCs with three-dimensional structures. This database was built on MySQL using the EU list of potential endocrine disruptors and TEDX list. It contains the three-dimensional structures available on PubChem, as well as a wide variety of information from different databases and text mining tools, useful for almost any kind of research regarding EDCs. The web platform was developed employing HTML, CSS and PHP languages, with dynamic contents in a graphic environment, facilitating information analysis. Currently EDCs DataBank has 615 molecules, including pesticides, natural and industrial products, cosmetics, drugs and food additives, among other low molecular weight xenobiotics. Therefore, this database can be used to study the toxicological effects of these molecules, or to develop pharmaceuticals targeting hormone receptors, through docking studies, high-throughput virtual screening and ligand-protein interaction analysis. EDCs DataBank is totally user-friendly and the 3D-structures of the molecules can be downloaded in several formats. This database is freely available at http://edcs.unicartagena.edu.co. Copyright © 2014. Published by Elsevier Ireland Ltd.
Biofuel Database

National Institute of Standards and Technology Data Gateway

Biofuel Database (Web, free access) This database brings together structural, biological, and thermodynamic data for enzymes that are either in current use or are being considered for use in the production of biofuels.
Glycan fragment database: a database of PDB-based glycan 3D structures.

PubMed

Jo, Sunhwan; Im, Wonpil

2013-01-01

The glycan fragment database (GFDB), freely available at http://www.glycanstructure.org, is a database of the glycosidic torsion angles derived from the glycan structures in the Protein Data Bank (PDB). Analogous to protein structure, the structure of an oligosaccharide chain in a glycoprotein, referred to as a glycan, can be characterized by the torsion angles of glycosidic linkages between relatively rigid carbohydrate monomeric units. Knowledge of accessible conformations of biologically relevant glycans is essential in understanding their biological roles. The GFDB provides an intuitive glycan sequence search tool that allows the user to search complex glycan structures. After a glycan search is complete, each glycosidic torsion angle distribution is displayed in terms of the exact match and the fragment match. The exact match results are from the PDB entries that contain the glycan sequence identical to the query sequence. The fragment match results are from the entries with the glycan sequence whose substructure (fragment) or entire sequence is matched to the query sequence, such that the fragment results implicitly include the influences from the nearby carbohydrate residues. In addition, clustering analysis based on the torsion angle distribution can be performed to obtain the representative structures among the searched glycan structures.
The Design of Lexical Database for Indonesian Language

NASA Astrophysics Data System (ADS)

Gunawan, D.; Amalia, A.

2017-03-01

Kamus Besar Bahasa Indonesia (KBBI), an official dictionary for Indonesian language, provides lists of words with their meaning. The online version can be accessed via Internet network. Another online dictionary is Kateglo. KBBI online and Kateglo only provides an interface for human. A machine cannot retrieve data from the dictionary easily without using advanced techniques. Whereas, lexical of words is required in research or application development which related to natural language processing, text mining, information retrieval or sentiment analysis. To address this requirement, we need to build a lexical database which provides well-defined structured information about words. A well-known lexical database is WordNet, which provides the relation among words in English. This paper proposes the design of a lexical database for Indonesian language based on the combination of KBBI 4th edition, Kateglo and WordNet structure. Knowledge representation by utilizing semantic networks depict the relation among words and provide the new structure of lexical database for Indonesian language. The result of this design can be used as the foundation to build the lexical database for Indonesian language.
A series of PDB related databases for everyday needs.

PubMed

Joosten, Robbie P; te Beek, Tim A H; Krieger, Elmar; Hekkelman, Maarten L; Hooft, Rob W W; Schneider, Reinhard; Sander, Chris; Vriend, Gert

2011-01-01

The Protein Data Bank (PDB) is the world-wide repository of macromolecular structure information. We present a series of databases that run parallel to the PDB. Each database holds one entry, if possible, for each PDB entry. DSSP holds the secondary structure of the proteins. PDBREPORT holds reports on the structure quality and lists errors. HSSP holds a multiple sequence alignment for all proteins. The PDBFINDER holds easy to parse summaries of the PDB file content, augmented with essentials from the other systems. PDB_REDO holds re-refined, and often improved, copies of all structures solved by X-ray. WHY_NOT summarizes why certain files could not be produced. All these systems are updated weekly. The data sets can be used for the analysis of properties of protein structures in areas ranging from structural genomics, to cancer biology and protein design.
Compilation of small ribosomal subunit RNA structures.

PubMed Central

Neefs, J M; Van de Peer, Y; De Rijk, P; Chapelle, S; De Wachter, R

1993-01-01

The database on small ribosomal subunit RNA structure contained 1804 nucleotide sequences on April 23, 1993. This number comprises 365 eukaryotic, 65 archaeal, 1260 bacterial, 30 plastidial, and 84 mitochondrial sequences. These are stored in the form of an alignment in order to facilitate the use of the database as input for comparative studies on higher-order structure and for reconstruction of phylogenetic trees. The elements of the postulated secondary structure for each molecule are indicated by special symbols. The database is available on-line directly from the authors by ftp and can also be obtained from the EMBL nucleotide sequence library by electronic mail, ftp, and on CD ROM disk. PMID:8332525
Ultra-Structure database design methodology for managing systems biology data and analyses

PubMed Central

Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M; Giddings, Morgan C

2009-01-01

Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era. PMID:19691849
Creating and Using a Consumer Chemical Molecular Graphics Database: The "Molecule of the Day" - A Great Way To Begin Your Lecture

NASA Astrophysics Data System (ADS)

Scharberg, Maureen A.; Cox, Oran E.; Barelli, Carl A.

1997-07-01

"The Molecule of the Day" consumer chemical database has been created to allow introductory chemistry students to explore molecular structures of chemicals in household products, and to provide opportunities in molecular modeling for undergraduate chemistry students. Before class begins, an overhead transparency is displayed which shows a three-dimensional molecular structure of a household chemical, and lists relevant features and uses of this chemical. Within answers to questionnaires, students have commented that this molecular graphics database has helped them to visually connect the microscopic structure of a molecule with its physical and chemical properties, as well as its uses in consumer products. It is anticipated that this database will be incorporated into a navigational software package such as Netscape.
Mass Spectra-Based Framework for Automated Structural Elucidation of Metabolome Data to Explore Phytochemical Diversity

PubMed Central

Matsuda, Fumio; Nakabayashi, Ryo; Sawada, Yuji; Suzuki, Makoto; Hirai, Masami Y.; Kanaya, Shigehiko; Saito, Kazuki

2011-01-01

A novel framework for automated elucidation of metabolite structures in liquid chromatography–mass spectrometer metabolome data was constructed by integrating databases. High-resolution tandem mass spectra data automatically acquired from each metabolite signal were used for database searches. Three distinct databases, KNApSAcK, ReSpect, and the PRIMe standard compound database, were employed for the structural elucidation. The outputs were retrieved using the CAS metabolite identifier for identification and putative annotation. A simple metabolite ontology system was also introduced to attain putative characterization of the metabolite signals. The automated method was applied for the metabolome data sets obtained from the rosette leaves of 20 Arabidopsis accessions. Phenotypic variations in novel Arabidopsis metabolites among these accessions could be investigated using this method. PMID:22645535

An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B.

PubMed

Rallapalli, P M; Kemball-Cook, G; Tuddenham, E G; Gomez, K; Perkins, S J

2013-07-01

Factor IX (FIX) is important in the coagulation cascade, being activated to FIXa on cleavage. Defects in the human F9 gene frequently lead to hemophilia B. To assess 1113 unique F9 mutations corresponding to 3721 patient entries in a new and up-to-date interactive web database alongside the FIXa protein structure. The mutations database was built using MySQL and structural analyses were based on a homology model for the human FIXa structure based on closely-related crystal structures. Mutations have been found in 336 (73%) out of 461 residues in FIX. There were 812 unique point mutations, 182 deletions, 54 polymorphisms, 39 insertions and 26 others that together comprise a total of 1113 unique variants. The 64 unique mild severity mutations in the mature protein with known circulating protein phenotypes include 15 (23%) quantitative type I mutations and 41 (64%) predominantly qualitative type II mutations. Inhibitors were described in 59 reports (1.6%) corresponding to 25 unique mutations. The interactive database provides insights into mechanisms of hemophilia B. Type II mutations are deduced to disrupt predominantly those structural regions involved with functional interactions. The interactive features of the database will assist in making judgments about patient management. © 2013 International Society on Thrombosis and Haemostasis.
DBAASP v.2: an enhanced database of structure and antimicrobial/cytotoxic activity of natural and synthetic peptides.

PubMed

Pirtskhalava, Malak; Gabrielian, Andrei; Cruz, Phillip; Griggs, Hannah L; Squires, R Burke; Hurt, Darrell E; Grigolava, Maia; Chubinidze, Mindia; Gogoladze, George; Vishnepolsky, Boris; Alekseyev, Vsevolod; Rosenthal, Alex; Tartakovsky, Michael

2016-01-04

Antimicrobial peptides (AMPs) are anti-infectives that may represent a novel and untapped class of biotherapeutics. Increasing interest in AMPs means that new peptides (natural and synthetic) are discovered faster than ever before. We describe herein a new version of the Database of Antimicrobial Activity and Structure of Peptides (DBAASPv.2, which is freely accessible at http://dbaasp.org). This iteration of the database reports chemical structures and empirically-determined activities (MICs, IC50, etc.) against more than 4200 specific target microbes for more than 2000 ribosomal, 80 non-ribosomal and 5700 synthetic peptides. Of these, the vast majority are monomeric, but nearly 200 of these peptides are found as homo- or heterodimers. More than 6100 of the peptides are linear, but about 515 are cyclic and more than 1300 have other intra-chain covalent bonds. More than half of the entries in the database were added after the resource was initially described, which reflects the recent sharp uptick of interest in AMPs. New features of DBAASPv.2 include: (i) user-friendly utilities and reporting functions, (ii) a 'Ranking Search' function to query the database by target species and return a ranked list of peptides with activity against that target and (iii) structural descriptions of the peptides derived from empirical data or calculated by molecular dynamics (MD) simulations. The three-dimensional structural data are critical components for understanding structure-activity relationships and for design of new antimicrobial drugs. We created more than 300 high-throughput MD simulations specifically for inclusion in DBAASP. The resulting structures are described in the database by novel trajectory analysis plots and movies. Another 200+ DBAASP entries have links to the Protein DataBank. All of the structures are easily visualized directly in the web browser. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Why Save Your Course as a Relational Database?

ERIC Educational Resources Information Center

Hamilton, Gregory C.; Katz, David L.; Davis, James E.

2000-01-01

Describes a system that stores course materials for computer-based training programs in a relational database called Of Course! Outlines the basic structure of the databases; explains distinctions between Of Course! and other authoring languages; and describes how data is retrieved from the database and presented to the student. (Author/LRW)
Emission Database for Global Atmospheric Research (EDGAR).

ERIC Educational Resources Information Center

Olivier, J. G. J.; And Others

1994-01-01

Presents the objective and methodology chosen for the construction of a global emissions source database called EDGAR and the structural design of the database system. The database estimates on a regional and grid basis, 1990 annual emissions of greenhouse gases, and of ozone depleting compounds from all known sources. (LZ)
On-Line Database of Vibration-Based Damage Detection Experiments

NASA Technical Reports Server (NTRS)

Pappa, Richard S.; Doebling, Scott W.; Kholwad, Tina D.

2000-01-01

This paper describes a new, on-line bibliographic database of vibration-based damage detection experiments. Publications in the database discuss experiments conducted on actual structures as well as those conducted with simulated data. The database can be searched and sorted in many ways, and it provides photographs of test structures when available. It currently contains 100 publications, which is estimated to be about 5-10% of the number of papers written to date on this subject. Additional entries are forthcoming. This database is available for public use on the Internet at the following address: http://sdbpappa-mac.larc.nasa.gov. Click on the link named "dd_experiments.fp3" and then type "guest" as the password. No user name is required.
REPDOSE: A database on repeated dose toxicity studies of commercial chemicals--A multifunctional tool.

PubMed

Bitsch, A; Jacobi, S; Melber, C; Wahnschaffe, U; Simetska, N; Mangelsdorf, I

2006-12-01

A database for repeated dose toxicity data has been developed. Studies were selected by data quality. Review documents or risk assessments were used to get a pre-screened selection of available valid data. The structure of the chemicals should be rather simple for well defined chemical categories. The database consists of three core data sets for each chemical: (1) structural features and physico-chemical data, (2) data on study design, (3) study results. To allow consistent queries, a high degree of standardization categories and glossaries were developed for relevant parameters. At present, the database consists of 364 chemicals investigated in 1018 studies which resulted in a total of 6002 specific effects. Standard queries have been developed, which allow analyzing the influence of structural features or PC data on LOELs, target organs and effects. Furthermore, it can be used as an expert system. First queries have shown that the database is a very valuable tool.
The BioImage Database Project: organizing multidimensional biological images in an object-relational database.

PubMed

Carazo, J M; Stelzer, E H

1999-01-01

The BioImage Database Project collects and structures multidimensional data sets recorded by various microscopic techniques relevant to modern life sciences. It provides, as precisely as possible, the circumstances in which the sample was prepared and the data were recorded. It grants access to the actual data and maintains links between related data sets. In order to promote the interdisciplinary approach of modern science, it offers a large set of key words, which covers essentially all aspects of microscopy. Nonspecialists can, therefore, access and retrieve significant information recorded and submitted by specialists in other areas. A key issue of the undertaking is to exploit the available technology and to provide a well-defined yet flexible structure for dealing with data. Its pivotal element is, therefore, a modern object relational database that structures the metadata and ameliorates the provision of a complete service. The BioImage database can be accessed through the Internet. Copyright 1999 Academic Press.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

PubMed

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites

PubMed Central

Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

2014-01-01

Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
VaProS: a database-integration approach for protein/genome information retrieval.

PubMed

Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

2016-12-01

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .
First year progress report on the development of the Texas flexible pavement database.

DOT National Transportation Integrated Search

2008-01-01

Comprehensive and reliable databases are essential for the development, validation, and calibration of any pavement : design and rehabilitation system. These databases should include material properties, pavement structural : characteristics, highway...
Strategies for Introducing Databasing into Science.

ERIC Educational Resources Information Center

Anderson, Christopher L.

1990-01-01

Outlines techniques used in the context of a sixth grade science class to teach database structure and search strategies for science using the AppleWorks program. Provides templates and questions for class and element databases. (Author/YP)
Projections for fast protein structure retrieval

PubMed Central

Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

2006-01-01

Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
Building structural similarity database for metric learning

NASA Astrophysics Data System (ADS)

Jin, Guoxin; Pappas, Thrasyvoulos N.

2015-03-01

We propose a new approach for constructing databases for training and testing similarity metrics for structurally lossless image compression. Our focus is on structural texture similarity (STSIM) metrics and the matched-texture compression (MTC) approach. We first discuss the metric requirements for structurally lossless compression, which differ from those of other applications such as image retrieval, classification, and understanding. We identify "interchangeability" as the key requirement for metric performance, and partition the domain of "identical" textures into three regions, of "highest," "high," and "good" similarity. We design two subjective tests for data collection, the first relies on ViSiProG to build a database of "identical" clusters, and the second builds a database of image pairs with the "highest," "high," "good," and "bad" similarity labels. The data for the subjective tests is generated during the MTC encoding process, and consist of pairs of candidate and target image blocks. The context of the surrounding image is critical for training the metrics to detect lighting discontinuities, spatial misalignments, and other border artifacts that have a noticeable effect on perceptual quality. The identical texture clusters are then used for training and testing two STSIM metrics. The labelled image pair database will be used in future research.
3D visualization of molecular structures in the MOGADOC database

NASA Astrophysics Data System (ADS)

Vogt, Natalja; Popov, Evgeny; Rudert, Rainer; Kramer, Rüdiger; Vogt, Jürgen

2010-08-01

The MOGADOC database (Molecular Gas-Phase Documentation) is a powerful tool to retrieve information about compounds which have been studied in the gas-phase by electron diffraction, microwave spectroscopy and molecular radio astronomy. Presently the database contains over 34,500 bibliographic references (from the beginning of each method) for about 10,000 inorganic, organic and organometallic compounds and structural data (bond lengths, bond angles, dihedral angles, etc.) for about 7800 compounds. Most of the implemented molecular structures are given in a three-dimensional (3D) presentation. To create or edit and visualize the 3D images of molecules, new tools (special editor and Java-based 3D applet) were developed. Molecular structures in internal coordinates were converted to those in Cartesian coordinates.
Searching molecular structure databases with tandem mass spectra using CSI:FingerID

PubMed Central

Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

2015-01-01

Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543
Aero/fluids database system

NASA Technical Reports Server (NTRS)

Reardon, John E.; Violett, Duane L., Jr.

1991-01-01

The AFAS Database System was developed to provide the basic structure of a comprehensive database system for the Marshall Space Flight Center (MSFC) Structures and Dynamics Laboratory Aerophysics Division. The system is intended to handle all of the Aerophysics Division Test Facilities as well as data from other sources. The system was written for the DEC VAX family of computers in FORTRAN-77 and utilizes the VMS indexed file system and screen management routines. Various aspects of the system are covered, including a description of the user interface, lists of all code structure elements, descriptions of the file structures, a description of the security system operation, a detailed description of the data retrieval tasks, a description of the session log, and a description of the archival system.
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.

PubMed

Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M

2011-01-01

Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
BtoxDB: a comprehensive database of protein structural data on toxin-antitoxin systems.

PubMed

Barbosa, Luiz Carlos Bertucci; Garrido, Saulo Santesso; Marchetto, Reinaldo

2015-03-01

Toxin-antitoxin (TA) systems are diverse and abundant genetic modules in prokaryotic cells that are typically formed by two genes encoding a stable toxin and a labile antitoxin. Because TA systems are able to repress growth or kill cells and are considered to be important actors in cell persistence (multidrug resistance without genetic change), these modules are considered potential targets for alternative drug design. In this scenario, structural information for the proteins in these systems is highly valuable. In this report, we describe the development of a web-based system, named BtoxDB, that stores all protein structural data on TA systems. The BtoxDB database was implemented as a MySQL relational database using PHP scripting language. Web interfaces were developed using HTML, CSS and JavaScript. The data were collected from the PDB, UniProt and Entrez databases. These data were appropriately filtered using specialized literature and our previous knowledge about toxin-antitoxin systems. The database provides three modules ("Search", "Browse" and "Statistics") that enable searches, acquisition of contents and access to statistical data. Direct links to matching external databases are also available. The compilation of all protein structural data on TA systems in one platform is highly useful for researchers interested in this content. BtoxDB is publicly available at http://www.gurupi.uft.edu.br/btoxdb. Copyright © 2015 Elsevier Ltd. All rights reserved.
Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

PubMed

Brohée, Sylvain; Barriot, Roland; Moreau, Yves

2010-09-01

In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.

pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins.

PubMed

Varadi, Mihaly; Kosol, Simone; Lebrun, Pierre; Valentini, Erica; Blackledge, Martin; Dunker, A Keith; Felli, Isabella C; Forman-Kay, Julie D; Kriwacki, Richard W; Pierattelli, Roberta; Sussman, Joel; Svergun, Dmitri I; Uversky, Vladimir N; Vendruscolo, Michele; Wishart, David; Wright, Peter E; Tompa, Peter

2014-01-01

The goal of pE-DB (http://pedb.vib.be) is to serve as an openly accessible database for the deposition of structural ensembles of intrinsically disordered proteins (IDPs) and of denatured proteins based on nuclear magnetic resonance spectroscopy, small-angle X-ray scattering and other data measured in solution. Owing to the inherent flexibility of IDPs, solution techniques are particularly appropriate for characterizing their biophysical properties, and structural ensembles in agreement with these data provide a convenient tool for describing the underlying conformational sampling. Database entries consist of (i) primary experimental data with descriptions of the acquisition methods and algorithms used for the ensemble calculations, and (ii) the structural ensembles consistent with these data, provided as a set of models in a Protein Data Bank format. PE-DB is open for submissions from the community, and is intended as a forum for disseminating the structural ensembles and the methodologies used to generate them. While the need to represent the IDP structures is clear, methods for determining and evaluating the structural ensembles are still evolving. The availability of the pE-DB database is expected to promote the development of new modeling methods and leads to a better understanding of how function arises from disordered states.
Correcting ligands, metabolites, and pathways

PubMed Central

Ott, Martin A; Vriend, Gert

2006-01-01

Background A wide range of research areas in bioinformatics, molecular biology and medicinal chemistry require precise chemical structure information about molecules and reactions, e.g. drug design, ligand docking, metabolic network reconstruction, and systems biology. Most available databases, however, treat chemical structures more as illustrations than as a datafield in its own right. Lack of chemical accuracy impedes progress in the areas mentioned above. We present a database of metabolites called BioMeta that augments the existing pathway databases by explicitly assessing the validity, correctness, and completeness of chemical structure and reaction information. Description The main bulk of the data in BioMeta were obtained from the KEGG Ligand database. We developed a tool for chemical structure validation which assesses the chemical validity and stereochemical completeness of a molecule description. The validation tool was used to examine the compounds in BioMeta, showing that a relatively small number of compounds had an incorrect constitution (connectivity only, not considering stereochemistry) and that a considerable number (about one third) had incomplete or even incorrect stereochemistry. We made a large effort to correct the errors and to complete the structural descriptions. A total of 1468 structures were corrected and/or completed. We also established the reaction balance of the reactions in BioMeta and corrected 55% of the unbalanced (stoichiometrically incorrect) reactions in an automatic procedure. The BioMeta database was implemented in PostgreSQL and provided with a web-based interface. Conclusion We demonstrate that the validation of metabolite structures and reactions is a feasible and worthwhile undertaking, and that the validation results can be used to trigger corrections and improvements to BioMeta, our metabolite database. BioMeta provides some tools for rational drug design, reaction searches, and visualization. It is freely available at provided that the copyright notice of all original data is cited. The database will be useful for querying and browsing biochemical pathways, and to obtain reference information for identifying compounds. However, these applications require that the underlying data be correct, and that is the focus of BioMeta. PMID:17132165
MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

PubMed

Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

2018-05-08

Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .
PURY: a database of geometric restraints of hetero compounds for refinement in complexes with macromolecular structures.

PubMed

Andrejasic, Miha; Praaenikar, Jure; Turk, Dusan

2008-11-01

The number and variety of macromolecular structures in complex with ;hetero' ligands is growing. The need for rapid delivery of correct geometric parameters for their refinement, which is often crucial for understanding the biological relevance of the structure, is growing correspondingly. The current standard for describing protein structures is the Engh-Huber parameter set. It is an expert data set resulting from selection and analysis of the crystal structures gathered in the Cambridge Structural Database (CSD). Clearly, such a manual approach cannot be applied to the vast and ever-growing number of chemical compounds. Therefore, a database, named PURY, of geometric parameters of chemical compounds has been developed, together with a server that accesses it. PURY is a compilation of the whole CSD. It contains lists of atom classes and bonds connecting them, as well as angle, chirality, planarity and conformation parameters. The current compilation is based on CSD 5.28 and contains 1978 atom classes and 32,702 bonding, 237,068 angle, 201,860 dihedral and 64,193 improper geometric restraints. Analysis has confirmed that the restraints from the PURY database are suitable for use in macromolecular crystal structure refinement and should be of value to the crystallographic community. The database can be accessed through the web server http://pury.ijs.si/, which creates topology and parameter files from deposited coordinates in suitable forms for the refinement programs MAIN, CNS and REFMAC. In the near future, the server will move to the CSD website http://pury.ccdc.cam.ac.uk/.
Computer Science Research in Europe.

DTIC Science & Technology

1984-08-29

most attention, multi- database and its structure, and (3) the dependencies between databases Distributed Systems and multi- databases . Having...completed a multi- database Newcastle University, UK system for distributed data management, At the University of Newcastle the INRIA is now working on a real...communications re- INRIA quirements of distributed database A project called SIRIUS was estab- systems, protocols for checking the lished in 1977 at the
WLN's Database: New Directions.

ERIC Educational Resources Information Center

Ziegman, Bruce N.

1988-01-01

Describes features of the Western Library Network's database, including the database structure, authority control, contents, quality control, and distribution methods. The discussion covers changes in distribution necessitated by increasing telecommunications costs and the development of optical data disk products. (CLB)
Food Composition Database Format and Structure: A User Focused Approach

PubMed Central

Clancy, Annabel K.; Woods, Kaitlyn; McMahon, Anne; Probst, Yasmine

2015-01-01

This study aimed to investigate the needs of Australian food composition database user’s regarding database format and relate this to the format of databases available globally. Three semi structured synchronous online focus groups (M = 3, F = 11) and n = 6 female key informant interviews were recorded. Beliefs surrounding the use, training, understanding, benefits and limitations of food composition data and databases were explored. Verbatim transcriptions underwent preliminary coding followed by thematic analysis with NVivo qualitative analysis software to extract the final themes. Schematic analysis was applied to the final themes related to database format. Desktop analysis also examined the format of six key globally available databases. 24 dominant themes were established, of which five related to format; database use, food classification, framework, accessibility and availability, and data derivation. Desktop analysis revealed that food classification systems varied considerably between databases. Microsoft Excel was a common file format used in all databases, and available software varied between countries. User’s also recognised that food composition databases format should ideally be designed specifically for the intended use, have a user-friendly food classification system, incorporate accurate data with clear explanation of data derivation and feature user input. However, such databases are limited by data availability and resources. Further exploration of data sharing options should be considered. Furthermore, user’s understanding of food composition data and databases limitations is inherent to the correct application of non-specific databases. Therefore, further exploration of user FCDB training should also be considered. PMID:26554836
A database paradigm for the management of DICOM-RT structure sets using a geographic information system

NASA Astrophysics Data System (ADS)

Shao, Weber; Kupelian, Patrick A.; Wang, Jason; Low, Daniel A.; Ruan, Dan

2014-03-01

We devise a paradigm for representing the DICOM-RT structure sets in a database management system, in such way that secondary calculations of geometric information can be performed quickly from the existing contour definitions. The implementation of this paradigm is achieved using the PostgreSQL database system and the PostGIS extension, a geographic information system commonly used for encoding geographical map data. The proposed paradigm eliminates the overhead of retrieving large data records from the database, as well as the need to implement various numerical and data parsing routines, when additional information related to the geometry of the anatomy is desired.
An Examination of Selected Software Testing Tools: 1992

DTIC Science & Technology

1992-12-01

Report ....................................................... 27-19 Figure 27-17. Metrics Manager Database Full Report...historical test database , the test management and problem reporting tools were examined using the sample test database provided by each supplier. 4-4...track the impact of new methods, organi- zational structures, and technologies. Metrics Manager is supported by an industry database that allows
The XSD-Builder Specification Language—Toward a Semantic View of XML Schema Definition

NASA Astrophysics Data System (ADS)

Fong, Joseph; Cheung, San Kuen

In the present database market, XML database model is a main structure for the forthcoming database system in the Internet environment. As a conceptual schema of XML database, XML Model has its limitation on presenting its data semantics. System analyst has no toolset for modeling and analyzing XML system. We apply XML Tree Model (shown in Figure 2) as a conceptual schema of XML database to model and analyze the structure of an XML database. It is important not only for visualizing, specifying, and documenting structural models, but also for constructing executable systems. The tree model represents inter-relationship among elements inside different logical schema such as XML Schema Definition (XSD), DTD, Schematron, XDR, SOX, and DSD (shown in Figure 1, an explanation of the terms in the figure are shown in Table 1). The XSD-Builder consists of XML Tree Model, source language, translator, and XSD. The source language is called XSD-Source which is mainly for providing an environment with concept of user friendliness while writing an XSD. The source language will consequently be translated by XSD-Translator. Output of XSD-Translator is an XSD which is our target and is called as an object language.
Database for the geologic map of the Mount Baker 30- by 60-minute quadrangle, Washington (I-2660)

USGS Publications Warehouse

Tabor, R.W.; Haugerud, R.A.; Hildreth, Wes; Brown, E.H.

2006-01-01

This digital map database has been prepared by R.W. Tabor from the published Geologic map of the Mount Baker 30- by 60-Minute Quadrangle, Washington. Together with the accompanying text files as PDF, it provides information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The authors mapped most of the geology at 1:100,000. The Quaternary contacts and structural data have been much simplified for the 1:100,000-scale map and database. The spatial resolution (scale) of the database is 1:100,000 or smaller. This database depicts the distribution of geologic materials and structures at a regional (1:100,000) scale. The report is intended to provide geologic information for the regional study of materials properties, earthquake shaking, landslide potential, mineral hazards, seismic velocity, and earthquake faults. In addition, the report contains information and interpretations about the regional geologic history and framework. However, the regional scale of this report does not provide sufficient detail for site development purposes.
NVST Data Archiving System Based On FastBit NoSQL Database

NASA Astrophysics Data System (ADS)

Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo

2014-06-01

The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.
Database for the geologic map of the Chelan 30-minute by 60-minute quadrangle, Washington (I-1661)

USGS Publications Warehouse

Tabor, R.W.; Frizzell, V.A.; Whetten, J.T.; Waitt, R.B.; Swanson, D.A.; Byerly, G.R.; Booth, D.B.; Hetherington, M.J.; Zartman, R.E.

2006-01-01

This digital map database has been prepared by R. W. Tabor from the published Geologic map of the Chelan 30-Minute Quadrangle, Washington. Together with the accompanying text files as PDF, it provides information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The authors mapped most of the bedrock geology at 1:100,000 scale, but compiled Quaternary units at 1:24,000 scale. The Quaternary contacts and structural data have been much simplified for the 1:100,000-scale map and database. The spatial resolution (scale) of the database is 1:100,000 or smaller. This database depicts the distribution of geologic materials and structures at a regional (1:100,000) scale. The report is intended to provide geologic information for the regional study of materials properties, earthquake shaking, landslide potential, mineral hazards, seismic velocity, and earthquake faults. In addition, the report contains information and interpretations about the regional geologic history and framework. However, the regional scale of this report does not provide sufficient detail for site development purposes.
Database for the geologic map of the Snoqualmie Pass 30-minute by 60-minute quadrangle, Washington (I-2538)

USGS Publications Warehouse

Tabor, R.W.; Frizzell, V.A.; Booth, D.B.; Waitt, R.B.

2006-01-01

This digital map database has been prepared by R.W. Tabor from the published Geologic map of the Snoqualmie Pass 30' X 60' Quadrangle, Washington. Together with the accompanying text files as PDF, it provides information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The authors mapped most of the bedrock geology at 1:100,000 scale, but compiled Quaternary units at 1:24,000 scale. The Quaternary contacts and structural data have been much simplified for the 1:100,000-scale map and database. The spatial resolution (scale) of the database is 1:100,000 or smaller. This database depicts the distribution of geologic materials and structures at a regional (1:100,000) scale. The report is intended to provide geologic information for the regional study of materials properties, earthquake shaking, landslide potential, mineral hazards, seismic velocity, and earthquake faults. In addition, the report contains information and interpretations about the regional geologic history and framework. However, the regional scale of this report does not provide sufficient detail for site development purposes.
Geologic Map of the Wenatchee 1:100,000 Quadrangle, Central Washington: A Digital Database

USGS Publications Warehouse

Tabor, R.W.; Waitt, R.B.; Frizzell, V.A.; Swanson, D.A.; Byerly, G.R.; Bentley, R.D.

2005-01-01

This digital map database has been prepared by R.W. Tabor from the published Geologic map of the Wenatchee 1:100,000 Quadrangle, Central Washington. Together with the accompanying text files as PDF, it provides information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The authors mapped most of the bedrock geology at 1:100,000 scale, but compiled Quaternary units at 1:24,000 scale. The Quaternary contacts and structural data have been much simplified for the 1:100,000-scale map and database. The spatial resolution (scale) of the database is 1:100,000 or smaller. This database depicts the distribution of geologic materials and structures at a regional (1:100,000) scale. The report is intended to provide geologic information for the regional study of materials properties, earthquake shaking, landslide potential, mineral hazards, seismic velocity, and earthquake faults. In addition, the report contains information and interpretations about the regional geologic history and framework. However, the regional scale of this report does not provide sufficient detail for site development purposes.
Shuttle-Data-Tape XML Translator

NASA Technical Reports Server (NTRS)

Barry, Matthew R.; Osborne, Richard N.

2005-01-01

JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.
Nighttime Insomnia Symptoms and Perceived Health in the America Insomnia Survey (AIS)

PubMed Central

Walsh, James K.; Coulouvrat, Catherine; Hajak, Goeran; Lakoma, Matthew D.; Petukhova, Maria; Roth, Thomas; Sampson, Nancy A.; Shahly, Victoria; Shillington, Alicia; Stephenson, Judith J.; Kessler, Ronald C.

2011-01-01

Study Objectives: To explore the distribution of the 4 cardinal nighttime symptoms of insomnia—difficulty initiating sleep (DIS), difficulty maintaining sleep (DMS), early morning awakening (EMA), and nonrestorative sleep (NRS)—in a national sample of health plan members and the associations of these nighttime symptoms with sociodemographics, comorbidity, and perceived health. Design/Setting/Participants: Cross-sectional telephone survey of 6,791 adult respondents. Intervention: None. Measurements/Results: Current insomnia was assessed using the Brief Insomnia Questionnaire (BIQ)—a fully structured validated scale generating diagnoses of insomnia using DSM-IV-TR, ICD-10, and RDC/ICSD-2 inclusion criteria. DMS (61.0%) and EMA (52.2%) were more prevalent than DIS (37.7%) and NRS (25.2%) among respondents with insomnia. Sociodemographic correlates varied significantly across the 4 symptoms. All 4 nighttime symptoms were significantly related to a wide range of comorbid physical and mental conditions. All 4 also significantly predicted decrements in perceived health both in the total sample and among respondents with insomnia after adjusting for comorbid physical and mental conditions. Joint associations of the 4 symptoms predicting perceived health were additive and related to daytime distress/impairment. Individual-level associations were strongest for NRS. At the societal level, though, where both prevalence and strength of individual-level associations were taken into consideration, DMS had the strongest associations. Conclusions: The extent to which nighttime insomnia symptoms are stable over time requires future long-term longitudinal study. Within the context of this limitation, the results suggest that core nighttime symptoms are associated with different patterns of risk and perceived health and that symptom-based subtyping might have value. Citation: Walsh JK; Coulouvrat C; Hajak G; Lakoma MD; Petukhova M; Roth T; Sampson NA; Shahly V; Shillington A; Stephenson JJ; Kessler RC. Nighttime insomnia symptoms and perceived health in the America Insomnia Survey (AIS). SLEEP 2011;34(8):997-1011. PMID:21804662
SU-E-T-544: A Radiation Oncology-Specific Multi-Institutional Federated Database: Initial Implementation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, K; Phillips, M; Fishburn, M

Purpose: To implement a common database structure and user-friendly web-browser based data collection tools across several medical institutions to better support evidence-based clinical decision making and comparative effectiveness research through shared outcomes data. Methods: A consortium of four academic medical centers agreed to implement a federated database, known as Oncospace. Initial implementation has addressed issues of differences between institutions in workflow and types and breadth of structured information captured. This requires coordination of data collection from departmental oncology information systems (OIS), treatment planning systems, and hospital electronic medical records in order to include as much as possible the multi-disciplinary clinicalmore » data associated with a patients care. Results: The original database schema was well-designed and required only minor changes to meet institution-specific data requirements. Mobile browser interfaces for data entry and review for both the OIS and the Oncospace database were tailored for the workflow of individual institutions. Federation of database queries--the ultimate goal of the project--was tested using artificial patient data. The tests serve as proof-of-principle that the system as a whole--from data collection and entry to providing responses to research queries of the federated database--was viable. The resolution of inter-institutional use of patient data for research is still not completed. Conclusions: The migration from unstructured data mainly in the form of notes and documents to searchable, structured data is difficult. Making the transition requires cooperation of many groups within the department and can be greatly facilitated by using the structured data to improve clinical processes and workflow. The original database schema design is critical to providing enough flexibility for multi-institutional use to improve each institution s ability to study outcomes, determine best practices, and support research. The project has demonstrated the feasibility of deploying a federated database environment for research purposes to multiple institutions.« less
DEVELOPMENT OF A STRUCTURE-SEARCHABLE DATABASE FOR PESTICIDE METABOLITES AND ENVIRONMENTAL DEGRADATES

EPA Science Inventory

USEPA is modifying and enhancing existing software for the depiction of metabolic maps to provide access via structures to metabolism information and associated data in EPA's Office of Pesticide Programs (OPP). The database includes information submitted to EPA in support of pest...
Application of kernel functions for accurate similarity search in large chemical databases.

PubMed

Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H

2010-04-29

Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.

Non-Price Competition and the Structure of the Online Information Industry: Q-Analysis of Medical Databases and Hosts.

ERIC Educational Resources Information Center

Davies, Roy

1987-01-01

Discussion of the online information industry emphasizes the effects of non-price competition on its structure and the firms involved. Q-analysis is applied to data on medical databases and hosts, changes over a three-year period are identified, and an optimum structure for the industry based on economic theory is considered. (Author/LRW)
SPLICE: A program to assemble partial query solutions from three-dimensional database searches into novel ligands

NASA Astrophysics Data System (ADS)

Ho, Chris M. W.; Marshall, Garland R.

1993-12-01

SPLICE is a program that processes partial query solutions retrieved from 3D, structural databases to generate novel, aggregate ligands. It is designed to interface with the database searching program FOUNDATION, which retrieves fragments containing any combination of a user-specified minimum number of matching query elements. SPLICE eliminates aspects of structures that are physically incapable of binding within the active site. Then, a systematic rule-based procedure is performed upon the remaining fragments to ensure receptor complementarity. All modifications are automated and remain transparent to the user. Ligands are then assembled by linking components into composite structures through overlapping bonds. As a control experiment, FOUNDATION and SPLICE were used to reconstruct a know HIV-1 protease inhibitor after it had been fragmented, reoriented, and added to a sham database of fifty different small molecules. To illustrate the capabilities of this program, a 3D search query containing the pharmacophoric elements of an aspartic proteinase-inhibitor crystal complex was searched using FOUNDATION against a subset of the Cambridge Structural Database. One hundred thirty-one compounds were retrieved, each containing any combination of at least four query elements. Compounds were automatically screened and edited for receptor complementarity. Numerous combinations of fragments were discovered that could be linked to form novel structures, containing a greater number of pharmacophoric elements than any single retrieved fragment.
UbSRD: The Ubiquitin Structural Relational Database.

PubMed

Harrison, Joseph S; Jacobs, Tim M; Houlihan, Kevin; Van Doorslaer, Koenraad; Kuhlman, Brian

2016-02-22

The structurally defined ubiquitin-like homology fold (UBL) can engage in several unique protein-protein interactions and many of these complexes have been characterized with high-resolution techniques. Using Rosetta's structural classification tools, we have created the Ubiquitin Structural Relational Database (UbSRD), an SQL database of features for all 509 UBL-containing structures in the PDB, allowing users to browse these structures by protein-protein interaction and providing a platform for quantitative analysis of structural features. We used UbSRD to define the recognition features of ubiquitin (UBQ) and SUMO observed in the PDB and the orientation of the UBQ tail while interacting with certain types of proteins. While some of the interaction surfaces on UBQ and SUMO overlap, each molecule has distinct features that aid in molecular discrimination. Additionally, we find that the UBQ tail is malleable and can adopt a variety of conformations upon binding. UbSRD is accessible as an online resource at rosettadesign.med.unc.edu/ubsrd. Copyright © 2015 Elsevier Ltd. All rights reserved.
CHEMICAL STRUCTURE INDEXING OF TOXICITY DATA ON ...

EPA Pesticide Factsheets

Standardized chemical structure annotation of public toxicity databases and information resources is playing an increasingly important role in the 'flattening' and integration of diverse sets of biological activity data on the Internet. This review discusses public initiatives that are accelerating the pace of this transformation, with particular reference to toxicology-related chemical information. Chemical content annotators, structure locator services, large structure/data aggregator web sites, structure browsers, International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) codes, toxicity data models and public chemical/biological activity profiling initiatives are all playing a role in overcoming barriers to the integration of toxicity data, and are bringing researchers closer to the reality of a mineable chemical Semantic Web. An example of this integration of data is provided by the collaboration among researchers involved with the Distributed Structure-Searchable Toxicity (DSSTox) project, the Carcinogenic Potency Project, projects at the National Cancer Institute and the PubChem database. Standardizing chemical structure annotation of public toxicity databases
Database for Rapid Dereplication of Known Natural Products Using Data from MS and Fast NMR Experiments.

PubMed

Zani, Carlos L; Carroll, Anthony R

2017-06-23

The discovery of novel and/or new bioactive natural products from biota sources is often confounded by the reisolation of known natural products. Dereplication strategies that involve the analysis of NMR and MS spectroscopic data to infer structural features present in purified natural products in combination with database searches of these substructures provide an efficient method to rapidly identify known natural products. Unfortunately this strategy has been hampered by the lack of publically available and comprehensive natural product databases and open source cheminformatics tools. A new platform, DEREP-NP, has been developed to help solve this problem. DEREP-NP uses the open source cheminformatics program DataWarrior to generate a database containing counts of 65 structural fragments present in 229 358 natural product structures derived from plants, animals, and microorganisms, published before 2013 and freely available in the nonproprietary Universal Natural Products Database (UNPD). By counting the number of times one or more of these structural features occurs in an unknown compound, as deduced from the analysis of its NMR ( 1 H, HSQC, and/or HMBC) and/or MS data, matching structures carrying the same numeric combination of searched structural features can be retrieved from the database. Confirmation that the matching structure is the same compound can then be verified through literature comparison of spectroscopic data. This methodology can be applied to both purified natural products and fractions containing a small number of individual compounds that are often generated as screening libraries. The utility of DEREP-NP has been verified through the analysis of spectra derived from compounds (and fractions containing two or three compounds) isolated from plant, marine invertebrate, and fungal sources. DEREP-NP is freely available at https://github.com/clzani/DEREP-NP and will help to streamline the natural product discovery process.
Object-oriented structures supporting remote sensing databases

NASA Technical Reports Server (NTRS)

Wichmann, Keith; Cromp, Robert F.

1995-01-01

Object-oriented databases show promise for modeling the complex interrelationships pervasive in scientific domains. To examine the utility of this approach, we have developed an Intelligent Information Fusion System based on this technology, and applied it to the problem of managing an active repository of remotely-sensed satellite scenes. The design and implementation of the system is compared and contrasted with conventional relational database techniques, followed by a presentation of the underlying object-oriented data structures used to enable fast indexing into the data holdings.
Knowledge Based Engineering for Spatial Database Management and Use

NASA Technical Reports Server (NTRS)

Peuquet, D. (Principal Investigator)

1984-01-01

The use of artificial intelligence techniques that are applicable to Geographic Information Systems (GIS) are examined. Questions involving the performance and modification to the database structure, the definition of spectra in quadtree structures and their use in search heuristics, extension of the knowledge base, and learning algorithm concepts are investigated.
PIECE 2.0: an update for the plant gene structure comparison and evolution database

USDA-ARS?s Scientific Manuscript database

PIECE (Plant Intron Exon Comparision and Evolution) is a web-accessible database that houses intron and exon information of plant genes. PIECE serves as a resource for biologists interested in comparing intron-exon organization and provides valuable insights into the evolution of gene structure in ...
Discovering More Chemical Concepts from 3D Chemical Information Searches of Crystal Structure Databases

ERIC Educational Resources Information Center

Rzepa, Henry S.

2016-01-01

Three new examples are presented illustrating three-dimensional chemical information searches of the Cambridge structure database (CSD) from which basic core concepts in organic and inorganic chemistry emerge. These include connecting the regiochemistry of aromatic electrophilic substitution with the geometrical properties of hydrogen bonding…
[A Terahertz Spectral Database Based on Browser/Server Technique].

PubMed

Zhang, Zhuo-yong; Song, Yue

2015-09-01

With the solution of key scientific and technical problems and development of instrumentation, the application of terahertz technology in various fields has been paid more and more attention. Owing to the unique characteristic advantages, terahertz technology has been showing a broad future in the fields of fast, non-damaging detections, as well as many other fields. Terahertz technology combined with other complementary methods can be used to cope with many difficult practical problems which could not be solved before. One of the critical points for further development of practical terahertz detection methods depends on a good and reliable terahertz spectral database. We developed a BS (browser/server) -based terahertz spectral database recently. We designed the main structure and main functions to fulfill practical requirements. The terahertz spectral database now includes more than 240 items, and the spectral information was collected based on three sources: (1) collection and citation from some other abroad terahertz spectral databases; (2) collected from published literatures; and (3) spectral data measured in our laboratory. The present paper introduced the basic structure and fundament functions of the terahertz spectral database developed in our laboratory. One of the key functions of this THz database is calculation of optical parameters. Some optical parameters including absorption coefficient, refractive index, etc. can be calculated based on the input THz time domain spectra. The other main functions and searching methods of the browser/server-based terahertz spectral database have been discussed. The database search system can provide users convenient functions including user registration, inquiry, displaying spectral figures and molecular structures, spectral matching, etc. The THz database system provides an on-line searching function for registered users. Registered users can compare the input THz spectrum with the spectra of database, according to the obtained correlation coefficient one can perform the searching task very fast and conveniently. Our terahertz spectral database can be accessed at http://www.teralibrary.com. The proposed terahertz spectral database is based on spectral information so far, and will be improved in the future. We hope this terahertz spectral database can provide users powerful, convenient, and high efficient functions, and could promote the broader applications of terahertz technology.
A Computational Approach From Gene to Structure Analysis of the Human ABCA4 Transporter Involved in Genetic Retinal Diseases.

PubMed

Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia

2017-10-01

The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Data Structures in Natural Computing: Databases as Weak or Strong Anticipatory Systems

NASA Astrophysics Data System (ADS)

Rossiter, B. N.; Heather, M. A.

2004-08-01

Information systems anticipate the real world. Classical databases store, organise and search collections of data of that real world but only as weak anticipatory information systems. This is because of the reductionism and normalisation needed to map the structuralism of natural data on to idealised machines with von Neumann architectures consisting of fixed instructions. Category theory developed as a formalism to explore the theoretical concept of naturality shows that methods like sketches arising from graph theory as only non-natural models of naturality cannot capture real-world structures for strong anticipatory information systems. Databases need a schema of the natural world. Natural computing databases need the schema itself to be also natural. Natural computing methods including neural computers, evolutionary automata, molecular and nanocomputing and quantum computation have the potential to be strong. At present they are mainly at the stage of weak anticipatory systems.
Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

PubMed

Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

2014-01-01

With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.
A Support Database System for Integrated System Health Management (ISHM)

NASA Technical Reports Server (NTRS)

Schmalzel, John; Figueroa, Jorge F.; Turowski, Mark; Morris, John

2007-01-01

The development, deployment, operation and maintenance of Integrated Systems Health Management (ISHM) applications require the storage and processing of tremendous amounts of low-level data. This data must be shared in a secure and cost-effective manner between developers, and processed within several heterogeneous architectures. Modern database technology allows this data to be organized efficiently, while ensuring the integrity and security of the data. The extensibility and interoperability of the current database technologies also allows for the creation of an associated support database system. A support database system provides additional capabilities by building applications on top of the database structure. These applications can then be used to support the various technologies in an ISHM architecture. This presentation and paper propose a detailed structure and application description for a support database system, called the Health Assessment Database System (HADS). The HADS provides a shared context for organizing and distributing data as well as a definition of the applications that provide the required data-driven support to ISHM. This approach provides another powerful tool for ISHM developers, while also enabling novel functionality. This functionality includes: automated firmware updating and deployment, algorithm development assistance and electronic datasheet generation. The architecture for the HADS has been developed as part of the ISHM toolset at Stennis Space Center for rocket engine testing. A detailed implementation has begun for the Methane Thruster Testbed Project (MTTP) in order to assist in developing health assessment and anomaly detection algorithms for ISHM. The structure of this implementation is shown in Figure 1. The database structure consists of three primary components: the system hierarchy model, the historical data archive and the firmware codebase. The system hierarchy model replicates the physical relationships between system elements to provide the logical context for the database. The historical data archive provides a common repository for sensor data that can be shared between developers and applications. The firmware codebase is used by the developer to organize the intelligent element firmware into atomic units which can be assembled into complete firmware for specific elements.
Dictionary as Database.

ERIC Educational Resources Information Center

Painter, Derrick

1996-01-01

Discussion of dictionaries as databases focuses on the digitizing of The Oxford English dictionary (OED) and the use of Standard Generalized Mark-Up Language (SGML). Topics include the creation of a consortium to digitize the OED, document structure, relational databases, text forms, sequence, and discourse. (LRW)
Translation from the collaborative OSM database to cartography

NASA Astrophysics Data System (ADS)

Hayat, Flora

2018-05-01

The OpenStreetMap (OSM) database includes original items very useful for geographical analysis and for creating thematic maps. Contributors record in the open database various themes regarding amenities, leisure, transports, buildings and boundaries. The Michelin mapping department develops map prototypes to test the feasibility of mapping based on OSM. To translate the OSM database structure into a database structure fitted with Michelin graphic guidelines a research project is in development. It aims at defining the right structure for the Michelin uses. The research project relies on the analysis of semantic and geometric heterogeneities in OSM data. In that order, Michelin implements methods to transform the input geographical database into a cartographic image dedicated for specific uses (routing and tourist maps). The paper focuses on the mapping tools available to produce a personalised spatial database. Based on processed data, paper and Web maps can be displayed. Two prototypes are described in this article: a vector tile web map and a mapping method to produce paper maps on a regional scale. The vector tile mapping method offers an easy navigation within the map and within graphic and thematic guide- lines. Paper maps can be partly automatically drawn. The drawing automation and data management are part of the mapping creation as well as the final hand-drawing phase. Both prototypes have been set up using the OSM technical ecosystem.
Querying Semi-Structured Data

NASA Technical Reports Server (NTRS)

Abiteboul, Serge

1997-01-01

The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.
In silico analysis of fragile histidine triad involved in regression of carcinoma.

PubMed

Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia

2017-04-01

Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.
DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS ...

EPA Pesticide Factsheets

DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction ModelsAnn M. RichardUS Environmental Protection Agency, Research Triangle Park, NC, USADistributed: Decentralized set of standardized, field-delimited databases, each separatelyauthored and maintained, that are able to accommodate diverse toxicity data content;Structure-Searchable: Standard format (SDF) structure-data files that can be readily imported into available chemical relational databases and structure-searched;Tox: Toxicity data as it exists in widely disparate forms in current public databases, spanning diverse toxicity endpoints, test systems, levels of biological content, degrees of summarization, and information content.INTRODUCTIONThe economic and social pressures to reduce the need for animal testing and to better anticipate the potential for human and eco-toxicity of environmental, industrial, or pharmaceutical chemicals are as pressing today as at any time prior. However, the goal of predicting chemical toxicity in its many manifestations, the `T' in 'ADMET' (adsorption, distribution, metabolism, elimination, toxicity), remains one of the most difficult and largely unmet challenges in a chemical screening paradigm [1]. It is widely acknowledged that the single greatest hurdle to improving structure-activity relationship (SAR) toxicity prediction capabilities, in both the pharmaceutical and environmental regulation arenas, is the lack of suffici
The ASTRAL Compendium in 2004

DOE R&D Accomplishments Database

Chandonia, John-Marc; Hon, Gary; Walker, Nigel S.; Lo Conte, Loredana; Koehl, Patrice; Levitt, Michael; Brenner, Steven E.

2003-09-15

The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. Partially derived from the SCOP database of protein structure domains, it includes sequences for each domain and other resources useful for studying these sequences and domain structures. The current release of ASTRAL contains 54,745 domains, more than three times as many as the initial release four years ago. ASTRAL has undergone major transformations in the past two years. In addition to several complete updates each year, ASTRAL is now updated on a weekly basis with preliminary classifications of domains from newly released PDB structures. These classifications are available as a stand-alone database, as well as available integrated into other ASTRAL databases such as representative subsets. To enhance the utility of ASTRAL to structural biologists, all SCOP domains are now made available as PDB-style coordinate files as well as sequences. In addition to sequences and representative subsets based on SCOP domains, sequences and subsets based on PDB chains are newly included in ASTRAL. Several search tools have been added to ASTRAL to facilitate retrieval of data by individual users and automated methods.

Structure-Based Characterization of Multiprotein Complexes

PubMed Central

Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J.

2014-01-01

Summary Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. PMID:24954616
A Novel Concept for the Search and Retrieval of the Derwent Markush Resource Database.

PubMed

Barth, Andreas; Stengel, Thomas; Litterst, Edwin; Kraut, Hans; Matuszczyk, Henry; Ailer, Franz; Hajkowski, Steve

2016-05-23

The representation of and search for generic chemical structures (Markush) remains a continuing challenge. Several research groups have addressed this problem, and over time a limited number of practical solutions have been proposed. Today there are two large commercial providers of Markush databases: Chemical Abstracts Service (CAS) and Thomson Reuters. The Thomson Reuters "Derwent" Markush database is currently offered via the online services Questel and STN and as a data feed for in-house use. The aim of this paper is to briefly review the existing Markush systems (databases plus search engines) and to describe our new approach for the implementation of the Derwent Markush Resource on STN. Our new approach demonstrates the integration of the Derwent Markush Resource database into the existing chemistry-focused STN platform without loss of detail. This provides compatibility with other structure and Markush databases on STN and at the same time makes it possible to deploy the specific features and functions of the Derwent approach. It is shown that the different Markush languages developed by CAS and Derwent can be combined into a single general Markush description. In this concept the generic nodes are grouped together in a unique hierarchy where all chemical elements and fragments can be integrated. As a consequence, both systems are searchable using a single structure query. Moreover, the presented concept could serve as a promising starting point for a common generalized description of Markush structures.
Public census data on CD-ROM at Lawrence Berkeley Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merrill, D.W.

The Comprehensive Epidemiologic Data Resource (CEDR) and Populations at Risk to Environmental Pollution (PAREP) projects, of the Information and Computing Sciences Division (ICSD) at Lawrence Berkeley Laboratory (LBL), are using public socio-economic and geographic data files which are available to CEDR and PAREP collaborators via LBL`s computing network. At this time 70 CD-ROM diskettes (approximately 36 gigabytes) are on line via the Unix file server cedrcd. lbl. gov. Most of the files are from the US Bureau of the Census, and most pertain to the 1990 Census of Population and Housing. All the CD-ROM diskettes contain documentation in the formmore » of ASCII text files. Printed documentation for most files is available for inspection at University of California Data and Technical Assistance (UC DATA), or the UC Documents Library. Many of the CD-ROM diskettes distributed by the Census Bureau contain software for PC compatible computers, for easily accessing the data. Shared access to the data is maintained through a collaboration among the CEDR and PAREP projects at LBL, and UC DATA, and the UC Documents Library. Via the Sun Network File System (NFS), these data can be exported to Internet computers for direct access by the user`s application program(s).« less
Public census data on CD-ROM at Lawrence Berkeley Laboratory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merrill, D.W.

The Comprehensive Epidemiologic Data Resource (CEDR) and Populations at Risk to Environmental Pollution (PAREP) projects, of the Information and Computing Sciences Division (ICSD) at Lawrence Berkeley Laboratory (LBL), are using public socio-economic and geographic data files which are available to CEDR and PAREP collaborators via LBL's computing network. At this time 70 CD-ROM diskettes (approximately 36 gigabytes) are on line via the Unix file server cedrcd. lbl. gov. Most of the files are from the US Bureau of the Census, and most pertain to the 1990 Census of Population and Housing. All the CD-ROM diskettes contain documentation in the formmore » of ASCII text files. Printed documentation for most files is available for inspection at University of California Data and Technical Assistance (UC DATA), or the UC Documents Library. Many of the CD-ROM diskettes distributed by the Census Bureau contain software for PC compatible computers, for easily accessing the data. Shared access to the data is maintained through a collaboration among the CEDR and PAREP projects at LBL, and UC DATA, and the UC Documents Library. Via the Sun Network File System (NFS), these data can be exported to Internet computers for direct access by the user's application program(s).« less
3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.

PubMed

Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke

2014-01-01

The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.
An online database of nuclear electromagnetic moments

NASA Astrophysics Data System (ADS)

Mertzimekis, T. J.; Stamou, K.; Psaltis, A.

2016-01-01

Measurements of nuclear magnetic dipole and electric quadrupole moments are considered quite important for the understanding of nuclear structure both near and far from the valley of stability. The recent advent of radioactive beams has resulted in a plethora of new, continuously flowing, experimental data on nuclear structure - including nuclear moments - which hinders the information management. A new, dedicated, public and user friendly online database (http://magneticmoments.info) has been created comprising experimental data of nuclear electromagnetic moments. The present database supersedes existing printed compilations, including also non-evaluated series of data and relevant meta-data, while putting strong emphasis on bimonthly updates. The scope, features and extensions of the database are reported.
Querying databases of trajectories of differential equations: Data structures for trajectories

NASA Technical Reports Server (NTRS)

Grossman, Robert

1989-01-01

One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.
Software Engineering Laboratory (SEL) database organization and user's guide, revision 2

NASA Technical Reports Server (NTRS)

Morusiewicz, Linda; Bristow, John

1992-01-01

The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.
Software Engineering Laboratory (SEL) database organization and user's guide

NASA Technical Reports Server (NTRS)

So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas

1989-01-01

The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.
Component, Context and Manufacturing Model Library (C2M2L)

DTIC Science & Technology

2013-03-01

Penn State team were stored in a relational database for easy access, storage and maintainability. The relational database consisted of a PostGres ...file into a format that can be imported into the PostGres database. This same custom application was used to generate Microsoft Excel templates...Press Break Forming Equipment 4.14 Manufacturing Model Library Database Structure The data storage mechanism for the ARL PSU MML was a PostGres database
A natural language interface plug-in for cooperative query answering in biological databases.

PubMed

Jamil, Hasan M

2012-06-11

One of the many unique features of biological databases is that the mere existence of a ground data item is not always a precondition for a query response. It may be argued that from a biologist's standpoint, queries are not always best posed using a structured language. By this we mean that approximate and flexible responses to natural language like queries are well suited for this domain. This is partly due to biologists' tendency to seek simpler interfaces and partly due to the fact that questions in biology involve high level concepts that are open to interpretations computed using sophisticated tools. In such highly interpretive environments, rigidly structured databases do not always perform well. In this paper, our goal is to propose a semantic correspondence plug-in to aid natural language query processing over arbitrary biological database schema with an aim to providing cooperative responses to queries tailored to users' interpretations. Natural language interfaces for databases are generally effective when they are tuned to the underlying database schema and its semantics. Therefore, changes in database schema become impossible to support, or a substantial reorganization cost must be absorbed to reflect any change. We leverage developments in natural language parsing, rule languages and ontologies, and data integration technologies to assemble a prototype query processor that is able to transform a natural language query into a semantically equivalent structured query over the database. We allow knowledge rules and their frequent modifications as part of the underlying database schema. The approach we adopt in our plug-in overcomes some of the serious limitations of many contemporary natural language interfaces, including support for schema modifications and independence from underlying database schema. The plug-in introduced in this paper is generic and facilitates connecting user selected natural language interfaces to arbitrary databases using a semantic description of the intended application. We demonstrate the feasibility of our approach with a practical example.
RExPrimer: an integrated primer designing tool increases PCR effectiveness by avoiding 3' SNP-in-primer and mis-priming from structural variation

PubMed Central

2009-01-01

Background Polymerase chain reaction (PCR) is very useful in many areas of molecular biology research. It is commonly observed that PCR success is critically dependent on design of an effective primer pair. Current tools for primer design do not adequately address the problem of PCR failure due to mis-priming on target-related sequences and structural variations in the genome. Methods We have developed an integrated graphical web-based application for primer design, called RExPrimer, which was written in Python language. The software uses Primer3 as the primer designing core algorithm. Locally stored sequence information and genomic variant information were hosted on MySQLv5.0 and were incorporated into RExPrimer. Results RExPrimer provides many functionalities for improved PCR primer design. Several databases, namely annotated human SNP databases, insertion/deletion (indel) polymorphisms database, pseudogene database, and structural genomic variation databases were integrated into RExPrimer, enabling an effective without-leaving-the-website validation of the resulting primers. By incorporating these databases, the primers reported by RExPrimer avoid mis-priming to related sequences (e.g. pseudogene, segmental duplication) as well as possible PCR failure because of structural polymorphisms (SNP, indel, and copy number variation (CNV)). To prevent mismatching caused by unexpected SNPs in the designed primers, in particular the 3' end (SNP-in-Primer), several SNP databases covering the broad range of population-specific SNP information are utilized to report SNPs present in the primer sequences. Population-specific SNP information also helps customize primer design for a specific population. Furthermore, RExPrimer offers a graphical user-friendly interface through the use of scalable vector graphic image that intuitively presents resulting primers along with the corresponding gene structure. In this study, we demonstrated the program effectiveness in successfully generating primers for strong homologous sequences. Conclusion The improvements for primer design incorporated into RExPrimer were demonstrated to be effective in designing primers for challenging PCR experiments. Integration of SNP and structural variation databases allows for robust primer design for a variety of PCR applications, irrespective of the sequence complexity in the region of interest. This software is freely available at http://www4a.biotec.or.th/rexprimer. PMID:19958502
Object-oriented parsing of biological databases with Python.

PubMed

Ramu, C; Gemünd, C; Gibson, T J

2000-07-01

While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.
An Extensible "SCHEMA-LESS" Database Framework for Managing High-Throughput Semi-Structured Documents

NASA Technical Reports Server (NTRS)

Maluf, David A.; Tran, Peter B.

2003-01-01

Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.
The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

PubMed

Takashima, S

2001-04-05

The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.
microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.

PubMed

Singh, Nagendra Kumar

2017-09-01

microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.
Interactive and Versatile Navigation of Structural Databases.

PubMed

Korb, Oliver; Kuhn, Bernd; Hert, Jérôme; Taylor, Neil; Cole, Jason; Groom, Colin; Stahl, Martin

2016-05-12

We present CSD-CrossMiner, a novel tool for pharmacophore-based searches in crystal structure databases. Intuitive pharmacophore queries describing, among others, protein-ligand interaction patterns, ligand scaffolds, or protein environments can be built and modified interactively. Matching crystal structures are overlaid onto the query and visualized as soon as they are available, enabling the researcher to quickly modify a hypothesis on the fly. We exemplify the utility of the approach by showing applications relevant to real-world drug discovery projects, including the identification of novel fragments for a specific protein environment or scaffold hopping. The ability to concurrently search protein-ligand binding sites extracted from the Protein Data Bank (PDB) and small organic molecules from the Cambridge Structural Database (CSD) using the same pharmacophore query further emphasizes the flexibility of CSD-CrossMiner. We believe that CSD-CrossMiner closes an important gap in mining structural data and will allow users to extract more value from the growing number of available crystal structures.
SORTEZ: a relational translator for NCBI's ASN.1 database.

PubMed

Hart, K W; Searls, D B; Overton, G C

1994-07-01

The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
LAND-deFeND - An innovative database structure for landslides and floods and their consequences.

PubMed

Napolitano, Elisabetta; Marchesini, Ivan; Salvati, Paola; Donnini, Marco; Bianchi, Cinzia; Guzzetti, Fausto

2018-02-01

Information on historical landslides and floods - collectively called "geo-hydrological hazards - is key to understand the complex dynamics of the events, to estimate the temporal and spatial frequency of damaging events, and to quantify their impact. A number of databases on geo-hydrological hazards and their consequences have been developed worldwide at different geographical and temporal scales. Of the few available database structures that can handle information on both landslides and floods some are outdated and others were not designed to store, organize, and manage information on single phenomena or on the type and monetary value of the damages and the remediation actions. Here, we present the LANDslides and Floods National Database (LAND-deFeND), a new database structure able to store, organize, and manage in a single digital structure spatial information collected from various sources with different accuracy. In designing LAND-deFeND, we defined four groups of entities, namely: nature-related, human-related, geospatial-related, and information-source-related entities that collectively can describe fully the geo-hydrological hazards and their consequences. In LAND-deFeND, the main entities are the nature-related entities, encompassing: (i) the "phenomenon", a single landslide or local inundation, (ii) the "event", which represent the ensemble of the inundations and/or landslides occurred in a conventional geographical area in a limited period, and (iii) the "trigger", which is the meteo-climatic or seismic cause (trigger) of the geo-hydrological hazards. LAND-deFeND maintains the relations between the nature-related entities and the human-related entities even where the information is missing partially. The physical model of the LAND-deFeND contains 32 tables, including nine input tables, 21 dictionary tables, and two association tables, and ten views, including specific views that make the database structure compliant with the EC INSPIRE and the Floods Directives. The LAND-deFeND database structure is open, and freely available from http://geomorphology.irpi.cnr.it/tools. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Smiles2Monomers: a link between chemical and biological structures for polymers.

PubMed

Dufresne, Yoann; Noé, Laurent; Leclère, Valérie; Pupin, Maude

2015-01-01

The monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. Many databases give access to the atomic structure of compounds but the monomeric structure of polymers is often lacking. We have designed a smart algorithm, implemented in the tool Smiles2Monomers (s2m), to infer efficiently and accurately the monomeric structure of a polymer from its chemical structure. Our strategy is divided into two steps: first, monomers are mapped on the atomic structure by an efficient subgraph-isomorphism algorithm ; second, the best tiling is computed so that non-overlapping monomers cover all the structure of the target polymer. The mapping is based on a Markovian index built by a dynamic programming algorithm. The index enables s2m to search quickly all the given monomers on a target polymer. After, a greedy algorithm combines the mapped monomers into a consistent monomeric structure. Finally, a local branch and cut algorithm refines the structure. We tested this method on two manually annotated databases of polymers and reconstructed the structures de novo with a sensitivity over 90 %. The average computation time per polymer is 2 s. s2m automatically creates de novo monomeric annotations for polymers, efficiently in terms of time computation and sensitivity. s2m allowed us to detect annotation errors in the tested databases and to easily find the accurate structures. So, s2m could be integrated into the curation process of databases of small compounds to verify the current entries and accelerate the annotation of new polymers. The full method can be downloaded or accessed via a website for peptide-like polymers at http://bioinfo.lifl.fr/norine/smiles2monomers.jsp.Graphical abstract:.

Piece2.0: an update for the pant gene structure comparison and evolution database

USDA-ARS?s Scientific Manuscript database

PIECE (Plant Intron Exon Comparison and Evolution) is a web-accessible database that houses intron and exon information of plant genes. PIECE serves as a resource for biologists interested in comparing intron–exon organization and provides valuable insights into the evolution of gene structure in pl...
STANDARDIZATION AND STRUCTURAL ANNOTATION OF PUBLIC TOXICITY DATABASES: IMPROVING SAR CAPABILITIES AND LINKAGE TO 'OMICS DATA

EPA Science Inventory

Standardization and structural annotation of public toxicity databases: Improving SAR capabilities and linkage to 'omics data
Ann M. Richard', ClarLynda Williams', Jamie Burch2
'Nat Health & Environ Res Lab, US EPA, RTP, NC 27711; 2EPA/NC Central Univ Student COOP Trainee<...
An open workflow to generate “MS Ready” structures and improve non-targeted mass spectrometry (ACS Fall 1 of 3)

EPA Science Inventory

High-throughput non-targeted analyses (NTA) rely on chemical reference databases for tentative identification of observed chemical features. Many of these databases and online resources incorporate chemical structure data not in a form that is readily observed by mass spectromet...
The forest inventory and analysis database description and users manual version 1.0

Treesearch

Patrick D. Miles; Gary J. Brand; Carol L. Alerich; Larry F. Bednar; Sharon W. Woudenberg; Joseph F. Glover; Edward N. Ezell

2001-01-01

Describes the structure of the Forest Inventory and Analysis Database (FIADB) and provides information on generating estimates of forest statistics from these data. The FIADB structure provides a consistent framework for storing forest inventory data across all ownerships across the entire United States. These data are available to the public.
Extending the Online Public Access Catalog into the Microcomputer Environment.

ERIC Educational Resources Information Center

Sutton, Brett

1990-01-01

Describes PCBIS, a database program for MS-DOS microcomputers that features a utility for automatically converting online public access catalog search results stored as text files into structured database files that can be searched, sorted, edited, and printed. Topics covered include the general features of the program, record structure, record…
A structured vocabulary for indexing dietary supplements in databases in the United States

PubMed Central

Saldanha, Leila G; Dwyer, Johanna T; Holden, Joanne M; Ireland, Jayne D.; Andrews, Karen W; Bailey, Regan L; Gahche, Jaime J.; Hardy, Constance J; Møller, Anders; Pilch, Susan M.; Roseland, Janet M

2011-01-01

Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing their ingredients in such databases. Differing approaches to classifying these products make it difficult to retrieve or link information effectively. A consistent approach to classifying information within food composition databases led to the development of LanguaL™, a structured vocabulary. LanguaL™ is being adapted as an interface tool for classifying and retrieving product information in dietary supplement databases. This paper outlines proposed changes to the LanguaL™ thesaurus for indexing dietary supplement products and ingredients in databases. The choice of 12 of the original 14 LanguaL™ facets pertinent to dietary supplements, modifications to their scopes, and applications are described. The 12 chosen facets are: Product Type; Source; Part of Source; Physical State, Shape or Form; Ingredients; Preservation Method, Packing Medium, Container or Wrapping; Contact Surface; Consumer Group/Dietary Use/Label Claim; Geographic Places and Regions; and Adjunct Characteristics of food. PMID:22611303
The Halophile protein database.

PubMed

Sharma, Naveen; Farooqi, Mohammad Samir; Chaturvedi, Krishna Kumar; Lal, Shashi Bhushan; Grover, Monendra; Rai, Anil; Pandey, Pankaj

2014-01-01

Halophilic archaea/bacteria adapt to different salt concentration, namely extreme, moderate and low. These type of adaptations may occur as a result of modification of protein structure and other changes in different cell organelles. Thus proteins may play an important role in the adaptation of halophilic archaea/bacteria to saline conditions. The Halophile protein database (HProtDB) is a systematic attempt to document the biochemical and biophysical properties of proteins from halophilic archaea/bacteria which may be involved in adaptation of these organisms to saline conditions. In this database, various physicochemical properties such as molecular weight, theoretical pI, amino acid composition, atomic composition, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (Gravy) have been listed. These physicochemical properties play an important role in identifying the protein structure, bonding pattern and function of the specific proteins. This database is comprehensive, manually curated, non-redundant catalogue of proteins. The database currently contains 59 897 proteins properties extracted from 21 different strains of halophilic archaea/bacteria. The database can be accessed through link. Database URL: http://webapp.cabgrid.res.in/protein/ © The Author(s) 2014. Published by Oxford University Press.
A new relational database structure and online interface for the HITRAN database

NASA Astrophysics Data System (ADS)

Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

2013-11-01

A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.
A storage scheme for the real-time database supporting the on-line commitment

NASA Astrophysics Data System (ADS)

Dai, Hong-bin; Jing, Yu-jian; Wang, Hui

2013-07-01

The modern SCADA (Supervisory Control and Data acquisition) systems have been applied to various aspects of everyday life. As the time goes on, the requirements of the applications of the systems vary. Thus the data structure of the real-time database, which is the core of a SCADA system, often needs modification. As a result, the commitment consisting of a sequence of configuration operations modifying the data structure of the real-time database is performed from time to time. Though it is simple to perform the off-line commitment by first stopping and then restarting the system, during which all the data in the real-time database are reconstructed. It is much more preferred or in some cases even necessary to perform the on-line commitment, during which the real-time database can still provide real-time service and the system continues working normally. In this paper, a storage scheme of the data in the real-time database is proposed. It helps the real-time database support its on-line commitment, during which real-time service is still available.
System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

NASA Technical Reports Server (NTRS)

Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

2017-01-01

The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.
The Forest Inventory and Analysis Database: Database description and users manual version 4.0 for Phase 2

Treesearch

Sharon W. Woudenberg; Barbara L. Conkling; Barbara M. O' Connell; Elizabeth B. LaPoint; Jeffery A. Turner; Karen L. Waddell

2010-01-01

This document is based on previous documentation of the nationally standardized Forest Inventory and Analysis database (Hansen and others 1992; Woudenberg and Farrenkopf 1995; Miles and others 2001). Documentation of the structure of the Forest Inventory and Analysis database (FIADB) for Phase 2 data, as well as codes and definitions, is provided. Examples for...
Database for the geologic map of the Sauk River 30-minute by 60-minute quadrangle, Washington (I-2592)

USGS Publications Warehouse

Tabor, R.W.; Booth, D.B.; Vance, J.A.; Ford, A.B.

2006-01-01

This digital map database has been prepared by R.W. Tabor from the published Geologic map of the Sauk River 30- by 60 Minute Quadrangle, Washington. Together with the accompanying text files as PDF, it provides information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The authors mapped most of the bedrock geology at 1:100,000 scale, but compiled most Quaternary units at 1:24,000 scale. The Quaternary contacts and structural data have been much simplified for the 1:100,000-scale map and database. The spatial resolution (scale) of the database is 1:100,000 or smaller. This database depicts the distribution of geologic materials and structures at a regional (1:100,000) scale. The report is intended to provide geologic information for the regional study of materials properties, earthquake shaking, landslide potential, mineral hazards, seismic velocity, and earthquake faults. In addition, the report contains information and interpretations about the regional geologic history and framework. However, the regional scale of this report does not provide sufficient detail for site development purposes.
Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials

PubMed Central

Federer, Callie; Yoo, Minjae

2016-01-01

Abstract Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov (https://clinicaltrials.gov/), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov. Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs. PMID:27631620
Big Data Mining and Adverse Event Pattern Analysis in Clinical Drug Trials.

PubMed

Federer, Callie; Yoo, Minjae; Tan, Aik Choon

2016-12-01

Drug adverse events (AEs) are a major health threat to patients seeking medical treatment and a significant barrier in drug discovery and development. AEs are now required to be submitted during clinical trials and can be extracted from ClinicalTrials.gov ( https://clinicaltrials.gov/ ), a database of clinical studies around the world. By extracting drug and AE information from ClinicalTrials.gov and structuring it into a database, drug-AEs could be established for future drug development and repositioning. To our knowledge, current AE databases contain mainly U.S. Food and Drug Administration (FDA)-approved drugs. However, our database contains both FDA-approved and experimental compounds extracted from ClinicalTrials.gov . Our database contains 8,161 clinical trials of 3,102,675 patients and 713,103 reported AEs. We extracted the information from ClinicalTrials.gov using a set of python scripts, and then used regular expressions and a drug dictionary to process and structure relevant information into a relational database. We performed data mining and pattern analysis of drug-AEs in our database. Our database can serve as a tool to assist researchers to discover drug-AE relationships for developing, repositioning, and repurposing drugs.
Generation And Understanding Of Natural Language Using Information In A Frame Structure

NASA Astrophysics Data System (ADS)

Perkins, Walton A.

1989-03-01

Many expert systems and relational database systems store factual information in the form of attributes values of objects. Problems arise in transforming from that attribute (frame) database representation into English surface structure and in transforming the English surface structure into a representation that references information in the frame database. In this paper we consider mainly the generation process, as it is this area in which we have made the most significant progress. In its interaction with the user, the expert system must generate questions, declarations, and uncertain declarations. Attributes such as COLOR, LENGTH, and ILLUMINATION can be referenced using the template: " of " for both questions and declarations. However, many other attributes, such as RATTLES, in "What is RATTLES of the light bulb?", and HAS_STREP_THROAT in, "HAS_STREP_THROAT of Dan is true." do not fit this template. We examined over 300 attributes from several knowledge bases and have grouped them into 16 classes. For each class there is one "question" template, one "declaration" template, and one "uncertain declaration" template for generating English surface structure. The internal databases identifiers (e.g., HAS_STREP_THROAT and DISEASE_35) must also be replaced by output synonyms. Classifying each attribute in combination with synonym translation remarkably improved the English surface structure that the system generated. In the area of understanding, synonym translation and knowledge of the attribute properties, such as legal values, has resulted in a robust database query capability.
MitBASE : a comprehensive and integrated mitochondrial DNA database. The present status

PubMed Central

Attimonelli, M.; Altamura, N.; Benne, R.; Brennicke, A.; Cooper, J. M.; D’Elia, D.; Montalvo, A. de; Pinto, B. de; De Robertis, M.; Golik, P.; Knoop, V.; Lanave, C.; Lazowska, J.; Licciulli, F.; Malladi, B. S.; Memeo, F.; Monnerot, M.; Pasimeni, R.; Pilbout, S.; Schapira, A. H. V.; Sloof, P.; Saccone, C.

2000-01-01

MitBASE is an integrated and comprehensive database of mitochondrial DNA data which collects, under a single interface, databases for Plant, Vertebrate, Invertebrate, Human, Protist and Fungal mtDNA and a Pilot database on nuclear genes involved in mitochondrial biogenesis in Saccharomyces cerevisiae. MitBASE reports all available information from different organisms and from intraspecies variants and mutants. Data have been drawn from the primary databases and from the literature; value adding information has been structured, e.g., editing information on protist mtDNA genomes, pathological information for human mtDNA variants, etc. The different databases, some of which are structured using commercial packages (Microsoft Access, File Maker Pro) while others use a flat-file format, have been integrated under ORACLE. Ad hoc retrieval systems have been devised for some of the above listed databases keeping into account their peculiarities. The database is resident at the EBI and is available at the following site: http://www3.ebi.ac.uk/Research/Mitbase/mitbase.pl . The impact of this project is intended for both basic and applied research. The study of mitochondrial genetic diseases and mitochondrial DNA intraspecies diversity are key topics in several biotechnological fields. The database has been funded within the EU Biotechnology programme. PMID:10592207
Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

PubMed

Ezra Tsur, Elishai

2017-01-01

Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

PubMed

Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

2015-01-01

In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.
PROGRESS REPORT ON THE DSSTOX DATABASE NETWORK: NEWLY LAUNCHED WEBSITE, APPLICATIONS, FUTURE PLANS

EPA Science Inventory

Progress Report on the DSSTox Database Network: Newly Launched Website, Applications, Future Plans

Progress will be reported on development of the Distributed Structure-Searchable Toxicity (DSSTox) Database Network and the newly launched public website that coordinates and...
A structured vocabulary for indexing dietary supplements in databases in the United States

USDA-ARS?s Scientific Manuscript database

Food composition databases are critical to assess and plan dietary intakes. Dietary supplement databases are also needed because dietary supplements make significant contributions to total nutrient intakes. However, no uniform system exists for classifying dietary supplement products and indexing ...

The North Central Forest Inventory and Analysis timber product output database--a regional composite approach.

Treesearch

Dennis M. May

1998-01-01

Discusses a regional composite approach to managing timber product output data in a relational database. Describes the development and structure of the regional composite database and demonstrates its use in addressing everyday timber product output information needs.
WebCSD: the online portal to the Cambridge Structural Database

PubMed Central

Thomas, Ian R.; Bruno, Ian J.; Cole, Jason C.; Macrae, Clare F.; Pidcock, Elna; Wood, Peter A.

2010-01-01

WebCSD, a new web-based application developed by the Cambridge Crystallographic Data Centre, offers fast searching of the Cambridge Structural Database using only a standard internet browser. Search facilities include two-dimensional substructure, molecular similarity, text/numeric and reduced cell searching. Text, chemical diagrams and three-dimensional structural information can all be studied in the results browser using the efficient entry summaries and embedded three-dimensional viewer. PMID:22477776
A structural informatics approach to mine kinase knowledge bases.

PubMed

Brooijmans, Natasja; Mobilio, Dominick; Walker, Gary; Nilakantan, Ramaswamy; Denny, Rajiah A; Feyfant, Eric; Diller, David; Bikker, Jack; Humblet, Christine

2010-03-01

In this paper, we describe a combination of structural informatics approaches developed to mine data extracted from existing structure knowledge bases (Protein Data Bank and the GVK database) with a focus on kinase ATP-binding site data. In contrast to existing systems that retrieve and analyze protein structures, our techniques are centered on a database of ligand-bound geometries in relation to residues lining the binding site and transparent access to ligand-based SAR data. We illustrate the systems in the context of the Abelson kinase and related inhibitor structures. 2009 Elsevier Ltd. All rights reserved.
Data to knowledge: how to get meaning from your result.

PubMed

Berman, Helen M; Gabanyi, Margaret J; Groom, Colin R; Johnson, John E; Murshudov, Garib N; Nicholls, Robert A; Reddy, Vijay; Schwede, Torsten; Zimmerman, Matthew D; Westbrook, John; Minor, Wladek

2015-01-01

Structural and functional studies require the development of sophisticated 'Big Data' technologies and software to increase the knowledge derived and ensure reproducibility of the data. This paper presents summaries of the Structural Biology Knowledge Base, the VIPERdb Virus Structure Database, evaluation of homology modeling by the Protein Model Portal, the ProSMART tool for conformation-independent structure comparison, the LabDB 'super' laboratory information management system and the Cambridge Structural Database. These techniques and technologies represent important tools for the transformation of crystallographic data into knowledge and information, in an effort to address the problem of non-reproducibility of experimental results.
Structure-based characterization of multiprotein complexes.

PubMed

Wiederstein, Markus; Gruber, Markus; Frank, Karl; Melo, Francisco; Sippl, Manfred J

2014-07-08

Multiprotein complexes govern virtually all cellular processes. Their 3D structures provide important clues to their biological roles, especially through structural correlations among protein molecules and complexes. The detection of such correlations generally requires comprehensive searches in databases of known protein structures by means of appropriate structure-matching techniques. Here, we present a high-speed structure search engine capable of instantly matching large protein oligomers against the complete and up-to-date database of biologically functional assemblies of protein molecules. We use this tool to reveal unseen structural correlations on the level of protein quaternary structure and demonstrate its general usefulness for efficiently exploring complex structural relationships among known protein assemblies. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
New generic indexing technology

NASA Technical Reports Server (NTRS)

Freeston, Michael

1996-01-01

There has been no fundamental change in the dynamic indexing methods supporting database systems since the invention of the B-tree twenty-five years ago. And yet the whole classical approach to dynamic database indexing has long since become inappropriate and increasingly inadequate. We are moving rapidly from the conventional one-dimensional world of fixed-structure text and numbers to a multi-dimensional world of variable structures, objects and images, in space and time. But, even before leaving the confines of conventional database indexing, the situation is highly unsatisfactory. In fact, our research has led us to question the basic assumptions of conventional database indexing. We have spent the past ten years studying the properties of multi-dimensional indexing methods, and in this paper we draw the strands of a number of developments together - some quite old, some very new, to show how we now have the basis for a new generic indexing technology for the next generation of database systems.
Collection, processing, and reporting of damage tolerant design data for non-aerospace structural materials

NASA Technical Reports Server (NTRS)

Huber, P. D.; Gallagher, J. P.

1994-01-01

This report describes the organization, format and content of the NASA Johnson damage tolerant database which was created to store damage tolerant property data for non aerospace structural materials. The database is designed to store fracture toughness data (K(sub IC), K(sub c), J(sub IC) and CTOD(sub IC)), resistance curve data (K(sub R) VS. delta a (sub eff) and JR VS. delta a (sub eff)), as well as subcritical crack growth data (a vs. N and da/dN vs. delta K). The database contains complementary material property data for both stainless and alloy steels, as well as for aluminum, nickel, and titanium alloys which were not incorporated into the Damage Tolerant Design Handbook database.
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan

2010-10-04

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
Alkamid database: Chemistry, occurrence and functionality of plant N-alkylamides.

PubMed

Boonen, Jente; Bronselaer, Antoon; Nielandt, Joachim; Veryser, Lieselotte; De Tré, Guy; De Spiegeleer, Bart

2012-08-01

N-Alkylamides (NAAs) are a promising group of bioactive compounds, which are anticipated to act as important lead compounds for plant protection and biocidal products, functional food, cosmeceuticals and drugs in the next decennia. These molecules, currently found in more than 25 plant families and with a wide structural diversity, exert a variety of biological-pharmacological effects and are of high ethnopharmacological importance. However, information is scattered in literature, with different, often unstandardized, pharmacological methodologies being used. Therefore, a comprehensive NAA database (acronym: Alkamid) was constructed to collect the available structural and functional NAA data, linked to their occurrence in plants (family, tribe, species, genus). For loading information in the database, literature data was gathered over the period 1950-2010, by using several search engines. In order to represent the collected information about NAAs, the plants in which they occur and the functionalities for which they have been examined, a relational database is constructed and implemented on a MySQL back-end. The database is supported by describing the NAA plant-, functional- and chemical-space. The chemical space includes a NAA classification, according to their fatty acid and amine structures. The Alkamid database (publicly available on the website http://alkamid.ugent.be/) is not only a central information point, but can also function as a useful tool to prioritize the NAA choice in the evaluation of their functionality, to perform data mining leading to quantitative structure-property relationships (QSPRs), functionality comparisons, clustering, plant biochemistry and taxonomic evaluations. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures

PubMed Central

2010-01-01

Background Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. Description RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. Conclusions RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field. PMID:20459631
RNA FRABASE 2.0: an advanced web-accessible database with the capacity to search the three-dimensional fragments within RNA structures.

PubMed

Popenda, Mariusz; Szachniuk, Marta; Blazewicz, Marek; Wasik, Szymon; Burke, Edmund K; Blazewicz, Jacek; Adamiak, Ryszard W

2010-05-06

Recent discoveries concerning novel functions of RNA, such as RNA interference, have contributed towards the growing importance of the field. In this respect, a deeper knowledge of complex three-dimensional RNA structures is essential to understand their new biological functions. A number of bioinformatic tools have been proposed to explore two major structural databases (PDB, NDB) in order to analyze various aspects of RNA tertiary structures. One of these tools is RNA FRABASE 1.0, the first web-accessible database with an engine for automatic search of 3D fragments within PDB-derived RNA structures. This search is based upon the user-defined RNA secondary structure pattern. In this paper, we present and discuss RNA FRABASE 2.0. This second version of the system represents a major extension of this tool in terms of providing new data and a wide spectrum of novel functionalities. An intuitionally operated web server platform enables very fast user-tailored search of three-dimensional RNA fragments, their multi-parameter conformational analysis and visualization. RNA FRABASE 2.0 has stored information on 1565 PDB-deposited RNA structures, including all NMR models. The RNA FRABASE 2.0 search engine algorithms operate on the database of the RNA sequences and the new library of RNA secondary structures, coded in the dot-bracket format extended to hold multi-stranded structures and to cover residues whose coordinates are missing in the PDB files. The library of RNA secondary structures (and their graphics) is made available. A high level of efficiency of the 3D search has been achieved by introducing novel tools to formulate advanced searching patterns and to screen highly populated tertiary structure elements. RNA FRABASE 2.0 also stores data and conformational parameters in order to provide "on the spot" structural filters to explore the three-dimensional RNA structures. An instant visualization of the 3D RNA structures is provided. RNA FRABASE 2.0 is freely available at http://rnafrabase.cs.put.poznan.pl. RNA FRABASE 2.0 provides a novel database and powerful search engine which is equipped with new data and functionalities that are unavailable elsewhere. Our intention is that this advanced version of the RNA FRABASE will be of interest to all researchers working in the RNA field.
Comprehensive identification and structural characterization of target components from Gelsemium elegans by high-performance liquid chromatography coupled with quadrupole time-of-flight mass spectrometry based on accurate mass databases combined with MS/MS spectra.

PubMed

Liu, Yan-Chun; Xiao, Sa; Yang, Kun; Ling, Li; Sun, Zhi-Liang; Liu, Zhao-Ying

2017-06-01

This study reports an applicable analytical strategy of comprehensive identification and structure characterization of target components from Gelsemium elegans by using high-performance liquid chromatography quadrupole time-of-flight mass spectrometry (LC-QqTOF MS) based on the use of accurate mass databases combined with MS/MS spectra. The databases created included accurate masses and elemental compositions of 204 components from Gelsemium and their structural data. The accurate MS and MS/MS spectra were acquired through data-dependent auto MS/MS mode followed by an extraction of the potential compounds from the LC-QqTOF MS raw data of the sample. The same was matched using the databases to search for targeted components in the sample. The structures for detected components were tentatively characterized by manually interpreting the accurate MS/MS spectra for the first time. A total of 57 components have been successfully detected and structurally characterized from the crude extracts of G. elegans, but has failed to differentiate some isomers. This analytical strategy is generic and efficient, avoids isolation and purification procedures, enables a comprehensive structure characterization of target components of Gelsemium and would be widely applicable for complicated mixtures that are derived from Gelsemium preparations. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
SMALL-SCALE AND GLOBAL DYNAMOS AND THE AREA AND FLUX DISTRIBUTIONS OF ACTIVE REGIONS, SUNSPOT GROUPS, AND SUNSPOTS: A MULTI-DATABASE STUDY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Muñoz-Jaramillo, Andrés; Windmueller, John C.; Amouzou, Ernest C.

2015-02-10

In this work, we take advantage of 11 different sunspot group, sunspot, and active region databases to characterize the area and flux distributions of photospheric magnetic structures. We find that, when taken separately, different databases are better fitted by different distributions (as has been reported previously in the literature). However, we find that all our databases can be reconciled by the simple application of a proportionality constant, and that, in reality, different databases are sampling different parts of a composite distribution. This composite distribution is made up by linear combination of Weibull and log-normal distributions—where a pure Weibull (log-normal) characterizesmore » the distribution of structures with fluxes below (above) 10{sup 21}Mx (10{sup 22}Mx). Additionally, we demonstrate that the Weibull distribution shows the expected linear behavior of a power-law distribution (when extended to smaller fluxes), making our results compatible with the results of Parnell et al. We propose that this is evidence of two separate mechanisms giving rise to visible structures on the photosphere: one directly connected to the global component of the dynamo (and the generation of bipolar active regions), and the other with the small-scale component of the dynamo (and the fragmentation of magnetic structures due to their interaction with turbulent convection)« less
Observational database for studies of nearby universe

NASA Astrophysics Data System (ADS)

Kaisina, E. I.; Makarov, D. I.; Karachentsev, I. D.; Kaisin, S. S.

2012-01-01

We present the description of a database of galaxies of the Local Volume (LVG), located within 10 Mpc around the Milky Way. It contains more than 800 objects. Based on an analysis of functional capabilities, we used the PostgreSQL DBMS as a management system for our LVG database. Applying semantic modelling methods, we developed a physical ER-model of the database. We describe the developed architecture of the database table structure, and the implemented web-access, available at http://www.sao.ru/lv/lvgdb.
Market Pressure and Government Intervention in the Administration and Development of Molecular Databases.

ERIC Educational Resources Information Center

Sillince, J. A. A.; Sillince, M.

1993-01-01

Discusses molecular databases and the role that government and private companies play in their administration and development. Highlights include copyright and patent issues relating to public databases and the information contained in them; data quality; data structures and technological questions; the international organization of molecular…
Thematic video indexing to support video database retrieval and query processing

NASA Astrophysics Data System (ADS)

Khoja, Shakeel A.; Hall, Wendy

1999-08-01

This paper presents a novel video database system, which caters for complex and long videos, such as documentaries, educational videos, etc. As compared to relatively structured format videos like CNN news or commercial advertisements, this database system has the capacity to work with long and unstructured videos.
Teaching Database Management System Use in a Library School Curriculum.

ERIC Educational Resources Information Center

Cooper, Michael D.

1985-01-01

Description of database management systems course being taught to students at School of Library and Information Studies, University of California, Berkeley, notes course structure, assignments, and course evaluation. Approaches to teaching concepts of three types of database systems are discussed and systems used by students in the course are…
Distributed structure-searchable toxicity (DSSTox) public database network: a proposal.

PubMed

Richard, Ann M; Williams, ClarLynda R

2002-01-29

The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, Structure-Activity Relationship (SAR) model development, or building of chemical relational databases (CRD). The distributed structure-searchable toxicity (DSSTox) public database network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: (1) to adopt and encourage the use of a common standard file format (structure data file (SDF)) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; (2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data sources with potential users of these data from other disciplines (such as chemistry, modeling, and computer science); and (3) to engage public/commercial/academic/industry groups in contributing to and expanding this community-wide, public data sharing and distribution effort. The DSSTox project's overall aims are to effect the closer association of chemical structure information with existing toxicity data, and to promote and facilitate structure-based exploration of these data within a common chemistry-based framework that spans toxicological disciplines.
Damming the genomic data flood using a comprehensive analysis and storage data structure

PubMed Central

Bouffard, Marc; Phillips, Michael S.; Brown, Andrew M.K.; Marsh, Sharon; Tardif, Jean-Claude; van Rooij, Tibor

2010-01-01

Data generation, driven by rapid advances in genomic technologies, is fast outpacing our analysis capabilities. Faced with this flood of data, more hardware and software resources are added to accommodate data sets whose structure has not specifically been designed for analysis. This leads to unnecessarily lengthy processing times and excessive data handling and storage costs. Current efforts to address this have centered on developing new indexing schemas and analysis algorithms, whereas the root of the problem lies in the format of the data itself. We have developed a new data structure for storing and analyzing genotype and phenotype data. By leveraging data normalization techniques, database management system capabilities and the use of a novel multi-table, multidimensional database structure we have eliminated the following: (i) unnecessarily large data set size due to high levels of redundancy, (ii) sequential access to these data sets and (iii) common bottlenecks in analysis times. The resulting novel data structure horizontally divides the data to circumvent traditional problems associated with the use of databases for very large genomic data sets. The resulting data set required 86% less disk space and performed analytical calculations 6248 times faster compared to a standard approach without any loss of information. Database URL: http://castor.pharmacogenomics.ca PMID:21159730
G6PDdb, an integrated database of glucose-6-phosphate dehydrogenase (G6PD) mutations.

PubMed

Kwok, Colin J; Martin, Andrew C R; Au, Shannon W N; Lam, Veronica M S

2002-03-01

G6PDdb (http://www.rubic.rdg.ac.uk/g6pd/ or http://www.bioinf.org.uk/g6pd/) is a newly created web-accessible locus-specific mutation database for the human Glucose-6-phosphate dehydrogenase (G6PD) gene. The relational database integrates up-to-date mutational and structural data from various databanks (GenBank, Protein Data Bank, etc.) with biochemically characterized variants and their associated phenotypes obtained from published literature and the Favism website. An automated analysis of the mutations likely to have a significant impact on the structure of the protein has been performed using a recently developed procedure. The database may be queried online and the full results of the analysis of the structural impact of mutations are available. The web page provides a form for submitting additional mutation data and is linked to resources such as the Favism website, OMIM, HGMD, HGVBASE, and the PDB. This database provides insights into the molecular aspects and clinical significance of G6PD deficiency for researchers and clinicians and the web page functions as a knowledge base relevant to the understanding of G6PD deficiency and its management. Copyright 2002 Wiley-Liss, Inc.

Analysing and Rationalising Molecular and Materials Databases Using Machine-Learning

NASA Astrophysics Data System (ADS)

de, Sandip; Ceriotti, Michele

Computational materials design promises to greatly accelerate the process of discovering new or more performant materials. Several collaborative efforts are contributing to this goal by building databases of structures, containing between thousands and millions of distinct hypothetical compounds, whose properties are computed by high-throughput electronic-structure calculations. The complexity and sheer amount of information has made manual exploration, interpretation and maintenance of these databases a formidable challenge, making it necessary to resort to automatic analysis tools. Here we will demonstrate how, starting from a measure of (dis)similarity between database items built from a combination of local environment descriptors, it is possible to apply hierarchical clustering algorithms, as well as dimensionality reduction methods such as sketchmap, to analyse, classify and interpret trends in molecular and materials databases, as well as to detect inconsistencies and errors. Thanks to the agnostic and flexible nature of the underlying metric, we will show how our framework can be applied transparently to different kinds of systems ranging from organic molecules and oligopeptides to inorganic crystal structures as well as molecular crystals. Funded by National Center for Computational Design and Discovery of Novel Materials (MARVEL) and Swiss National Science Foundation.
Adding Hierarchical Objects to Relational Database General-Purpose XML-Based Information Managements

NASA Technical Reports Server (NTRS)

Lin, Shu-Chun; Knight, Chris; La, Tracy; Maluf, David; Bell, David; Tran, Khai Peter; Gawdiak, Yuri

2006-01-01

NETMARK is a flexible, high-throughput software system for managing, storing, and rapid searching of unstructured and semi-structured documents. NETMARK transforms such documents from their original highly complex, constantly changing, heterogeneous data formats into well-structured, common data formats in using Hypertext Markup Language (HTML) and/or Extensible Markup Language (XML). The software implements an object-relational database system that combines the best practices of the relational model utilizing Structured Query Language (SQL) with those of the object-oriented, semantic database model for creating complex data. In particular, NETMARK takes advantage of the Oracle 8i object-relational database model using physical-address data types for very efficient keyword searches of records across both context and content. NETMARK also supports multiple international standards such as WEBDAV for drag-and-drop file management and SOAP for integrated information management using Web services. The document-organization and -searching capabilities afforded by NETMARK are likely to make this software attractive for use in disciplines as diverse as science, auditing, and law enforcement.
The Design and Product of National 1:1000000 Cartographic Data of Topographic Map

NASA Astrophysics Data System (ADS)

Wang, Guizhi

2016-06-01

National administration of surveying, mapping and geoinformation started to launch the project of national fundamental geographic information database dynamic update in 2012. Among them, the 1:50000 database was updated once a year, furthermore the 1:250000 database was downsized and linkage-updated on the basis. In 2014, using the latest achievements of 1:250000 database, comprehensively update the 1:1000000 digital line graph database. At the same time, generate cartographic data of topographic map and digital elevation model data. This article mainly introduce national 1:1000000 cartographic data of topographic map, include feature content, database structure, Database-driven Mapping technology, workflow and so on.
e23D: database and visualization of A-to-I RNA editing sites mapped to 3D protein structures.

PubMed

Solomon, Oz; Eyal, Eran; Amariglio, Ninette; Unger, Ron; Rechavi, Gidi

2016-07-15

e23D, a database of A-to-I RNA editing sites from human, mouse and fly mapped to evolutionary related protein 3D structures, is presented. Genomic coordinates of A-to-I RNA editing sites are converted to protein coordinates and mapped onto 3D structures from PDB or theoretical models from ModBase. e23D allows visualization of the protein structure, modeling of recoding events and orientation of the editing with respect to nearby genomic functional sites from databases of disease causing mutations and genomic polymorphism. http://www.sheba-cancer.org.il/e23D CONTACT: oz.solomon@live.biu.ac.il or Eran.Eyal@sheba.health.gov.il. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Data on publications, structural analyses, and queries used to build and utilize the AlloRep database.

PubMed

Sousa, Filipa L; Parente, Daniel J; Hessman, Jacob A; Chazelle, Allen; Teichmann, Sarah A; Swint-Kruse, Liskin

2016-09-01

The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].
A complete database for the Einstein imaging proportional counter

NASA Technical Reports Server (NTRS)

Helfand, David J.

1991-01-01

A complete database for the Einstein Imaging Proportional Counter (IPC) was completed. The original data that makes up the archive is described as well as the structure of the database, the Op-Ed analysis system, the technical advances achieved relative to the analysis of (IPC) data, the data products produced, and some uses to which the database has been put by scientists outside Columbia University over the past year.
Bioinformatics Approaches to Classifying Allergens and Predicting Cross-Reactivity

PubMed Central

Schein, Catherine H.; Ivanciuc, Ovidiu; Braun, Werner

2007-01-01

The major advances in understanding why patients respond to several seemingly different stimuli have been through the isolation, sequencing and structural analysis of proteins that induce an IgE response. The most significant finding is that allergenic proteins from very different sources can have nearly identical sequences and structures, and that this similarity can account for clinically observed cross-reactivity. The increasing amount of information on the sequence, structure and IgE epitopes of allergens is now available in several databases and powerful bioinformatics search tools allow user access to relevant information. Here, we provide an overview of these databases and describe state-of-the art bioinformatics tools to identify the common proteins that may be at the root of multiple allergy syndromes. Progress has also been made in quantitatively defining characteristics that discriminate allergens from non-allergens. Search and software tools for this purpose have been developed and implemented in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/). SDAP contains information for over 800 allergens and extensive bibliographic references in a relational database with links to other publicly available databases. SDAP is freely available on the Web to clinicians and patients, and can be used to find structural and functional relations among known allergens and to identify potentially cross-reacting antigens. Here we illustrate how these bioinformatics tools can be used to group allergens, and to detect areas that may account for common patterns of IgE binding and cross-reactivity. Such results can be used to guide treatment regimens for allergy sufferers. PMID:17276876
Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites.

PubMed

Ribeiro, António J M; Holliday, Gemma L; Furnham, Nicholas; Tyzack, Jonathan D; Ferris, Katherine; Thornton, Janet M

2018-01-04

M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
An Investigation of the Fine Spatial Structure of Meteor Streams Using the Relational Database ``Meteor''

NASA Astrophysics Data System (ADS)

Karpov, A. V.; Yumagulov, E. Z.

2003-05-01

We have restored and ordered the archive of meteor observations carried out with a meteor radar complex ``KGU-M5'' since 1986. A relational database has been formed under the control of the Database Management System (DBMS) Oracle 8. We also improved and tested a statistical method for studying the fine spatial structure of meteor streams with allowance for the specific features of application of the DBMS. Statistical analysis of the results of observations made it possible to obtain information about the substance distribution in the Quadrantid, Geminid, and Perseid meteor streams.
Crystallography Open Databases and Preservation: a World-wide Initiative

NASA Astrophysics Data System (ADS)

Chateigner, Daniel

In 2003, an international team of crystallographers proposed the Crystallography Open Database (COD), a fully-free collection of crystal structure data, in the aim of ensuring their preservation. With nearly 250000 entries, this database represents a large open set of data for crystallographers, academics and industrials, located at five different places world-wide, and included in Thomson-Reuters’ ISI. As a large step towards data preservation, raw data can now be uploaded along with «digested» structure files, and COD can be questioned by most of the crystallography-linked industrial software. The COD initiative work deserves several other open developments.
Fullerene data mining using bibliometrics and database tomography

PubMed

Kostoff; Braun; Schubert; Toothman; Humenik

2000-01-01

Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.
Organizations in America: Analyzing Their Structures and Human Resource Practices Based on the National Organizations Study.

ERIC Educational Resources Information Center

Kalleberg, Arne L.; Knoke, David; Marsden, Peter V.; Spaeth, Joe L.

In 1991 the National Organizations Study (NOS) surveyed a number of U.S. businesses about their structure, context, and personnel practices to produce a database for answering questions about social behavior in work organizations. This book presents the results of that survey. The study aimed to create a national database on organizations--based…
DSSTOX: NEW ON-LINE RESOURCE FOR PUBLISHING AND INTEGRATING STANDARDIZED STRUCTURE-INCLUSIVE TOXICITY DATABASES

EPA Science Inventory

DSSTox: New On-line Resource for Publishing Structure-Standardized Toxicity Databases

Ann M Richard1, Jamie Burch2, ClarLynda Williams3
1Nat. Health and Environ. Effects Res. Lb, US EP& Ret Triangle Park, NC 27711; 2EPA-NC
Central Univ Student COOP, US EPA, lies. Tri...
System for Configuring Modular Telemetry Transponders

NASA Technical Reports Server (NTRS)

Varnavas, Kosta A. (Inventor); Sims, William Herbert, III (Inventor)

2014-01-01

A system for configuring telemetry transponder cards uses a database of error checking protocol data structures, each containing data to implement at least one CCSDS protocol algorithm. Using a user interface, a user selects at least one telemetry specific error checking protocol from the database. A compiler configures an FPGA with the data from the data structures to implement the error checking protocol.
Correcting the record of structural publications requires joint effort of the community and journal editors.

PubMed

Rupp, Bernhard; Wlodawer, Alexander; Minor, Wladek; Helliwell, John R; Jaskolski, Mariusz

2016-12-01

Seriously flawed and even fictional models of biomolecular crystal structures, although rare, still persist in the record of structural repositories and databases. The ensuing problems of database contamination and persistence of publications based on incorrect structure models must be effectively addressed. The burden cannot be simply left to the critical voices who take the effort to contribute dissenting comments that are mostly ignored. The entire structural biology community, and particularly the journal editors who exercise significant power in this respect, must engage in a constructive dialog lest structural biology lose its credibility as an evidence-based empirical science. © 2016 Federation of European Biochemical Societies.
Data to knowledge: how to get meaning from your result

PubMed Central

Berman, Helen M.; Gabanyi, Margaret J.; Groom, Colin R.; Johnson, John E.; Murshudov, Garib N.; Nicholls, Robert A.; Reddy, Vijay; Schwede, Torsten; Zimmerman, Matthew D.; Westbrook, John; Minor, Wladek

2015-01-01

Structural and functional studies require the development of sophisticated ‘Big Data’ technologies and software to increase the knowledge derived and ensure reproducibility of the data. This paper presents summaries of the Structural Biology Knowledge Base, the VIPERdb Virus Structure Database, evaluation of homology modeling by the Protein Model Portal, the ProSMART tool for conformation-independent structure comparison, the LabDB ‘super’ laboratory information management system and the Cambridge Structural Database. These techniques and technologies represent important tools for the transformation of crystallographic data into knowledge and information, in an effort to address the problem of non-reproducibility of experimental results. PMID:25610627
TOPSAN: a dynamic web database for structural genomics.

PubMed

Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

2011-01-01

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

DOE PAGES

Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; ...

2015-08-28

Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less
PROXiMATE: a database of mutant protein-protein complex thermodynamics and kinetics.

PubMed

Jemimah, Sherlyn; Yugandhar, K; Michael Gromiha, M

2017-09-01

We have developed PROXiMATE, a database of thermodynamic data for more than 6000 missense mutations in 174 heterodimeric protein-protein complexes, supplemented with interaction network data from STRING database, solvent accessibility, sequence, structural and functional information, experimental conditions and literature information. Additional features include complex structure visualization, search and display options, download options and a provision for users to upload their data. The database is freely available at http://www.iitm.ac.in/bioinfo/PROXiMATE/ . The website is implemented in Python, and supports recent versions of major browsers such as IE10, Firefox, Chrome and Opera. gromiha@iitm.ac.in. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
High performance semantic factoring of giga-scale semantic graph databases.

DOE Office of Scientific and Technical Information (OSTI.GOV)

al-Saffar, Sinan; Adolf, Bob; Haglin, David

2010-10-01

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less

MicroUse: The Database on Microcomputer Applications in Libraries and Information Centers.

ERIC Educational Resources Information Center

Chen, Ching-chih; Wang, Xiaochu

1984-01-01

Describes MicroUse, a microcomputer-based database on microcomputer applications in libraries and information centers which was developed using relational database manager dBASE II. The description includes its system configuration, software utilized, the in-house-developed dBASE programs, multifile structure, basic functions, MicroUse records,…
Blending Education and Polymer Science: Semiautomated Creation of a Thermodynamic Property Database

ERIC Educational Resources Information Center

Tchoua, Roselyne B.; Qin, Jian; Audus, Debra J.; Chard, Kyle; Foster, Ian T.; de Pablo, Juan

2016-01-01

Structured databases of chemical and physical properties play a central role in the everyday research activities of scientists and engineers. In materials science, researchers and engineers turn to these databases to quickly query, compare, and aggregate various properties, thereby allowing for the development or application of new materials. The…
A Tutorial in Creating Web-Enabled Databases with Inmagic DB/TextWorks through ODBC.

ERIC Educational Resources Information Center

Breeding, Marshall

2000-01-01

Explains how to create Web-enabled databases. Highlights include Inmagic's DB/Text WebPublisher product called DB/TextWorks; ODBC (Open Database Connectivity) drivers; Perl programming language; HTML coding; Structured Query Language (SQL); Common Gateway Interface (CGI) programming; and examples of HTML pages and Perl scripts. (LRW)
Report on Approaches to Database Translation. Final Report.

ERIC Educational Resources Information Center

Gallagher, Leonard; Salazar, Sandra

This report describes approaches to database translation (i.e., transferring data and data definitions from a source, either a database management system (DBMS) or a batch file, to a target DBMS), and recommends a method for representing the data structures of newly-proposed network and relational data models in a form suitable for database…
An Experimental Investigation of Complexity in Database Query Formulation Tasks

ERIC Educational Resources Information Center

Casterella, Gretchen Irwin; Vijayasarathy, Leo

2013-01-01

Information Technology professionals and other knowledge workers rely on their ability to extract data from organizational databases to respond to business questions and support decision making. Structured query language (SQL) is the standard programming language for querying data in relational databases, and SQL skills are in high demand and are…
The FP4026 Research Database on the fundamental period of RC infilled frame structures.

PubMed

Asteris, Panagiotis G

2016-12-01

The fundamental period of vibration appears to be one of the most critical parameters for the seismic design of buildings because it strongly affects the destructive impact of the seismic forces. In this article, important research data (entitled FP4026 Research Database (Fundamental Period-4026 cases of infilled frames) based on a detailed and in-depth analytical research on the fundamental period of reinforced concrete structures is presented. In particular, the values of the fundamental period which have been analytically determined are presented, taking into account the majority of the involved parameters. This database can be extremely valuable for the development of new code proposals for the estimation of the fundamental period of reinforced concrete structures fully or partially infilled with masonry walls.
Geologic map and map database of parts of Marin, San Francisco, Alameda, Contra Costa, and Sonoma counties, California

USGS Publications Warehouse

Blake, M.C.; Jones, D.L.; Graymer, R.W.; digital database by Soule, Adam

2000-01-01

This digital map database, compiled from previously published and unpublished data, and new mapping by the authors, represents the general distribution of bedrock and surficial deposits in the mapped area. Together with the accompanying text file (mageo.txt, mageo.pdf, or mageo.ps), it provides current information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:62,500 or smaller general distribution of bedrock and surficial deposits in the mapped area. Together with the accompanying text file (mageo.txt, mageo.pdf, or mageo.ps), it provides current information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:62,500 or smaller.
Mass-storage management for distributed image/video archives

NASA Astrophysics Data System (ADS)

Franchi, Santina; Guarda, Roberto; Prampolini, Franco

1993-04-01

The realization of image/video database requires a specific design for both database structures and mass storage management. This issue has addressed the project of the digital image/video database system that has been designed at IBM SEMEA Scientific & Technical Solution Center. Proper database structures have been defined to catalog image/video coding technique with the related parameters, and the description of image/video contents. User workstations and servers are distributed along a local area network. Image/video files are not managed directly by the DBMS server. Because of their wide size, they are stored outside the database on network devices. The database contains the pointers to the image/video files and the description of the storage devices. The system can use different kinds of storage media, organized in a hierarchical structure. Three levels of functions are available to manage the storage resources. The functions of the lower level provide media management. They allow it to catalog devices and to modify device status and device network location. The medium level manages image/video files on a physical basis. It manages file migration between high capacity media and low access time media. The functions of the upper level work on image/video file on a logical basis, as they archive, move and copy image/video data selected by user defined queries. These functions are used to support the implementation of a storage management strategy. The database information about characteristics of both storage devices and coding techniques are used by the third level functions to fit delivery/visualization requirements and to reduce archiving costs.
PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures.

PubMed

Shin, Jae-Min; Cho, Doo-Ho

2005-01-01

PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications.
Rebelling for a Reason: Protein Structural “Outliers”

PubMed Central

Arumugam, Gandhimathi; Nair, Anu G.; Hariharaputran, Sridhar; Ramanathan, Sowdhamini

2013-01-01

Analysis of structural variation in domain superfamilies can reveal constraints in protein evolution which aids protein structure prediction and classification. Structure-based sequence alignment of distantly related proteins, organized in PASS2 database, provides clues about structurally conserved regions among different functional families. Some superfamily members show large structural differences which are functionally relevant. This paper analyses the impact of structural divergence on function for multi-member superfamilies, selected from the PASS2 superfamily alignment database. Functional annotations within superfamilies, with structural outliers or ‘rebels’, are discussed in the context of structural variations. Overall, these data reinforce the idea that functional similarities cannot be extrapolated from mere structural conservation. The implication for fold-function prediction is that the functional annotations can only be inherited with very careful consideration, especially at low sequence identities. PMID:24073209
The Plant Structure Ontology, a Unified Vocabulary of Anatomy and Morphology of a Flowering Plant1[W][OA

PubMed Central

Ilic, Katica; Kellogg, Elizabeth A.; Jaiswal, Pankaj; Zapata, Felipe; Stevens, Peter F.; Vincent, Leszek P.; Avraham, Shulamit; Reiser, Leonore; Pujar, Anuradha; Sachs, Martin M.; Whitman, Noah T.; McCouch, Susan R.; Schaeffer, Mary L.; Ware, Doreen H.; Stein, Lincoln D.; Rhee, Seung Y.

2007-01-01

Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological representation of anatomy and morphology of a flowering plant. The PSO is intended for a broad plant research community, including bench scientists, curators in genomic databases, and bioinformaticians. The initial releases of the PSO integrated existing ontologies for Arabidopsis (Arabidopsis thaliana), maize (Zea mays), and rice (Oryza sativa); more recent versions of the ontology encompass terms relevant to Fabaceae, Solanaceae, additional cereal crops, and poplar (Populus spp.). Databases such as The Arabidopsis Information Resource, Nottingham Arabidopsis Stock Centre, Gramene, MaizeGDB, and SOL Genomics Network are using the PSO to describe expression patterns of genes and phenotypes of mutants and natural variants and are regularly contributing new annotations to the Plant Ontology database. The PSO is also used in specialized public databases, such as BRENDA, GENEVESTIGATOR, NASCArrays, and others. Over 10,000 gene annotations and phenotype descriptions from participating databases can be queried and retrieved using the Plant Ontology browser. The PSO, as well as contributed gene associations, can be obtained at www.plantontology.org. PMID:17142475
Fast 3D shape screening of large chemical databases through alignment-recycling

PubMed Central

Fontaine, Fabien; Bolton, Evan; Borodina, Yulia; Bryant, Stephen H

2007-01-01

Background Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. Results Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. Conclusion Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. PMID:17880744
Distributed Structure-Searchable Toxicity (DSSTox) Database

EPA Pesticide Factsheets

The Distributed Structure-Searchable Toxicity network provides a public forum for publishing downloadable, structure-searchable, standardized chemical structure files associated with chemical inventories or toxicity data sets of environmental relevance.
Databases for Microbiologists

DOE PAGES

Zhulin, Igor B.

2015-05-26

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.
Databases for Microbiologists

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhulin, Igor B.

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. Finally, the purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists.
Data structures and organisation: Special problems in scientific applications

NASA Astrophysics Data System (ADS)

Read, Brian J.

1989-12-01

In this paper we discuss and offer answers to the following questions: What, really, are the benifits of databases in physics? Are scientific databases essentially different from conventional ones? What are the drawbacks of a commercial database management system for use with scientific data? Do they outweigh the advantages? Do databases systems have adequate graphics facilities, or is a separate graphics package necessary? SQL as a standard language has deficiencies, but what are they for scientific data in particular? Indeed, is the relational model appropriate anyway? Or, should we turn to object oriented databases?
Databases for Microbiologists

PubMed Central

2015-01-01

Databases play an increasingly important role in biology. They archive, store, maintain, and share information on genes, genomes, expression data, protein sequences and structures, metabolites and reactions, interactions, and pathways. All these data are critically important to microbiologists. Furthermore, microbiology has its own databases that deal with model microorganisms, microbial diversity, physiology, and pathogenesis. Thousands of biological databases are currently available, and it becomes increasingly difficult to keep up with their development. The purpose of this minireview is to provide a brief survey of current databases that are of interest to microbiologists. PMID:26013493
MetalS(3), a database-mining tool for the identification of structurally similar metal sites.

PubMed

Valasatava, Yana; Rosato, Antonio; Cavallaro, Gabriele; Andreini, Claudia

2014-08-01

We have developed a database search tool to identify metal sites having structural similarity to a query metal site structure within the MetalPDB database of minimal functional sites (MFSs) contained in metal-binding biological macromolecules. MFSs describe the local environment around the metal(s) independently of the larger context of the macromolecular structure. Such a local environment has a determinant role in tuning the chemical reactivity of the metal, ultimately contributing to the functional properties of the whole system. The database search tool, which we called MetalS(3) (Metal Sites Similarity Search), can be accessed through a Web interface at http://metalweb.cerm.unifi.it/tools/metals3/ . MetalS(3) uses a suitably adapted version of an algorithm that we previously developed to systematically compare the structure of the query metal site with each MFS in MetalPDB. For each MFS, the best superposition is kept. All these superpositions are then ranked according to the MetalS(3) scoring function and are presented to the user in tabular form. The user can interact with the output Web page to visualize the structural alignment or the sequence alignment derived from it. Options to filter the results are available. Test calculations show that the MetalS(3) output correlates well with expectations from protein homology considerations. Furthermore, we describe some usage scenarios that highlight the usefulness of MetalS(3) to obtain mechanistic and functional hints regardless of homology.
Operating System Support for Shared Hardware Data Structures

DTIC Science & Technology

2013-01-31

Carbon [73] uses hardware queues to improve fine-grained multitasking for Recognition, Mining , and Synthesis. Compared to software ap- proaches...web transaction processing, data mining , and multimedia. Early work in database processors [114, 96, 79, 111] reduce the costs of relational database...assignment can be solved statically or dynamically. Static assignment deter- mines offline which data structures are assigned to use HWDS resources and at
Managing expectations: assessment of chemistry databases generated by automated extraction of chemical structures from patents.

PubMed

Senger, Stefan; Bartek, Luca; Papadatos, George; Gaulton, Anna

2015-12-01

First public disclosure of new chemical entities often takes place in patents, which makes them an important source of information. However, with an ever increasing number of patent applications, manual processing and curation on such a large scale becomes even more challenging. An alternative approach better suited for this large corpus of documents is the automated extraction of chemical structures. A number of patent chemistry databases generated by using the latter approach are now available but little is known that can help to manage expectations when using them. This study aims to address this by comparing two such freely available sources, SureChEMBL and IBM SIIP (IBM Strategic Intellectual Property Insight Platform), with manually curated commercial databases. When looking at the percentage of chemical structures successfully extracted from a set of patents, using SciFinder as our reference, 59 and 51 % were also found in our comparison in SureChEMBL and IBM SIIP, respectively. When performing this comparison with compounds as starting point, i.e. establishing if for a list of compounds the databases provide the links between chemical structures and patents they appear in, we obtained similar results. SureChEMBL and IBM SIIP found 62 and 59 %, respectively, of the compound-patent pairs obtained from Reaxys. In our comparison of automatically generated vs. manually curated patent chemistry databases, the former successfully provided approximately 60 % of links between chemical structure and patents. It needs to be stressed that only a very limited number of patents and compound-patent pairs were used for our comparison. Nevertheless, our results will hopefully help to manage expectations of users of patent chemistry databases of this type and provide a useful framework for more studies like ours as well as guide future developments of the workflows used for the automated extraction of chemical structures from patents. The challenges we have encountered whilst performing this study highlight that more needs to be done to make such assessments easier. Above all, more adequate, preferably open access to relevant 'gold standards' is required.

Innovative Strategies to Develop Chemical Categories Using a Combination of Structural and Toxicological Properties.

PubMed

Batke, Monika; Gütlein, Martin; Partosch, Falko; Gundert-Remy, Ursula; Helma, Christoph; Kramer, Stefan; Maunz, Andreas; Seeland, Madeleine; Bitsch, Annette

2016-01-01

Interest is increasing in the development of non-animal methods for toxicological evaluations. These methods are however, particularly challenging for complex toxicological endpoints such as repeated dose toxicity. European Legislation, e.g., the European Union's Cosmetic Directive and REACH, demands the use of alternative methods. Frameworks, such as the Read-across Assessment Framework or the Adverse Outcome Pathway Knowledge Base, support the development of these methods. The aim of the project presented in this publication was to develop substance categories for a read-across with complex endpoints of toxicity based on existing databases. The basic conceptual approach was to combine structural similarity with shared mechanisms of action. Substances with similar chemical structure and toxicological profile form candidate categories suitable for read-across. We combined two databases on repeated dose toxicity, RepDose database, and ELINCS database to form a common database for the identification of categories. The resulting database contained physicochemical, structural, and toxicological data, which were refined and curated for cluster analyses. We applied the Predictive Clustering Tree (PCT) approach for clustering chemicals based on structural and on toxicological information to detect groups of chemicals with similar toxic profiles and pathways/mechanisms of toxicity. As many of the experimental toxicity values were not available, this data was imputed by predicting them with a multi-label classification method, prior to clustering. The clustering results were evaluated by assessing chemical and toxicological similarities with the aim of identifying clusters with a concordance between structural information and toxicity profiles/mechanisms. From these chosen clusters, seven were selected for a quantitative read-across, based on a small ratio of NOAEL of the members with the highest and the lowest NOAEL in the cluster (< 5). We discuss the limitations of the approach. Based on this analysis we propose improvements for a follow-up approach, such as incorporation of metabolic information and more detailed mechanistic information. The software enables the user to allocate a substance in a cluster and to use this information for a possible read- across. The clustering tool is provided as a free web service, accessible at http://mlc-reach.informatik.uni-mainz.de.
Motif discovery with data mining in 3D protein structure databases: discovery, validation and prediction of the U-shape zinc binding ("Huf-Zinc") motif.

PubMed

Maurer-Stroh, Sebastian; Gao, He; Han, Hao; Baeten, Lies; Schymkowitz, Joost; Rousseau, Frederic; Zhang, Louxin; Eisenhaber, Frank

2013-02-01

Data mining in protein databases, derivatives from more fundamental protein 3D structure and sequence databases, has considerable unearthed potential for the discovery of sequence motif--structural motif--function relationships as the finding of the U-shape (Huf-Zinc) motif, originally a small student's project, exemplifies. The metal ion zinc is critically involved in universal biological processes, ranging from protein-DNA complexes and transcription regulation to enzymatic catalysis and metabolic pathways. Proteins have evolved a series of motifs to specifically recognize and bind zinc ions. Many of these, so called zinc fingers, are structurally independent globular domains with discontinuous binding motifs made up of residues mostly far apart in sequence. Through a systematic approach starting from the BRIX structure fragment database, we discovered that there exists another predictable subset of zinc-binding motifs that not only have a conserved continuous sequence pattern but also share a characteristic local conformation, despite being included in totally different overall folds. While this does not allow general prediction of all Zn binding motifs, a HMM-based web server, Huf-Zinc, is available for prediction of these novel, as well as conventional, zinc finger motifs in protein sequences. The Huf-Zinc webserver can be freely accessed through this URL (http://mendel.bii.a-star.edu.sg/METHODS/hufzinc/).
Toward public volume database management: a case study of NOVA, the National Online Volumetric Archive

NASA Astrophysics Data System (ADS)

Fletcher, Alex; Yoo, Terry S.

2004-04-01

Public databases today can be constructed with a wide variety of authoring and management structures. The widespread appeal of Internet search engines suggests that public information be made open and available to common search strategies, making accessible information that would otherwise be hidden by the infrastructure and software interfaces of a traditional database management system. We present the construction and organizational details for managing NOVA, the National Online Volumetric Archive. As an archival effort of the Visible Human Project for supporting medical visualization research, archiving 3D multimodal radiological teaching files, and enhancing medical education with volumetric data, our overall database structure is simplified; archives grow by accruing information, but seldom have to modify, delete, or overwrite stored records. NOVA is being constructed and populated so that it is transparent to the Internet; that is, much of its internal structure is mirrored in HTML allowing internet search engines to investigate, catalog, and link directly to the deep relational structure of the collection index. The key organizational concept for NOVA is the Image Content Group (ICG), an indexing strategy for cataloging incoming data as a set structure rather than by keyword management. These groups are managed through a series of XML files and authoring scripts. We cover the motivation for Image Content Groups, their overall construction, authorship, and management in XML, and the pilot results for creating public data repositories using this strategy.
De-identifying an EHR database - anonymity, correctness and readability of the medical record.

PubMed

Pantazos, Kostas; Lauesen, Soren; Lippert, Soren

2011-01-01

Electronic health records (EHR) contain a large amount of structured data and free text. Exploring and sharing clinical data can improve healthcare and facilitate the development of medical software. However, revealing confidential information is against ethical principles and laws. We de-identified a Danish EHR database with 437,164 patients. The goal was to generate a version with real medical records, but related to artificial persons. We developed a de-identification algorithm that uses lists of named entities, simple language analysis, and special rules. Our algorithm consists of 3 steps: collect lists of identifiers from the database and external resources, define a replacement for each identifier, and replace identifiers in structured data and free text. Some patient records could not be safely de-identified, so the de-identified database has 323,122 patient records with an acceptable degree of anonymity, readability and correctness (F-measure of 95%). The algorithm has to be adjusted for each culture, language and database.
Database on Performance of Neutron Irradiated FeCrAl Alloys

DOE Office of Scientific and Technical Information (OSTI.GOV)

Field, Kevin G.; Briggs, Samuel A.; Littrell, Ken

The present report summarizes and discusses the database on radiation tolerance for Generation I, Generation II, and commercial FeCrAl alloys. This database has been built upon mechanical testing and microstructural characterization on selected alloys irradiated within the High Flux Isotope Reactor (HFIR) at Oak Ridge National Laboratory (ORNL) up to doses of 13.8 dpa at temperatures ranging from 200°C to 550°C. The structure and performance of these irradiated alloys were characterized using advanced microstructural characterization techniques and mechanical testing. The primary objective of developing this database is to enhance the rapid development of a mechanistic understanding on the radiation tolerancemore » of FeCrAl alloys, thereby enabling informed decisions on the optimization of composition and microstructure of FeCrAl alloys for application as an accident tolerant fuel (ATF) cladding. This report is structured to provide a brief summary of critical results related to the database on radiation tolerance of FeCrAl alloys.« less
The 2002 RPA Plot Summary database users manual

Treesearch

Patrick D. Miles; John S. Vissage; W. Brad Smith

2004-01-01

Describes the structure of the RPA 2002 Plot Summary database and provides information on generating estimates of forest statistics from these data. The RPA 2002 Plot Summary database provides a consistent framework for storing forest inventory data across all ownerships across the entire United States. The data represents the best available data as of October 2001....
The Research Potential of the Electronic OED Database at the University of Waterloo: A Case Study.

ERIC Educational Resources Information Center

Berg, Donna Lee

1991-01-01

Discusses the history and structure of the online database of the second edition of the Oxford English Dictionary (OED) and the software tools developed at the University of Waterloo to manipulate the unusually complex database. Four sample searches that indicate some types of problems that might be encountered are appended. (DB)
An integrated photogrammetric and spatial database management system for producing fully structured data using aerial and remote sensing images.

PubMed

Ahmadi, Farshid Farnood; Ebadi, Hamid

2009-01-01

3D spatial data acquired from aerial and remote sensing images by photogrammetric techniques is one of the most accurate and economic data sources for GIS, map production, and spatial data updating. However, there are still many problems concerning storage, structuring and appropriate management of spatial data obtained using these techniques. According to the capabilities of spatial database management systems (SDBMSs); direct integration of photogrammetric and spatial database management systems can save time and cost of producing and updating digital maps. This integration is accomplished by replacing digital maps with a single spatial database. Applying spatial databases overcomes the problem of managing spatial and attributes data in a coupled approach. This management approach is one of the main problems in GISs for using map products of photogrammetric workstations. Also by the means of these integrated systems, providing structured spatial data, based on OGC (Open GIS Consortium) standards and topological relations between different feature classes, is possible at the time of feature digitizing process. In this paper, the integration of photogrammetric systems and SDBMSs is evaluated. Then, different levels of integration are described. Finally design, implementation and test of a software package called Integrated Photogrammetric and Oracle Spatial Systems (IPOSS) is presented.
novPTMenzy: a database for enzymes involved in novel post-translational modifications

PubMed Central

Khater, Shradha; Mohanty, Debasisa

2015-01-01

With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459
RPG: the Ribosomal Protein Gene database.

PubMed

Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

2004-01-01

RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes.
RPG: the Ribosomal Protein Gene database

PubMed Central

Nakao, Akihiro; Yoshihama, Maki; Kenmochi, Naoya

2004-01-01

RPG (http://ribosome.miyazaki-med.ac.jp/) is a new database that provides detailed information about ribosomal protein (RP) genes. It contains data from humans and other organisms, including Drosophila melanogaster, Caenorhabditis elegans, Saccharo myces cerevisiae, Methanococcus jannaschii and Escherichia coli. Users can search the database by gene name and organism. Each record includes sequences (genomic, cDNA and amino acid sequences), intron/exon structures, genomic locations and information about orthologs. In addition, users can view and compare the gene structures of the above organisms and make multiple amino acid sequence alignments. RPG also provides information on small nucleolar RNAs (snoRNAs) that are encoded in the introns of RP genes. PMID:14681386
HoPaCI-DB: host-Pseudomonas and Coxiella interaction database

PubMed Central

Bleves, Sophie; Dunger, Irmtraud; Walter, Mathias C.; Frangoulidis, Dimitrios; Kastenmüller, Gabi; Voulhoux, Romé; Ruepp, Andreas

2014-01-01

Bacterial infectious diseases are the result of multifactorial processes affected by the interplay between virulence factors and host targets. The host-Pseudomonas and Coxiella interaction database (HoPaCI-DB) is a publicly available manually curated integrative database (http://mips.helmholtz-muenchen.de/HoPaCI/) of host–pathogen interaction data from Pseudomonas aeruginosa and Coxiella burnetii. The resource provides structured information on 3585 experimentally validated interactions between molecules, bioprocesses and cellular structures extracted from the scientific literature. Systematic annotation and interactive graphical representation of disease networks make HoPaCI-DB a versatile knowledge base for biologists and network biology approaches. PMID:24137008
A database of natural products and chemical entities from marine habitat

PubMed Central

Babu, Padavala Ajay; Puppala, Suma Sree; Aswini, Satyavarapu Lakshmi; Vani, Metta Ramya; Kumar, Chinta Narasimha; Prasanna, Tallapragada

2008-01-01

Marine compound database consists of marine natural products and chemical entities, collected from various literature sources, which are known to possess bioactivity against human diseases. The database is constructed using html code. The 12 categories of 182 compounds are provided with the source, compound name, 2-dimensional structure, bioactivity and clinical trial information. The database is freely available online and can be accessed at http://www.progenebio.in/mcdb/index.htm PMID:19238254
Benzodiazepines and related drugs for insomnia in palliative care.

PubMed

Hirst, A; Sloan, R

2002-01-01

Insomnia, a subjective complaint of poor sleep and associated impairment in daytime function, is a common problem. Currently, benzodiazepines are the most used pharmacological treatment for this complaint. They are considered helpful for occasional short-term use up to four weeks but longer term use is not advised due to potential problems regarding tolerance, dosing escalation, psychological addiction and physical dependence. There is no consensus on their utility in patients with progressive incurable conditions who may require assistance with sleep for many weeks as their condition deteriorates. To assess the effectiveness and safety of benzodiazepines or benzodiazepine receptor agonists such as Zolpidem, Zopiclone and Zaleplon for insomnia in palliative care. Several electronic databases were searched including Cochrane PaPaS Group specialized register, Cochrane Library Issue 4, 2001, MEDLINE, EMBASE, BNI plus, CINAHL, BIOLOGICAL ABSTRACTS, PSYCINFO, CANCERLIT, HEALTHSTAR, WEB OF SCIENCE, SIGLE, Dissertation Abstracts, ZETOC and the MetaRegister of ongoing trials. These were searched from 1960 to 2001 or as much of this range as possible. Additional articles were sought by handsearching reference lists in standard textbooks and reviews in the field and by contacting academic centres in palliative care and pharmaceutical companies. There were no language restrictions. Studies considered for inclusion were randomized controlled trials of adult patients in any setting, receiving palliative care or suffering an incurable progressive medical condition. (For example, cancers, AIDS, Motor Neurone Disease, Multiple Sclerosis, Parkinson's Disease, Chronic Obstructive Pulmonary Disease). There had to be an explicit complaint of insomnia in study participants, diagnosed by any of the three main classification systems (DSM-IV (APA 1994), ICSD (AASD 1990) or ICD (WHO 1992)), or as described in the study if it involved a subjective complaint of poor sleep. Studies had to compare a benzodiazepine or Zolpidem or Zopiclone or Zaleplon with placebo or active control for the treatment of insomnia. Any duration of therapy were considered. Abstracts were independently inspected by both reviewers, full papers were obtained where necessary. Where there was uncertainty advice was sought by a third (PW). Data extraction and quality assessments were undertaken independently by both reviewers. No randomized controlled trials were identified meeting the a priori inclusion criteria. Thirty-seven studies were considered but all were excluded from the review. Despite a comprehensive search no evidence from randomized controlled trials was identified. It was not possible to draw any conclusions regarding the use of benzodiazepines in palliative care.
An ab initio electronic transport database for inorganic materials.

PubMed

Ricci, Francesco; Chen, Wei; Aydemir, Umut; Snyder, G Jeffrey; Rignanese, Gian-Marco; Jain, Anubhav; Hautier, Geoffroy

2017-07-04

Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material's band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present the workflow to generate the data, the data validation procedure, and the database structure. Our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.
Open Access Internet Resources for Nano-Materials Physics Education

NASA Astrophysics Data System (ADS)

Moeck, Peter; Seipel, Bjoern; Upreti, Girish; Harvey, Morgan; Garrick, Will

2006-05-01

Because a great deal of nano-material science and engineering relies on crystalline materials, materials physicists have to provide their own specific contributions to the National Nanotechnology Initiative. Here we briefly review two freely accessible internet-based crystallographic databases, the Nano-Crystallography Database (http://nanocrystallography.research.pdx.edu) and the Crystallography Open Database (http://crystallography.net). Information on over 34,000 full structure determinations are stored in these two databases in the Crystallographic Information File format. The availability of such crystallographic data on the internet in a standardized format allows for all kinds of web-based crystallographic calculations and visualizations. Two examples of which that are dealt with in this paper are: interactive crystal structure visualizations in three dimensions and calculations of lattice-fringe fingerprints for the identification of unknown nanocrystals from their atomic-resolution transmission electron microscopy images.
Large scale database scrubbing using object oriented software components.

PubMed

Herting, R L; Barnes, M R

1998-01-01

Now that case managers, quality improvement teams, and researchers use medical databases extensively, the ability to share and disseminate such databases while maintaining patient confidentiality is paramount. A process called scrubbing addresses this problem by removing personally identifying information while keeping the integrity of the medical information intact. Scrubbing entire databases, containing multiple tables, requires that the implicit relationships between data elements in different tables of the database be maintained. To address this issue we developed DBScrub, a Java program that interfaces with any JDBC compliant database and scrubs the database while maintaining the implicit relationships within it. DBScrub uses a small number of highly configurable object-oriented software components to carry out the scrubbing. We describe the structure of these software components and how they maintain the implicit relationships within the database.
MoonProt: a database for proteins that are known to moonlight

PubMed Central

Mani, Mathew; Chen, Chang; Amblee, Vaishak; Liu, Haipeng; Mathur, Tanu; Zwicke, Grant; Zabad, Shadi; Patel, Bansi; Thakkar, Jagravi; Jeffery, Constance J.

2015-01-01

Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple biochemical functions that are not due to gene fusions, multiple RNA splice variants or pleiotropic effects. The known moonlighting proteins perform a variety of diverse functions in many different cell types and species, and information about their structures and functions is scattered in many publications. We have constructed the manually curated, searchable, internet-based MoonProt Database (http://www.moonlightingproteins.org) with information about the over 200 proteins that have been experimentally verified to be moonlighting proteins. The availability of this organized information provides a more complete picture of what is currently known about moonlighting proteins. The database will also aid researchers in other fields, including determining the functions of genes identified in genome sequencing projects, interpreting data from proteomics projects and annotating protein sequence and structural databases. In addition, information about the structures and functions of moonlighting proteins can be helpful in understanding how novel protein functional sites evolved on an ancient protein scaffold, which can also help in the design of proteins with novel functions. PMID:25324305
SU-E-T-255: Development of a Michigan Quality Assurance (MQA) Database for Clinical Machine Operations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roberts, D

Purpose: A unified database system was developed to allow accumulation, review and analysis of quality assurance (QA) data for measurement, treatment, imaging and simulation equipment in our department. Recording these data in a database allows a unified and structured approach to review and analysis of data gathered using commercial database tools. Methods: A clinical database was developed to track records of quality assurance operations on linear accelerators, a computed tomography (CT) scanner, high dose rate (HDR) afterloader and imaging systems such as on-board imaging (OBI) and Calypso in our department. The database was developed using Microsoft Access database and visualmore » basic for applications (VBA) programming interface. Separate modules were written for accumulation, review and analysis of daily, monthly and annual QA data. All modules were designed to use structured query language (SQL) as the basis of data accumulation and review. The SQL strings are dynamically re-written at run time. The database also features embedded documentation, storage of documents produced during QA activities and the ability to annotate all data within the database. Tests are defined in a set of tables that define test type, specific value, and schedule. Results: Daily, Monthly and Annual QA data has been taken in parallel with established procedures to test MQA. The database has been used to aggregate data across machines to examine the consistency of machine parameters and operations within the clinic for several months. Conclusion: The MQA application has been developed as an interface to a commercially available SQL engine (JET 5.0) and a standard database back-end. The MQA system has been used for several months for routine data collection.. The system is robust, relatively simple to extend and can be migrated to a commercial SQL server.« less
Fragger: a protein fragment picker for structural queries.

PubMed

Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

2017-01-01

Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

CyBy(2): a structure-based data management tool for chemical and biological data.

PubMed

Höck, Stefan; Riedl, Rainer

2012-01-01

We report the development of a powerful data management tool for chemical and biological data: CyBy(2). CyBy(2) is a structure-based information management tool used to store and visualize structural data alongside additional information such as project assignment, physical information, spectroscopic data, biological activity, functional data and synthetic procedures. The application consists of a database, an application server, used to query and update the database, and a client application with a rich graphical user interface (GUI) used to interact with the server.
Knowledge representation in metabolic pathway databases.

PubMed

Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C

2014-05-01

The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.
In-Memory Graph Databases for Web-Scale Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Morari, Alessandro; Weaver, Jesse R.

RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++more » compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.« less
PAMDB: a comprehensive Pseudomonas aeruginosa metabolome database.

PubMed

Huang, Weiliang; Brewer, Luke K; Jones, Jace W; Nguyen, Angela T; Marcu, Ana; Wishart, David S; Oglesby-Sherrouse, Amanda G; Kane, Maureen A; Wilks, Angela

2018-01-04

The Pseudomonas aeruginosaMetabolome Database (PAMDB, http://pseudomonas.umaryland.edu) is a searchable, richly annotated metabolite database specific to P. aeruginosa. P. aeruginosa is a soil organism and significant opportunistic pathogen that adapts to its environment through a versatile energy metabolism network. Furthermore, P. aeruginosa is a model organism for the study of biofilm formation, quorum sensing, and bioremediation processes, each of which are dependent on unique pathways and metabolites. The PAMDB is modelled on the Escherichia coli (ECMDB), yeast (YMDB) and human (HMDB) metabolome databases and contains >4370 metabolites and 938 pathways with links to over 1260 genes and proteins. The database information was compiled from electronic databases, journal articles and mass spectrometry (MS) metabolomic data obtained in our laboratories. For each metabolite entered, we provide detailed compound descriptions, names and synonyms, structural and physiochemical information, nuclear magnetic resonance (NMR) and MS spectra, enzymes and pathway information, as well as gene and protein sequences. The database allows extensive searching via chemical names, structure and molecular weight, together with gene, protein and pathway relationships. The PAMBD and its future iterations will provide a valuable resource to biologists, natural product chemists and clinicians in identifying active compounds, potential biomarkers and clinical diagnostics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Troublesome Crystal Structures: Prevention, Detection, and Resolution

PubMed Central

Harlow, Richard L.

1996-01-01

A large number of incorrect crystal structures is being published today. These structures are proving to be a particular problem to those of us who are interested in comparing structural moieties found in the databases in order to develop structure-property relationships. Problems can reside in the input data, e.g., wrong unit cell or low quality intensity data, or in the structural model, e.g., wrong space group or atom types. Many of the common mistakes are, however, relatively easy to detect and thus should be preventable; at the very least, suspicious structures can be flagged, if not by the authors then by the referees and, ultimately, the crystallographic databases. This article describes some of the more common mistakes and their effects on the resulting structures, lists a series of tests that can be used to detect incorrect structures, and makes a strong plea for the publication of higher quality structures. PMID:27805169
Novel Hybrid Virtual Screening Protocol Based on Molecular Docking and Structure-Based Pharmacophore for Discovery of Methionyl-tRNA Synthetase Inhibitors as Antibacterial Agents

PubMed Central

Liu, Chi; He, Gu; Jiang, Qinglin; Han, Bo; Peng, Cheng

2013-01-01

Methione tRNA synthetase (MetRS) is an essential enzyme involved in protein biosynthesis in all living organisms and is a potential antibacterial target. In the current study, the structure-based pharmacophore (SBP)-guided method has been suggested to generate a comprehensive pharmacophore of MetRS based on fourteen crystal structures of MetRS-inhibitor complexes. In this investigation, a hybrid protocol of a virtual screening method, comprised of pharmacophore model-based virtual screening (PBVS), rigid and flexible docking-based virtual screenings (DBVS), is used for retrieving new MetRS inhibitors from commercially available chemical databases. This hybrid virtual screening approach was then applied to screen the Specs (202,408 compounds) database, a structurally diverse chemical database. Fifteen hit compounds were selected from the final hits and shifted to experimental studies. These results may provide important information for further research of novel MetRS inhibitors as antibacterial agents. PMID:23839093
Remote online monitoring and measuring system for civil engineering structures

NASA Astrophysics Data System (ADS)

Kujawińska, Malgorzata; Sitnik, Robert; Dymny, Grzegorz; Karaszewski, Maciej; Michoński, Kuba; Krzesłowski, Jakub; Mularczyk, Krzysztof; Bolewicki, Paweł

2009-06-01

In this paper a distributed intelligent system for civil engineering structures on-line measurement, remote monitoring, and data archiving is presented. The system consists of a set of optical, full-field displacement sensors connected to a controlling server. The server conducts measurements according to a list of scheduled tasks and stores the primary data or initial results in a remote centralized database. Simultaneously the server performs checks, ordered by the operator, which may in turn result with an alert or a specific action. The structure of whole system is analyzed along with the discussion on possible fields of application and the ways to provide a relevant security during data transport. Finally, a working implementation consisting of a fringe projection, geometrical moiré, digital image correlation and grating interferometry sensors and Oracle XE database is presented. The results from database utilized for on-line monitoring of a threshold value of strain for an exemplary area of interest at the engineering structure are presented and discussed.
amamutdb.no: A relational database for MAN2B1 allelic variants that compiles genotypes, clinical phenotypes, and biochemical and structural data of mutant MAN2B1 in α-mannosidosis.

PubMed

Riise Stensland, Hilde Monica Frostad; Frantzen, Gabrio; Kuokkanen, Elina; Buvang, Elisabeth Kjeldsen; Klenow, Helle Bagterp; Heikinheimo, Pirkko; Malm, Dag; Nilssen, Øivind

2015-06-01

α-Mannosidosis is an autosomal recessive lysosomal storage disorder caused by mutations in the MAN2B1 gene, encoding lysosomal α-mannosidase. The disorder is characterized by a range of clinical phenotypes of which the major manifestations are mental impairment, hearing impairment, skeletal changes, and immunodeficiency. Here, we report an α-mannosidosis mutation database, amamutdb.no, which has been constructed as a publicly accessible online resource for recording and analyzing MAN2B1 variants (http://amamutdb.no). Our aim has been to offer structured and relational information on MAN2B1 mutations and genotypes along with associated clinical phenotypes. Classifying missense mutations, as pathogenic or benign, is a challenge. Therefore, they have been given special attention as we have compiled all available data that relate to their biochemical, functional, and structural properties. The α-mannosidosis mutation database is comprehensive and relational in the sense that information can be retrieved and compiled across datasets; hence, it will facilitate diagnostics and increase our understanding of the clinical and molecular aspects of α-mannosidosis. We believe that the amamutdb.no structure and architecture will be applicable for the development of databases for any monogenic disorder. © 2015 WILEY PERIODICALS, INC.
Technology for organization of the onboard system for processing and storage of ERS data for ultrasmall spacecraft

NASA Astrophysics Data System (ADS)

Strotov, Valery V.; Taganov, Alexander I.; Konkin, Yuriy V.; Kolesenkov, Aleksandr N.

2017-10-01

Task of processing and analysis of obtained Earth remote sensing data on ultra-small spacecraft board is actual taking into consideration significant expenditures of energy for data transfer and low productivity of computers. Thereby, there is an issue of effective and reliable storage of the general information flow obtained from onboard systems of information collection, including Earth remote sensing data, into a specialized data base. The paper has considered peculiarities of database management system operation with the multilevel memory structure. For storage of data in data base the format has been developed that describes a data base physical structure which contains required parameters for information loading. Such structure allows reducing a memory size occupied by data base because it is not necessary to store values of keys separately. The paper has shown architecture of the relational database management system oriented into embedment into the onboard ultra-small spacecraft software. Data base for storage of different information, including Earth remote sensing data, can be developed by means of such database management system for its following processing. Suggested database management system architecture has low requirements to power of the computer systems and memory resources on the ultra-small spacecraft board. Data integrity is ensured under input and change of the structured information.
Misdiagnosis of narcolepsy.

PubMed

Dunne, Laura; Patel, Pallavi; Maschauer, Emily L; Morrison, Ian; Riha, Renata L

2016-12-01

Narcolepsy is a chronic primary sleep disorder, characterized by excessive daytime sleepiness and sleep dysfunction with or without cataplexy. Narcolepsy is uncommon, with a low prevalence rate which makes it difficult to diagnose definitively without a complex series of tests and a detailed history. The aim of this study was to review patients referred to a tertiary sleep centre who had been labelled with a diagnosis of narcolepsy prior to referral in order to assess if the diagnosis was accurate, and if not, to determine the cause of diagnostic misattribution. All patients seen at a sleep centre from 2007-2013 (n = 551) who underwent detailed objective testing including an MSLT PSG, as well as wearing an actigraphy watch and completing a sleep diary for 2 weeks, were assessed for a pre-referral and final diagnosis of narcolepsy. Of the 41 directly referred patients with a diagnostic label of narcolepsy, 19 (46 %) were subsequently confirmed to have narcolepsy on objective testing and assessment by a sleep physician using ICSD-2 criteria. The diagnosis of narcolepsy was incorrectly attributed to almost 50 % of patients labelled with a diagnosis of narcolepsy who were referred for further opinion by a variety of specialists and generalists. Accurate diagnosis of narcolepsy is critical for many reasons, such as the impact it has on quality of life, driving, employment, insurance and pregnancy in women as well as medication management.
TESS: a geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites.

PubMed Central

Wallace, A. C.; Borkakoti, N.; Thornton, J. M.

1997-01-01

It is well established that sequence templates such as those in the PROSITE and PRINTS databases are powerful tools for predicting the biological function and tertiary structure for newly derived protein sequences. The number of X-ray and NMR protein structures is increasing rapidly and it is apparent that a 3D equivalent of the sequence templates is needed. Here, we describe an algorithm called TESS that automatically derives 3D templates from structures deposited in the Brookhaven Protein Data Bank. While a new sequence can be searched for sequence patterns, a new structure can be scanned against these 3D templates to identify functional sites. As examples, 3D templates are derived for enzymes with an O-His-O "catalytic triad" and for the ribonucleases and lysozymes. When these 3D templates are applied to a large data set of nonidentical proteins, several interesting hits are located. This suggests that the development of a 3D template database may help to identify the function of new protein structures, if unknown, as well as to design proteins with specific functions. PMID:9385633
DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY ...

EPA Pesticide Factsheets

The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, SAR model development, or building of chemical relational databases (CRD). The Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: 1) to adopt and encourage the use of a common standard file format (SDF) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; 2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data s
Assessment of imputation methods using varying ecological information to fill the gaps in a tree functional trait database

NASA Astrophysics Data System (ADS)

Poyatos, Rafael; Sus, Oliver; Vilà-Cabrera, Albert; Vayreda, Jordi; Badiella, Llorenç; Mencuccini, Maurizio; Martínez-Vilalta, Jordi

2016-04-01

Plant functional traits are increasingly being used in ecosystem ecology thanks to the growing availability of large ecological databases. However, these databases usually contain a large fraction of missing data because measuring plant functional traits systematically is labour-intensive and because most databases are compilations of datasets with different sampling designs. As a result, within a given database, there is an inevitable variability in the number of traits available for each data entry and/or the species coverage in a given geographical area. The presence of missing data may severely bias trait-based analyses, such as the quantification of trait covariation or trait-environment relationships and may hamper efforts towards trait-based modelling of ecosystem biogeochemical cycles. Several data imputation (i.e. gap-filling) methods have been recently tested on compiled functional trait databases, but the performance of imputation methods applied to a functional trait database with a regular spatial sampling has not been thoroughly studied. Here, we assess the effects of data imputation on five tree functional traits (leaf biomass to sapwood area ratio, foliar nitrogen, maximum height, specific leaf area and wood density) in the Ecological and Forest Inventory of Catalonia, an extensive spatial database (covering 31900 km2). We tested the performance of species mean imputation, single imputation by the k-nearest neighbors algorithm (kNN) and a multiple imputation method, Multivariate Imputation with Chained Equations (MICE) at different levels of missing data (10%, 30%, 50%, and 80%). We also assessed the changes in imputation performance when additional predictors (species identity, climate, forest structure, spatial structure) were added in kNN and MICE imputations. We evaluated the imputed datasets using a battery of indexes describing departure from the complete dataset in trait distribution, in the mean prediction error, in the correlation matrix and in selected bivariate trait relationships. MICE yielded imputations which better preserved the variability and covariance structure of the data and provided an estimate of between-imputation uncertainty. We found that adding species identity as a predictor in MICE and kNN improved imputation for all traits, but adding climate did not lead to any appreciable improvement. However, forest structure and spatial structure did reduce imputation errors in maximum height and in leaf biomass to sapwood area ratios, respectively. Although species mean imputations showed the lowest error for 3 out the 5 studied traits, dataset-averaged errors were lowest for MICE imputations with all additional predictors, when missing data levels were 50% or lower. Species mean imputations always resulted in larger errors in the correlation matrix and appreciably altered the studied bivariate trait relationships. In conclusion, MICE imputations using species identity, climate, forest structure and spatial structure as predictors emerged as the most suitable method of the ones tested here, but it was also evident that imputation performance deteriorates at high levels of missing data (80%).
Biological Databases for Behavioral Neurobiology

PubMed Central

Baker, Erich J.

2014-01-01

Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119
SITEX 2.0: Projections of protein functional sites on eukaryotic genes. Extension with orthologous genes.

PubMed

Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

2017-04-01

Functional sites define the diversity of protein functions and are the central object of research of the structural and functional organization of proteins. The mechanisms underlying protein functional sites emergence and their variability during evolution are distinguished by duplication, shuffling, insertion and deletion of the exons in genes. The study of the correlation between a site structure and exon structure serves as the basis for the in-depth understanding of sites organization. In this regard, the development of programming resources that allow the realization of the mutual projection of exon structure of genes and primary and tertiary structures of encoded proteins is still the actual problem. Previously, we developed the SitEx system that provides information about protein and gene sequences with mapped exon borders and protein functional sites amino acid positions. The database included information on proteins with known 3D structure. However, data with respect to orthologs was not available. Therefore, we added the projection of sites positions to the exon structures of orthologs in SitEx 2.0. We implemented a search through database using site conservation variability and site discontinuity through exon structure. Inclusion of the information on orthologs allowed to expand the possibilities of SitEx usage for solving problems regarding the analysis of the structural and functional organization of proteins. Database URL: http://www-bionet.sscc.ru/sitex/ .
WWW database of optical constants for astronomy

NASA Astrophysics Data System (ADS)

Henning, Th.; Il'In, V. B.; Krivova, N. A.; Michel, B.; Voshchinnikov, N. V.

1999-04-01

The database we announce contains references to the papers, data files and links to the Internet resources related to measurements and calculations of the optical constants of the materials of astronomical interest: different silicates, ices, oxides, sulfides, carbides, carbonaceous species from amorphous carbon to graphite and diamonds, etc. We describe the general structure and content of the database which has now free access via Internet: http://www.astro.spbu.ru/JPDOC/entry.html\\ or \\ http:// www. astro.uni-jena.de/Users/database/entry.html
Evolution of the use of relational and NoSQL databases in the ATLAS experiment

NASA Astrophysics Data System (ADS)

Barberis, D.

2016-09-01

The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
Uses and limitations of registry and academic databases.

PubMed

Williams, William G

2010-01-01

A database is simply a structured collection of information. A clinical database may be a Registry (a limited amount of data for every patient undergoing heart surgery) or Academic (an organized and extensive dataset of an inception cohort of carefully selected subset of patients). A registry and an academic database have different purposes and cost. The data to be collected for a database is defined by its purpose and the output reports required for achieving that purpose. A Registry's purpose is to ensure quality care, an Academic Database, to discover new knowledge through research. A database is only as good as the data it contains. Database personnel must be exceptionally committed and supported by clinical faculty. A system to routinely validate and verify data integrity is essential to ensure database utility. Frequent use of the database improves its accuracy. For congenital heart surgeons, routine use of a Registry Database is an essential component of clinical practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
AbDb: antibody structure database—a database of PDB-derived antibody structures

PubMed Central

Ferdous, Saba

2018-01-01

Abstract In order to analyse structures of proteins of a particular class, these need to be extracted from Protein Data Bank (PDB) files. In the case of antibodies, there are a number of special considerations: (i) identifying antibodies in the PDB is not trivial, (ii) they may be crystallized with or without antigen, (iii) for analysis purposes, one is normally only interested in the Fv region of the antibody, (iv) structural analysis of epitopes, in particular, requires individual antibody–antigen complexes from a PDB file which may contain multiple copies of the same, or different, antibodies and (v) standard numbering schemes should be applied. Consequently, there is a need for a specialist resource containing pre-numbered non-redundant antibody Fv structures with their cognate antigens. We have created an automatically updated resource, AbDb, which collects the Fv regions from antibody structures using information from our SACS database which summarizes antibody structures from the PDB. PDB files containing multiple structures are split and numbered and each antibody structure is associated with its antigen where available. Antibody structures with only light or heavy chains have also been processed and sequences of antibodies are compared to identify multiple structures of the same antibody. The data may be queried on the basis of PDB code, or the name or species of the antibody or antigen, and the complete datasets may be downloaded. Database URL: www.bioinf.org.uk/abs/abdb/ PMID:29718130
ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

PubMed Central

Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

2009-01-01

We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

SeqHound: biological sequence and structure database as a platform for bioinformatics research

PubMed Central

2002-01-01

Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

PubMed

Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

2016-01-01

The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.
Optimization of the efficiency of search operations in the relational database of radio electronic systems

NASA Astrophysics Data System (ADS)

Wajszczyk, Bronisław; Biernacki, Konrad

2018-04-01

The increase of interoperability of radio electronic systems used in the Armed Forces requires the processing of very large amounts of data. Requirements for the integration of information from many systems and sensors, including radar recognition, electronic and optical recognition, force to look for more efficient methods to support information retrieval in even-larger database resources. This paper presents the results of research on methods of improving the efficiency of databases using various types of indexes. The data structure indexing technique is a solution used in RDBMS systems (relational database management system). However, the analysis of the performance of indices, the description of potential applications, and in particular the presentation of a specific scale of performance growth for individual indices are limited to few studies in this field. This paper contains analysis of methods affecting the work efficiency of a relational database management system. As a result of the research, a significant increase in the efficiency of operations on data was achieved through the strategy of indexing data structures. The presentation of the research topic discussed in this paper mainly consists of testing the operation of various indexes against the background of different queries and data structures. The conclusions from the conducted experiments allow to assess the effectiveness of the solutions proposed and applied in the research. The results of the research indicate the existence of a real increase in the performance of operations on data using indexation of data structures. In addition, the level of this growth is presented, broken down by index types.
A Novel Method for Sampling Alpha-Helical Protein Backbones

DOE R&D Accomplishments Database

Fain, Boris; Levitt, Michael

2001-01-01

We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible 3-D arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case the method finds significant numbers of conformations close to the native structure. In addition we assign coordinates to all atoms for 4 of the 25 proteins. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of structures (0.02%--82%) within 6A of the native structure. The method's speed and efficiency make it a valuable contribution towards the goal of predicting protein structure.
ERIC: Sphinx or Golden Griffin?

ERIC Educational Resources Information Center

Lopez, Manuel D.

1989-01-01

Evaluates the Educational Resources Information Center (ERIC) database. Summarizes ERIC's history and organization, and discusses criticisms concerning access, currency, and database content. Reviews role of component clearinghouses, indexing practices, thesaurus structure, international coverage, and comparative studies. Finds ERIC a valuable…
An ab initio electronic transport database for inorganic materials

DOE PAGES

Ricci, Francesco; Chen, Wei; Aydemir, Umut; ...

2017-07-04

Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
An ab initio electronic transport database for inorganic materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ricci, Francesco; Chen, Wei; Aydemir, Umut

Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
Local concurrent error detection and correction in data structures using virtual backpointers

NASA Technical Reports Server (NTRS)

Li, C. C.; Chen, P. P.; Fuchs, W. K.

1987-01-01

A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data structures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared databased of Virtual Double Linked Lists.
Catalytic site identification—a web server to identify catalytic site structural matches throughout PDB

PubMed Central

Kirshner, Daniel A.; Nilmeier, Jerome P.; Lightstone, Felice C.

2013-01-01

The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov. PMID:23680785
Catalytic site identification--a web server to identify catalytic site structural matches throughout PDB.

PubMed

Kirshner, Daniel A; Nilmeier, Jerome P; Lightstone, Felice C

2013-07-01

The catalytic site identification web server provides the innovative capability to find structural matches to a user-specified catalytic site among all Protein Data Bank proteins rapidly (in less than a minute). The server also can examine a user-specified protein structure or model to identify structural matches to a library of catalytic sites. Finally, the server provides a database of pre-calculated matches between all Protein Data Bank proteins and the library of catalytic sites. The database has been used to derive a set of hypothesized novel enzymatic function annotations. In all cases, matches and putative binding sites (protein structure and surfaces) can be visualized interactively online. The website can be accessed at http://catsid.llnl.gov.
Automated extraction of knowledge for model-based diagnostics

NASA Technical Reports Server (NTRS)

Gonzalez, Avelino J.; Myler, Harley R.; Towhidnejad, Massood; Mckenzie, Frederic D.; Kladke, Robin R.

1990-01-01

The concept of accessing computer aided design (CAD) design databases and extracting a process model automatically is investigated as a possible source for the generation of knowledge bases for model-based reasoning systems. The resulting system, referred to as automated knowledge generation (AKG), uses an object-oriented programming structure and constraint techniques as well as internal database of component descriptions to generate a frame-based structure that describes the model. The procedure has been designed to be general enough to be easily coupled to CAD systems that feature a database capable of providing label and connectivity data from the drawn system. The AKG system is capable of defining knowledge bases in formats required by various model-based reasoning tools.
Structure-Based Design of Molecules to Reactivate Tumor-Derived p53 Mutations

DTIC Science & Technology

2007-06-01

cluster in conserved regions or “hot spots” (Hainaut and Hollstein, 2000). Missense mutations leading to amino acid changes are the most common p53...domain stabilization compounds. Analysis of the residue-specific temperature factors of the high resolution core domain structure, coupled with a...second scoring results, 13 compounds (10 from the SPECS database and 3 from the TimTec database) were selected for further analysis using solution
The BIRN Project: Imaging the Nervous System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ellisman, Mark

The grand goal in neuroscience research is to understand how the interplay of structural, chemical and electrical signals in nervous tissue gives rise to behavior. Experimental advances of the past decades have given the individual neuroscientist an increasingly powerful arsenal for obtaining data, from the level of molecules to nervous systems. Scientists have begun the arduous and challenging process of adapting and assembling neuroscience data at all scales of resolution and across disciplines into computerized databases and other easily accessed sources. These databases will complement the vast structural and sequence databases created to catalogue, organize and analyze gene sequences andmore » protein products. The general premise of the neuroscience goal is simple; namely that with "complete" knowledge of the genome and protein structures accruing rapidly we next need to assemble an infrastructure that will facilitate acquisition of an understanding for how functional complexes operate in their cell and tissue contexts.« less
The BIRN Project: Imaging the Nervous System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ellisman, Mark

The grand goal in neuroscience research is to understand how the interplay of structural, chemical and electrical signals in nervous tissue gives rise to behavior. Experimental advances of the past decades have given the individual neuroscientist an increasingly powerful arsenal for obtaining data, from the level of molecules to nervous systems. Scientists have begun the arduous and challenging process of adapting and assembling neuroscience data at all scales of resolution and across disciplines into computerized databases and other easily accessed sources. These databases will complement the vast structural and sequence databases created to catalogue, organize and analyze gene sequences andmore » protein products. The general premise of the neuroscience goal is simple; namely that with 'complete' knowledge of the genome and protein structures accruing rapidly we next need to assemble an infrastructure that will facilitate acquisition of an understanding for how functional complexes operate in their cell and tissue contexts.« less
E-MSD: improving data deposition and structure quality.

PubMed

Tagari, M; Tate, J; Swaminathan, G J; Newman, R; Naim, A; Vranken, W; Kapopoulou, A; Hussain, A; Fillon, J; Henrick, K; Velankar, S

2006-01-01

The Macromolecular Structure Database (MSD) (http://www.ebi.ac.uk/msd/) [H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Golovin, K. Henrick, A. Hussain, J. Ionides, M. John, P. A. Keller, E. Krissinel et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458-462.] group is one of the three partners in the worldwide Protein DataBank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data [H. Berman, K. Henrick and H. Nakamura (2003) Announcing the worldwide Protein Data Bank. Nature Struct. Biol., 10, 980.]. Since its inception, the MSD group has worked with partners around the world to improve the quality of PDB data, through a clean up programme that addresses inconsistencies and inaccuracies in the legacy archive. The improvements in data quality in the legacy archive have been achieved largely through the creation of a unified data archive, in the form of a relational database that stores all of the data in the wwPDB. The three partners are working towards improving the tools and methods for the deposition of new data by the community at large. The implementation of the MSD database, together with the parallel development of improved tools and methodologies for data harvesting, validation and archival, has lead to significant improvements in the quality of data that enters the archive. Through this and related projects in the NMR and EM realms the MSD continues to improve the quality of publicly available structural data.
The ATLAS conditions database architecture for the Muon spectrometer

NASA Astrophysics Data System (ADS)

Verducci, Monica; ATLAS Muon Collaboration

2010-04-01

The Muon System, facing the challenge requirement of the conditions data storage, has extensively started to use the conditions database project 'COOL' as the basis for all its conditions data storage both at CERN and throughout the worldwide collaboration as decided by the ATLAS Collaboration. The management of the Muon COOL conditions database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored. The Muon conditions database is responsible for almost all of the 'non event' data and detector quality flags storage needed for debugging of the detector operations and for performing reconstruction and analysis. The COOL database allows database applications to be written independently of the underlying database technology and ensures long term compatibility with the entire ATLAS Software. COOL implements an interval of validity database, i.e. objects stored or referenced in COOL have an associated start and end time between which they are valid, the data is stored in folders, which are themselves arranged in a hierarchical structure of folder sets. The structure is simple and mainly optimized to store and retrieve object(s) associated with a particular time. In this work, an overview of the entire Muon conditions database architecture is given, including the different sources of the data and the storage model used. In addiction the software interfaces used to access to the conditions data are described, more emphasis is given to the Offline Reconstruction framework ATHENA and the services developed to provide the conditions data to the reconstruction.
BIOSPIDA: A Relational Database Translator for NCBI.

PubMed

Hagen, Matthew S; Lee, Eva K

2010-11-13

As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time.
A PATO-compliant zebrafish screening database (MODB): management of morpholino knockdown screen information.

PubMed

Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C

2008-01-07

The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
DPTEdb, an integrative database of transposable elements in dioecious plants.

PubMed

Li, Shu-Fen; Zhang, Guo-Jun; Zhang, Xue-Jin; Yuan, Jin-Hong; Deng, Chuan-Liang; Gu, Lian-Feng; Gao, Wu-Jun

2016-01-01

Dioecious plants usually harbor 'young' sex chromosomes, providing an opportunity to study the early stages of sex chromosome evolution. Transposable elements (TEs) are mobile DNA elements frequently found in plants and are suggested to play important roles in plant sex chromosome evolution. The genomes of several dioecious plants have been sequenced, offering an opportunity to annotate and mine the TE data. However, comprehensive and unified annotation of TEs in these dioecious plants is still lacking. In this study, we constructed a dioecious plant transposable element database (DPTEdb). DPTEdb is a specific, comprehensive and unified relational database and web interface. We used a combination of de novo, structure-based and homology-based approaches to identify TEs from the genome assemblies of previously published data, as well as our own. The database currently integrates eight dioecious plant species and a total of 31 340 TEs along with classification information. DPTEdb provides user-friendly web interfaces to browse, search and download the TE sequences in the database. Users can also use tools, including BLAST, GetORF, HMMER, Cut sequence and JBrowse, to analyze TE data. Given the role of TEs in plant sex chromosome evolution, the database will contribute to the investigation of TEs in structural, functional and evolutionary dynamics of the genome of dioecious plants. In addition, the database will supplement the research of sex diversification and sex chromosome evolution of dioecious plants.Database URL: http://genedenovoweb.ticp.net:81/DPTEdb/index.php. © The Author(s) 2016. Published by Oxford University Press.
Construction typification as the tool for optimizing the functioning of a robotized manufacturing system

NASA Astrophysics Data System (ADS)

Gwiazda, A.; Banas, W.; Sekala, A.; Foit, K.; Hryniewicz, P.; Kost, G.

2015-11-01

Process of workcell designing is limited by different constructional requirements. They are related to technological parameters of manufactured element, to specifications of purchased elements of a workcell and to technical characteristics of a workcell scene. This shows the complexity of the design-constructional process itself. The results of such approach are individually designed workcell suitable to the specific location and specific production cycle. Changing this parameters one must rebuild the whole configuration of a workcell. Taking into consideration this it is important to elaborate the base of typical elements of a robot kinematic chain that could be used as the tool for building Virtual modelling of kinematic chains of industrial robots requires several preparatory phase. Firstly, it is important to create a database element, which will be models of industrial robot arms. These models could be described as functional primitives that represent elements between components of the kinematic pairs and structural members of industrial robots. A database with following elements is created: the base kinematic pairs, the base robot structural elements, the base of the robot work scenes. The first of these databases includes kinematic pairs being the key component of the manipulator actuator modules. Accordingly, as mentioned previously, it includes the first stage rotary pair of fifth stage. This type of kinematic pairs was chosen due to the fact that it occurs most frequently in the structures of industrial robots. Second base consists of structural robot elements therefore it allows for the conversion of schematic structures of kinematic chains in the structural elements of the arm of industrial robots. It contains, inter alia, the structural elements such as base, stiff members - simple or angular units. They allow converting recorded schematic three-dimensional elements. Last database is a database of scenes. It includes elements of both simple and complex: simple models of technological equipment, conveyors models, models of the obstacles and like that. Using these elements it could be formed various production spaces (robotized workcells), in which it is possible to virtually track the operation of an industrial robot arm modelled in the system.

Trends in maar crater size and shape using the global Maar Volcano Location and Shape (MaarVLS) database

NASA Astrophysics Data System (ADS)

Graettinger, A. H.

2018-05-01

A maar crater is the top of a much larger subsurface diatreme structure produced by phreatomagmatic explosions and the size and shape of the crater reflects the growth history of that structure during an eruption. Recent experimental and geophysical research has shown that crater complexity can reflect subsurface complexity. Morphometry provides a means of characterizing a global population of maar craters in order to establish the typical size and shape of features. A global database of Quaternary maar crater planform morphometry indicates that maar craters are typically not circular and frequently have compound shapes resembling overlapping circles. Maar craters occur in volcanic fields that contain both small volume and complex volcanoes. The global perspective provided by the database shows that maars are common in many volcanic and tectonic settings producing a similar diversity of size and shape within and between volcanic fields. A few exceptional populations of maars were revealed by the database, highlighting directions of future research to improve our understanding on the geometry and spacing of subsurface explosions that produce maars. These outlying populations, such as anomalously large craters (>3000 m), chains of maars, and volcanic fields composed of mostly maar craters each represent a small portion of the database, but provide opportunities to reinvestigate fundamental questions on maar formation. Maar crater morphometry can be integrated with structural, hydrological studies to investigate lateral migration of phreatomagmatic explosion location in the subsurface. A comprehensive database of intact maar morphometry is also beneficial for the hunt for maar-diatremes on other planets.
[National Database of Genotypes--ethical and legal issues].

PubMed

Franková, Vera; Tesínová, Jolana; Brdicka, Radim

2011-01-01

National Database of Genotypes--ethical and legal issues The aim of the project National Database of Genotypes is to outline structure and rules for the database operation collecting information about genotypes of individual persons. The database should be used entirely for health care. Its purpose is to enable physicians to gain quick and easy access to the information about persons requiring specialized care due to their genetic constitution. In the future, another introduction of new genetic tests into the clinical practice can be expected thus the database of genotypes facilitates substantial financial savings by exclusion of duplicates of the expensive genetic testing. Ethical questions connected with the creating and functioning of such database concern mainly privacy protection, confidentiality of personal sensitive data, protection of database from misuse, consent with participation and public interests. Due to necessity of correct interpretation by qualified professional (= clinical geneticist), particular categorization of genetic data within the database is discussed. The function of proposed database has to be governed in concordance with the Czech legislation together with solving ethical problems.
OrChem - An open source chemistry search engine for Oracle(R).

PubMed

Rijnbeek, Mark; Steinbeck, Christoph

2009-10-22

Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net.
sc-PDB: a 3D-database of ligandable binding sites—10 years on

PubMed Central

Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

2015-01-01

The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483
RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

PubMed

Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

2016-10-07

RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .
The carbohydrate sequence markup language (CabosML): an XML description of carbohydrate structures.

PubMed

Kikuchi, Norihiro; Kameyama, Akihiko; Nakaya, Shuuichi; Ito, Hiromi; Sato, Takashi; Shikanai, Toshihide; Takahashi, Yoriko; Narimatsu, Hisashi

2005-04-15

Bioinformatics resources for glycomics are very poor as compared with those for genomics and proteomics. The complexity of carbohydrate sequences makes it difficult to define a common language to represent them, and the development of bioinformatics tools for glycomics has not progressed. In this study, we developed a carbohydrate sequence markup language (CabosML), an XML description of carbohydrate structures. The language definition (XML Schema) and an experimental database of carbohydrate structures using an XML database management system are available at http://www.phoenix.hydra.mki.co.jp/CabosDemo.html kikuchi@hydra.mki.co.jp.
The history of the CATH structural classification of protein domains.

PubMed

Sillitoe, Ian; Dawson, Natalie; Thornton, Janet; Orengo, Christine

2015-12-01

This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
The Protein-DNA Interface database

PubMed Central

2010-01-01

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 Å or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface. We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes. PMID:20482798
The Protein-DNA Interface database.

PubMed

Norambuena, Tomás; Melo, Francisco

2010-05-18

The Protein-DNA Interface database (PDIdb) is a repository containing relevant structural information of Protein-DNA complexes solved by X-ray crystallography and available at the Protein Data Bank. The database includes a simple functional classification of the protein-DNA complexes that consists of three hierarchical levels: Class, Type and Subtype. This classification has been defined and manually curated by humans based on the information gathered from several sources that include PDB, PubMed, CATH, SCOP and COPS. The current version of the database contains only structures with resolution of 2.5 A or higher, accounting for a total of 922 entries. The major aim of this database is to contribute to the understanding of the main rules that underlie the molecular recognition process between DNA and proteins. To this end, the database is focused on each specific atomic interface rather than on the separated binding partners. Therefore, each entry in this database consists of a single and independent protein-DNA interface.We hope that PDIdb will be useful to many researchers working in fields such as the prediction of transcription factor binding sites in DNA, the study of specificity determinants that mediate enzyme recognition events, engineering and design of new DNA binding proteins with distinct binding specificity and affinity, among others. Finally, due to its friendly and easy-to-use web interface, we hope that PDIdb will also serve educational and teaching purposes.
[Construction and application of special analysis database of geoherbs based on 3S technology].

PubMed

Guo, Lan-ping; Huang, Lu-qi; Lv, Dong-mei; Shao, Ai-juan; Wang, Jian

2007-09-01

In this paper,the structures, data sources, data codes of "the spacial analysis database of geoherbs" based 3S technology are introduced, and the essential functions of the database, such as data management, remote sensing, spacial interpolation, spacial statistics, spacial analysis and developing are described. At last, two examples for database usage are given, the one is classification and calculating of NDVI index of remote sensing image in geoherbal area of Atractylodes lancea, the other one is adaptation analysis of A. lancea. These indicate that "the spacial analysis database of geoherbs" has bright prospect in spacial analysis of geoherbs.
Overview of Nuclear Physics Data: Databases, Web Applications and Teaching Tools

NASA Astrophysics Data System (ADS)

McCutchan, Elizabeth

2017-01-01

The mission of the United States Nuclear Data Program (USNDP) is to provide current, accurate, and authoritative data for use in pure and applied areas of nuclear science and engineering. This is accomplished by compiling, evaluating, and disseminating extensive datasets. Our main products include the Evaluated Nuclear Structure File (ENSDF) containing information on nuclear structure and decay properties and the Evaluated Nuclear Data File (ENDF) containing information on neutron-induced reactions. The National Nuclear Data Center (NNDC), through the website www.nndc.bnl.gov, provides web-based retrieval systems for these and many other databases. In addition, the NNDC hosts several on-line physics tools, useful for calculating various quantities relating to basic nuclear physics. In this talk, I will first introduce the quantities which are evaluated and recommended in our databases. I will then outline the searching capabilities which allow one to quickly and efficiently retrieve data. Finally, I will demonstrate how the database searches and web applications can provide effective teaching tools concerning the structure of nuclei and how they interact. Work supported by the Office of Nuclear Physics, Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-98CH10886.
The Tropical Biominer Project: mining old sources for new drugs.

PubMed

Artiguenave, François; Lins, André; Maciel, Wesley Dias; Junior, Antonio Celso Caldeira; Nacif-Coelho, Carla; de Souza Linhares, Maria Margarida Ribeiro; de Oliveira, Guilherme Correa; Barbosa, Luis Humberto Rezende; Lopes, Júlio César Dias; Junior, Claudionor Nunes Coelho

2005-01-01

The Tropical Biominer Project is a recent initiative from the Federal University of Minas Gerais (UFMG) and the Oswaldo Cruz foundation, with the participation of the Biominas Foundation (Belo Horizonte, Minas Gerais, Brazil) and the start-up Homologix. The main objective of the project is to build a new resource for the chemogenomics research, on chemical compounds, with a strong emphasis on natural molecules. Adopted technologies include the search of information from structured, semi-structured, and non-structured documents (the last two from the web) and datamining tools in order to gather information from different sources. The database is the support for developing applications to find new potential treatments for parasitic infections by using virtual screening tools. We present here the midpoint of the project: the conception and implementation of the Tropical Biominer Database. This is a Federated Database designed to store data from different resources. Connected to the database, a web crawler is able to gather information from distinct, patented web sites and store them after automatic classification using datamining tools. Finally, we demonstrate the interest of the approach, by formulating new hypotheses on specific targets of a natural compound, violacein, using inferences from a Virtual Screening procedure.
Flight Deck Interval Management Display. [Elements, Information and Annunciations Database User Guide

NASA Technical Reports Server (NTRS)

Lancaster, Jeff; Dillard, Michael; Alves, Erin; Olofinboba, Olu

2014-01-01

The User Guide details the Access Database provided with the Flight Deck Interval Management (FIM) Display Elements, Information, & Annunciations program. The goal of this User Guide is to support ease of use and the ability to quickly retrieve and select items of interest from the Database. The Database includes FIM Concepts identified in a literature review preceding the publication of this document. Only items that are directly related to FIM (e.g., spacing indicators), which change or enable FIM (e.g., menu with control buttons), or which are affected by FIM (e.g., altitude reading) are included in the database. The guide has been expanded from previous versions to cover database structure, content, and search features with voiced explanations.
Time and Space Efficient Algorithms for Two-Party Authenticated Data Structures

NASA Astrophysics Data System (ADS)

Papamanthou, Charalampos; Tamassia, Roberto

Authentication is increasingly relevant to data management. Data is being outsourced to untrusted servers and clients want to securely update and query their data. For example, in database outsourcing, a client's database is stored and maintained by an untrusted server. Also, in simple storage systems, clients can store very large amounts of data but at the same time, they want to assure their integrity when they retrieve them. In this paper, we present a model and protocol for two-party authentication of data structures. Namely, a client outsources its data structure and verifies that the answers to the queries have not been tampered with. We provide efficient algorithms to securely outsource a skip list with logarithmic time overhead at the server and client and logarithmic communication cost, thus providing an efficient authentication primitive for outsourced data, both structured (e.g., relational databases) and semi-structured (e.g., XML documents). In our technique, the client stores only a constant amount of space, which is optimal. Our two-party authentication framework can be deployed on top of existing storage applications, thus providing an efficient authentication service. Finally, we present experimental results that demonstrate the practical efficiency and scalability of our scheme.
RNA Bricks—a database of RNA 3D motifs and their interactions

PubMed Central

Chojnowski, Grzegorz; Waleń, Tomasz; Bujnicki, Janusz M.

2014-01-01

The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions. PMID:24220091
[Locus-controlling regions: description in the LCR-TRRD data base].

PubMed

Podkolodnaia, O A; Levitskiĭ, V G; Podkolodnyĭ, N L

2001-01-01

The structural and functional organization of locus control regions (LCR) was analyzed using data of the LCR-TRR Database. The role of several transcription factors in the LCR function was considered. A study was made of the possible nucleosomal packing of enhancer regions in LCR. The structure and the format of LCR-TRRD are described. The database has been constructed for SRS and is available at http://wwwmgs.bionet.nsc.ru/mgs/dbase/LCR/.
Group updates Gravity Database for central Andes

NASA Astrophysics Data System (ADS)

MIGRA Group; Götze, H.-J.

Between 1993 and 1995 a group of scientists from Chile, Argentina, and Germany incorporated some 2000 new gravity observations into a database that covers a remote region of the Central Andes in northern Chile and northwestern Argentina (between 64°-71°W and 20°-29°S). The database can be used to study the structure and evolution of the Andes. About 14,000 gravity values are included in the database, including older, reprocessed data. Researchers at universities or governmental agencies are welcome to use the data for noncommercial purposes.
Online Patent Searching: The Realities.

ERIC Educational Resources Information Center

Kaback, Stuart M.

1983-01-01

Considers patent subject searching capabilities of major online databases, noting patent claims, "deep-indexed" files, test searches, retrieval of related references, multi-database searching, improvements needed in indexing of chemical structures, full text searching, improvements needed in handling numerical data, and augmenting a…
Overview of open resources to support automated structure verification and elucidation

EPA Science Inventory

Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software...
SSEP: secondary structural elements of proteins

PubMed Central

Shanthi, V.; Selvarani, P.; Kiran Kumar, Ch.; Mohire, C. S.; Sekar, K.

2003-01-01

SSEP is a comprehensive resource for accessing information related to the secondary structural elements present in the 25 and 90% non-redundant protein chains. The database contains 1771 protein chains from 1670 protein structures and 6182 protein chains from 5425 protein structures in 25 and 90% non-redundant protein chains, respectively. The current version provides information about the α-helical segments and β-strand fragments of varying lengths. In addition, it also contains the information about 310-helix, β- and ν-turns and hairpin loops. The free graphics program RASMOL has been interfaced with the search engine to visualize the three-dimensional structures of the user queried secondary structural fragment. The database is updated regularly and is available through Bioinformatics web server at http://cluster.physics.iisc.ernet.in/ssep/ or http://144.16.71.148/ssep/. PMID:12824336

DWARF – a data warehouse system for analyzing protein families

PubMed Central

Fischer, Markus; Thai, Quan K; Grieb, Melanie; Pleiss, Jürgen

2006-01-01

Background The emerging field of integrative bioinformatics provides the tools to organize and systematically analyze vast amounts of highly diverse biological data and thus allows to gain a novel understanding of complex biological systems. The data warehouse DWARF applies integrative bioinformatics approaches to the analysis of large protein families. Description The data warehouse system DWARF integrates data on sequence, structure, and functional annotation for protein fold families. The underlying relational data model consists of three major sections representing entities related to the protein (biochemical function, source organism, classification to homologous families and superfamilies), the protein sequence (position-specific annotation, mutant information), and the protein structure (secondary structure information, superimposed tertiary structure). Tools for extracting, transforming and loading data from public available resources (ExPDB, GenBank, DSSP) are provided to populate the database. The data can be accessed by an interface for searching and browsing, and by analysis tools that operate on annotation, sequence, or structure. We applied DWARF to the family of α/β-hydrolases to host the Lipase Engineering database. Release 2.3 contains 6138 sequences and 167 experimentally determined protein structures, which are assigned to 37 superfamilies 103 homologous families. Conclusion DWARF has been designed for constructing databases of large structurally related protein families and for evaluating their sequence-structure-function relationships by a systematic analysis of sequence, structure and functional annotation. It has been applied to predict biochemical properties from sequence, and serves as a valuable tool for protein engineering. PMID:17094801
Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

PubMed

Hsing, Michael; Cherkasov, Artem

2008-06-25

Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
FRASS: the web-server for RNA structural comparison

PubMed Central

2010-01-01

Background The impressive increase of novel RNA structures, during the past few years, demands automated methods for structure comparison. While many algorithms handle only small motifs, few techniques, developed in recent years, (ARTS, DIAL, SARA, SARSA, and LaJolla) are available for the structural comparison of large and intact RNA molecules. Results The FRASS web-server represents a RNA chain with its Gauss integrals and allows one to compare structures of RNA chains and to find similar entries in a database derived from the Protein Data Bank. We observed that FRASS scores correlate well with the ARTS and LaJolla similarity scores. Moreover, the-web server can also reproduce satisfactorily the DARTS classification of RNA 3D structures and the classification of the SCOR functions that was obtained by the SARA method. Conclusions The FRASS web-server can be easily used to detect relationships among RNA molecules and to scan efficiently the rapidly enlarging structural databases. PMID:20553602
The Cambridge Structural Database: a quarter of a million crystal structures and rising.

PubMed

Allen, Frank H

2002-06-01

The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-molecule crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chemical information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73,000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500,000 crystal structures by the year 2010.
Applications of GIS and database technologies to manage a Karst Feature Database

USGS Publications Warehouse

Gao, Y.; Tipping, R.G.; Alexander, E.C.

2006-01-01

This paper describes the management of a Karst Feature Database (KFD) in Minnesota. Two sets of applications in both GIS and Database Management System (DBMS) have been developed for the KFD of Minnesota. These applications were used to manage and to enhance the usability of the KFD. Structured Query Language (SQL) was used to manipulate transactions of the database and to facilitate the functionality of the user interfaces. The Database Administrator (DBA) authorized users with different access permissions to enhance the security of the database. Database consistency and recovery are accomplished by creating data logs and maintaining backups on a regular basis. The working database provides guidelines and management tools for future studies of karst features in Minnesota. The methodology of designing this DBMS is applicable to develop GIS-based databases to analyze and manage geomorphic and hydrologic datasets at both regional and local scales. The short-term goal of this research is to develop a regional KFD for the Upper Mississippi Valley Karst and the long-term goal is to expand this database to manage and study karst features at national and global scales.
Gas Chromatography-Tandem Mass Spectrometry of Lignin Pyrolyzates with Dopant-Assisted Atmospheric Pressure Chemical Ionization and Molecular Structure Search with CSI:FingerID

NASA Astrophysics Data System (ADS)

Larson, Evan A.; Hutchinson, Carolyn P.; Lee, Young Jin

2018-06-01

Dopant-assisted atmospheric pressure chemical ionization (dAPCI) is a soft ionization method rarely used for gas chromatography-mass spectrometry (GC-MS). The current study combines GC-dAPCI with tandem mass spectrometry (MS/MS) for analysis of a complex mixture such as lignin pyrolysis analysis. To identify the structures of volatile lignin pyrolysis products, collision-induced dissociation (CID) MS/MS using a quadrupole time-of-flight mass spectrometer (QTOFMS) and pseudo MS/MS through in-source collision-induced dissociation (ISCID) using a single stage TOFMS are utilized. To overcome the lack of MS/MS database, Compound Structure Identification (CSI):FingerID is used to interpret CID spectra and predict best matched structures from PubChem library. With this approach, a total of 59 compounds were positively identified in comparison to only 22 in NIST database search of GC-EI-MS dataset. This study demonstrates the effectiveness of GC-dAPCI-MS/MS to overcome the limitations of traditional GC-EI-MS analysis when EI-MS database is not sufficient. [Figure not available: see fulltext.
BIOSPIDA: A Relational Database Translator for NCBI

PubMed Central

Hagen, Matthew S.; Lee, Eva K.

2010-01-01

As the volume and availability of biological databases continue widespread growth, it has become increasingly difficult for research scientists to identify all relevant information for biological entities of interest. Details of nucleotide sequences, gene expression, molecular interactions, and three-dimensional structures are maintained across many different databases. To retrieve all necessary information requires an integrated system that can query multiple databases with minimized overhead. This paper introduces a universal parser and relational schema translator that can be utilized for all NCBI databases in Abstract Syntax Notation (ASN.1). The data models for OMIM, Entrez-Gene, Pubmed, MMDB and GenBank have been successfully converted into relational databases and all are easily linkable helping to answer complex biological questions. These tools facilitate research scientists to locally integrate databases from NCBI without significant workload or development time. PMID:21347013
Object-Oriented Approach to Integrating Database Semantics. Volume 4.

DTIC Science & Technology

1987-12-01

schemata for; 1. Object Classification Shema -- Entities 2. Object Structure and Relationship Schema -- Relations 3. Operation Classification and... relationships are represented in a database is non- intuitive for naive users. *It is difficult to access and combine information in multiple databases. In this...from the CURRENT-.CLASSES table. Choosing a selected item do-selects it. Choose 0 to exit. 1. STUDENTS 2. CUR~RENT-..CLASSES 3. MANAGMNT -.CLASS
Analysis and Development of a Web-Enabled Planning and Scheduling Database Application

DTIC Science & Technology

2013-09-01

establishes an entity—relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web- enabled interface for...development, develop, design, process, re- engineering, reengineering, MySQL , structured query language, SQL, myPHPadmin. 15. NUMBER OF PAGES 107 16...relationship diagram for the desired process, constructs an operable database using MySQL , and provides a web-enabled interface for the population of
Assembly: a resource for assembled genomes at NCBI

PubMed Central

Kitts, Paul A.; Church, Deanna M.; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G.; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D.; Pruitt, Kim D.; Kimchi, Avi

2016-01-01

The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site. PMID:26578580
Database-Guided Discovery of Potent Peptides to Combat HIV-1 or Superbugs

PubMed Central

Wang, Guangshun

2013-01-01

Antimicrobial peptides (AMPs), small host defense proteins, are indispensable for the protection of multicellular organisms such as plants and animals from infection. The number of AMPs discovered per year increased steadily since the 1980s. Over 2,000 natural AMPs from bacteria, protozoa, fungi, plants, and animals have been registered into the antimicrobial peptide database (APD). The majority of these AMPs (>86%) possess 11–50 amino acids with a net charge from 0 to +7 and hydrophobic percentages between 31–70%. This article summarizes peptide discovery on the basis of the APD. The major methods are the linguistic model, database screening, de novo design, and template-based design. Using these methods, we identified various potent peptides against human immunodeficiency virus type 1 (HIV-1) or methicillin-resistant Staphylococcus aureus (MRSA). While the stepwise designed anti-HIV peptide is disulfide-linked and rich in arginines, the ab initio designed anti-MRSA peptide is linear and rich in leucines. Thus, there are different requirements for antiviral and antibacterial peptides, which could kill pathogens via different molecular targets. The biased amino acid composition in the database-designed peptides, or natural peptides such as θ-defensins, requires the use of the improved two-dimensional NMR method for structural determination to avoid the publication of misleading structure and dynamics. In the case of human cathelicidin LL-37, structural determination requires 3D NMR techniques. The high-quality structure of LL-37 provides a solid basis for understanding its interactions with membranes of bacteria and other pathogens. In conclusion, the APD database is a comprehensive platform for storing, classifying, searching, predicting, and designing potent peptides against pathogenic bacteria, viruses, fungi, parasites, and cancer cells. PMID:24276259
PropBase Query Layer: a single portal to UK subsurface physical property databases

NASA Astrophysics Data System (ADS)

Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

2013-04-01

Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.
The value of protein structure classification information-Surveying the scientific literature

DOE PAGES

Fox, Naomi K.; Brenner, Steven E.; Chandonia, John -Marc

2015-08-27

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from themore » resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less
The value of protein structure classification information-Surveying the scientific literature

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fox, Naomi K.; Brenner, Steven E.; Chandonia, John -Marc

The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from themore » resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.« less
Using databases in medical education research: AMEE Guide No. 77.

PubMed

Cleland, Jennifer; Scott, Neil; Harrild, Kirsten; Moffat, Mandy

2013-05-01

This AMEE Guide offers an introduction to the use of databases in medical education research. It is intended for those who are contemplating conducting research in medical education but are new to the field. The Guide is structured around the process of planning your research so that data collection, management and analysis are appropriate for the research question. Throughout we consider contextual possibilities and constraints to educational research using databases, such as the resources available, and provide concrete examples of medical education research to illustrate many points. The first section of the Guide explains the difference between different types of data and classifying data, and addresses the rationale for research using databases in medical education. We explain the difference between qualitative research and qualitative data, the difference between categorical and quantitative data, and the difference types of data which fall into these categories. The Guide reviews the strengths and weaknesses of qualitative and quantitative research. The next section is structured around how to work with quantitative and qualitative databases and provides guidance on the many practicalities of setting up a database. This includes how to organise your database, including anonymising data and coding, as well as preparing and describing your data so it is ready for analysis. The critical matter of the ethics of using databases in medical educational research, including using routinely collected data versus data collected for research purposes, and issues of confidentiality, is discussed. Core to the Guide is drawing out the similarities and differences in working with different types of data and different types of databases. Future AMEE Guides in the research series will address statistical analysis of data in more detail.
Transformation of Developmental Neurotoxicity Data into a Structure-Searchable Relational Database

EPA Science Inventory

A database of neurotoxicants is critical to support the development and validation of animal alternatives for neurotoxicity. Validation of in vitro test methods can only be done using known animal and human neurotoxicants producing defined responses for neurochemical, neuropatho...
Bibliographic Databases Outside of the United States.

ERIC Educational Resources Information Center

McGinn, Thomas P.; And Others

1988-01-01

Eight articles describe the development, content, and structure of databases outside of the United States. Features discussed include library involvement, authority control, shared cataloging services, union catalogs, thesauri, abstracts, and distribution methods. Countries and areas represented are Latin America, Australia, the United Kingdom,…
Implementing a Microcomputer Database Management System.

ERIC Educational Resources Information Center

Manock, John J.; Crater, K. Lynne

1985-01-01

Current issues in selecting, structuring, and implementing microcomputer database management systems in research administration offices are discussed, and their capabilities are illustrated with the system used by the University of North Carolina at Wilmington. Trends in microcomputer technology and their likely impact on research administration…
Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

PubMed Central

Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

2010-01-01

Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259
Real-time ligand binding pocket database search using local surface descriptors.

PubMed

Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

2010-07-01

Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.

A new Volcanic managEment Risk Database desIgn (VERDI): Application to El Hierro Island (Canary Islands)

NASA Astrophysics Data System (ADS)

Bartolini, S.; Becerril, L.; Martí, J.

2014-11-01

One of the most important issues in modern volcanology is the assessment of volcanic risk, which will depend - among other factors - on both the quantity and quality of the available data and an optimum storage mechanism. This will require the design of purpose-built databases that take into account data format and availability and afford easy data storage and sharing, and will provide for a more complete risk assessment that combines different analyses but avoids any duplication of information. Data contained in any such database should facilitate spatial and temporal analysis that will (1) produce probabilistic hazard models for future vent opening, (2) simulate volcanic hazards and (3) assess their socio-economic impact. We describe the design of a new spatial database structure, VERDI (Volcanic managEment Risk Database desIgn), which allows different types of data, including geological, volcanological, meteorological, monitoring and socio-economic information, to be manipulated, organized and managed. The root of the question is to ensure that VERDI will serve as a tool for connecting different kinds of data sources, GIS platforms and modeling applications. We present an overview of the database design, its components and the attributes that play an important role in the database model. The potential of the VERDI structure and the possibilities it offers in regard to data organization are here shown through its application on El Hierro (Canary Islands). The VERDI database will provide scientists and decision makers with a useful tool that will assist to conduct volcanic risk assessment and management.
ProCarDB: a database of bacterial carotenoids.

PubMed

Nupur, L N U; Vats, Asheema; Dhanda, Sandeep Kumar; Raghava, Gajendra P S; Pinnaka, Anil Kumar; Kumar, Ashwani

2016-05-26

Carotenoids have important functions in bacteria, ranging from harvesting light energy to neutralizing oxidants and acting as virulence factors. However, information pertaining to the carotenoids is scattered throughout the literature. Furthermore, information about the genes/proteins involved in the biosynthesis of carotenoids has tremendously increased in the post-genomic era. A web server providing the information about microbial carotenoids in a structured manner is required and will be a valuable resource for the scientific community working with microbial carotenoids. Here, we have created a manually curated, open access, comprehensive compilation of bacterial carotenoids named as ProCarDB- Prokaryotic Carotenoid Database. ProCarDB includes 304 unique carotenoids arising from 50 biosynthetic pathways distributed among 611 prokaryotes. ProCarDB provides important information on carotenoids, such as 2D and 3D structures, molecular weight, molecular formula, SMILES, InChI, InChIKey, IUPAC name, KEGG Id, PubChem Id, and ChEBI Id. The database also provides NMR data, UV-vis absorption data, IR data, MS data and HPLC data that play key roles in the identification of carotenoids. An important feature of this database is the extension of biosynthetic pathways from the literature and through the presence of the genes/enzymes in different organisms. The information contained in the database was mined from published literature and databases such as KEGG, PubChem, ChEBI, LipidBank, LPSN, and Uniprot. The database integrates user-friendly browsing and searching with carotenoid analysis tools to help the user. We believe that this database will serve as a major information centre for researchers working on bacterial carotenoids.
The Structural Ceramics Database: Technical Foundations

PubMed Central

Munro, R. G.; Hwang, F. Y.; Hubbard, C. R.

1989-01-01

The development of a computerized database on advanced structural ceramics can play a critical role in fostering the widespread use of ceramics in industry and in advanced technologies. A computerized database may be the most effective means of accelerating technology development by enabling new materials to be incorporated into designs far more rapidly than would have been possible with traditional information transfer processes. Faster, more efficient access to critical data is the basis for creating this technological advantage. Further, a computerized database provides the means for a more consistent treatment of data, greater quality control and product reliability, and improved continuity of research and development programs. A preliminary system has been completed as phase one of an ongoing program to establish the Structural Ceramics Database system. The system is designed to be used on personal computers. Developed in a modular design, the preliminary system is focused on the thermal properties of monolithic ceramics. The initial modules consist of materials specification, thermal expansion, thermal conductivity, thermal diffusivity, specific heat, thermal shock resistance, and a bibliography of data references. Query and output programs also have been developed for use with these modules. The latter program elements, along with the database modules, will be subjected to several stages of testing and refinement in the second phase of this effort. The goal of the refinement process will be the establishment of this system as a user-friendly prototype. Three primary considerations provide the guidelines to the system’s development: (1) The user’s needs; (2) The nature of materials properties; and (3) The requirements of the programming language. The present report discusses the manner and rationale by which each of these considerations leads to specific features in the design of the system. PMID:28053397
Structure and needs of global loss databases about natural disaster

NASA Astrophysics Data System (ADS)

Steuer, Markus

2010-05-01

Global loss databases are used for trend analyses and statistics in scientific projects, studies for governmental and nongovernmental organizations and for the insurance and finance industry as well. At the moment three global data sets are established: EM-DAT (CRED), Sigma (Swiss Re) and NatCatSERVICE (Munich Re). Together with the Asian Disaster Reduction Center (ADRC) and United Nations Development Program (UNDP) started a collaborative initiative in 2007 with the aim to agreed on and implemented a common "Disaster Category Classification and Peril Terminology for Operational Databases". This common classification has been established through several technical meetings and working groups and represents a first and important step in the development of a standardized international classification of disasters and terminology of perils. This means concrete to set up a common hierarchy and terminology for all global and regional databases on natural disasters and establish a common and agreed definition of disaster groups, main types and sub-types of events. Also the theme of georeferencing, temporal aspects, methodology and sourcing were other issues that have been identified and will be discussed. The implementation of the new and defined structure for global loss databases is already set up for Munich Re NatCatSERVICE. In the following oral session we will show the structure of the global databases as defined and in addition to give more transparency of the data sets behind published statistics and analyses. The special focus will be on the catastrophe classification from a moderate loss event up to a great natural catastrophe, also to show the quality of sources and give inside information about the assessment of overall and insured losses. Keywords: disaster category classification, peril terminology, overall and insured losses, definition
Protein Folding and Structure Prediction from the Ground Up: The Atomistic Associative Memory, Water Mediated, Structure and Energy Model.

PubMed

Chen, Mingchen; Lin, Xingcheng; Zheng, Weihua; Onuchic, José N; Wolynes, Peter G

2016-08-25

The associative memory, water mediated, structure and energy model (AWSEM) is a coarse-grained force field with transferable tertiary interactions that incorporates local in sequence energetic biases using bioinformatically derived structural information about peptide fragments with locally similar sequences that we call memories. The memory information from the protein data bank (PDB) database guides proper protein folding. The structural information about available sequences in the database varies in quality and can sometimes lead to frustrated free energy landscapes locally. One way out of this difficulty is to construct the input fragment memory information from all-atom simulations of portions of the complete polypeptide chain. In this paper, we investigate this approach first put forward by Kwac and Wolynes in a more complete way by studying the structure prediction capabilities of this approach for six α-helical proteins. This scheme which we call the atomistic associative memory, water mediated, structure and energy model (AAWSEM) amounts to an ab initio protein structure prediction method that starts from the ground up without using bioinformatic input. The free energy profiles from AAWSEM show that atomistic fragment memories are sufficient to guide the correct folding when tertiary forces are included. AAWSEM combines the efficiency of coarse-grained simulations on the full protein level with the local structural accuracy achievable from all-atom simulations of only parts of a large protein. The results suggest that a hybrid use of atomistic fragment memory and database memory in structural predictions may well be optimal for many practical applications.
Local concurrent error detection and correction in data structures using virtual backpointers

NASA Technical Reports Server (NTRS)

Li, Chung-Chi Jim; Chen, Paul Peichuan; Fuchs, W. Kent

1989-01-01

A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data strutures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared database of Virtual Double Linked Lists.
New software for statistical analysis of Cambridge Structural Database data

PubMed Central

Sykes, Richard A.; McCabe, Patrick; Allen, Frank H.; Battle, Gary M.; Bruno, Ian J.; Wood, Peter A.

2011-01-01

A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments. PMID:22477784
Advanced instrumentation: Technology database enhancement, volume 4, appendix G

NASA Technical Reports Server (NTRS)

1991-01-01

The purpose of this task was to add to the McDonnell Douglas Space Systems Company's Sensors Database, including providing additional information on the instruments and sensors applicable to physical/chemical Environmental Control and Life Support System (P/C ECLSS) or Closed Ecological Life Support System (CELSS) which were not previously included. The Sensors Database was reviewed in order to determine the types of data required, define the data categories, and develop an understanding of the data record structure. An assessment of the MDSSC Sensors Database identified limitations and problems in the database. Guidelines and solutions were developed to address these limitations and problems in order that the requirements of the task could be fulfilled.
[Design and establishment of modern literature database about acupuncture Deqi].

PubMed

Guo, Zheng-rong; Qian, Gui-feng; Pan, Qiu-yin; Wang, Yang; Xin, Si-yuan; Li, Jing; Hao, Jie; Hu, Ni-juan; Zhu, Jiang; Ma, Liang-xiao

2015-02-01

A search on acupuncture Deqi was conducted using four Chinese-language biomedical databases (CNKI, Wan-Fang, VIP and CBM) and PubMed database and using keywords "Deqi" or "needle sensation" "needling feeling" "needle feel" "obtaining qi", etc. Then, a "Modern Literature Database for Acupuncture Deqi" was established by employing Microsoft SQL Server 2005 Express Edition, introducing the contents, data types, information structure and logic constraint of the system table fields. From this Database, detailed inquiries about general information of clinical trials, acupuncturists' experience, ancient medical works, comprehensive literature, etc. can be obtained. The present databank lays a foundation for subsequent evaluation of literature quality about Deqi and data mining of undetected Deqi knowledge.
Nuclear Science References Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pritychenko, B., E-mail: pritychenko@bnl.gov; Běták, E.; Singh, B.

2014-06-15

The Nuclear Science References (NSR) database together with its associated Web interface, is the world's only comprehensive source of easily accessible low- and intermediate-energy nuclear physics bibliographic information for more than 210,000 articles since the beginning of nuclear science. The weekly-updated NSR database provides essential support for nuclear data evaluation, compilation and research activities. The principles of the database and Web application development and maintenance are described. Examples of nuclear structure, reaction and decay applications are specifically included. The complete NSR database is freely available at the websites of the National Nuclear Data Center (http://www.nndc.bnl.gov/nsr) and the International Atomic Energymore » Agency (http://www-nds.iaea.org/nsr)« less
The value of protein structure classification information—Surveying the scientific literature

PubMed Central

Fox, Naomi K.; Brenner, Steven E.

2015-01-01

ABSTRACT The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP–extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012–2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non‐SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings. Proteins 2015; 83:2025–2038. © 2015 The Authors. Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc. PMID:26313554
Breast Imaging in the Era of Big Data: Structured Reporting and Data Mining.

PubMed

Margolies, Laurie R; Pandey, Gaurav; Horowitz, Eliot R; Mendelson, David S

2016-02-01

The purpose of this article is to describe structured reporting and the development of large databases for use in data mining in breast imaging. The results of millions of breast imaging examinations are reported with structured tools based on the BI-RADS lexicon. Much of these data are stored in accessible media. Robust computing power creates great opportunity for data scientists and breast imagers to collaborate to improve breast cancer detection and optimize screening algorithms. Data mining can create knowledge, but the questions asked and their complexity require extremely powerful and agile databases. New data technologies can facilitate outcomes research and precision medicine.
SQLGEN: a framework for rapid client-server database application development.

PubMed

Nadkarni, P M; Cheung, K H

1995-12-01

SQLGEN is a framework for rapid client-server relational database application development. It relies on an active data dictionary on the client machine that stores metadata on one or more database servers to which the client may be connected. The dictionary generates dynamic Structured Query Language (SQL) to perform common database operations; it also stores information about the access rights of the user at log-in time, which is used to partially self-configure the behavior of the client to disable inappropriate user actions. SQLGEN uses a microcomputer database as the client to store metadata in relational form, to transiently capture server data in tables, and to allow rapid application prototyping followed by porting to client-server mode with modest effort. SQLGEN is currently used in several production biomedical databases.
The unified database for the fixed target experiment BM@N

NASA Astrophysics Data System (ADS)

Gertsenberger, K. V.

2016-09-01

The article describes the developed database designed as comprehensive data storage of the fixed target experiment BM@N [1] at Joint Institute for Nuclear Research (JINR) in Dubna. The structure and purposes of the BM@N facility will be briefly presented. The scheme of the unified database and its parameters will be described in detail. The use of the BM@N database implemented on the PostgreSQL database management system (DBMS) allows one to provide user access to the actual information of the experiment. Also the interfaces developed for the access to the database will be presented. One was implemented as the set of C++ classes to access the data without SQL statements, the other-Web-interface being available on the Web page of the BM@N experiment.
Advanced transportation system studies. Alternate propulsion subsystem concepts: Propulsion database

NASA Technical Reports Server (NTRS)

Levack, Daniel

1993-01-01

The Advanced Transportation System Studies alternate propulsion subsystem concepts propulsion database interim report is presented. The objective of the database development task is to produce a propulsion database which is easy to use and modify while also being comprehensive in the level of detail available. The database is to be available on the Macintosh computer system. The task is to extend across all three years of the contract. Consequently, a significant fraction of the effort in this first year of the task was devoted to the development of the database structure to ensure a robust base for the following years' efforts. Nonetheless, significant point design propulsion system descriptions and parametric models were also produced. Each of the two propulsion databases, parametric propulsion database and propulsion system database, are described. The descriptions include a user's guide to each code, write-ups for models used, and sample output. The parametric database has models for LOX/H2 and LOX/RP liquid engines, solid rocket boosters using three different propellants, a hybrid rocket booster, and a NERVA derived nuclear thermal rocket engine.
The Primate Life History Database: A unique shared ecological data resource

PubMed Central

Strier, Karen B.; Altmann, Jeanne; Brockman, Diane K.; Bronikowski, Anne M.; Cords, Marina; Fedigan, Linda M.; Lapp, Hilmar; Liu, Xianhua; Morris, William F.; Pusey, Anne E.; Stoinski, Tara S.; Alberts, Susan C.

2011-01-01

Summary The importance of data archiving, data sharing, and public access to data has received considerable attention. Awareness is growing among scientists that collaborative databases can facilitate these activities.We provide a detailed description of the collaborative life history database developed by our Working Group at the National Evolutionary Synthesis Center (NESCent) to address questions about life history patterns and the evolution of mortality and demographic variability in wild primates.Examples from each of the seven primate species included in our database illustrate the range of data incorporated and the challenges, decision-making processes, and criteria applied to standardize data across diverse field studies. In addition to the descriptive and structural metadata associated with our database, we also describe the process metadata (how the database was designed and delivered) and the technical specifications of the database.Our database provides a useful model for other researchers interested in developing similar types of databases for other organisms, while our process metadata may be helpful to other groups of researchers interested in developing databases for other types of collaborative analyses. PMID:21698066
Design and Implementation of an Intelligence Database.

DTIC Science & Technology

1984-12-01

In designing SDM, many database aplications were analyzed in order to determine the structures that cc. i:r and recur in them...automatically, nor is it even known which relations can be converted to Di./NF. In spite of this, DK/NF can be exceedingly useful for practical database...goal of any design process is to produce qn output design, Sout, to accurately represent Sin. Further . all the relations in Sout must satisfy
15 CFR 995.4 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-01-01

... database resulting from the transformation of the ENC by ECDIS for appropriate use, updates to the ENC by... of the 1974 SOLAS Convention. Electronic Navigational Chart (ENC) means a database, standardized as to content, structure, and format, issued for use with ECDIS on the authority of government...
High Performance Descriptive Semantic Analysis of Semantic Graph Databases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan

As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
The European Radiobiology Archives (ERA)--content, structure and use illustrated by an example.

PubMed

Gerber, G B; Wick, R R; Kellerer, A M; Hopewell, J W; Di Majo, V; Dudoignon, N; Gössner, W; Stather, J

2006-01-01

The European Radiobiology Archives (ERA), supported by the European Commission and the European Late Effect Project Group (EULEP), together with the US National Radiobiology Archives (NRA) and the Japanese Radiobiology Archives (JRA) have collected all information still available on long-term animal experiments, including some selected human studies. The archives consist of a database in Microsoft Access, a website, databases of references and information on the use of the database. At present, the archives contain a description of the exposure conditions, animal strains, etc. from approximately 350,000 individuals; data on survival and pathology are available from approximately 200,000 individuals. Care has been taken to render pathological diagnoses compatible among different studies and to allow the lumping of pathological diagnoses into more general classes. 'Forms' in Access with an underlying computer code facilitate the use of the database. This paper describes the structure and content of the archives and illustrates an example for a possible analysis of such data.

DDRprot: a database of DNA damage response-related proteins.

PubMed

Andrés-León, Eduardo; Cases, Ildefonso; Arcas, Aida; Rojas, Ana M

2016-01-01

The DNA Damage Response (DDR) signalling network is an essential system that protects the genome's integrity. The DDRprot database presented here is a resource that integrates manually curated information on the human DDR network and its sub-pathways. For each particular DDR protein, we present detailed information about its function. If involved in post-translational modifications (PTMs) with each other, we depict the position of the modified residue/s in the three-dimensional structures, when resolved structures are available for the proteins. All this information is linked to the original publication from where it was obtained. Phylogenetic information is also shown, including time of emergence and conservation across 47 selected species, family trees and sequence alignments of homologues. The DDRprot database can be queried by different criteria: pathways, species, evolutionary age or involvement in (PTM). Sequence searches using hidden Markov models can be also used.Database URL: http://ddr.cbbio.es. © The Author(s) 2016. Published by Oxford University Press.
KnotProt: a database of proteins with knots and slipknots

PubMed Central

Jamroz, Michal; Niemyska, Wanda; Rawdon, Eric J.; Stasiak, Andrzej; Millett, Kenneth C.; Sułkowski, Piotr; Sulkowska, Joanna I.

2015-01-01

The protein topology database KnotProt, http://knotprot.cent.uw.edu.pl/, collects information about protein structures with open polypeptide chains forming knots or slipknots. The knotting complexity of the cataloged proteins is presented in the form of a matrix diagram that shows users the knot type of the entire polypeptide chain and of each of its subchains. The pattern visible in the matrix gives the knotting fingerprint of a given protein and permits users to determine, for example, the minimal length of the knotted regions (knot's core size) or the depth of a knot, i.e. how many amino acids can be removed from either end of the cataloged protein structure before converting it from a knot to a different type of knot. In addition, the database presents extensive information about the biological functions, families and fold types of proteins with non-trivial knotting. As an additional feature, the KnotProt database enables users to submit protein or polymer chains and generate their knotting fingerprints. PMID:25361973
EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

PubMed Central

Hsin, Kun-Yi; Morgan, Hugh P.; Shave, Steven R.; Hinton, Andrew C.; Taylor, Paul; Walkinshaw, Malcolm D.

2011-01-01

We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features. PMID:21051336
EPOS Data and Service Provision

NASA Astrophysics Data System (ADS)

Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt

2017-04-01

EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data collection into the EPOS metadata catalog. In parallel work proceeds on the user interface softsare, the APIs (Application Programming Interfaces) to the TCS services, the harvesting method and software, the AAAI (Authentication, Authorisation, Accounting Infrastructure) and the system manager. The next steps will involve interfaces to ICS-D (Distributed ICS i.e. facilities and services for computing, data storage, detectors and instruments for data collection etc.) to which requests, software and data will be deployed and from which data will be generated. Associated with this will be the development of the workflow system which will assist the end-user in building a workflow to achieve the scientific objectives.
Clinical Databases for Chest Physicians.

PubMed

Courtwright, Andrew M; Gabriel, Peter E

2018-04-01

A clinical database is a repository of patient medical and sociodemographic information focused on one or more specific health condition or exposure. Although clinical databases may be used for research purposes, their primary goal is to collect and track patient data for quality improvement, quality assurance, and/or actual clinical management. This article aims to provide an introduction and practical advice on the development of small-scale clinical databases for chest physicians and practice groups. Through example projects, we discuss the pros and cons of available technical platforms, including Microsoft Excel and Access, relational database management systems such as Oracle and PostgreSQL, and Research Electronic Data Capture. We consider approaches to deciding the base unit of data collection, creating consensus around variable definitions, and structuring routine clinical care to complement database aims. We conclude with an overview of regulatory and security considerations for clinical databases. Copyright © 2018 American College of Chest Physicians. Published by Elsevier Inc. All rights reserved.
The Xeno-glycomics database (XDB): a relational database of qualitative and quantitative pig glycome repertoire.

PubMed

Park, Hae-Min; Park, Ju-Hyeong; Kim, Yoon-Woo; Kim, Kyoung-Jin; Jeong, Hee-Jin; Jang, Kyoung-Soon; Kim, Byung-Gee; Kim, Yun-Gon

2013-11-15

In recent years, the improvement of mass spectrometry-based glycomics techniques (i.e. highly sensitive, quantitative and high-throughput analytical tools) has enabled us to obtain a large dataset of glycans. Here we present a database named Xeno-glycomics database (XDB) that contains cell- or tissue-specific pig glycomes analyzed with mass spectrometry-based techniques, including a comprehensive pig glycan information on chemical structures, mass values, types and relative quantities. It was designed as a user-friendly web-based interface that allows users to query the database according to pig tissue/cell types or glycan masses. This database will contribute in providing qualitative and quantitative information on glycomes characterized from various pig cells/organs in xenotransplantation and might eventually provide new targets in the α1,3-galactosyltransferase gene-knock out pigs era. The database can be accessed on the web at http://bioinformatics.snu.ac.kr/xdb.
A Framework for Cloudy Model Optimization and Database Storage

NASA Astrophysics Data System (ADS)

Calvén, Emilia; Helton, Andrew; Sankrit, Ravi

2018-01-01

We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
SPARQLGraph: a web-based platform for graphically querying biological Semantic Web databases.

PubMed

Schweiger, Dominik; Trajanoski, Zlatko; Pabinger, Stephan

2014-08-15

Semantic Web has established itself as a framework for using and sharing data across applications and database boundaries. Here, we present a web-based platform for querying biological Semantic Web databases in a graphical way. SPARQLGraph offers an intuitive drag & drop query builder, which converts the visual graph into a query and executes it on a public endpoint. The tool integrates several publicly available Semantic Web databases, including the databases of the just recently released EBI RDF platform. Furthermore, it provides several predefined template queries for answering biological questions. Users can easily create and save new query graphs, which can also be shared with other researchers. This new graphical way of creating queries for biological Semantic Web databases considerably facilitates usability as it removes the requirement of knowing specific query languages and database structures. The system is freely available at http://sparqlgraph.i-med.ac.at.
Development of a replicated database of DHCP data for evaluation of drug use.

PubMed Central

Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

1996-01-01

This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451
Development of a replicated database of DHCP data for evaluation of drug use.

PubMed

Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

1996-01-01

This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

PubMed

Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H

2009-01-01

Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
Structure for Storing Properties of Particles (PoP)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patel, N. R.; Mattoon, C. M.; Beck, B. R.

2014-06-01

Some evaluated nuclear databases are critical for applications such as nuclear energy, nuclear medicine, homeland security, and stockpile stewardship. Particle masses, nuclear excitation levels, and other “Properties of Particles” are essential for making evaluated nuclear databases. Currently, these properties are obtained from various databases that are stored in outdated formats. Moreover, the “Properties of Particles” (PoP) structure is being designed that will allow storing all information for one or more particles in a single place, so that each evaluation, simulation, model calculation, etc. can link to the same data. Information provided in PoP will include properties of nuclei, gammas andmore » electrons (along with other particles such as pions, as evaluations extend to higher energies). Presently, PoP includes masses from the Atomic Mass Evaluation version 2003 (AME2003), and level schemes and gamma decays from the Reference Input Parameter Library (RIPL-3). The data are stored in a hierarchical structure. An example of how PoP stores nuclear masses and energy levels will be presented here.« less
When a domain isn’t a domain, and why it’s important to properly filter proteins in databases

PubMed Central

Towse, Clare-Louise; Daggett, Valerie

2013-01-01

Summary Membership in a protein domain database does not a domain make; a feature we realized when generating a consensus view of protein fold space with our Consensus Domain Dictionary (CDD). This dictionary was used to select representative structures for characterization of the protein dynameome: the Dynameomics initiative. Through this endeavor we rejected a surprising 40% of the 1695 folds in the CDD as being non-autonomous folding units. Although some of this was due to the challenges of grouping similar fold topologies, the dissonance between the cataloguing and structural qualification of protein domains remains surprising. Another potential factor is previously overlooked intrinsic disorder; predicted estimates suggest 40% of proteins to have either local or global disorder. One thing is clear, filtering a structural database and ensuring a consistent definition for protein domains is crucial, and caution is prescribed when generalizations of globular domains are drawn from unfiltered protein domain datasets. PMID:23108912
Knowledge Discovery in Variant Databases Using Inductive Logic Programming

PubMed Central

Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D.

2013-01-01

Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/. PMID:23589683
Knowledge discovery in variant databases using inductive logic programming.

PubMed

Nguyen, Hoan; Luu, Tien-Dao; Poch, Olivier; Thompson, Julie D

2013-01-01

Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
OnTheFly: a database of Drosophila melanogaster transcription factors and their binding sites.

PubMed

Shazman, Shula; Lee, Hunjoong; Socol, Yakov; Mann, Richard S; Honig, Barry

2014-01-01

We present OnTheFly (http://bhapp.c2b2.columbia.edu/OnTheFly/index.php), a database comprising a systematic collection of transcription factors (TFs) of Drosophila melanogaster and their DNA-binding sites. TFs predicted in the Drosophila melanogaster genome are annotated and classified and their structures, obtained via experiment or homology models, are provided. All known preferred TF DNA-binding sites obtained from the B1H, DNase I and SELEX methodologies are presented. DNA shape parameters predicted for these sites are obtained from a high throughput server or from crystal structures of protein-DNA complexes where available. An important feature of the database is that all DNA-binding domains and their binding sites are fully annotated in a eukaryote using structural criteria and evolutionary homology. OnTheFly thus provides a comprehensive view of TFs and their binding sites that will be a valuable resource for deciphering non-coding regulatory DNA.
Structure for Storing Properties of Particles (PoP)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patel, N.R., E-mail: infinidhi@llnl.gov; Mattoon, C.M.; Beck, B.R.

2014-06-15

Evaluated nuclear databases are critical for applications such as nuclear energy, nuclear medicine, homeland security, and stockpile stewardship. Particle masses, nuclear excitation levels, and other “Properties of Particles” are essential for making evaluated nuclear databases. Currently, these properties are obtained from various databases that are stored in outdated formats. A “Properties of Particles” (PoP) structure is being designed that will allow storing all information for one or more particles in a single place, so that each evaluation, simulation, model calculation, etc. can link to the same data. Information provided in PoP will include properties of nuclei, gammas and electrons (alongmore » with other particles such as pions, as evaluations extend to higher energies). Presently, PoP includes masses from the Atomic Mass Evaluation version 2003 (AME2003), and level schemes and gamma decays from the Reference Input Parameter Library (RIPL-3). The data are stored in a hierarchical structure. An example of how PoP stores nuclear masses and energy levels will be presented here.« less
Knowledge Representation: A Brief Review.

ERIC Educational Resources Information Center

Vickery, B. C.

1986-01-01

Reviews different structures and techniques of knowledge representation: structure of database records and files, data structures in computer programming, syntatic and semantic structure of natural language, knowledge representation in artificial intelligence, and models of human memory. A prototype expert system that makes use of some of these…
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL

PubMed Central

2010-01-01

Background Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. Results A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. Conclusions CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL. PMID:20594316
Flexible network reconstruction from relational databases with Cytoscape and CytoSQL.

PubMed

Laukens, Kris; Hollunder, Jens; Dang, Thanh Hai; De Jaeger, Geert; Kuiper, Martin; Witters, Erwin; Verschoren, Alain; Van Leemput, Koenraad

2010-07-01

Molecular interaction networks can be efficiently studied using network visualization software such as Cytoscape. The relevant nodes, edges and their attributes can be imported in Cytoscape in various file formats, or directly from external databases through specialized third party plugins. However, molecular data are often stored in relational databases with their own specific structure, for which dedicated plugins do not exist. Therefore, a more generic solution is presented. A new Cytoscape plugin 'CytoSQL' is developed to connect Cytoscape to any relational database. It allows to launch SQL ('Structured Query Language') queries from within Cytoscape, with the option to inject node or edge features of an existing network as SQL arguments, and to convert the retrieved data to Cytoscape network components. Supported by a set of case studies we demonstrate the flexibility and the power of the CytoSQL plugin in converting specific data subsets into meaningful network representations. CytoSQL offers a unified approach to let Cytoscape interact with relational databases. Thanks to the power of the SQL syntax, this tool can rapidly generate and enrich networks according to very complex criteria. The plugin is available at http://www.ptools.ua.ac.be/CytoSQL.

OrChem - An open source chemistry search engine for Oracle®

PubMed Central

2009-01-01

Background Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Results Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. Availability OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net. PMID:20298521
sc-PDB: a 3D-database of ligandable binding sites--10 years on.

PubMed

Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

2015-01-01

The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
CamMedNP: building the Cameroonian 3D structural natural products database for virtual screening.

PubMed

Ntie-Kang, Fidele; Mbah, James A; Mbaze, Luc Meva'a; Lifongo, Lydia L; Scharfe, Michael; Hanna, Joelle Ngo; Cho-Ngwa, Fidelis; Onguéné, Pascal Amoa; Owono Owono, Luc C; Megnassan, Eugene; Sippl, Wolfgang; Efange, Simon M N

2013-04-16

Computer-aided drug design (CADD) often involves virtual screening (VS) of large compound datasets and the availability of such is vital for drug discovery protocols. We present CamMedNP - a new database beginning with more than 2,500 compounds of natural origin, along with some of their derivatives which were obtained through hemisynthesis. These are pure compounds which have been previously isolated and characterized using modern spectroscopic methods and published by several research teams spread across Cameroon. In the present study, 224 distinct medicinal plant species belonging to 55 plant families from the Cameroonian flora have been considered. About 80 % of these have been previously published and/or referenced in internationally recognized journals. For each compound, the optimized 3D structure, drug-like properties, plant source, collection site and currently known biological activities are given, as well as literature references. We have evaluated the "drug-likeness" of this database using Lipinski's "Rule of Five". A diversity analysis has been carried out in comparison with the ChemBridge diverse database. CamMedNP could be highly useful for database screening and natural product lead generation programs.
Deducing chemical structure from crystallographically determined atomic coordinates

PubMed Central

Bruno, Ian J.; Shields, Gregory P.; Taylor, Robin

2011-01-01

An improved algorithm has been developed for assigning chemical structures to incoming entries to the Cambridge Structural Database, using only the information available in the deposited CIF. Steps in the algorithm include detection of bonds, selection of polymer unit, resolution of disorder, and assignment of bond types and formal charges. The chief difficulty is posed by the large number of metallo-organic crystal structures that must be processed, given our aspiration that assigned chemical structures should accurately reflect properties such as the oxidation states of metals and redox-active ligands, metal coordination numbers and hapticities, and the aromaticity or otherwise of metal ligands. Other complications arise from disorder, especially when it is symmetry imposed or modelled with the SQUEEZE algorithm. Each assigned structure is accompanied by an estimate of reliability and, where necessary, diagnostic information indicating probable points of error. Although the algorithm was written to aid building of the Cambridge Structural Database, it has the potential to develop into a general-purpose tool for adding chemical information to newly determined crystal structures. PMID:21775812
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bower, J.C.; Burford, M.J.; Downing, T.R.

The Integrated Baseline System (IBS) is an emergency management planning and analysis tool that is being developed under the direction of the US Army Nuclear and Chemical Agency (USANCA). The IBS Data Management Guide provides the background, as well as the operations and procedures needed to generate and maintain a site-specific map database. Data and system managers use this guide to manage the data files and database that support the administrative, user-environment, database management, and operational capabilities of the IBS. This document provides a description of the data files and structures necessary for running the IBS software and using themore » site map database.« less
GOBASE—a database of mitochondrial and chloroplast information

PubMed Central

O'Brien, Emmet A.; Badidi, Elarbi; Barbasiewicz, Ania; deSousa, Cristina; Lang, B. Franz; Burger, Gertraud

2003-01-01

GOBASE is a relational database containing integrated sequence, RNA secondary structure and biochemical and taxonomic information about organelles. GOBASE release 6 (summer 2002) contains over 130 000 mitochondrial sequences, an increase of 37% over the previous release, and more than 30 000 chloroplast sequences in a new auxiliary database. To handle this flood of new data, we have designed and implemented GOpop, a Java system for population and verification of the database. We have also implemented a more powerful and flexible user interface using the PHP programming language. http://megasun.bch.umontreal.ca/gobase/gobase.html. PMID:12519975
The IVTANTHERMO-Online database for thermodynamic properties of individual substances with web interface

NASA Astrophysics Data System (ADS)

Belov, G. V.; Dyachkov, S. A.; Levashov, P. R.; Lomonosov, I. V.; Minakov, D. V.; Morozov, I. V.; Sineva, M. A.; Smirnov, V. N.

2018-01-01

The database structure, main features and user interface of an IVTANTHERMO-Online system are reviewed. This system continues the series of the IVTANTHERMO packages developed in JIHT RAS. It includes the database for thermodynamic properties of individual substances and related software for analysis of experimental results, data fitting, calculation and estimation of thermodynamical functions and thermochemistry quantities. In contrast to the previous IVTANTHERMO versions it has a new extensible database design, the client-server architecture, a user-friendly web interface with a number of new features for online and offline data processing.
Topology-Scaling Identification of Layered Solids and Stable Exfoliated 2D Materials.

PubMed

Ashton, Michael; Paul, Joshua; Sinnott, Susan B; Hennig, Richard G

2017-03-10

The Materials Project crystal structure database has been searched for materials possessing layered motifs in their crystal structures using a topology-scaling algorithm. The algorithm identifies and measures the sizes of bonded atomic clusters in a structure's unit cell, and determines their scaling with cell size. The search yielded 826 stable layered materials that are considered as candidates for the formation of two-dimensional monolayers via exfoliation. Density-functional theory was used to calculate the exfoliation energy of each material and 680 monolayers emerge with exfoliation energies below those of already-existent two-dimensional materials. The crystal structures of these two-dimensional materials provide templates for future theoretical searches of stable two-dimensional materials. The optimized structures and other calculated data for all 826 monolayers are provided at our database (https://materialsweb.org).
Technologies and standards in the information systems of the soil-geographic database of Russia

NASA Astrophysics Data System (ADS)

Golozubov, O. M.; Rozhkov, V. A.; Alyabina, I. O.; Ivanov, A. V.; Kolesnikova, V. M.; Shoba, S. A.

2015-01-01

The achievements, problems, and challenges of the modern stage of the development of the Soil-Geographic Database of Russia (SGDBR) and the history of this project are outlined. The structure of the information system of the SGDBR as an internet-based resource to collect data on soil profiles and to integrate the geographic and attribute databases on the same platform is described. The pilot project in Rostov oblast illustrates the inclusion of regional information in the SGDBR and its application for solving practical problems. For the first time in Russia, the GeoRSS standard based on the structured hypertext representation of the geographic and attribute information has been applied in the state system for the agromonitoring of agricultural lands in Rostov oblast and information exchange through the internet.
A study of the Immune Epitope Database for some fungi species using network topological indices.

PubMed

Vázquez-Prieto, Severo; Paniagua, Esperanza; Solana, Hugo; Ubeira, Florencio M; González-Díaz, Humberto

2017-08-01

In the last years, the encryption of system structure information with different network topological indices has been a very active field of research. In the present study, we assembled for the first time a complex network using data obtained from the Immune Epitope Database for fungi species, and we then considered the general topology, the node degree distribution, and the local structure of this network. We also calculated eight node centrality measures for the observed network and compared it with three theoretical models. In view of the results obtained, we may expect that the present approach can become a valuable tool to explore the complexity of this database, as well as for the storage, manipulation, comparison, and retrieval of information contained therein.
CD-ROM End-User Instruction: A Planning Model.

ERIC Educational Resources Information Center

Johnson, Mary E.; Rosen, Barbara S.

1990-01-01

Discusses methods and content of library instruction for CD-ROM searching in terms of the needs of end-users. Instructional methods explored include staff instruction, structured instruction, database documentation, tutorials and help screens, and floaters. Suggestions for effective instruction in transfer of skills, database content, database…
EPAs DSSTox Chemical Database: A Resource for the Non-Targeted Testing Community (EPA NTA workshop)

EPA Science Inventory

EPA’s DSSTox database project, which includes coverage of the ToxCast and Tox21 high-throughput testing inventories, provides high-quality chemical-structure files for inventories of toxicological and environmental relevance. A feature of the DSSTox project, which differentiates ...
DSSTox EPA Integrated Risk Information System Structure-Index Locator File: SDF File and Documentation

EPA Science Inventory

EPA's Integrated Risk Information System (IRIS) database was developed and is maintained by EPA's Office of Research and Developement, National Center for Environmental Assessment. IRIS is a database of human health effects that may result from exposure to various substances fou...
A Magnetic Petrology Database for Satellite Magnetic Anomaly Interpretations

NASA Astrophysics Data System (ADS)

Nazarova, K.; Wasilewski, P.; Didenko, A.; Genshaft, Y.; Pashkevich, I.

2002-05-01

A Magnetic Petrology Database (MPDB) is now being compiled at NASA/Goddard Space Flight Center in cooperation with Russian and Ukrainian Institutions. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via Internet for more realistic interpretation of satellite magnetic anomalies. Magnetic Petrology Data had been accumulated in NASA/Goddard Space Flight Center, United Institute of Physics of the Earth (Russia) and Institute of Geophysics (Ukraine) over several decades and now consists of many thousands of records of data in our archives. The MPDB was, and continues to be in big demand especially since recent launching in near Earth orbit of the mini-constellation of three satellites - Oersted (in 1999), Champ (in 2000), and SAC-C (in 2000) which will provide lithospheric magnetic maps with better spatial and amplitude resolution (about 1 nT). The MPDB is focused on lower crustal and upper mantle rocks and will include data on mantle xenoliths, serpentinized ultramafic rocks, granulites, iron quartzites and rocks from Archean-Proterozoic metamorphic sequences from all around the world. A substantial amount of data is coming from the area of unique Kursk Magnetic Anomaly and Kola Deep Borehole (which recovered 12 km of continental crust). A prototype MPDB can be found on the Geodynamics Branch web server of Goddard Space Flight Center at http://core2.gsfc.nasa.gov/terr_mag/magnpetr.html. The MPDB employs a searchable relational design and consists of 7 interrelated tables. The schema of database is shown at http://core2.gsfc.nasa.gov/terr_mag/doc.html. MySQL database server was utilized to implement MPDB. The SQL (Structured Query Language) is used to query the database. To present the results of queries on WEB and for WEB programming we utilized PHP scripting language and CGI scripts. The prototype MPDB is designed to search database by major satellite magnetic anomaly, tectonic structure, geographical location, rock type, magnetic properties, chemistry and reference, see http://core2.gsfc.nasa.gov/terr_mag/query1.html. The output of database is HTML structured table, text file, and downloadable file. This database will be very useful for studies of lithospheric satellite magnetic anomalies on the Earth and other terrestrial planets.
Toxicity of ionic liquids: database and prediction via quantitative structure-activity relationship method.

PubMed

Zhao, Yongsheng; Zhao, Jihong; Huang, Ying; Zhou, Qing; Zhang, Xiangping; Zhang, Suojiang

2014-08-15

A comprehensive database on toxicity of ionic liquids (ILs) is established. The database includes over 4000 pieces of data. Based on the database, the relationship between IL's structure and its toxicity has been analyzed qualitatively. Furthermore, Quantitative Structure-Activity relationships (QSAR) model is conducted to predict the toxicities (EC50 values) of various ILs toward the Leukemia rat cell line IPC-81. Four parameters selected by the heuristic method (HM) are used to perform the studies of multiple linear regression (MLR) and support vector machine (SVM). The squared correlation coefficient (R(2)) and the root mean square error (RMSE) of training sets by two QSAR models are 0.918 and 0.959, 0.258 and 0.179, respectively. The prediction R(2) and RMSE of QSAR test sets by MLR model are 0.892 and 0.329, by SVM model are 0.958 and 0.234, respectively. The nonlinear model developed by SVM algorithm is much outperformed MLR, which indicates that SVM model is more reliable in the prediction of toxicity of ILs. This study shows that increasing the relative number of O atoms of molecules leads to decrease in the toxicity of ILs. Copyright © 2014 Elsevier B.V. All rights reserved.
Stability assessment of structures under earthquake hazard through GRID technology

NASA Astrophysics Data System (ADS)

Prieto Castrillo, F.; Boton Fernandez, M.

2009-04-01

This work presents a GRID framework to estimate the vulnerability of structures under earthquake hazard. The tool has been designed to cover the needs of a typical earthquake engineering stability analysis; preparation of input data (pre-processing), response computation and stability analysis (post-processing). In order to validate the application over GRID, a simplified model of structure under artificially generated earthquake records has been implemented. To achieve this goal, the proposed scheme exploits the GRID technology and its main advantages (parallel intensive computing, huge storage capacity and collaboration analysis among institutions) through intensive interaction among the GRID elements (Computing Element, Storage Element, LHC File Catalogue, federated database etc.) The dynamical model is described by a set of ordinary differential equations (ODE's) and by a set of parameters. Both elements, along with the integration engine, are encapsulated into Java classes. With this high level design, subsequent improvements/changes of the model can be addressed with little effort. In the procedure, an earthquake record database is prepared and stored (pre-processing) in the GRID Storage Element (SE). The Metadata of these records is also stored in the GRID federated database. This Metadata contains both relevant information about the earthquake (as it is usual in a seismic repository) and also the Logical File Name (LFN) of the record for its later retrieval. Then, from the available set of accelerograms in the SE, the user can specify a range of earthquake parameters to carry out a dynamic analysis. This way, a GRID job is created for each selected accelerogram in the database. At the GRID Computing Element (CE), displacements are then obtained by numerical integration of the ODE's over time. The resulting response for that configuration is stored in the GRID Storage Element (SE) and the maximum structure displacement is computed. Then, the corresponding Metadata containing the response LFN, earthquake magnitude and maximum structure displacement is also stored. Finally, the displacements are post-processed through a statistically-based algorithm from the available Metadata to obtain the probability of collapse of the structure for different earthquake magnitudes. From this study, it is possible to build a vulnerability report for the structure type and seismic data. The proposed methodology can be combined with the on-going initiatives to build a European earthquake record database. In this context, Grid enables collaboration analysis over shared seismic data and results among different institutions.
LoopX: A Graphical User Interface-Based Database for Comprehensive Analysis and Comparative Evaluation of Loops from Protein Structures.

PubMed

Kadumuri, Rajashekar Varma; Vadrevu, Ramakrishna

2017-10-01

Due to their crucial role in function, folding, and stability, protein loops are being targeted for grafting/designing to create novel or alter existing functionality and improve stability and foldability. With a view to facilitate a thorough analysis and effectual search options for extracting and comparing loops for sequence and structural compatibility, we developed, LoopX a comprehensively compiled library of sequence and conformational features of ∼700,000 loops from protein structures. The database equipped with a graphical user interface is empowered with diverse query tools and search algorithms, with various rendering options to visualize the sequence- and structural-level information along with hydrogen bonding patterns, backbone φ, ψ dihedral angles of both the target and candidate loops. Two new features (i) conservation of the polar/nonpolar environment and (ii) conservation of sequence and conformation of specific residues within the loops have also been incorporated in the search and retrieval of compatible loops for a chosen target loop. Thus, the LoopX server not only serves as a database and visualization tool for sequence and structural analysis of protein loops but also aids in extracting and comparing candidate loops for a given target loop based on user-defined search options.
Virtual screening applications: a study of ligand-based methods and different structure representations in four different scenarios.

PubMed

Hristozov, Dimitar P; Oprea, Tudor I; Gasteiger, Johann

2007-01-01

Four different ligand-based virtual screening scenarios are studied: (1) prioritizing compounds for subsequent high-throughput screening (HTS); (2) selecting a predefined (small) number of potentially active compounds from a large chemical database; (3) assessing the probability that a given structure will exhibit a given activity; (4) selecting the most active structure(s) for a biological assay. Each of the four scenarios is exemplified by performing retrospective ligand-based virtual screening for eight different biological targets using two large databases--MDDR and WOMBAT. A comparison between the chemical spaces covered by these two databases is presented. The performance of two techniques for ligand--based virtual screening--similarity search with subsequent data fusion (SSDF) and novelty detection with Self-Organizing Maps (ndSOM) is investigated. Three different structure representations--2,048-dimensional Daylight fingerprints, topological autocorrelation weighted by atomic physicochemical properties (sigma electronegativity, polarizability, partial charge, and identity) and radial distribution functions weighted by the same atomic physicochemical properties--are compared. Both methods were found applicable in scenario one. The similarity search was found to perform slightly better in scenario two while the SOM novelty detection is preferred in scenario three. No method/descriptor combination achieved significant success in scenario four.
Automated compound classification using a chemical ontology.

PubMed

Bobach, Claudia; Böhme, Timo; Laube, Ulf; Püschel, Anett; Weber, Lutz

2012-12-29

Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated.
Automated compound classification using a chemical ontology

PubMed Central

2012-01-01

Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated. PMID:23273256

Some links on this page may take you to non-federal websites. Their policies may differ from this site.