Sample records for structure-based computational database

  1. LMSD: LIPID MAPS structure database

    PubMed Central

    Sud, Manish; Fahy, Eoin; Cotter, Dawn; Brown, Alex; Dennis, Edward A.; Glass, Christopher K.; Merrill, Alfred H.; Murphy, Robert C.; Raetz, Christian R. H.; Russell, David W.; Subramaniam, Shankar

    2007-01-01

    The LIPID MAPS Structure Database (LMSD) is a relational database encompassing structures and annotations of biologically relevant lipids. Structures of lipids in the database come from four sources: (i) LIPID MAPS Consortium's core laboratories and partners; (ii) lipids identified by LIPID MAPS experiments; (iii) computationally generated structures for appropriate lipid classes; (iv) biologically relevant lipids manually curated from LIPID BANK, LIPIDAT and other public sources. All the lipid structures in LMSD are drawn in a consistent fashion. In addition to a classification-based retrieval of lipids, users can search LMSD using either text-based or structure-based search options. The text-based search implementation supports data retrieval by any combination of these data fields: LIPID MAPS ID, systematic or common name, mass, formula, category, main class, and subclass data fields. The structure-based search, in conjunction with optional data fields, provides the capability to perform a substructure search or exact match for the structure drawn by the user. Search results, in addition to structure and annotations, also include relevant links to external databases. The LMSD is publicly available at PMID:17098933

  2. Why Save Your Course as a Relational Database?

    ERIC Educational Resources Information Center

    Hamilton, Gregory C.; Katz, David L.; Davis, James E.

    2000-01-01

    Describes a system that stores course materials for computer-based training programs in a relational database called Of Course! Outlines the basic structure of the databases; explains distinctions between Of Course! and other authoring languages; and describes how data is retrieved from the database and presented to the student. (Author/LRW)

  3. Preliminary Design of a Consultation Knowledge-Based System for the Minimization of Distortion in Welded Structures

    DTIC Science & Technology

    1989-02-01

    which capture the knowledge of such experts. These Expert Systems, or Knowledge-Based Systems’, differ from the usual computer programming techniques...their applications in the fields of structural design and welding is reviewed. 5.1 Introduction Expert Systems, or KBES, are computer programs using Al...procedurally constructed as conventional computer programs usually are; * The knowledge base of such systems is executable, unlike databases 3 "Ill

  4. [Computational chemistry in structure-based drug design].

    PubMed

    Cao, Ran; Li, Wei; Sun, Han-Zi; Zhou, Yu; Huang, Niu

    2013-07-01

    Today, the understanding of the sequence and structure of biologically relevant targets is growing rapidly and researchers from many disciplines, physics and computational science in particular, are making significant contributions to modern biology and drug discovery. However, it remains challenging to rationally design small molecular ligands with desired biological characteristics based on the structural information of the drug targets, which demands more accurate calculation of ligand binding free-energy. With the rapid advances in computer power and extensive efforts in algorithm development, physics-based computational chemistry approaches have played more important roles in structure-based drug design. Here we reviewed the newly developed computational chemistry methods in structure-based drug design as well as the elegant applications, including binding-site druggability assessment, large scale virtual screening of chemical database, and lead compound optimization. Importantly, here we address the current bottlenecks and propose practical solutions.

  5. Technology for organization of the onboard system for processing and storage of ERS data for ultrasmall spacecraft

    NASA Astrophysics Data System (ADS)

    Strotov, Valery V.; Taganov, Alexander I.; Konkin, Yuriy V.; Kolesenkov, Aleksandr N.

    2017-10-01

    Task of processing and analysis of obtained Earth remote sensing data on ultra-small spacecraft board is actual taking into consideration significant expenditures of energy for data transfer and low productivity of computers. Thereby, there is an issue of effective and reliable storage of the general information flow obtained from onboard systems of information collection, including Earth remote sensing data, into a specialized data base. The paper has considered peculiarities of database management system operation with the multilevel memory structure. For storage of data in data base the format has been developed that describes a data base physical structure which contains required parameters for information loading. Such structure allows reducing a memory size occupied by data base because it is not necessary to store values of keys separately. The paper has shown architecture of the relational database management system oriented into embedment into the onboard ultra-small spacecraft software. Data base for storage of different information, including Earth remote sensing data, can be developed by means of such database management system for its following processing. Suggested database management system architecture has low requirements to power of the computer systems and memory resources on the ultra-small spacecraft board. Data integrity is ensured under input and change of the structured information.

  6. Fast online and index-based algorithms for approximate search of RNA sequence-structure patterns

    PubMed Central

    2013-01-01

    Background It is well known that the search for homologous RNAs is more effective if both sequence and structure information is incorporated into the search. However, current tools for searching with RNA sequence-structure patterns cannot fully handle mutations occurring on both these levels or are simply not fast enough for searching large sequence databases because of the high computational costs of the underlying sequence-structure alignment problem. Results We present new fast index-based and online algorithms for approximate matching of RNA sequence-structure patterns supporting a full set of edit operations on single bases and base pairs. Our methods efficiently compute semi-global alignments of structural RNA patterns and substrings of the target sequence whose costs satisfy a user-defined sequence-structure edit distance threshold. For this purpose, we introduce a new computing scheme to optimally reuse the entries of the required dynamic programming matrices for all substrings and combine it with a technique for avoiding the alignment computation of non-matching substrings. Our new index-based methods exploit suffix arrays preprocessed from the target database and achieve running times that are sublinear in the size of the searched sequences. To support the description of RNA molecules that fold into complex secondary structures with multiple ordered sequence-structure patterns, we use fast algorithms for the local or global chaining of approximate sequence-structure pattern matches. The chaining step removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our improved online algorithm is faster than the best previous method by up to factor 45. Our best new index-based algorithm achieves a speedup of factor 560. Conclusions The presented methods achieve considerable speedups compared to the best previous method. This, together with the expected sublinear running time of the presented index-based algorithms, allows for the first time approximate matching of RNA sequence-structure patterns in large sequence databases. Beyond the algorithmic contributions, we provide with RaligNAtor a robust and well documented open-source software package implementing the algorithms presented in this manuscript. The RaligNAtor software is available at http://www.zbh.uni-hamburg.de/ralignator. PMID:23865810

  7. Ultra-Structure database design methodology for managing systems biology data and analyses

    PubMed Central

    Maier, Christopher W; Long, Jeffrey G; Hemminger, Bradley M; Giddings, Morgan C

    2009-01-01

    Background Modern, high-throughput biological experiments generate copious, heterogeneous, interconnected data sets. Research is dynamic, with frequently changing protocols, techniques, instruments, and file formats. Because of these factors, systems designed to manage and integrate modern biological data sets often end up as large, unwieldy databases that become difficult to maintain or evolve. The novel rule-based approach of the Ultra-Structure design methodology presents a potential solution to this problem. By representing both data and processes as formal rules within a database, an Ultra-Structure system constitutes a flexible framework that enables users to explicitly store domain knowledge in both a machine- and human-readable form. End users themselves can change the system's capabilities without programmer intervention, simply by altering database contents; no computer code or schemas need be modified. This provides flexibility in adapting to change, and allows integration of disparate, heterogenous data sets within a small core set of database tables, facilitating joint analysis and visualization without becoming unwieldy. Here, we examine the application of Ultra-Structure to our ongoing research program for the integration of large proteomic and genomic data sets (proteogenomic mapping). Results We transitioned our proteogenomic mapping information system from a traditional entity-relationship design to one based on Ultra-Structure. Our system integrates tandem mass spectrum data, genomic annotation sets, and spectrum/peptide mappings, all within a small, general framework implemented within a standard relational database system. General software procedures driven by user-modifiable rules can perform tasks such as logical deduction and location-based computations. The system is not tied specifically to proteogenomic research, but is rather designed to accommodate virtually any kind of biological research. Conclusion We find Ultra-Structure offers substantial benefits for biological information systems, the largest being the integration of diverse information sources into a common framework. This facilitates systems biology research by integrating data from disparate high-throughput techniques. It also enables us to readily incorporate new data types, sources, and domain knowledge with no change to the database structure or associated computer code. Ultra-Structure may be a significant step towards solving the hard problem of data management and integration in the systems biology era. PMID:19691849

  8. SPIRE Data-Base Management System

    NASA Technical Reports Server (NTRS)

    Fuechsel, C. F.

    1984-01-01

    Spacelab Payload Integration and Rocket Experiment (SPIRE) data-base management system (DBMS) based on relational model of data bases. Data bases typically used for engineering and mission analysis tasks and, unlike most commercially available systems, allow data items and data structures stored in forms suitable for direct analytical computation. SPIRE DBMS designed to support data requests from interactive users as well as applications programs.

  9. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2006-08-08

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  10. Computer systems and methods for the query and visualization of multidimensional database

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2010-05-11

    A method and system for producing graphics. A hierarchical structure of a database is determined. A visual table, comprising a plurality of panes, is constructed by providing a specification that is in a language based on the hierarchical structure of the database. In some cases, this language can include fields that are in the database schema. The database is queried to retrieve a set of tuples in accordance with the specification. A subset of the set of tuples is associated with a pane in the plurality of panes.

  11. Automated extraction of knowledge for model-based diagnostics

    NASA Technical Reports Server (NTRS)

    Gonzalez, Avelino J.; Myler, Harley R.; Towhidnejad, Massood; Mckenzie, Frederic D.; Kladke, Robin R.

    1990-01-01

    The concept of accessing computer aided design (CAD) design databases and extracting a process model automatically is investigated as a possible source for the generation of knowledge bases for model-based reasoning systems. The resulting system, referred to as automated knowledge generation (AKG), uses an object-oriented programming structure and constraint techniques as well as internal database of component descriptions to generate a frame-based structure that describes the model. The procedure has been designed to be general enough to be easily coupled to CAD systems that feature a database capable of providing label and connectivity data from the drawn system. The AKG system is capable of defining knowledge bases in formats required by various model-based reasoning tools.

  12. Cloud-Based NoSQL Open Database of Pulmonary Nodules for Computer-Aided Lung Cancer Diagnosis and Reproducible Research.

    PubMed

    Ferreira Junior, José Raniery; Oliveira, Marcelo Costa; de Azevedo-Marques, Paulo Mazzoncini

    2016-12-01

    Lung cancer is the leading cause of cancer-related deaths in the world, and its main manifestation is pulmonary nodules. Detection and classification of pulmonary nodules are challenging tasks that must be done by qualified specialists, but image interpretation errors make those tasks difficult. In order to aid radiologists on those hard tasks, it is important to integrate the computer-based tools with the lesion detection, pathology diagnosis, and image interpretation processes. However, computer-aided diagnosis research faces the problem of not having enough shared medical reference data for the development, testing, and evaluation of computational methods for diagnosis. In order to minimize this problem, this paper presents a public nonrelational document-oriented cloud-based database of pulmonary nodules characterized by 3D texture attributes, identified by experienced radiologists and classified in nine different subjective characteristics by the same specialists. Our goal with the development of this database is to improve computer-aided lung cancer diagnosis and pulmonary nodule detection and classification research through the deployment of this database in a cloud Database as a Service framework. Pulmonary nodule data was provided by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), image descriptors were acquired by a volumetric texture analysis, and database schema was developed using a document-oriented Not only Structured Query Language (NoSQL) approach. The proposed database is now with 379 exams, 838 nodules, and 8237 images, 4029 of them are CT scans and 4208 manually segmented nodules, and it is allocated in a MongoDB instance on a cloud infrastructure.

  13. Database systems for knowledge-based discovery.

    PubMed

    Jagarlapudi, Sarma A R P; Kishan, K V Radha

    2009-01-01

    Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.

  14. Visualization and manipulating the image of a formal data structure (FDS)-based database

    NASA Astrophysics Data System (ADS)

    Verdiesen, Franc; de Hoop, Sylvia; Molenaar, Martien

    1994-08-01

    A vector map is a terrain representation with a vector-structured geometry. Molenaar formulated an object-oriented formal data structure for 3D single valued vector maps. This FDS is implemented in a database (Oracle). In this study we describe a methodology for visualizing a FDS-based database and manipulating the image. A data set retrieved by querying the database is converted into an import file for a drawing application. An objective of this study is that an end-user can alter and add terrain objects in the image. The drawing application creates an export file, that is compared with the import file. Differences between these files result in updating the database which involves checks on consistency. In this study Autocad is used for visualizing and manipulating the image of the data set. A computer program has been written for the data exchange and conversion between Oracle and Autocad. The data structure of the FDS is compared to the data structure of Autocad and the data of the FDS is converted into the structure of Autocad equal to the FDS.

  15. 3DNALandscapes: a database for exploring the conformational features of DNA.

    PubMed

    Zheng, Guohui; Colasanti, Andrew V; Lu, Xiang-Jun; Olson, Wilma K

    2010-01-01

    3DNALandscapes, located at: http://3DNAscapes.rutgers.edu, is a new database for exploring the conformational features of DNA. In contrast to most structural databases, which archive the Cartesian coordinates and/or derived parameters and images for individual structures, 3DNALandscapes enables searches of conformational information across multiple structures. The database contains a wide variety of structural parameters and molecular images, computed with the 3DNA software package and known to be useful for characterizing and understanding the sequence-dependent spatial arrangements of the DNA sugar-phosphate backbone, sugar-base side groups, base pairs, base-pair steps, groove structure, etc. The data comprise all DNA-containing structures--both free and bound to proteins, drugs and other ligands--currently available in the Protein Data Bank. The web interface allows the user to link, report, plot and analyze this information from numerous perspectives and thereby gain insight into DNA conformation, deformability and interactions in different sequence and structural contexts. The data accumulated from known, well-resolved DNA structures can serve as useful benchmarks for the analysis and simulation of new structures. The collective data can also help to understand how DNA deforms in response to proteins and other molecules and undergoes conformational rearrangements.

  16. Columba: an integrated database of proteins, structures, and annotations.

    PubMed

    Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

    2005-03-31

    Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.

  17. SInCRe—structural interactome computational resource for Mycobacterium tuberculosis

    PubMed Central

    Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy

    2015-01-01

    We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660

  18. Software and resources for computational medicinal chemistry

    PubMed Central

    Liao, Chenzhong; Sitzmann, Markus; Pugliese, Angelo; Nicklaus, Marc C

    2011-01-01

    Computer-aided drug design plays a vital role in drug discovery and development and has become an indispensable tool in the pharmaceutical industry. Computational medicinal chemists can take advantage of all kinds of software and resources in the computer-aided drug design field for the purposes of discovering and optimizing biologically active compounds. This article reviews software and other resources related to computer-aided drug design approaches, putting particular emphasis on structure-based drug design, ligand-based drug design, chemical databases and chemoinformatics tools. PMID:21707404

  19. Integrative interactive visualization of crystal structure, band structure, and Brillouin zone

    NASA Astrophysics Data System (ADS)

    Hanson, Robert; Hinke, Ben; van Koevering, Matthew; Oses, Corey; Toher, Cormac; Hicks, David; Gossett, Eric; Plata Ramos, Jose; Curtarolo, Stefano; Aflow Collaboration

    The AFLOW library is an open-access database for high throughput ab-initio calculations that serves as a resource for the dissemination of computational results in the area of materials science. Our project aims to create an interactive web-based visualization of any structure in the AFLOW database that has associate band structure data in a way that allows novel simultaneous exploration of the crystal structure, band structure, and Brillouin zone. Interactivity is obtained using two synchronized JSmol implementations, one for the crystal structure and one for the Brillouin zone, along with a D3-based band-structure diagram produced on the fly from data obtained from the AFLOW database. The current website portal (http://aflowlib.mems.duke.edu/users/jmolers/matt/website) allows interactive access and visualization of crystal structure, Brillouin zone and band structure for more than 55,000 inorganic crystal structures. This work was supported by the US Navy Office of Naval Research through a Broad Area Announcement administered by Duke University.

  20. Distributed computing for macromolecular crystallography

    PubMed Central

    Krissinel, Evgeny; Uski, Ville; Lebedev, Andrey; Ballard, Charles

    2018-01-01

    Modern crystallographic computing is characterized by the growing role of automated structure-solution pipelines, which represent complex expert systems utilizing a number of program components, decision makers and databases. They also require considerable computational resources and regular database maintenance, which is increasingly more difficult to provide at the level of individual desktop-based CCP4 setups. On the other hand, there is a significant growth in data processed in the field, which brings up the issue of centralized facilities for keeping both the data collected and structure-solution projects. The paradigm of distributed computing and data management offers a convenient approach to tackling these problems, which has become more attractive in recent years owing to the popularity of mobile devices such as tablets and ultra-portable laptops. In this article, an overview is given of developments by CCP4 aimed at bringing distributed crystallographic computations to a wide crystallographic community. PMID:29533240

  1. Distributed computing for macromolecular crystallography.

    PubMed

    Krissinel, Evgeny; Uski, Ville; Lebedev, Andrey; Winn, Martyn; Ballard, Charles

    2018-02-01

    Modern crystallographic computing is characterized by the growing role of automated structure-solution pipelines, which represent complex expert systems utilizing a number of program components, decision makers and databases. They also require considerable computational resources and regular database maintenance, which is increasingly more difficult to provide at the level of individual desktop-based CCP4 setups. On the other hand, there is a significant growth in data processed in the field, which brings up the issue of centralized facilities for keeping both the data collected and structure-solution projects. The paradigm of distributed computing and data management offers a convenient approach to tackling these problems, which has become more attractive in recent years owing to the popularity of mobile devices such as tablets and ultra-portable laptops. In this article, an overview is given of developments by CCP4 aimed at bringing distributed crystallographic computations to a wide crystallographic community.

  2. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

    PubMed

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.

  3. MEGADOCK-Web: an integrated database of high-throughput structure-based protein-protein interaction predictions.

    PubMed

    Hayashi, Takanori; Matsuzaki, Yuri; Yanagisawa, Keisuke; Ohue, Masahito; Akiyama, Yutaka

    2018-05-08

    Protein-protein interactions (PPIs) play several roles in living cells, and computational PPI prediction is a major focus of many researchers. The three-dimensional (3D) structure and binding surface are important for the design of PPI inhibitors. Therefore, rigid body protein-protein docking calculations for two protein structures are expected to allow elucidation of PPIs different from known complexes in terms of 3D structures because known PPI information is not explicitly required. We have developed rapid PPI prediction software based on protein-protein docking, called MEGADOCK. In order to fully utilize the benefits of computational PPI predictions, it is necessary to construct a comprehensive database to gather prediction results and their predicted 3D complex structures and to make them easily accessible. Although several databases exist that provide predicted PPIs, the previous databases do not contain a sufficient number of entries for the purpose of discovering novel PPIs. In this study, we constructed an integrated database of MEGADOCK PPI predictions, named MEGADOCK-Web. MEGADOCK-Web provides more than 10 times the number of PPI predictions than previous databases and enables users to conduct PPI predictions that cannot be found in conventional PPI prediction databases. In MEGADOCK-Web, there are 7528 protein chains and 28,331,628 predicted PPIs from all possible combinations of those proteins. Each protein structure is annotated with PDB ID, chain ID, UniProt AC, related KEGG pathway IDs, and known PPI pairs. Additionally, MEGADOCK-Web provides four powerful functions: 1) searching precalculated PPI predictions, 2) providing annotations for each predicted protein pair with an experimentally known PPI, 3) visualizing candidates that may interact with the query protein on biochemical pathways, and 4) visualizing predicted complex structures through a 3D molecular viewer. MEGADOCK-Web provides a huge amount of comprehensive PPI predictions based on docking calculations with biochemical pathways and enables users to easily and quickly assess PPI feasibilities by archiving PPI predictions. MEGADOCK-Web also promotes the discovery of new PPIs and protein functions and is freely available for use at http://www.bi.cs.titech.ac.jp/megadock-web/ .

  4. The structure and dipole moment of globular proteins in solution and crystalline states: use of NMR and X-ray databases for the numerical calculation of dipole moment.

    PubMed

    Takashima, S

    2001-04-05

    The large dipole moment of globular proteins has been well known because of the detailed studies using dielectric relaxation and electro-optical methods. The search for the origin of these dipolemoments, however, must be based on the detailed knowledge on protein structure with atomic resolutions. At present, we have two sources of information on the structure of protein molecules: (1) x-ray databases obtained in crystalline state; (2) NMR databases obtained in solution state. While x-ray databases consist of only one model, NMR databases, because of the fluctuation of the protein folding in solution, consist of a number of models, thus enabling the computation of dipole moment repeated for all these models. The aim of this work, using these databases, is the detailed investigation on the interdependence between the structure and dipole moment of protein molecules. The dipole moment of protein molecules has roughly two components: one dipole moment is due to surface charges and the other, core dipole moment, is due to polar groups such as N--H and C==O bonds. The computation of surface charge dipole moment consists of two steps: (A) calculation of the pK shifts of charged groups for electrostatic interactions and (B) calculation of the dipole moment using the pK corrected for electrostatic shifts. The dipole moments of several proteins were computed using both NMR and x-ray databases. The dipole moments of these two sets of calculations are, with a few exceptions, in good agreement with one another and also with measured dipole moments.

  5. Smiles2Monomers: a link between chemical and biological structures for polymers.

    PubMed

    Dufresne, Yoann; Noé, Laurent; Leclère, Valérie; Pupin, Maude

    2015-01-01

    The monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. Many databases give access to the atomic structure of compounds but the monomeric structure of polymers is often lacking. We have designed a smart algorithm, implemented in the tool Smiles2Monomers (s2m), to infer efficiently and accurately the monomeric structure of a polymer from its chemical structure. Our strategy is divided into two steps: first, monomers are mapped on the atomic structure by an efficient subgraph-isomorphism algorithm ; second, the best tiling is computed so that non-overlapping monomers cover all the structure of the target polymer. The mapping is based on a Markovian index built by a dynamic programming algorithm. The index enables s2m to search quickly all the given monomers on a target polymer. After, a greedy algorithm combines the mapped monomers into a consistent monomeric structure. Finally, a local branch and cut algorithm refines the structure. We tested this method on two manually annotated databases of polymers and reconstructed the structures de novo with a sensitivity over 90 %. The average computation time per polymer is 2 s. s2m automatically creates de novo monomeric annotations for polymers, efficiently in terms of time computation and sensitivity. s2m allowed us to detect annotation errors in the tested databases and to easily find the accurate structures. So, s2m could be integrated into the curation process of databases of small compounds to verify the current entries and accelerate the annotation of new polymers. The full method can be downloaded or accessed via a website for peptide-like polymers at http://bioinfo.lifl.fr/norine/smiles2monomers.jsp.Graphical abstract:.

  6. Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

    PubMed

    Brohée, Sylvain; Barriot, Roland; Moreau, Yves

    2010-09-01

    In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.

  7. A rudimentary database for three-dimensional objects using structural representation

    NASA Technical Reports Server (NTRS)

    Sowers, James P.

    1987-01-01

    A database which enables users to store and share the description of three-dimensional objects in a research environment is presented. The main objective of the design is to make it a compact structure that holds sufficient information to reconstruct the object. The database design is based on an object representation scheme which is information preserving, reasonably efficient, and yet economical in terms of the storage requirement. The determination of the needed data for the reconstruction process is guided by the belief that it is faster to do simple computations to generate needed data/information for construction than to retrieve everything from memory. Some recent techniques of three-dimensional representation that influenced the design of the database are discussed. The schema for the database and the structural definition used to define an object are given. The user manual for the software developed to create and maintain the contents of the database is included.

  8. Structure elucidation of organic compounds aided by the computer program system SCANNET

    NASA Astrophysics Data System (ADS)

    Guzowska-Swider, B.; Hippe, Z. S.

    1992-12-01

    Recognition of chemical structure is a very important problem currently solved by molecular spectroscopy, particularly IR, UV, NMR and Raman spectroscopy, and mass spectrometry. Nowadays, solution of the problem is frequently aided by the computer. SCANNET is a computer program system for structure elucidation of organic compounds, developed by our group. The structure recognition of an unknown substance is made by comparing its spectrum with successive reference spectra of standard compounds, i.e. chemical compounds of known chemical structure, stored in a spectral database. The computer program system SCANNET consists of six different spectral databases for following the analytical methods: IR, UV, 13C-NMR, 1H-NMR and Raman spectroscopy, and mass spectrometry. A chemist, to elucidate a structure, can use one of these spectral methods or a combination of them and search the appropriate databases. As the result of searching each spectral database, the user obtains a list of chemical substances whose spectra are identical and/or similar to the spectrum input into the computer. The final information obtained from searching the spectral databases is in the form of a list of chemical substances having all the examined spectra, for each type of spectroscopy, identical or simlar to those of the unknown compound.

  9. Data Structures in Natural Computing: Databases as Weak or Strong Anticipatory Systems

    NASA Astrophysics Data System (ADS)

    Rossiter, B. N.; Heather, M. A.

    2004-08-01

    Information systems anticipate the real world. Classical databases store, organise and search collections of data of that real world but only as weak anticipatory information systems. This is because of the reductionism and normalisation needed to map the structuralism of natural data on to idealised machines with von Neumann architectures consisting of fixed instructions. Category theory developed as a formalism to explore the theoretical concept of naturality shows that methods like sketches arising from graph theory as only non-natural models of naturality cannot capture real-world structures for strong anticipatory information systems. Databases need a schema of the natural world. Natural computing databases need the schema itself to be also natural. Natural computing methods including neural computers, evolutionary automata, molecular and nanocomputing and quantum computation have the potential to be strong. At present they are mainly at the stage of weak anticipatory systems.

  10. GALT protein database, a bioinformatics resource for the management and analysis of structural features of a galactosemia-related protein and its mutants.

    PubMed

    d'Acierno, Antonio; Facchiano, Angelo; Marabotti, Anna

    2009-06-01

    We describe the GALT-Prot database and its related web-based application that have been developed to collect information about the structural and functional effects of mutations on the human enzyme galactose-1-phosphate uridyltransferase (GALT) involved in the genetic disease named galactosemia type I. Besides a list of missense mutations at gene and protein sequence levels, GALT-Prot reports the analysis results of mutant GALT structures. In addition to the structural information about the wild-type enzyme, the database also includes structures of over 100 single point mutants simulated by means of a computational procedure, and the analysis to each mutant was made with several bioinformatics programs in order to investigate the effect of the mutations. The web-based interface allows querying of the database, and several links are also provided in order to guarantee a high integration with other resources already present on the web. Moreover, the architecture of the database and the web application is flexible and can be easily adapted to store data related to other proteins with point mutations. GALT-Prot is freely available at http://bioinformatica.isa.cnr.it/GALT/.

  11. Multidisciplinary analysis and design of printed wiring boards

    NASA Astrophysics Data System (ADS)

    Fulton, Robert E.; Hughes, Joseph L.; Scott, Waymond R., Jr.; Umeagukwu, Charles; Yeh, Chao-Pin

    1991-04-01

    Modern printed wiring board design depends on electronic prototyping using computer-based simulation and design tools. Existing electrical computer-aided design (ECAD) tools emphasize circuit connectivity with only rudimentary analysis capabilities. This paper describes a prototype integrated PWB design environment denoted Thermal Structural Electromagnetic Testability (TSET) being developed at Georgia Tech in collaboration with companies in the electronics industry. TSET provides design guidance based on enhanced electrical and mechanical CAD capabilities including electromagnetic modeling testability analysis thermal management and solid mechanics analysis. TSET development is based on a strong analytical and theoretical science base and incorporates an integrated information framework and a common database design based on a systematic structured methodology.

  12. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  13. An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

    DOE PAGES

    AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

    2015-11-19

    Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less

  14. Computationally Efficient Characterization of Potential Energy Surfaces Based on Fingerprint Distances

    NASA Astrophysics Data System (ADS)

    Schaefer, Bastian; Goedecker, Stefan; Goedecker Group Team

    Based on Lennard-Jones, Silicon, Sodium-Chloride and Gold clusters, it was found that uphill barrier energies of transition states between directly connected minima tend to increase with increasing structural differences of the two minima. Based on this insight it also turned out that post-processing minima hopping data at a negligible computational cost allows to obtain qualitative topological information on potential energy surfaces that can be stored in so called qualitative connectivity databases. These qualitative connectivity databases are used for generating fingerprint disconnectivity graphs that allow to obtain a first qualitative idea on thermodynamic and kinetic properties of a system of interest. This research was supported by the NCCR MARVEL, funded by the Swiss National Science Foundation. Computer time was provided by the Swiss National Supercomputing Centre (CSCS) under Project ID No. s499.

  15. Mass-Spectrometry Based Structure Identification of "Known-Unknowns" Using the EPA's CompTox Dashboard (ACS Spring National Meeting) 4 of 7

    EPA Science Inventory

    The CompTox Dashboard is a publicly accessible database provided by the National Center for Computational Toxicology at the US-EPA. The dashboard provides access to a database containing ~720,000 chemicals and integrates a number of our public-facing projects (e.g. ToxCast and Ex...

  16. Combined use of computational chemistry and chemoinformatics methods for chemical discovery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sugimoto, Manabu, E-mail: sugimoto@kumamoto-u.ac.jp; Institute for Molecular Science, 38 Nishigo-Naka, Myodaiji, Okazaki 444-8585; CREST, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012

    2015-12-31

    Data analysis on numerical data by the computational chemistry calculations is carried out to obtain knowledge information of molecules. A molecular database is developed to systematically store chemical, electronic-structure, and knowledge-based information. The database is used to find molecules related to a keyword of “cancer”. Then the electronic-structure calculations are performed to quantitatively evaluate quantum chemical similarity of the molecules. Among the 377 compounds registered in the database, 24 molecules are found to be “cancer”-related. This set of molecules includes both carcinogens and anticancer drugs. The quantum chemical similarity analysis, which is carried out by using numerical results of themore » density-functional theory calculations, shows that, when some energy spectra are referred to, carcinogens are reasonably distinguished from the anticancer drugs. Therefore these spectral properties are considered of as important measures for classification.« less

  17. bpRNA: large-scale automated annotation and analysis of RNA secondary structure.

    PubMed

    Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David

    2018-05-09

    While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.

  18. Introduction to the computational structural mechanics testbed

    NASA Technical Reports Server (NTRS)

    Lotts, C. G.; Greene, W. H.; Mccleary, S. L.; Knight, N. F., Jr.; Paulson, S. S.; Gillian, R. E.

    1987-01-01

    The Computational Structural Mechanics (CSM) testbed software system based on the SPAR finite element code and the NICE system is described. This software is denoted NICE/SPAR. NICE was developed at Lockheed Palo Alto Research Laboratory and contains data management utilities, a command language interpreter, and a command language definition for integrating engineering computational modules. SPAR is a system of programs used for finite element structural analysis developed for NASA by Lockheed and Engineering Information Systems, Inc. It includes many complementary structural analysis, thermal analysis, utility functions which communicate through a common database. The work on NICE/SPAR was motivated by requirements for a highly modular and flexible structural analysis system to use as a tool in carrying out research in computational methods and exploring computer hardware. Analysis examples are presented which demonstrate the benefits gained from a combination of the NICE command language with a SPAR computational modules.

  19. Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.

    PubMed

    Arita, Masanori; Suwa, Kazuhiro

    2008-09-17

    In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.

  20. Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

    PubMed Central

    Arita, Masanori; Suwa, Kazuhiro

    2008-01-01

    Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113

  1. Exploring Natural Products from the Biodiversity of Pakistan for Computational Drug Discovery Studies: Collection, Optimization, Design and Development of A Chemical Database (ChemDP).

    PubMed

    Mirza, Shaher Bano; Bokhari, Habib; Fatmi, Muhammad Qaiser

    2015-01-01

    Pakistan possesses a rich and vast source of natural products (NPs). Some of these secondary metabolites have been identified as potent therapeutic agents. However, the medicinal usage of most of these compounds has not yet been fully explored. The discoveries for new scaffolds of NPs as inhibitors of certain enzymes or receptors using advanced computational drug discovery approaches are also limited due to the unavailability of accurate 3D structures of NPs. An organized database incorporating all relevant information, therefore, can facilitate to explore the medicinal importance of the metabolites from Pakistani Biodiversity. The Chemical Database of Pakistan (ChemDP; release 01) is a fully-referenced, evolving, web-based, virtual database which has been designed and developed to introduce natural products (NPs) and their derivatives from the biodiversity of Pakistan to Global scientific communities. The prime aim is to provide quality structures of compounds with relevant information for computer-aided drug discovery studies. For this purpose, over 1000 NPs have been identified from more than 400 published articles, for which 2D and 3D molecular structures have been generated with a special focus on their stereochemistry, where applicable. The PM7 semiempirical quantum chemistry method has been used to energy optimize the 3D structure of NPs. The 2D and 3D structures can be downloaded as .sdf, .mol, .sybyl, .mol2, and .pdb files - readable formats by many chemoinformatics/bioinformatics software packages. Each entry in ChemDP contains over 100 data fields representing various molecular, biological, physico-chemical and pharmacological properties, which have been properly documented in the database for end users. These pieces of information have been either manually extracted from the literatures or computationally calculated using various computational tools. Cross referencing to a major data repository i.e. ChemSpider has been made available for overlapping compounds. An android application of ChemDP is available at its website. The ChemDP is freely accessible at www.chemdp.com.

  2. Radioactivity and Environmental Security in the Oceans: New Research and Policy Priorities in the Arctic and North Atlantic

    DTIC Science & Technology

    1993-06-09

    within the framework of an update for the computer database "DiaNIK" which has been developed at the Vernadsky Institute of Geochemistry and Analytical...chemical thermodynamic data for minerals and mineral-forming substances. The structure of thermodynamic database "DiaNIK" is based on the principles...in the database . A substantial portion of the thermodynamic values recommended by "DiaNIK" experts for the substances in User Version 3.1 resulted from

  3. Stability assessment of structures under earthquake hazard through GRID technology

    NASA Astrophysics Data System (ADS)

    Prieto Castrillo, F.; Boton Fernandez, M.

    2009-04-01

    This work presents a GRID framework to estimate the vulnerability of structures under earthquake hazard. The tool has been designed to cover the needs of a typical earthquake engineering stability analysis; preparation of input data (pre-processing), response computation and stability analysis (post-processing). In order to validate the application over GRID, a simplified model of structure under artificially generated earthquake records has been implemented. To achieve this goal, the proposed scheme exploits the GRID technology and its main advantages (parallel intensive computing, huge storage capacity and collaboration analysis among institutions) through intensive interaction among the GRID elements (Computing Element, Storage Element, LHC File Catalogue, federated database etc.) The dynamical model is described by a set of ordinary differential equations (ODE's) and by a set of parameters. Both elements, along with the integration engine, are encapsulated into Java classes. With this high level design, subsequent improvements/changes of the model can be addressed with little effort. In the procedure, an earthquake record database is prepared and stored (pre-processing) in the GRID Storage Element (SE). The Metadata of these records is also stored in the GRID federated database. This Metadata contains both relevant information about the earthquake (as it is usual in a seismic repository) and also the Logical File Name (LFN) of the record for its later retrieval. Then, from the available set of accelerograms in the SE, the user can specify a range of earthquake parameters to carry out a dynamic analysis. This way, a GRID job is created for each selected accelerogram in the database. At the GRID Computing Element (CE), displacements are then obtained by numerical integration of the ODE's over time. The resulting response for that configuration is stored in the GRID Storage Element (SE) and the maximum structure displacement is computed. Then, the corresponding Metadata containing the response LFN, earthquake magnitude and maximum structure displacement is also stored. Finally, the displacements are post-processed through a statistically-based algorithm from the available Metadata to obtain the probability of collapse of the structure for different earthquake magnitudes. From this study, it is possible to build a vulnerability report for the structure type and seismic data. The proposed methodology can be combined with the on-going initiatives to build a European earthquake record database. In this context, Grid enables collaboration analysis over shared seismic data and results among different institutions.

  4. A neotropical Miocene pollen database employing image-based search and semantic modeling.

    PubMed

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-08-01

    Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.

  5. Advances in computational metabolomics and databases deepen the understanding of metabolisms.

    PubMed

    Tsugawa, Hiroshi

    2018-01-29

    Mass spectrometry (MS)-based metabolomics is the popular platform for metabolome analyses. Computational techniques for the processing of MS raw data, for example, feature detection, peak alignment, and the exclusion of false-positive peaks, have been established. The next stage of untargeted metabolomics would be to decipher the mass fragmentation of small molecules for the global identification of human-, animal-, plant-, and microbiota metabolomes, resulting in a deeper understanding of metabolisms. This review is an update on the latest computational metabolomics including known/expected structure databases, chemical ontology classifications, and mass spectrometry cheminformatics for the interpretation of mass fragmentations and for the elucidation of unknown metabolites. The importance of metabolome 'databases' and 'repositories' is also discussed because novel biological discoveries are often attributable to the accumulation of data, to relational databases, and to their statistics. Lastly, a practical guide for metabolite annotations is presented as the summary of this review. Copyright © 2018 Elsevier Ltd. All rights reserved.

  6. Gloss in Sanskrit Wordnet

    NASA Astrophysics Data System (ADS)

    Kulkarni, Malhar; Kulkarni, Irawati; Dangarikar, Chaitali; Bhattacharyya, Pushpak

    Glosses and examples are the essential components of the computational lexical databases like, Wordnet. These two components of the lexical database can be used in building domain ontologies, semantic relations, phrase structure rules etc., and can help automatic or manual word sense disambiguation tasks. The present paper aims to highlight the importance of gloss in the process of WSD based on the experiences from building Sanskrit Wordnet. This paper presents a survey of Sanskrit Synonymy lexica, use of Navya-Nyāya terminology in developing a gloss and the kind of patterns evolved that are useful for the computational purpose of WSD with special reference to Sanskrit.

  7. Databases of Conformations and NMR Structures of Glycan Determinants.

    PubMed

    Sarkar, Anita; Drouillard, Sophie; Rivet, Alain; Perez, Serge

    2015-12-01

    The present study reports a comprehensive nuclear magnetic resonance (NMR) characterization and a systematic conformational sampling of the conformational preferences of 170 glycan moieties of glycosphingolipids as produced in large-scale quantities by bacterial fermentation. These glycans span across a variety of families including the blood group antigens (A, B and O), core structures (Types 1, 2 and 4), fucosylated oligosaccharides (core and lacto-series), sialylated oligosaccharides (Types 1 and 2), Lewis antigens, GPI-anchors and globosides. A complementary set of about 100 glycan determinants occurring in glycoproteins and glycosaminoglycans has also been structurally characterized using molecular mechanics-based computation. The experimental and computational data generated are organized in two relational databases that can be queried by the user through a user-friendly search engine. The NMR ((1)H and (13)C, COSY, TOCSY, HMQC, HMBC correlation) spectra and 3D structures are available for visualization and download in commonly used structure formats. Emphasis has been given to the use of a common nomenclature for the structural encoding of the carbohydrates and each glycan molecule is described by four different types of representations in order to cope with the different usages in chemistry and biology. These web-based databases were developed with non-proprietary software and are open access for the scientific community available at http://glyco3d.cermav.cnrs.fr. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. A computer-based information system for epilepsy and electroencephalography.

    PubMed

    Finnerup, N B; Fuglsang-Frederiksen, A; Røssel, P; Jennum, P

    1999-08-01

    This paper describes a standardised computer-based information system for electroencephalography (EEG) focusing on epilepsy. The system was developed using a prototyping approach. It is based on international recommendations for EEG examination, interpretation and terminology, international guidelines for epidemiological studies on epilepsy and classification of epileptic seizures and syndromes and international classification of diseases. It is divided into: (1) clinical information and epilepsy relevant data; and (2) EEG data, which is hierarchically structured including description and interpretation of EEG. Data is coded but is supplemented with unrestricted text. The resulting patient database can be integrated with other clinical databases and with the patient record system and may facilitate clinical and epidemiological research and development of standards and guidelines for EEG description and interpretation. The system is currently used for teleconsultation between Gentofte and Lisbon.

  9. Computer based interpretation of infrared spectra-structure of the knowledge-base, automatic rule generation and interpretation

    NASA Astrophysics Data System (ADS)

    Ehrentreich, F.; Dietze, U.; Meyer, U.; Abbas, S.; Schulz, H.

    1995-04-01

    It is a main task within the SpecInfo-Project to develop interpretation tools that can handle a great deal more of the complicated, more specific spectrum-structure-correlations. In the first step the empirical knowledge about the assignment of structural groups and their characteristic IR-bands has been collected from literature and represented in a computer readable well-structured form. Vague, verbal rules are managed by introduction of linguistic variables. The next step was the development of automatic rule generating procedures. We had combined and enlarged the IDIOTS algorithm with the algorithm by Blaffert relying on set theory. The procedures were successfully applied to the SpecInfo database. The realization of the preceding items is a prerequisite for the improvement of the computerized structure elucidation procedure.

  10. Breast Imaging in the Era of Big Data: Structured Reporting and Data Mining.

    PubMed

    Margolies, Laurie R; Pandey, Gaurav; Horowitz, Eliot R; Mendelson, David S

    2016-02-01

    The purpose of this article is to describe structured reporting and the development of large databases for use in data mining in breast imaging. The results of millions of breast imaging examinations are reported with structured tools based on the BI-RADS lexicon. Much of these data are stored in accessible media. Robust computing power creates great opportunity for data scientists and breast imagers to collaborate to improve breast cancer detection and optimize screening algorithms. Data mining can create knowledge, but the questions asked and their complexity require extremely powerful and agile databases. New data technologies can facilitate outcomes research and precision medicine.

  11. MINEs: Open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics

    DOE PAGES

    Jeffryes, James G.; Colastani, Ricardo L.; Elbadawi-Sidhu, Mona; ...

    2015-08-28

    Metabolomics have proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography–mass spectrometry (LC–MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likelymore » to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC–MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures.« less

  12. microRNAs Databases: Developmental Methodologies, Structural and Functional Annotations.

    PubMed

    Singh, Nagendra Kumar

    2017-09-01

    microRNA (miRNA) is an endogenous and evolutionary conserved non-coding RNA, involved in post-transcriptional process as gene repressor and mRNA cleavage through RNA-induced silencing complex (RISC) formation. In RISC, miRNA binds in complementary base pair with targeted mRNA along with Argonaut proteins complex, causes gene repression or endonucleolytic cleavage of mRNAs and results in many diseases and syndromes. After the discovery of miRNA lin-4 and let-7, subsequently large numbers of miRNAs were discovered by low-throughput and high-throughput experimental techniques along with computational process in various biological and metabolic processes. The miRNAs are important non-coding RNA for understanding the complex biological phenomena of organism because it controls the gene regulation. This paper reviews miRNA databases with structural and functional annotations developed by various researchers. These databases contain structural and functional information of animal, plant and virus miRNAs including miRNAs-associated diseases, stress resistance in plant, miRNAs take part in various biological processes, effect of miRNAs interaction on drugs and environment, effect of variance on miRNAs, miRNAs gene expression analysis, sequence of miRNAs, structure of miRNAs. This review focuses on the developmental methodology of miRNA databases such as computational tools and methods used for extraction of miRNAs annotation from different resources or through experiment. This study also discusses the efficiency of user interface design of every database along with current entry and annotations of miRNA (pathways, gene ontology, disease ontology, etc.). Here, an integrated schematic diagram of construction process for databases is also drawn along with tabular and graphical comparison of various types of entries in different databases. Aim of this paper is to present the importance of miRNAs-related resources at a single place.

  13. Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

    PubMed

    Ezra Tsur, Elishai

    2017-01-01

    Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

  14. Digital hand atlas and computer-aided bone age assessment via the Web

    NASA Astrophysics Data System (ADS)

    Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente

    1999-07-01

    A frequently used assessment method of bone age is atlas matching by a radiological examination of a hand image against a reference set of atlas patterns of normal standards. We are in a process of developing a digital hand atlas with a large standard set of normal hand and wrist images that reflect the skeletal maturity, race and sex difference, and current child development. The digital hand atlas will be used for a computer-aided bone age assessment via Web. We have designed and partially implemented a computer-aided diagnostic (CAD) system for Web-based bone age assessment. The system consists of a digital hand atlas, a relational image database and a Web-based user interface. The digital atlas is based on a large standard set of normal hand an wrist images with extracted bone objects and quantitative features. The image database uses a content- based indexing to organize the hand images and their attributes and present to users in a structured way. The Web-based user interface allows users to interact with the hand image database from browsers. Users can use a Web browser to push a clinical hand image to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, will be extracted and compared with patterns from the atlas database to assess the bone age. The relevant reference imags and the final assessment report will be sent back to the user's browser via Web. The digital atlas will remove the disadvantages of the currently out-of-date one and allow the bone age assessment to be computerized and done conveniently via Web. In this paper, we present the system design and Web-based client-server model for computer-assisted bone age assessment and our initial implementation of the digital atlas database.

  15. MINEs: open access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics.

    PubMed

    Jeffryes, James G; Colastani, Ricardo L; Elbadawi-Sidhu, Mona; Kind, Tobias; Niehaus, Thomas D; Broadbelt, Linda J; Hanson, Andrew D; Fiehn, Oliver; Tyo, Keith E J; Henry, Christopher S

    2015-01-01

    In spite of its great promise, metabolomics has proven difficult to execute in an untargeted and generalizable manner. Liquid chromatography-mass spectrometry (LC-MS) has made it possible to gather data on thousands of cellular metabolites. However, matching metabolites to their spectral features continues to be a bottleneck, meaning that much of the collected information remains uninterpreted and that new metabolites are seldom discovered in untargeted studies. These challenges require new approaches that consider compounds beyond those available in curated biochemistry databases. Here we present Metabolic In silico Network Expansions (MINEs), an extension of known metabolite databases to include molecules that have not been observed, but are likely to occur based on known metabolites and common biochemical reactions. We utilize an algorithm called the Biochemical Network Integrated Computational Explorer (BNICE) and expert-curated reaction rules based on the Enzyme Commission classification system to propose the novel chemical structures and reactions that comprise MINE databases. Starting from the Kyoto Encyclopedia of Genes and Genomes (KEGG) COMPOUND database, the MINE contains over 571,000 compounds, of which 93% are not present in the PubChem database. However, these MINE compounds have on average higher structural similarity to natural products than compounds from KEGG or PubChem. MINE databases were able to propose annotations for 98.6% of a set of 667 MassBank spectra, 14% more than KEGG alone and equivalent to PubChem while returning far fewer candidates per spectra than PubChem (46 vs. 1715 median candidates). Application of MINEs to LC-MS accurate mass data enabled the identity of an unknown peak to be confidently predicted. MINE databases are freely accessible for non-commercial use via user-friendly web-tools at http://minedatabase.mcs.anl.gov and developer-friendly APIs. MINEs improve metabolomics peak identification as compared to general chemical databases whose results include irrelevant synthetic compounds. Furthermore, MINEs complement and expand on previous in silico generated compound databases that focus on human metabolism. We are actively developing the database; future versions of this resource will incorporate transformation rules for spontaneous chemical reactions and more advanced filtering and prioritization of candidate structures. Graphical abstractMINE database construction and access methods. The process of constructing a MINE database from the curated source databases is depicted on the left. The methods for accessing the database are shown on the right.

  16. Distributed structure-searchable toxicity (DSSTox) public database network: a proposal.

    PubMed

    Richard, Ann M; Williams, ClarLynda R

    2002-01-29

    The ability to assess the potential genotoxicity, carcinogenicity, or other toxicity of pharmaceutical or industrial chemicals based on chemical structure information is a highly coveted and shared goal of varied academic, commercial, and government regulatory groups. These diverse interests often employ different approaches and have different criteria and use for toxicity assessments, but they share a need for unrestricted access to existing public toxicity data linked with chemical structure information. Currently, there exists no central repository of toxicity information, commercial or public, that adequately meets the data requirements for flexible analogue searching, Structure-Activity Relationship (SAR) model development, or building of chemical relational databases (CRD). The distributed structure-searchable toxicity (DSSTox) public database network is being proposed as a community-supported, web-based effort to address these shared needs of the SAR and toxicology communities. The DSSTox project has the following major elements: (1) to adopt and encourage the use of a common standard file format (structure data file (SDF)) for public toxicity databases that includes chemical structure, text and property information, and that can easily be imported into available CRD applications; (2) to implement a distributed source approach, managed by a DSSTox Central Website, that will enable decentralized, free public access to structure-toxicity data files, and that will effectively link knowledgeable toxicity data sources with potential users of these data from other disciplines (such as chemistry, modeling, and computer science); and (3) to engage public/commercial/academic/industry groups in contributing to and expanding this community-wide, public data sharing and distribution effort. The DSSTox project's overall aims are to effect the closer association of chemical structure information with existing toxicity data, and to promote and facilitate structure-based exploration of these data within a common chemistry-based framework that spans toxicological disciplines.

  17. How to benchmark methods for structure-based virtual screening of large compound libraries.

    PubMed

    Christofferson, Andrew J; Huang, Niu

    2012-01-01

    Structure-based virtual screening is a useful computational technique for ligand discovery. To systematically evaluate different docking approaches, it is important to have a consistent benchmarking protocol that is both relevant and unbiased. Here, we describe the designing of a benchmarking data set for docking screen assessment, a standard docking screening process, and the analysis and presentation of the enrichment of annotated ligands among a background decoy database.

  18. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA.

    PubMed

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R

    2009-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure.

  19. Covariant Evolutionary Event Analysis for Base Interaction Prediction Using a Relational Database Management System for RNA

    PubMed Central

    Xu, Weijia; Ozer, Stuart; Gutell, Robin R.

    2010-01-01

    With an increasingly large amount of sequences properly aligned, comparative sequence analysis can accurately identify not only common structures formed by standard base pairing but also new types of structural elements and constraints. However, traditional methods are too computationally expensive to perform well on large scale alignment and less effective with the sequences from diversified phylogenetic classifications. We propose a new approach that utilizes coevolutional rates among pairs of nucleotide positions using phylogenetic and evolutionary relationships of the organisms of aligned sequences. With a novel data schema to manage relevant information within a relational database, our method, implemented with a Microsoft SQL Server 2005, showed 90% sensitivity in identifying base pair interactions among 16S ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50% better sensitivity than a previous study. The results also indicated covariation signals for a few sets of cross-strand base stacking pairs in secondary structure helices, and other subtle constraints in the RNA structure. PMID:20502534

  20. A neotropical Miocene pollen database employing image-based search and semantic modeling1

    PubMed Central

    Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren

    2014-01-01

    • Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648

  1. An ab initio electronic transport database for inorganic materials.

    PubMed

    Ricci, Francesco; Chen, Wei; Aydemir, Umut; Snyder, G Jeffrey; Rignanese, Gian-Marco; Jain, Anubhav; Hautier, Geoffroy

    2017-07-04

    Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material's band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present the workflow to generate the data, the data validation procedure, and the database structure. Our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.

  2. Searching LOGIN, the Local Government Information Network.

    ERIC Educational Resources Information Center

    Jack, Robert F.

    1984-01-01

    Describes a computer-based information retrieval and electronic messaging system produced by Control Data Corporation now being used by government agencies and other organizations. Background of Local Government Information Network (LOGIN), database structure, types of LOGIN units, searching LOGIN (intersect, display, and list commands), and how…

  3. A Practical Engineering Approach to Predicting Fatigue Crack Growth in Riveted Lap Joints

    NASA Technical Reports Server (NTRS)

    Harris, Charles E.; Piascik, Robert S.; Newman, James C., Jr.

    1999-01-01

    An extensive experimental database has been assembled from very detailed teardown examinations of fatigue cracks found in rivet holes of fuselage structural components. Based on this experimental database, a comprehensive analysis methodology was developed to predict the onset of widespread fatigue damage in lap joints of fuselage structure. Several computer codes were developed with specialized capabilities to conduct the various analyses that make up the comprehensive methodology. Over the past several years, the authors have interrogated various aspects of the analysis methods to determine the degree of computational rigor required to produce numerical predictions with acceptable engineering accuracy. This study led to the formulation of a practical engineering approach to predicting fatigue crack growth in riveted lap joints. This paper describes the practical engineering approach and compares predictions with the results from several experimental studies.

  4. A Practical Engineering Approach to Predicting Fatigue Crack Growth in Riveted Lap Joints

    NASA Technical Reports Server (NTRS)

    Harris, C. E.; Piascik, R. S.; Newman, J. C., Jr.

    2000-01-01

    An extensive experimental database has been assembled from very detailed teardown examinations of fatigue cracks found in rivet holes of fuselage structural components. Based on this experimental database, a comprehensive analysis methodology was developed to predict the onset of widespread fatigue damage in lap joints of fuselage structure. Several computer codes were developed with specialized capabilities to conduct the various analyses that make up the comprehensive methodology. Over the past several years, the authors have interrogated various aspects of the analysis methods to determine the degree of computational rigor required to produce numerical predictions with acceptable engineering accuracy. This study led to the formulation of a practical engineering approach to predicting fatigue crack growth in riveted lap joints. This paper describes the practical engineering approach and compares predictions with the results from several experimental studies.

  5. Predicting Ligand Binding Sites on Protein Surfaces by 3-Dimensional Probability Density Distributions of Interacting Atoms

    PubMed Central

    Jian, Jhih-Wei; Elumalai, Pavadai; Pitti, Thejkiran; Wu, Chih Yuan; Tsai, Keng-Chang; Chang, Jeng-Yih; Peng, Hung-Pin; Yang, An-Suei

    2016-01-01

    Predicting ligand binding sites (LBSs) on protein structures, which are obtained either from experimental or computational methods, is a useful first step in functional annotation or structure-based drug design for the protein structures. In this work, the structure-based machine learning algorithm ISMBLab-LIG was developed to predict LBSs on protein surfaces with input attributes derived from the three-dimensional probability density maps of interacting atoms, which were reconstructed on the query protein surfaces and were relatively insensitive to local conformational variations of the tentative ligand binding sites. The prediction accuracy of the ISMBLab-LIG predictors is comparable to that of the best LBS predictors benchmarked on several well-established testing datasets. More importantly, the ISMBLab-LIG algorithm has substantial tolerance to the prediction uncertainties of computationally derived protein structure models. As such, the method is particularly useful for predicting LBSs not only on experimental protein structures without known LBS templates in the database but also on computationally predicted model protein structures with structural uncertainties in the tentative ligand binding sites. PMID:27513851

  6. Kentucky geotechnical database.

    DOT National Transportation Integrated Search

    2005-03-01

    Development of a comprehensive dynamic, geotechnical database is described. Computer software selected to program the client/server application in windows environment, components and structure of the geotechnical database, and primary factors cons...

  7. Network-based statistical comparison of citation topology of bibliographic databases

    PubMed Central

    Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko

    2014-01-01

    Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies. PMID:25263231

  8. A theoretical-electron-density databank using a model of real and virtual spherical atoms.

    PubMed

    Nassour, Ayoub; Domagala, Slawomir; Guillot, Benoit; Leduc, Theo; Lecomte, Claude; Jelsch, Christian

    2017-08-01

    A database describing the electron density of common chemical groups using combinations of real and virtual spherical atoms is proposed, as an alternative to the multipolar atom modelling of the molecular charge density. Theoretical structure factors were computed from periodic density functional theory calculations on 38 crystal structures of small molecules and the charge density was subsequently refined using a density model based on real spherical atoms and additional dummy charges on the covalent bonds and on electron lone-pair sites. The electron-density parameters of real and dummy atoms present in a similar chemical environment were averaged on all the molecules studied to build a database of transferable spherical atoms. Compared with the now-popular databases of transferable multipolar parameters, the spherical charge modelling needs fewer parameters to describe the molecular electron density and can be more easily incorporated in molecular modelling software for the computation of electrostatic properties. The construction method of the database is described. In order to analyse to what extent this modelling method can be used to derive meaningful molecular properties, it has been applied to the urea molecule and to biotin/streptavidin, a protein/ligand complex.

  9. Cybermaterials: materials by design and accelerated insertion of materials

    NASA Astrophysics Data System (ADS)

    Xiong, Wei; Olson, Gregory B.

    2016-02-01

    Cybermaterials innovation entails an integration of Materials by Design and accelerated insertion of materials (AIM), which transfers studio ideation into industrial manufacturing. By assembling a hierarchical architecture of integrated computational materials design (ICMD) based on materials genomic fundamental databases, the ICMD mechanistic design models accelerate innovation. We here review progress in the development of linkage models of the process-structure-property-performance paradigm, as well as related design accelerating tools. Extending the materials development capability based on phase-level structural control requires more fundamental investment at the level of the Materials Genome, with focus on improving applicable parametric design models and constructing high-quality databases. Future opportunities in materials genomic research serving both Materials by Design and AIM are addressed.

  10. Cyberinfrastructure for the Unified Study of Earth Structure and Earthquake Sources in Complex Geologic Environments

    NASA Astrophysics Data System (ADS)

    Zhao, L.; Chen, P.; Jordan, T. H.; Olsen, K. B.; Maechling, P.; Faerman, M.

    2004-12-01

    The Southern California Earthquake Center (SCEC) is developing a Community Modeling Environment (CME) to facilitate the computational pathways of physics-based seismic hazard analysis (Maechling et al., this meeting). Major goals are to facilitate the forward modeling of seismic wavefields in complex geologic environments, including the strong ground motions that cause earthquake damage, and the inversion of observed waveform data for improved models of Earth structure and fault rupture. Here we report on a unified approach to these coupled inverse problems that is based on the ability to generate and manipulate wavefields in densely gridded 3D Earth models. A main element of this approach is a database of receiver Green tensors (RGT) for the seismic stations, which comprises all of the spatial-temporal displacement fields produced by the three orthogonal unit impulsive point forces acting at each of the station locations. Once the RGT database is established, synthetic seismograms for any earthquake can be simply calculated by extracting a small, source-centered volume of the RGT from the database and applying the reciprocity principle. The partial derivatives needed for point- and finite-source inversions can be generated in the same way. Moreover, the RGT database can be employed in full-wave tomographic inversions launched from a 3D starting model, because the sensitivity (Fréchet) kernels for travel-time and amplitude anomalies observed at seismic stations in the database can be computed by convolving the earthquake-induced displacement field with the station RGTs. We illustrate all elements of this unified analysis with an RGT database for 33 stations of the California Integrated Seismic Network in and around the Los Angeles Basin, which we computed for the 3D SCEC Community Velocity Model (SCEC CVM3.0) using a fourth-order staggered-grid finite-difference code. For a spatial grid spacing of 200 m and a time resolution of 10 ms, the calculations took ~19,000 node-hours on the Linux cluster at USC's High-Performance Computing Center. The 33-station database with a volume of ~23.5 TB was archived in the SCEC digital library at the San Diego Supercomputer Center using the Storage Resource Broker (SRB). From a laptop, anyone with access to this SRB collection can compute synthetic seismograms for an arbitrary source in the CVM in a matter of minutes. Efficient approaches have been implemented to use this RGT database in the inversions of waveforms for centroid and finite moment tensors and tomographic inversions to improve the CVM. Our experience with these large problems suggests areas where the cyberinfrastructure currently available for geoscience computation needs to be improved.

  11. Inverse Band Structure Design via Materials Database Screening: Application to Square Planar Thermoelectrics

    DOE PAGES

    Isaacs, Eric B.; Wolverton, Chris

    2018-02-26

    Electronic band structure contains a wealth of information on the electronic properties of a solid and is routinely computed. However, the more difficult problem of designing a solid with a desired band structure is an outstanding challenge. In order to address this inverse band structure design problem, we devise an approach using materials database screening with materials attributes based on the constituent elements, nominal electron count, crystal structure, and thermodynamics. Our strategy is tested in the context of thermoelectric materials, for which a targeted band structure containing both flat and dispersive components with respect to crystal momentum is highly desirable.more » We screen for thermodynamically stable or metastable compounds containing d 8 transition metals coordinated by anions in a square planar geometry in order to mimic the properties of recently identified oxide thermoelectrics with such a band structure. In doing so, we identify 157 compounds out of a total of over half a million candidates. After further screening based on electronic band gap and structural anisotropy, we explicitly compute the band structures for the several of the candidates in order to validate the approach. We successfully find two new oxide systems that achieve the targeted band structure. Electronic transport calculations on these two compounds, Ba 2PdO 3 and La 4PdO 7, confirm promising thermoelectric power factor behavior for the compounds. This methodology is easily adapted to other targeted band structures and should be widely applicable to a variety of design problems.« less

  12. Inverse Band Structure Design via Materials Database Screening: Application to Square Planar Thermoelectrics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Isaacs, Eric B.; Wolverton, Chris

    Electronic band structure contains a wealth of information on the electronic properties of a solid and is routinely computed. However, the more difficult problem of designing a solid with a desired band structure is an outstanding challenge. In order to address this inverse band structure design problem, we devise an approach using materials database screening with materials attributes based on the constituent elements, nominal electron count, crystal structure, and thermodynamics. Our strategy is tested in the context of thermoelectric materials, for which a targeted band structure containing both flat and dispersive components with respect to crystal momentum is highly desirable.more » We screen for thermodynamically stable or metastable compounds containing d 8 transition metals coordinated by anions in a square planar geometry in order to mimic the properties of recently identified oxide thermoelectrics with such a band structure. In doing so, we identify 157 compounds out of a total of over half a million candidates. After further screening based on electronic band gap and structural anisotropy, we explicitly compute the band structures for the several of the candidates in order to validate the approach. We successfully find two new oxide systems that achieve the targeted band structure. Electronic transport calculations on these two compounds, Ba 2PdO 3 and La 4PdO 7, confirm promising thermoelectric power factor behavior for the compounds. This methodology is easily adapted to other targeted band structures and should be widely applicable to a variety of design problems.« less

  13. 3D-SURFER 2.0: web platform for real-time search and characterization of protein surfaces.

    PubMed

    Xiong, Yi; Esquivel-Rodriguez, Juan; Sael, Lee; Kihara, Daisuke

    2014-01-01

    The increasing number of uncharacterized protein structures necessitates the development of computational approaches for function annotation using the protein tertiary structures. Protein structure database search is the basis of any structure-based functional elucidation of proteins. 3D-SURFER is a web platform for real-time protein surface comparison of a given protein structure against the entire PDB using 3D Zernike descriptors. It can smoothly navigate the protein structure space in real-time from one query structure to another. A major new feature of Release 2.0 is the ability to compare the protein surface of a single chain, a single domain, or a single complex against databases of protein chains, domains, complexes, or a combination of all three in the latest PDB. Additionally, two types of protein structures can now be compared: all-atom-surface and backbone-atom-surface. The server can also accept a batch job for a large number of database searches. Pockets in protein surfaces can be identified by VisGrid and LIGSITE (csc) . The server is available at http://kiharalab.org/3d-surfer/.

  14. Projections for fast protein structure retrieval

    PubMed Central

    Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R

    2006-01-01

    Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310

  15. Definitions of database files and fields of the Personal Computer-Based Water Data Sources Directory

    USGS Publications Warehouse

    Green, J. Wayne

    1991-01-01

    This report describes the data-base files and fields of the personal computer-based Water Data Sources Directory (WDSD). The personal computer-based WDSD was derived from the U.S. Geological Survey (USGS) mainframe computer version. The mainframe version of the WDSD is a hierarchical data-base design. The personal computer-based WDSD is a relational data- base design. This report describes the data-base files and fields of the relational data-base design in dBASE IV (the use of brand names in this abstract is for identification purposes only and does not constitute endorsement by the U.S. Geological Survey) for the personal computer. The WDSD contains information on (1) the type of organization, (2) the major orientation of water-data activities conducted by each organization, (3) the names, addresses, and telephone numbers of offices within each organization from which water data may be obtained, (4) the types of data held by each organization and the geographic locations within which these data have been collected, (5) alternative sources of an organization's data, (6) the designation of liaison personnel in matters related to water-data acquisition and indexing, (7) the volume of water data indexed for the organization, and (8) information about other types of data and services available from the organization that are pertinent to water-resources activities.

  16. Pharmacophore modeling, docking, and principal component analysis based clustering: combined computer-assisted approaches to identify new inhibitors of the human rhinovirus coat protein.

    PubMed

    Steindl, Theodora M; Crump, Carolyn E; Hayden, Frederick G; Langer, Thierry

    2005-10-06

    The development and application of a sophisticated virtual screening and selection protocol to identify potential, novel inhibitors of the human rhinovirus coat protein employing various computer-assisted strategies are described. A large commercially available database of compounds was screened using a highly selective, structure-based pharmacophore model generated with the program Catalyst. A docking study and a principal component analysis were carried out within the software package Cerius and served to validate and further refine the obtained results. These combined efforts led to the selection of six candidate structures, for which in vitro anti-rhinoviral activity could be shown in a biological assay.

  17. Searching molecular structure databases with tandem mass spectra using CSI:FingerID

    PubMed Central

    Dührkop, Kai; Shen, Huibin; Meusel, Marvin; Rousu, Juho; Böcker, Sebastian

    2015-01-01

    Metabolites provide a direct functional signature of cellular state. Untargeted metabolomics experiments usually rely on tandem MS to identify the thousands of compounds in a biological sample. Today, the vast majority of metabolites remain unknown. We present a method for searching molecular structure databases using tandem MS data of small molecules. Our method computes a fragmentation tree that best explains the fragmentation spectrum of an unknown molecule. We use the fragmentation tree to predict the molecular structure fingerprint of the unknown compound using machine learning. This fingerprint is then used to search a molecular structure database such as PubChem. Our method is shown to improve on the competing methods for computational metabolite identification by a considerable margin. PMID:26392543

  18. An ab initio electronic transport database for inorganic materials

    DOE PAGES

    Ricci, Francesco; Chen, Wei; Aydemir, Umut; ...

    2017-07-04

    Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less

  19. An ab initio electronic transport database for inorganic materials

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ricci, Francesco; Chen, Wei; Aydemir, Umut

    Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less

  20. CFD in the 1980's from one point of view

    NASA Technical Reports Server (NTRS)

    Lomax, Harvard

    1991-01-01

    The present interpretive treatment of the development history of CFD in the 1980s gives attention to advancements in such algorithmic techniques as flux Jacobian-based upwind differencing, total variation-diminishing and essentially nonoscillatory schemes, multigrid methods, unstructured grids, and nonrectangular structured grids. At the same time, computational turbulence research gave attention to turbulence modeling on the bases of increasingly powerful supercomputers and meticulously constructed databases. The major future developments in CFD will encompass such capabilities as structured and unstructured three-dimensional grids.

  1. Evaluation of the performance of MP4-based procedures for a wide range of thermochemical and kinetic properties

    NASA Astrophysics Data System (ADS)

    Yu, Li-Juan; Wan, Wenchao; Karton, Amir

    2016-11-01

    We evaluate the performance of standard and modified MPn procedures for a wide set of thermochemical and kinetic properties, including atomization energies, structural isomerization energies, conformational energies, and reaction barrier heights. The reference data are obtained at the CCSD(T)/CBS level by means of the Wn thermochemical protocols. We find that none of the MPn-based procedures show acceptable performance for the challenging W4-11 and BH76 databases. For the other thermochemical/kinetic databases, the MP2.5 and MP3.5 procedures provide the most attractive accuracy-to-computational cost ratios. The MP2.5 procedure results in a weighted-total-root-mean-square deviation (WTRMSD) of 3.4 kJ/mol, whilst the computationally more expensive MP3.5 procedure results in a WTRMSD of 1.9 kJ/mol (the same WTRMSD obtained for the CCSD(T) method in conjunction with a triple-zeta basis set). We also assess the performance of the computationally economical CCSD(T)/CBS(MP2) method, which provides the best overall performance for all the considered databases, including W4-11 and BH76.

  2. [Construction of chemical information database based on optical structure recognition technique].

    PubMed

    Lv, C Y; Li, M N; Zhang, L R; Liu, Z M

    2018-04-18

    To create a protocol that could be used to construct chemical information database from scientific literature quickly and automatically. Scientific literature, patents and technical reports from different chemical disciplines were collected and stored in PDF format as fundamental datasets. Chemical structures were transformed from published documents and images to machine-readable data by using the name conversion technology and optical structure recognition tool CLiDE. In the process of molecular structure information extraction, Markush structures were enumerated into well-defined monomer molecules by means of QueryTools in molecule editor ChemDraw. Document management software EndNote X8 was applied to acquire bibliographical references involving title, author, journal and year of publication. Text mining toolkit ChemDataExtractor was adopted to retrieve information that could be used to populate structured chemical database from figures, tables, and textual paragraphs. After this step, detailed manual revision and annotation were conducted in order to ensure the accuracy and completeness of the data. In addition to the literature data, computing simulation platform Pipeline Pilot 7.5 was utilized to calculate the physical and chemical properties and predict molecular attributes. Furthermore, open database ChEMBL was linked to fetch known bioactivities, such as indications and targets. After information extraction and data expansion, five separate metadata files were generated, including molecular structure data file, molecular information, bibliographical references, predictable attributes and known bioactivities. Canonical simplified molecular input line entry specification as primary key, metadata files were associated through common key nodes including molecular number and PDF number to construct an integrated chemical information database. A reasonable construction protocol of chemical information database was created successfully. A total of 174 research articles and 25 reviews published in Marine Drugs from January 2015 to June 2016 collected as essential data source, and an elementary marine natural product database named PKU-MNPD was built in accordance with this protocol, which contained 3 262 molecules and 19 821 records. This data aggregation protocol is of great help for the chemical information database construction in accuracy, comprehensiveness and efficiency based on original documents. The structured chemical information database can facilitate the access to medical intelligence and accelerate the transformation of scientific research achievements.

  3. BESIU Physical Analysis on Hadoop Platform

    NASA Astrophysics Data System (ADS)

    Huo, Jing; Zang, Dongsong; Lei, Xiaofeng; Li, Qiang; Sun, Gongxing

    2014-06-01

    In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure.

  4. A flexible computer aid for conceptual design based on constraint propagation and component-modeling. [of aircraft in three dimensions

    NASA Technical Reports Server (NTRS)

    Kolb, Mark A.

    1988-01-01

    The Rubber Airplane program, which combines two symbolic processing techniques with a component-based database of design knowledge, is proposed as a computer aid for conceptual design. Using object-oriented programming, programs are organized around the objects and behavior to be simulated, and using constraint propagation, declarative statements designate mathematical relationships among all the equation variables. It is found that the additional level of organizational structure resulting from the arrangement of the design information in terms of design components provides greater flexibility and convenience.

  5. FlavonoidSearch: A system for comprehensive flavonoid annotation by mass spectrometry.

    PubMed

    Akimoto, Nayumi; Ara, Takeshi; Nakajima, Daisuke; Suda, Kunihiro; Ikeda, Chiaki; Takahashi, Shingo; Muneto, Reiko; Yamada, Manabu; Suzuki, Hideyuki; Shibata, Daisuke; Sakurai, Nozomu

    2017-04-28

    Currently, in mass spectrometry-based metabolomics, limited reference mass spectra are available for flavonoid identification. In the present study, a database of probable mass fragments for 6,867 known flavonoids (FsDatabase) was manually constructed based on new structure- and fragmentation-related rules using new heuristics to overcome flavonoid complexity. We developed the FlavonoidSearch system for flavonoid annotation, which consists of the FsDatabase and a computational tool (FsTool) to automatically search the FsDatabase using the mass spectra of metabolite peaks as queries. This system showed the highest identification accuracy for the flavonoid aglycone when compared to existing tools and revealed accurate discrimination between the flavonoid aglycone and other compounds. Sixteen new flavonoids were found from parsley, and the diversity of the flavonoid aglycone among different fruits and vegetables was investigated.

  6. Fast 3D shape screening of large chemical databases through alignment-recycling

    PubMed Central

    Fontaine, Fabien; Bolton, Evan; Borodina, Yulia; Bryant, Stephen H

    2007-01-01

    Background Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. Results Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. Conclusion Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. PMID:17880744

  7. A database perspective of the transition from single-use (ancillary-based) systems to integrated models supporting clinical care and research in a MUMPS-based system.

    PubMed

    Siegel, J; Kirkland, D

    1991-01-01

    The Composite Health Care System (CHCS), a MUMPS-based hospital information system (HIS), has evolved from the Decentralized Hospital Computer Program (DHCP) installed within VA Hospitals. The authors explore the evolution of an ancillary-based system toward an integrated model with a look at its current state and possible future. The history and relationships between orders of different types tie specific patient-related data into a logical and temporal model. Diagrams demonstrate how the database structure has evolved to support clinical needs for integration. It is suggested that a fully integrated model is capable of meeting traditional HIS needs.

  8. GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

    PubMed

    Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

    2006-03-31

    Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.

  9. [Adverse Effect Predictions Based on Computational Toxicology Techniques and Large-scale Databases].

    PubMed

    Uesawa, Yoshihiro

    2018-01-01

     Understanding the features of chemical structures related to the adverse effects of drugs is useful for identifying potential adverse effects of new drugs. This can be based on the limited information available from post-marketing surveillance, assessment of the potential toxicities of metabolites and illegal drugs with unclear characteristics, screening of lead compounds at the drug discovery stage, and identification of leads for the discovery of new pharmacological mechanisms. This present paper describes techniques used in computational toxicology to investigate the content of large-scale spontaneous report databases of adverse effects, and it is illustrated with examples. Furthermore, volcano plotting, a new visualization method for clarifying the relationships between drugs and adverse effects via comprehensive analyses, will be introduced. These analyses may produce a great amount of data that can be applied to drug repositioning.

  10. Application of kernel functions for accurate similarity search in large chemical databases.

    PubMed

    Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H

    2010-04-29

    Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.

  11. Chemical graphs, molecular matrices and topological indices in chemoinformatics and quantitative structure-activity relationships.

    PubMed

    Ivanciuc, Ovidiu

    2013-06-01

    Chemical and molecular graphs have fundamental applications in chemoinformatics, quantitative structureproperty relationships (QSPR), quantitative structure-activity relationships (QSAR), virtual screening of chemical libraries, and computational drug design. Chemoinformatics applications of graphs include chemical structure representation and coding, database search and retrieval, and physicochemical property prediction. QSPR, QSAR and virtual screening are based on the structure-property principle, which states that the physicochemical and biological properties of chemical compounds can be predicted from their chemical structure. Such structure-property correlations are usually developed from topological indices and fingerprints computed from the molecular graph and from molecular descriptors computed from the three-dimensional chemical structure. We present here a selection of the most important graph descriptors and topological indices, including molecular matrices, graph spectra, spectral moments, graph polynomials, and vertex topological indices. These graph descriptors are used to define several topological indices based on molecular connectivity, graph distance, reciprocal distance, distance-degree, distance-valency, spectra, polynomials, and information theory concepts. The molecular descriptors and topological indices can be developed with a more general approach, based on molecular graph operators, which define a family of graph indices related by a common formula. Graph descriptors and topological indices for molecules containing heteroatoms and multiple bonds are computed with weighting schemes based on atomic properties, such as the atomic number, covalent radius, or electronegativity. The correlation in QSPR and QSAR models can be improved by optimizing some parameters in the formula of topological indices, as demonstrated for structural descriptors based on atomic connectivity and graph distance.

  12. Big data and high-performance analytics in structural health monitoring for bridge management

    NASA Astrophysics Data System (ADS)

    Alampalli, Sharada; Alampalli, Sandeep; Ettouney, Mohammed

    2016-04-01

    Structural Health Monitoring (SHM) can be a vital tool for effective bridge management. Combining large data sets from multiple sources to create a data-driven decision-making framework is crucial for the success of SHM. This paper presents a big data analytics framework that combines multiple data sets correlated with functional relatedness to convert data into actionable information that empowers risk-based decision-making. The integrated data environment incorporates near real-time streams of semi-structured data from remote sensors, historical visual inspection data, and observations from structural analysis models to monitor, assess, and manage risks associated with the aging bridge inventories. Accelerated processing of dataset is made possible by four technologies: cloud computing, relational database processing, support from NOSQL database, and in-memory analytics. The framework is being validated on a railroad corridor that can be subjected to multiple hazards. The framework enables to compute reliability indices for critical bridge components and individual bridge spans. In addition, framework includes a risk-based decision-making process that enumerate costs and consequences of poor bridge performance at span- and network-levels when rail networks are exposed to natural hazard events such as floods and earthquakes. Big data and high-performance analytics enable insights to assist bridge owners to address problems faster.

  13. Analysing and Rationalising Molecular and Materials Databases Using Machine-Learning

    NASA Astrophysics Data System (ADS)

    de, Sandip; Ceriotti, Michele

    Computational materials design promises to greatly accelerate the process of discovering new or more performant materials. Several collaborative efforts are contributing to this goal by building databases of structures, containing between thousands and millions of distinct hypothetical compounds, whose properties are computed by high-throughput electronic-structure calculations. The complexity and sheer amount of information has made manual exploration, interpretation and maintenance of these databases a formidable challenge, making it necessary to resort to automatic analysis tools. Here we will demonstrate how, starting from a measure of (dis)similarity between database items built from a combination of local environment descriptors, it is possible to apply hierarchical clustering algorithms, as well as dimensionality reduction methods such as sketchmap, to analyse, classify and interpret trends in molecular and materials databases, as well as to detect inconsistencies and errors. Thanks to the agnostic and flexible nature of the underlying metric, we will show how our framework can be applied transparently to different kinds of systems ranging from organic molecules and oligopeptides to inorganic crystal structures as well as molecular crystals. Funded by National Center for Computational Design and Discovery of Novel Materials (MARVEL) and Swiss National Science Foundation.

  14. Fingerprint-Based Structure Retrieval Using Electron Density

    PubMed Central

    Yin, Shuangye; Dokholyan, Nikolay V.

    2010-01-01

    We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy. PMID:21287628

  15. Fingerprint-based structure retrieval using electron density.

    PubMed

    Yin, Shuangye; Dokholyan, Nikolay V

    2011-03-01

    We present a computational approach that can quickly search a large protein structural database to identify structures that fit a given electron density, such as determined by cryo-electron microscopy. We use geometric invariants (fingerprints) constructed using 3D Zernike moments to describe the electron density, and reduce the problem of fitting of the structure to the electron density to simple fingerprint comparison. Using this approach, we are able to screen the entire Protein Data Bank and identify structures that fit two experimental electron densities determined by cryo-electron microscopy. Copyright © 2010 Wiley-Liss, Inc.

  16. Innovative computer-aided methods for the discovery of new kinase ligands.

    PubMed

    Abuhammad, Areej; Taha, Mutasem

    2016-04-01

    Recent evidence points to significant roles played by protein kinases in cell signaling and cellular proliferation. Faulty protein kinases are involved in cancer, diabetes and chronic inflammation. Efforts are continuously carried out to discover new inhibitors for selected protein kinases. In this review, we discuss two new computer-aided methodologies we developed to mine virtual databases for new bioactive compounds. One method is ligand-based exploration of the pharmacophoric space of inhibitors of any particular biotarget followed by quantitative structure-activity relationship-based selection of the best pharmacophore(s). The second approach is structure-based assuming that potent ligands come into contact with binding site spots distinct from those contacted by weakly potent ligands. Both approaches yield pharmacophores useful as 3D search queries for the discovery of new bioactive (kinase) inhibitors.

  17. The application of SSADM to modelling the logical structure of proteins.

    PubMed

    Saldanha, J; Eccles, J

    1991-10-01

    A logical design that describes the overall structure of proteins, together with a more detailed design describing secondary and some supersecondary structures, has been constructed using the computer-aided software engineering (CASE) tool, Auto-mate. Auto-mate embodies the philosophy of the Structured Systems Analysis and Design Method (SSADM) which enables the logical design of computer systems. Our design will facilitate the building of large information systems, such as databases and knowledgebases in the field of protein structure, by the derivation of system requirements from our logical model prior to producing the final physical system. In addition, the study has highlighted the ease of employing SSADM as a formalism in which to conduct the transferral of concepts from an expert into a design for a knowledge-based system that can be implemented on a computer (the knowledge-engineering exercise). It has been demonstrated how SSADM techniques may be extended for the purpose of modelling the constituent Prolog rules. This facilitates the integration of the logical system design model with the derived knowledge-based system.

  18. UniGene Tabulator: a full parser for the UniGene format.

    PubMed

    Lenzi, Luca; Frabetti, Flavia; Facchin, Federica; Casadei, Raffaella; Vitale, Lorenza; Canaider, Silvia; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi

    2006-10-15

    UniGene Tabulator 1.0 provides a solution for full parsing of UniGene flat file format; it implements a structured graphical representation of each data field present in UniGene following import into a common database managing system usable in a personal computer. This database includes related tables for sequence, protein similarity, sequence-tagged site (STS) and transcript map interval (TXMAP) data, plus a summary table where each record represents a UniGene cluster. UniGene Tabulator enables full local management of UniGene data, allowing parsing, querying, indexing, retrieving, exporting and analysis of UniGene data in a relational database form, usable on Macintosh (OS X 10.3.9 or later) and Windows (2000, with service pack 4, XP, with service pack 2 or later) operating systems-based computers. The current release, including both the FileMaker runtime applications, is freely available at http://apollo11.isto.unibo.it/software/

  19. EKPD: a hierarchical database of eukaryotic protein kinases and protein phosphatases.

    PubMed

    Wang, Yongbo; Liu, Zexian; Cheng, Han; Gao, Tianshun; Pan, Zhicheng; Yang, Qing; Guo, Anyuan; Xue, Yu

    2014-01-01

    We present here EKPD (http://ekpd.biocuckoo.org), a hierarchical database of eukaryotic protein kinases (PKs) and protein phosphatases (PPs), the key molecules responsible for the reversible phosphorylation of proteins that are involved in almost all aspects of biological processes. As extensive experimental and computational efforts have been carried out to identify PKs and PPs, an integrative resource with detailed classification and annotation information would be of great value for both experimentalists and computational biologists. In this work, we first collected 1855 PKs and 347 PPs from the scientific literature and various public databases. Based on previously established rationales, we classified all of the known PKs and PPs into a hierarchical structure with three levels, i.e. group, family and individual PK/PP. There are 10 groups with 149 families for the PKs and 10 groups with 33 families for the PPs. We constructed 139 and 27 Hidden Markov Model profiles for PK and PP families, respectively. Then we systematically characterized ∼50,000 PKs and >10,000 PPs in eukaryotes. In addition, >500 PKs and >400 PPs were computationally identified by ortholog search. Finally, the online service of the EKPD database was implemented in PHP + MySQL + JavaScript.

  20. Partitioning medical image databases for content-based queries on a Grid.

    PubMed

    Montagnat, J; Breton, V; E Magnin, I

    2005-01-01

    In this paper we study the impact of executing a medical image database query application on the grid. For lowering the total computation time, the image database is partitioned into subsets to be processed on different grid nodes. A theoretical model of the application complexity and estimates of the grid execution overhead are used to efficiently partition the database. We show results demonstrating that smart partitioning of the database can lead to significant improvements in terms of total computation time. Grids are promising for content-based image retrieval in medical databases.

  1. COMET-AR User's Manual: COmputational MEchanics Testbed with Adaptive Refinement

    NASA Technical Reports Server (NTRS)

    Moas, E. (Editor)

    1997-01-01

    The COMET-AR User's Manual provides a reference manual for the Computational Structural Mechanics Testbed with Adaptive Refinement (COMET-AR), a software system developed jointly by Lockheed Palo Alto Research Laboratory and NASA Langley Research Center under contract NAS1-18444. The COMET-AR system is an extended version of an earlier finite element based structural analysis system called COMET, also developed by Lockheed and NASA. The primary extensions are the adaptive mesh refinement capabilities and a new "object-like" database interface that makes COMET-AR easier to extend further. This User's Manual provides a detailed description of the user interface to COMET-AR from the viewpoint of a structural analyst.

  2. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed Central

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database. PMID:8653451

  3. Development of a replicated database of DHCP data for evaluation of drug use.

    PubMed

    Graber, S E; Seneker, J A; Stahl, A A; Franklin, K O; Neel, T E; Miller, R A

    1996-01-01

    This case report describes development and testing of a method to extract clinical information stored in the Veterans Affairs (VA) Decentralized Hospital Computer System (DHCP) for the purpose of analyzing data about groups of patients. The authors used a microcomputer-based, structured query language (SQL)-compatible, relational database system to replicate a subset of the Nashville VA Hospital's DHCP patient database. This replicated database contained the complete current Nashville DHCP prescription, provider, patient, and drug data sets, and a subset of the laboratory data. A pilot project employed this replicated database to answer questions that might arise in drug-use evaluation, such as identification of cases of polypharmacy, suboptimal drug regimens, and inadequate laboratory monitoring of drug therapy. These database queries included as candidates for review all prescriptions for all outpatients. The queries demonstrated that specific drug-use events could be identified for any time interval represented in the replicated database.

  4. Randomized Approaches for Nearest Neighbor Search in Metric Space When Computing the Pairwise Distance Is Extremely Expensive

    NASA Astrophysics Data System (ADS)

    Wang, Lusheng; Yang, Yong; Lin, Guohui

    Finding the closest object for a query in a database is a classical problem in computer science. For some modern biological applications, computing the similarity between two objects might be very time consuming. For example, it takes a long time to compute the edit distance between two whole chromosomes and the alignment cost of two 3D protein structures. In this paper, we study the nearest neighbor search problem in metric space, where the pair-wise distance between two objects in the database is known and we want to minimize the number of distances computed on-line between the query and objects in the database in order to find the closest object. We have designed two randomized approaches for indexing metric space databases, where objects are purely described by their distances with each other. Analysis and experiments show that our approaches only need to compute O(logn) objects in order to find the closest object, where n is the total number of objects in the database.

  5. Research in Structures and Dynamics, 1984

    NASA Technical Reports Server (NTRS)

    Hayduk, R. J. (Compiler); Noor, A. K. (Compiler)

    1984-01-01

    A symposium on advanced and trends in structures and dynamics was held to communicate new insights into physical behavior and to identify trends in the solution procedures for structures and dynamics problems. Pertinent areas of concern were (1) multiprocessors, parallel computation, and database management systems, (2) advances in finite element technology, (3) interactive computing and optimization, (4) mechanics of materials, (5) structural stability, (6) dynamic response of structures, and (7) advanced computer applications.

  6. Fast vessel segmentation in retinal images using multi-scale enhancement and second-order local entropy

    NASA Astrophysics Data System (ADS)

    Yu, H.; Barriga, S.; Agurto, C.; Zamora, G.; Bauman, W.; Soliz, P.

    2012-03-01

    Retinal vasculature is one of the most important anatomical structures in digital retinal photographs. Accurate segmentation of retinal blood vessels is an essential task in automated analysis of retinopathy. This paper presents a new and effective vessel segmentation algorithm that features computational simplicity and fast implementation. This method uses morphological pre-processing to decrease the disturbance of bright structures and lesions before vessel extraction. Next, a vessel probability map is generated by computing the eigenvalues of the second derivatives of Gaussian filtered image at multiple scales. Then, the second order local entropy thresholding is applied to segment the vessel map. Lastly, a rule-based decision step, which measures the geometric shape difference between vessels and lesions is applied to reduce false positives. The algorithm is evaluated on the low-resolution DRIVE and STARE databases and the publicly available high-resolution image database from Friedrich-Alexander University Erlangen-Nuremberg, Germany). The proposed method achieved comparable performance to state of the art unsupervised vessel segmentation methods with a competitive faster speed on the DRIVE and STARE databases. For the high resolution fundus image database, the proposed algorithm outperforms an existing approach both on performance and speed. The efficiency and robustness make the blood vessel segmentation method described here suitable for broad application in automated analysis of retinal images.

  7. A New Adaptive Structural Signature for Symbol Recognition by Using a Galois Lattice as a Classifier.

    PubMed

    Coustaty, M; Bertet, K; Visani, M; Ogier, J

    2011-08-01

    In this paper, we propose a new approach for symbol recognition using structural signatures and a Galois lattice as a classifier. The structural signatures are based on topological graphs computed from segments which are extracted from the symbol images by using an adapted Hough transform. These structural signatures-that can be seen as dynamic paths which carry high-level information-are robust toward various transformations. They are classified by using a Galois lattice as a classifier. The performance of the proposed approach is evaluated based on the GREC'03 symbol database, and the experimental results we obtain are encouraging.

  8. Large-scale extraction of accurate drug-disease treatment pairs from biomedical literature for drug repurposing

    PubMed Central

    2013-01-01

    Background A large-scale, highly accurate, machine-understandable drug-disease treatment relationship knowledge base is important for computational approaches to drug repurposing. The large body of published biomedical research articles and clinical case reports available on MEDLINE is a rich source of FDA-approved drug-disease indication as well as drug-repurposing knowledge that is crucial for applying FDA-approved drugs for new diseases. However, much of this information is buried in free text and not captured in any existing databases. The goal of this study is to extract a large number of accurate drug-disease treatment pairs from published literature. Results In this study, we developed a simple but highly accurate pattern-learning approach to extract treatment-specific drug-disease pairs from 20 million biomedical abstracts available on MEDLINE. We extracted a total of 34,305 unique drug-disease treatment pairs, the majority of which are not included in existing structured databases. Our algorithm achieved a precision of 0.904 and a recall of 0.131 in extracting all pairs, and a precision of 0.904 and a recall of 0.842 in extracting frequent pairs. In addition, we have shown that the extracted pairs strongly correlate with both drug target genes and therapeutic classes, therefore may have high potential in drug discovery. Conclusions We demonstrated that our simple pattern-learning relationship extraction algorithm is able to accurately extract many drug-disease pairs from the free text of biomedical literature that are not captured in structured databases. The large-scale, accurate, machine-understandable drug-disease treatment knowledge base that is resultant of our study, in combination with pairs from structured databases, will have high potential in computational drug repurposing tasks. PMID:23742147

  9. High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan

    2010-10-04

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less

  10. Design sensitivity analysis using EAL. Part 1: Conventional design parameters

    NASA Technical Reports Server (NTRS)

    Dopker, B.; Choi, Kyung K.; Lee, J.

    1986-01-01

    A numerical implementation of design sensitivity analysis of builtup structures is presented, using the versatility and convenience of an existing finite element structural analysis code and its database management system. The finite element code used in the implemenatation presented is the Engineering Analysis Language (EAL), which is based on a hybrid method of analysis. It was shown that design sensitivity computations can be carried out using the database management system of EAL, without writing a separate program and a separate database. Conventional (sizing) design parameters such as cross-sectional area of beams or thickness of plates and plane elastic solid components are considered. Compliance, displacement, and stress functionals are considered as performance criteria. The method presented is being extended to implement shape design sensitivity analysis using a domain method and a design component method.

  11. A resource for benchmarking the usefulness of protein structure models.

    PubMed

    Carbajo, Daniel; Tramontano, Anna

    2012-08-02

    Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.

  12. Gradual cut detection using low-level vision for digital video

    NASA Astrophysics Data System (ADS)

    Lee, Jae-Hyun; Choi, Yeun-Sung; Jang, Ok-bae

    1996-09-01

    Digital video computing and organization is one of the important issues in multimedia system, signal compression, or database. Video should be segmented into shots to be used for identification and indexing. This approach requires a suitable method to automatically locate cut points in order to separate shot in a video. Automatic cut detection to isolate shots in a video has received considerable attention due to many practical applications; our video database, browsing, authoring system, retrieval and movie. Previous studies are based on a set of difference mechanisms and they measured the content changes between video frames. But they could not detect more special effects which include dissolve, wipe, fade-in, fade-out, and structured flashing. In this paper, a new cut detection method for gradual transition based on computer vision techniques is proposed. And then, experimental results applied to commercial video are presented and evaluated.

  13. Towards computational improvement of DNA database indexing and short DNA query searching.

    PubMed

    Stojanov, Done; Koceski, Sašo; Mileva, Aleksandra; Koceska, Nataša; Bande, Cveta Martinovska

    2014-09-03

    In order to facilitate and speed up the search of massive DNA databases, the database is indexed at the beginning, employing a mapping function. By searching through the indexed data structure, exact query hits can be identified. If the database is searched against an annotated DNA query, such as a known promoter consensus sequence, then the starting locations and the number of potential genes can be determined. This is particularly relevant if unannotated DNA sequences have to be functionally annotated. However, indexing a massive DNA database and searching an indexed data structure with millions of entries is a time-demanding process. In this paper, we propose a fast DNA database indexing and searching approach, identifying all query hits in the database, without having to examine all entries in the indexed data structure, limiting the maximum length of a query that can be searched against the database. By applying the proposed indexing equation, the whole human genome could be indexed in 10 hours on a personal computer, under the assumption that there is enough RAM to store the indexed data structure. Analysing the methodology proposed by Reneker, we observed that hits at starting positions [Formula: see text] are not reported, if the database is searched against a query shorter than [Formula: see text] nucleotides, such that [Formula: see text] is the length of the DNA database words being mapped and [Formula: see text] is the length of the query. A solution of this drawback is also presented.

  14. AUTOCASK (AUTOmatic Generation of 3-D CASK models). A microcomputer based system for shipping cask design review analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gerhard, M.A.; Sommer, S.C.

    1995-04-01

    AUTOCASK (AUTOmatic Generation of 3-D CASK models) is a microcomputer-based system of computer programs and databases developed at the Lawrence Livermore National Laboratory (LLNL) for the structural analysis of shipping casks for radioactive material. Model specification is performed on the microcomputer, and the analyses are performed on an engineering workstation or mainframe computer. AUTOCASK is based on 80386/80486 compatible microcomputers. The system is composed of a series of menus, input programs, display programs, a mesh generation program, and archive programs. All data is entered through fill-in-the-blank input screens that contain descriptive data requests.

  15. ACToR Chemical Structure processing using Open Source ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d

  16. Development of computer informational system of diagnostics integrated optical materials, elements, and devices

    NASA Astrophysics Data System (ADS)

    Volosovitch, Anatoly E.; Konopaltseva, Lyudmila I.

    1995-11-01

    Well-known methods of optical diagnostics, database for their storage, as well as expert system (ES) for their development are analyzed. A computer informational system is developed, which is based on a hybrid ES built on modern DBMS. As an example, the structural and constructive circuits of the hybrid integrated-optical devices based on laser diodes, diffusion waveguides, geodetic lenses, package-free linear photodiode arrays, etc. are presented. The features of methods and test results as well as the advanced directions of works related to the hybrid integrated-optical devices in the field of metrology are discussed.

  17. An Incremental High-Utility Mining Algorithm with Transaction Insertion

    PubMed Central

    Gan, Wensheng; Zhang, Binbin

    2015-01-01

    Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038

  18. Understanding the Effects of Databases as Cognitive Tools in a Problem-Based Multimedia Learning Environment

    ERIC Educational Resources Information Center

    Li, Rui; Liu, Min

    2007-01-01

    The purpose of this study is to examine the potential of using computer databases as cognitive tools to share learners' cognitive load and facilitate learning in a multimedia problem-based learning (PBL) environment designed for sixth graders. Two research questions were: (a) can the computer database tool share sixth-graders' cognitive load? and…

  19. High performance semantic factoring of giga-scale semantic graph databases.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    al-Saffar, Sinan; Adolf, Bob; Haglin, David

    2010-10-01

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less

  20. High Performance Descriptive Semantic Analysis of Semantic Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less

  1. DIMA 3.0: Domain Interaction Map.

    PubMed

    Luo, Qibin; Pagel, Philipp; Vilne, Baiba; Frishman, Dmitrij

    2011-01-01

    Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46,900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.

  2. Identification of Transthyretin Fibril Formation Inhibitors Using Structure-Based Virtual Screening.

    PubMed

    Ortore, Gabriella; Martinelli, Adriano

    2017-08-22

    Transthyretin (TTR) is the primary carrier for thyroxine (T 4 ) in cerebrospinal fluid and a secondary carrier in blood. TTR is a stable homotetramer, but certain factors, genetic or environmental, could promote its degradation to form amyloid fibrils. A docking study using crystal structures of wild-type TTR was planned; our aim was to design new ligands that are able to inhibit TTR fibril formation. The computational protocol was thought to overcome the multiple binding modes of the ligands induced by the peculiarity of the TTR binding site and by the pseudosymmetry of the site pockets, which generally weaken such structure-based studies. Two docking steps, one that is very fast and a subsequent step that is more accurate, were used to screen the Aldrich Market Select database. Five compounds were selected, and their activity toward inhibiting TTR fibril formation was assessed. Three compounds were observed to be actives, two of which have the same potency as the positive control, and the other was found to be a promising lead compound. These results validate a computational protocol that is able to archive information on the key interactions between database compounds and TTR, which is valuable for supporting further studies. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Policy Gradient Adaptive Dynamic Programming for Data-Based Optimal Control.

    PubMed

    Luo, Biao; Liu, Derong; Wu, Huai-Ning; Wang, Ding; Lewis, Frank L

    2017-10-01

    The model-free optimal control problem of general discrete-time nonlinear systems is considered in this paper, and a data-based policy gradient adaptive dynamic programming (PGADP) algorithm is developed to design an adaptive optimal controller method. By using offline and online data rather than the mathematical system model, the PGADP algorithm improves control policy with a gradient descent scheme. The convergence of the PGADP algorithm is proved by demonstrating that the constructed Q -function sequence converges to the optimal Q -function. Based on the PGADP algorithm, the adaptive control method is developed with an actor-critic structure and the method of weighted residuals. Its convergence properties are analyzed, where the approximate Q -function converges to its optimum. Computer simulation results demonstrate the effectiveness of the PGADP-based adaptive control method.

  4. Shuttle-Data-Tape XML Translator

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2005-01-01

    JSDTImport is a computer program for translating native Shuttle Data Tape (SDT) files from American Standard Code for Information Interchange (ASCII) format into databases in other formats. JSDTImport solves the problem of organizing the SDT content, affording flexibility to enable users to choose how to store the information in a database to better support client and server applications. JSDTImport can be dynamically configured by use of a simple Extensible Markup Language (XML) file. JSDTImport uses this XML file to define how each record and field will be parsed, its layout and definition, and how the resulting database will be structured. JSDTImport also includes a client application programming interface (API) layer that provides abstraction for the data-querying process. The API enables a user to specify the search criteria to apply in gathering all the data relevant to a query. The API can be used to organize the SDT content and translate into a native XML database. The XML format is structured into efficient sections, enabling excellent query performance by use of the XPath query language. Optionally, the content can be translated into a Structured Query Language (SQL) database for fast, reliable SQL queries on standard database server computers.

  5. An investigation of constraint-based component-modeling for knowledge representation in computer-aided conceptual design

    NASA Technical Reports Server (NTRS)

    Kolb, Mark A.

    1990-01-01

    Originally, computer programs for engineering design focused on detailed geometric design. Later, computer programs for algorithmically performing the preliminary design of specific well-defined classes of objects became commonplace. However, due to the need for extreme flexibility, it appears unlikely that conventional programming techniques will prove fruitful in developing computer aids for engineering conceptual design. The use of symbolic processing techniques, such as object-oriented programming and constraint propagation, facilitate such flexibility. Object-oriented programming allows programs to be organized around the objects and behavior to be simulated, rather than around fixed sequences of function- and subroutine-calls. Constraint propagation allows declarative statements to be understood as designating multi-directional mathematical relationships among all the variables of an equation, rather than as unidirectional assignments to the variable on the left-hand side of the equation, as in conventional computer programs. The research has concentrated on applying these two techniques to the development of a general-purpose computer aid for engineering conceptual design. Object-oriented programming techniques are utilized to implement a user-extensible database of design components. The mathematical relationships which model both geometry and physics of these components are managed via constraint propagation. In addition, to this component-based hierarchy, special-purpose data structures are provided for describing component interactions and supporting state-dependent parameters. In order to investigate the utility of this approach, a number of sample design problems from the field of aerospace engineering were implemented using the prototype design tool, Rubber Airplane. The additional level of organizational structure obtained by representing design knowledge in terms of components is observed to provide greater convenience to the program user, and to result in a database of engineering information which is easier both to maintain and to extend.

  6. Advanced transportation system studies. Alternate propulsion subsystem concepts: Propulsion database

    NASA Technical Reports Server (NTRS)

    Levack, Daniel

    1993-01-01

    The Advanced Transportation System Studies alternate propulsion subsystem concepts propulsion database interim report is presented. The objective of the database development task is to produce a propulsion database which is easy to use and modify while also being comprehensive in the level of detail available. The database is to be available on the Macintosh computer system. The task is to extend across all three years of the contract. Consequently, a significant fraction of the effort in this first year of the task was devoted to the development of the database structure to ensure a robust base for the following years' efforts. Nonetheless, significant point design propulsion system descriptions and parametric models were also produced. Each of the two propulsion databases, parametric propulsion database and propulsion system database, are described. The descriptions include a user's guide to each code, write-ups for models used, and sample output. The parametric database has models for LOX/H2 and LOX/RP liquid engines, solid rocket boosters using three different propellants, a hybrid rocket booster, and a NERVA derived nuclear thermal rocket engine.

  7. Protein binding hot spots prediction from sequence only by a new ensemble learning method.

    PubMed

    Hu, Shan-Shan; Chen, Peng; Wang, Bing; Li, Jinyan

    2017-10-01

    Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set. http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .

  8. Computer Science Research in Europe.

    DTIC Science & Technology

    1984-08-29

    most attention, multi- database and its structure, and (3) the dependencies between databases Distributed Systems and multi- databases . Having...completed a multi- database Newcastle University, UK system for distributed data management, At the University of Newcastle the INRIA is now working on a real...communications re- INRIA quirements of distributed database A project called SIRIUS was estab- systems, protocols for checking the lished in 1977 at the

  9. Prototype and Evaluation of AutoHelp: A Case-based, Web-accessible Help Desk System for EOSDIS

    NASA Technical Reports Server (NTRS)

    Mitchell, Christine M.; Thurman, David A.

    1999-01-01

    AutoHelp is a case-based, Web-accessible help desk for users of the EOSDIS. Its uses a combination of advanced computer and Web technologies, knowledge-based systems tools, and cognitive engineering to offload the current, person-intensive, help desk facilities at the DAACs. As a case-based system, AutoHelp starts with an organized database of previous help requests (questions and answers) indexed by a hierarchical category structure that facilitates recognition by persons seeking assistance. As an initial proof-of-concept demonstration, a month of email help requests to the Goddard DAAC were analyzed and partially organized into help request cases. These cases were then categorized to create a preliminary case indexing system, or category structure. This category structure allows potential users to identify or recognize categories of questions, responses, and sample cases similar to their needs. Year one of this research project focused on the development of a technology demonstration. User assistance 'cases' are stored in an Oracle database in a combination of tables linking prototypical questions with responses and detailed examples from the email help requests analyzed to date. When a potential user accesses the AutoHelp system, a Web server provides a Java applet that displays the category structure of the help case base organized by the needs of previous users. When the user identifies or requests a particular type of assistance, the applet uses Java database connectivity (JDBC) software to access the database and extract the relevant cases. The demonstration will include an on-line presentation of how AutoHelp is currently structured. We will show how a user might request assistance via the Web interface and how the AutoHelp case base provides assistance. The presentation will describe the DAAC data collection, case definition, and organization to date, as well as the AutoHelp architecture. It will conclude with the year 2 proposal to more fully develop the case base, the user interface (including the category structure), interface with the current DAAC Help System, the development of tools to add new cases, and user testing and evaluation at (perhaps) the Goddard DAAC.

  10. SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases

    NASA Astrophysics Data System (ADS)

    Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.

    One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.

  11. The Innate Immune Database (IIDB)

    PubMed Central

    Korb, Martin; Rust, Aistair G; Thorsson, Vesteinn; Battail, Christophe; Li, Bin; Hwang, Daehee; Kennedy, Kathleen A; Roach, Jared C; Rosenberger, Carrie M; Gilchrist, Mark; Zak, Daniel; Johnson, Carrie; Marzolf, Bruz; Aderem, Alan; Shmulevich, Ilya; Bolouri, Hamid

    2008-01-01

    Background As part of a National Institute of Allergy and Infectious Diseases funded collaborative project, we have performed over 150 microarray experiments measuring the response of C57/BL6 mouse bone marrow macrophages to toll-like receptor stimuli. These microarray expression profiles are available freely from our project web site . Here, we report the development of a database of computationally predicted transcription factor binding sites and related genomic features for a set of over 2000 murine immune genes of interest. Our database, which includes microarray co-expression clusters and a host of web-based query, analysis and visualization facilities, is available freely via the internet. It provides a broad resource to the research community, and a stepping stone towards the delineation of the network of transcriptional regulatory interactions underlying the integrated response of macrophages to pathogens. Description We constructed a database indexed on genes and annotations of the immediate surrounding genomic regions. To facilitate both gene-specific and systems biology oriented research, our database provides the means to analyze individual genes or an entire genomic locus. Although our focus to-date has been on mammalian toll-like receptor signaling pathways, our database structure is not limited to this subject, and is intended to be broadly applicable to immunology. By focusing on selected immune-active genes, we were able to perform computationally intensive expression and sequence analyses that would currently be prohibitive if applied to the entire genome. Using six complementary computational algorithms and methodologies, we identified transcription factor binding sites based on the Position Weight Matrices available in TRANSFAC. For one example transcription factor (ATF3) for which experimental data is available, over 50% of our predicted binding sites coincide with genome-wide chromatin immnuopreciptation (ChIP-chip) results. Our database can be interrogated via a web interface. Genomic annotations and binding site predictions can be automatically viewed with a customized version of the Argo genome browser. Conclusion We present the Innate Immune Database (IIDB) as a community resource for immunologists interested in gene regulatory systems underlying innate responses to pathogens. The database website can be freely accessed at . PMID:18321385

  12. Map-Based Querying for Multimedia Database

    DTIC Science & Technology

    2014-09-01

    existing assets in a custom multimedia database based on an area of interest. It also describes the augmentation of an Android Tactical Assault Kit (ATAK......for Multimedia Database Somiya Metu Computational and Information Sciences Directorate, ARL

  13. Real-Time Ligand Binding Pocket Database Search Using Local Surface Descriptors

    PubMed Central

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-01-01

    Due to the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of a particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two dimensional pseudo-Zernike moments or the 3D Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark study employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed. PMID:20455259

  14. Real-time ligand binding pocket database search using local surface descriptors.

    PubMed

    Chikhi, Rayan; Sael, Lee; Kihara, Daisuke

    2010-07-01

    Because of the increasing number of structures of unknown function accumulated by ongoing structural genomics projects, there is an urgent need for computational methods for characterizing protein tertiary structures. As functions of many of these proteins are not easily predicted by conventional sequence database searches, a legitimate strategy is to utilize structure information in function characterization. Of particular interest is prediction of ligand binding to a protein, as ligand molecule recognition is a major part of molecular function of proteins. Predicting whether a ligand molecule binds a protein is a complex problem due to the physical nature of protein-ligand interactions and the flexibility of both binding sites and ligand molecules. However, geometric and physicochemical complementarity is observed between the ligand and its binding site in many cases. Therefore, ligand molecules which bind to a local surface site in a protein can be predicted by finding similar local pockets of known binding ligands in the structure database. Here, we present two representations of ligand binding pockets and utilize them for ligand binding prediction by pocket shape comparison. These representations are based on mapping of surface properties of binding pockets, which are compactly described either by the two-dimensional pseudo-Zernike moments or the three-dimensional Zernike descriptors. These compact representations allow a fast real-time pocket searching against a database. Thorough benchmark studies employing two different datasets show that our representations are competitive with the other existing methods. Limitations and potentials of the shape-based methods as well as possible improvements are discussed.

  15. Designing Semiconductor Heterostructures Using Digitally Accessible Electronic-Structure Data

    NASA Astrophysics Data System (ADS)

    Shapera, Ethan; Schleife, Andre

    Semiconductor sandwich structures, so-called heterojunctions, are at the heart of modern applications with tremendous societal impact: Light-emitting diodes shape the future of lighting and solar cells are promising for renewable energy. However, their computer-based design is hampered by the high cost of electronic structure techniques used to select materials based on alignment of valence and conduction bands and to evaluate excited state properties. We describe, validate, and demonstrate an open source Python framework which rapidly screens existing online databases and user-provided data to find combinations of suitable, previously fabricated materials for optoelectronic applications. The branch point energy aligns valence and conduction bands of different materials, requiring only the bulk density functional theory band structure. We train machine learning algorithms to predict the dielectric constant, electron mobility, and hole mobility with material descriptors available in online databases. Using CdSe and InP as emitting layers for LEDs and CH3NH3PbI3 and nanoparticle PbS as absorbers for solar cells, we demonstrate our broadly applicable, automated method.

  16. The database management system: A topic and a tool

    NASA Technical Reports Server (NTRS)

    Plummer, O. R.

    1984-01-01

    Data structures and data base management systems are common tools employed to deal with the administrative information of a university. An understanding of these topics is needed by a much wider audience, ranging from those interested in computer aided design and manufacturing to those using microcomputers. These tools are becoming increasingly valuable to academic programs as they develop comprehensive computer support systems. The wide use of these tools relies upon the relational data model as a foundation. Experience with the use of the IPAD RIM5.0 program is described.

  17. Automated compound classification using a chemical ontology.

    PubMed

    Bobach, Claudia; Böhme, Timo; Laube, Ulf; Püschel, Anett; Weber, Lutz

    2012-12-29

    Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated.

  18. Automated compound classification using a chemical ontology

    PubMed Central

    2012-01-01

    Background Classification of chemical compounds into compound classes by using structure derived descriptors is a well-established method to aid the evaluation and abstraction of compound properties in chemical compound databases. MeSH and recently ChEBI are examples of chemical ontologies that provide a hierarchical classification of compounds into general compound classes of biological interest based on their structural as well as property or use features. In these ontologies, compounds have been assigned manually to their respective classes. However, with the ever increasing possibilities to extract new compounds from text documents using name-to-structure tools and considering the large number of compounds deposited in databases, automated and comprehensive chemical classification methods are needed to avoid the error prone and time consuming manual classification of compounds. Results In the present work we implement principles and methods to construct a chemical ontology of classes that shall support the automated, high-quality compound classification in chemical databases or text documents. While SMARTS expressions have already been used to define chemical structure class concepts, in the present work we have extended the expressive power of such class definitions by expanding their structure-based reasoning logic. Thus, to achieve the required precision and granularity of chemical class definitions, sets of SMARTS class definitions are connected by OR and NOT logical operators. In addition, AND logic has been implemented to allow the concomitant use of flexible atom lists and stereochemistry definitions. The resulting chemical ontology is a multi-hierarchical taxonomy of concept nodes connected by directed, transitive relationships. Conclusions A proposal for a rule based definition of chemical classes has been made that allows to define chemical compound classes more precisely than before. The proposed structure-based reasoning logic allows to translate chemistry expert knowledge into a computer interpretable form, preventing erroneous compound assignments and allowing automatic compound classification. The automated assignment of compounds in databases, compound structure files or text documents to their related ontology classes is possible through the integration with a chemical structure search engine. As an application example, the annotation of chemical structure files with a prototypic ontology is demonstrated. PMID:23273256

  19. Computer-based creativity enhanced conceptual design model for non-routine design of mechanical systems

    NASA Astrophysics Data System (ADS)

    Li, Yutong; Wang, Yuxin; Duffy, Alex H. B.

    2014-11-01

    Computer-based conceptual design for routine design has made great strides, yet non-routine design has not been given due attention, and it is still poorly automated. Considering that the function-behavior-structure(FBS) model is widely used for modeling the conceptual design process, a computer-based creativity enhanced conceptual design model(CECD) for non-routine design of mechanical systems is presented. In the model, the leaf functions in the FBS model are decomposed into and represented with fine-grain basic operation actions(BOA), and the corresponding BOA set in the function domain is then constructed. Choosing building blocks from the database, and expressing their multiple functions with BOAs, the BOA set in the structure domain is formed. Through rule-based dynamic partition of the BOA set in the function domain, many variants of regenerated functional schemes are generated. For enhancing the capability to introduce new design variables into the conceptual design process, and dig out more innovative physical structure schemes, the indirect function-structure matching strategy based on reconstructing the combined structure schemes is adopted. By adjusting the tightness of the partition rules and the granularity of the divided BOA subsets, and making full use of the main function and secondary functions of each basic structure in the process of reconstructing of the physical structures, new design variables and variants are introduced into the physical structure scheme reconstructing process, and a great number of simpler physical structure schemes to accomplish the overall function organically are figured out. The creativity enhanced conceptual design model presented has a dominant capability in introducing new deign variables in function domain and digging out simpler physical structures to accomplish the overall function, therefore it can be utilized to solve non-routine conceptual design problem.

  20. Computational Discovery of Materials Using the Firefly Algorithm

    NASA Astrophysics Data System (ADS)

    Avendaño-Franco, Guillermo; Romero, Aldo

    Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.

  1. BIOPEP database and other programs for processing bioactive peptide sequences.

    PubMed

    Minkiewicz, Piotr; Dziuba, Jerzy; Iwaniak, Anna; Dziuba, Marta; Darewicz, Małgorzata

    2008-01-01

    This review presents the potential for application of computational tools in peptide science based on a sample BIOPEP database and program as well as other programs and databases available via the World Wide Web. The BIOPEP application contains a database of biologically active peptide sequences and a program enabling construction of profiles of the potential biological activity of protein fragments, calculation of quantitative descriptors as measures of the value of proteins as potential precursors of bioactive peptides, and prediction of bonds susceptible to hydrolysis by endopeptidases in a protein chain. Other bioactive and allergenic peptide sequence databases are also presented. Programs enabling the construction of binary and multiple alignments between peptide sequences, the construction of sequence motifs attributed to a given type of bioactivity, searching for potential precursors of bioactive peptides, and the prediction of sites susceptible to proteolytic cleavage in protein chains are available via the Internet as are other approaches concerning secondary structure prediction and calculation of physicochemical features based on amino acid sequence. Programs for prediction of allergenic and toxic properties have also been developed. This review explores the possibilities of cooperation between various programs.

  2. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  3. Adaptive Constrained Optimal Control Design for Data-Based Nonlinear Discrete-Time Systems With Critic-Only Structure.

    PubMed

    Luo, Biao; Liu, Derong; Wu, Huai-Ning

    2018-06-01

    Reinforcement learning has proved to be a powerful tool to solve optimal control problems over the past few years. However, the data-based constrained optimal control problem of nonaffine nonlinear discrete-time systems has rarely been studied yet. To solve this problem, an adaptive optimal control approach is developed by using the value iteration-based Q-learning (VIQL) with the critic-only structure. Most of the existing constrained control methods require the use of a certain performance index and only suit for linear or affine nonlinear systems, which is unreasonable in practice. To overcome this problem, the system transformation is first introduced with the general performance index. Then, the constrained optimal control problem is converted to an unconstrained optimal control problem. By introducing the action-state value function, i.e., Q-function, the VIQL algorithm is proposed to learn the optimal Q-function of the data-based unconstrained optimal control problem. The convergence results of the VIQL algorithm are established with an easy-to-realize initial condition . To implement the VIQL algorithm, the critic-only structure is developed, where only one neural network is required to approximate the Q-function. The converged Q-function obtained from the critic-only VIQL method is employed to design the adaptive constrained optimal controller based on the gradient descent scheme. Finally, the effectiveness of the developed adaptive control method is tested on three examples with computer simulation.

  4. Full-reference quality assessment of stereoscopic images by learning binocular receptive field properties.

    PubMed

    Shao, Feng; Li, Kemeng; Lin, Weisi; Jiang, Gangyi; Yu, Mei; Dai, Qionghai

    2015-10-01

    Quality assessment of 3D images encounters more challenges than its 2D counterparts. Directly applying 2D image quality metrics is not the solution. In this paper, we propose a new full-reference quality assessment for stereoscopic images by learning binocular receptive field properties to be more in line with human visual perception. To be more specific, in the training phase, we learn a multiscale dictionary from the training database, so that the latent structure of images can be represented as a set of basis vectors. In the quality estimation phase, we compute sparse feature similarity index based on the estimated sparse coefficient vectors by considering their phase difference and amplitude difference, and compute global luminance similarity index by considering luminance changes. The final quality score is obtained by incorporating binocular combination based on sparse energy and sparse complexity. Experimental results on five public 3D image quality assessment databases demonstrate that in comparison with the most related existing methods, the devised algorithm achieves high consistency with subjective assessment.

  5. Braided Composites for Aerospace Applications. (Latest citations from the Aerospace Database)

    NASA Technical Reports Server (NTRS)

    1996-01-01

    The bibliography contains citations concerning the design, fabrication, and testing of structural composites formed by braiding machines. Topics include computer aided design and associated computer aided manufacture of braided tubular and flat forms. Applications include aircraft and spacecraft structures, where high shear strength and stiffness are required.

  6. Membrane proteins structures: A review on computational modeling tools.

    PubMed

    Almeida, Jose G; Preto, Antonio J; Koukos, Panagiotis I; Bonvin, Alexandre M J J; Moreira, Irina S

    2017-10-01

    Membrane proteins (MPs) play diverse and important functions in living organisms. They constitute 20% to 30% of the known bacterial, archaean and eukaryotic organisms' genomes. In humans, their importance is emphasized as they represent 50% of all known drug targets. Nevertheless, experimental determination of their three-dimensional (3D) structure has proven to be both time consuming and rather expensive, which has led to the development of computational algorithms to complement the available experimental methods and provide valuable insights. This review highlights the importance of membrane proteins and how computational methods are capable of overcoming challenges associated with their experimental characterization. It covers various MP structural aspects, such as lipid interactions, allostery, and structure prediction, based on methods such as Molecular Dynamics (MD) and Machine-Learning (ML). Recent developments in algorithms, tools and hybrid approaches, together with the increase in both computational resources and the amount of available data have resulted in increasingly powerful and trustworthy approaches to model MPs. Even though MPs are elementary and important in nature, the determination of their 3D structure has proven to be a challenging endeavor. Computational methods provide a reliable alternative to experimental methods. In this review, we focus on computational techniques to determine the 3D structure of MP and characterize their binding interfaces. We also summarize the most relevant databases and software programs available for the study of MPs. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases

    PubMed Central

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-01-01

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174

  8. Executing Complexity-Increasing Queries in Relational (MySQL) and NoSQL (MongoDB and EXist) Size-Growing ISO/EN 13606 Standardized EHR Databases.

    PubMed

    Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario

    2018-03-19

    This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.

  9. Forecasting database for the tsunami warning regional center for the western Mediterranean Sea

    NASA Astrophysics Data System (ADS)

    Gailler, A.; Hebert, H.; Loevenbruck, A.; Hernandez, B.

    2010-12-01

    Improvements in the availability of sea-level observations and advances in numerical modeling techniques are increasing the potential for tsunami warnings to be based on numerical model forecasts. Numerical tsunami propagation and inundation models are well developed, but they present a challenge to run in real-time, partly due to computational limitations and also to a lack of detailed knowledge on the earthquake rupture parameters. Through the establishment of the tsunami warning regional center for NE Atlantic and western Mediterranean Sea, the CEA is especially in charge of providing rapidly a map with uncertainties showing zones in the main axis of energy at the Mediterranean scale. The strategy is based initially on a pre-computed tsunami scenarios database, as source parameters available a short time after an earthquake occurs are preliminary and may be somewhat inaccurate. Existing numerical models are good enough to provide a useful guidance for warning structures to be quickly disseminated. When an event will occur, an appropriate variety of offshore tsunami propagation scenarios by combining pre-computed propagation solutions (single or multi sources) may be recalled through an automatic interface. This approach would provide quick estimates of tsunami offshore propagation, and aid hazard assessment and evacuation decision-making. As numerical model accuracy is inherently limited by errors in bathymetry and topography, and as inundation maps calculation is more complex and expensive in term of computational time, only tsunami offshore propagation modeling will be included in the forecasting database using a single sparse bathymetric computation grid for the numerical modeling. Because of too much variability in the mechanism of tsunamigenic earthquakes, all possible magnitudes cannot be represented in the scenarios database. In principle, an infinite number of tsunami propagation scenarios can be constructed by linear combinations of a finite number of pre-computed unit scenarios. The whole notion of a pre-computed forecasting database also requires a historical earthquake and tsunami database, as well as an up-to-date seismotectonic database including faults geometry and a zonation based on seismotectonic synthesis of source zones and tsunamigenic faults. Our forecast strategy is thus based on a unit source function methodology, whereby the model runs are combined and scaled linearly to produce any composite tsunamis propagation solution. Each unit source function is equivalent to a tsunami generated by a Mo 1.75E+19 N.m earthquake (Mw ~6.8) with a rectangular fault 25 km by 20 km in size and 1 m in slip. The faults of the unit functions are placed adjacent to each other, following the discretization of the main seismogenic faults bounding the western Mediterranean basin. The number of unit functions involved varies with the magnitude of the wanted composite solution and the combined waveheights are multiplied by a given scaling factor to produce the new arbitrary scenario. Some test-cases examples are presented (e.g., Boumerdès 2003 [Algeria, Mw 6.8], Djijel 1856 [Algeria, Mw 7.2], Ligure 1887 [Italia, Mw 6.5-6.7]).

  10. URS DataBase: universe of RNA structures and their motifs.

    PubMed

    Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail

    2016-01-01

    The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA-protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification.Database URL: http://server3.lpm.org.ru/urs/. © The Author(s) 2016. Published by Oxford University Press.

  11. URS DataBase: universe of RNA structures and their motifs

    PubMed Central

    Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail

    2016-01-01

    The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA–protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification. Database URL: http://server3.lpm.org.ru/urs/ PMID:27242032

  12. GIRAF: a method for fast search and flexible alignment of ligand binding interfaces in proteins at atomic resolution

    PubMed Central

    Kinjo, Akira R.; Nakamura, Haruki

    2012-01-01

    Comparison and classification of protein structures are fundamental means to understand protein functions. Due to the computational difficulty and the ever-increasing amount of structural data, however, it is in general not feasible to perform exhaustive all-against-all structure comparisons necessary for comprehensive classifications. To efficiently handle such situations, we have previously proposed a method, now called GIRAF. We herein describe further improvements in the GIRAF protein structure search and alignment method. The GIRAF method achieves extremely efficient search of similar structures of ligand binding sites of proteins by exploiting database indexing of structural features of local coordinate frames. In addition, it produces refined atom-wise alignments by iterative applications of the Hungarian method to the bipartite graph defined for a pair of superimposed structures. By combining the refined alignments based on different local coordinate frames, it is made possible to align structures involving domain movements. We provide detailed accounts for the database design, the search and alignment algorithms as well as some benchmark results. PMID:27493524

  13. Implementing and analyzing the multi-threaded LP-inference

    NASA Astrophysics Data System (ADS)

    Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.

    2018-03-01

    The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.

  14. Indexing and retrieving motions of characters in close contact.

    PubMed

    Ho, Edmond S L; Komura, Taku

    2009-01-01

    Human motion indexing and retrieval are important for animators due to the need to search for motions in the database which can be blended and concatenated. Most of the previous researches of human motion indexing and retrieval compute the Euclidean distance of joint angles or joint positions. Such approaches are difficult to apply for cases in which multiple characters are closely interacting with each other, as the relationships of the characters are not encoded in the representation. In this research, we propose a topology-based approach to index the motions of two human characters in close contact. We compute and encode how the two bodies are tangled based on the concept of rational tangles. The encoded relationships, which we define as TangleList, are used to determine the similarity of the pairs of postures. Using our method, we can index and retrieve motions such as one person piggy-backing another, one person assisting another in walking, and two persons dancing / wrestling. Our method is useful to manage a motion database of multiple characters. We can also produce motion graph structures of two characters closely interacting with each other by interpolating and concatenating topologically similar postures and motion clips, which are applicable to 3D computer games and computer animation.

  15. The Use of a Relational Database in Qualitative Research on Educational Computing.

    ERIC Educational Resources Information Center

    Winer, Laura R.; Carriere, Mario

    1990-01-01

    Discusses the use of a relational database as a data management and analysis tool for nonexperimental qualitative research, and describes the use of the Reflex Plus database in the Vitrine 2001 project in Quebec to study computer-based learning environments. Information systems are also discussed, and the use of a conceptual model is explained.…

  16. Coupling computer-interpretable guidelines with a drug-database through a web-based system – The PRESGUID project

    PubMed Central

    Dufour, Jean-Charles; Fieschi, Dominique; Fieschi, Marius

    2004-01-01

    Background Clinical Practice Guidelines (CPGs) available today are not extensively used due to lack of proper integration into clinical settings, knowledge-related information resources, and lack of decision support at the point of care in a particular clinical context. Objective The PRESGUID project (PREScription and GUIDelines) aims to improve the assistance provided by guidelines. The project proposes an online service enabling physicians to consult computerized CPGs linked to drug databases for easier integration into the healthcare process. Methods Computable CPGs are structured as decision trees and coded in XML format. Recommendations related to drug classes are tagged with ATC codes. We use a mapping module to enhance computerized guidelines coupling with a drug database, which contains detailed information about each usable specific medication. In this way, therapeutic recommendations are backed up with current and up-to-date information from the database. Results Two authoritative CPGs, originally diffused as static textual documents, have been implemented to validate the computerization process and to illustrate the usefulness of the resulting automated CPGs and their coupling with a drug database. We discuss the advantages of this approach for practitioners and the implications for both guideline developers and drug database providers. Other CPGs will be implemented and evaluated in real conditions by clinicians working in different health institutions. PMID:15053828

  17. Construction of image database for newspapaer articles using CTS

    NASA Astrophysics Data System (ADS)

    Kamio, Tatsuo

    Nihon Keizai Shimbun, Inc. developed a system of making articles' image database automatically by use of CTS (Computer Typesetting System). Besides the articles and the headlines inputted in CTS, it reproduces the image of elements of such as photography and graphs by article in accordance with information of position on the paper. So to speak, computer itself clips the articles out of the newspaper. Image database is accumulated in magnetic file and optical file and is output to the facsimile of users. With diffusion of CTS, newspaper companies which start to have structure of articles database are increased rapidly, the said system is the first attempt to make database automatically. This paper describes the device of CTS which supports this system and outline.

  18. Description of 'REQUEST-KYUSHYU' for KYUKEICHO regional data base

    NASA Astrophysics Data System (ADS)

    Takimoto, Shin'ichi

    Kyushu Economic Research Association (a foundational juridical person) initiated the regional database services, ' REQUEST-Kyushu ' recently. It is the full scale databases compiled based on the information and know-hows which the Association has accumulated over forty years. It covers the regional information database for journal and newspaper articles, and statistical information database for economic statistics. As to the former database it is searched on a personal computer and then a search result (original text) is sent through a facsimile. As to the latter, it is also searched on a personal computer where the data is processed, edited or downloaded. This paper describes characteristics, content and the system outline of 'REQUEST-Kyushu'.

  19. Peer Education Versus Computer-Based Education: Improve Utilization of Library Databases Among Direct Care Nurses.

    PubMed

    Gonzalez, Roxana; O'Brien-Barry, Patricia; Ancheta, Reginaldo; Razal, Rennuel; Clyne, Mary Ellen

    A quasiexperimental study was conducted to demonstrate which teaching modality, peer education or computer-based education, improves the utilization of the library electronic databases and thereby evidence-based knowledge at the point of care. No significant differences were found between the teaching modalities. However, the study identified the need to explore professional development teaching modalities outside the traditional classroom to support an evidence-based practice healthcare environment.

  20. The portable UNIX programming system (PUPS) and CANTOR: a computational environment for dynamical representation and analysis of complex neurobiological data.

    PubMed

    O'Neill, M A; Hilgetag, C C

    2001-08-29

    Many problems in analytical biology, such as the classification of organisms, the modelling of macromolecules, or the structural analysis of metabolic or neural networks, involve complex relational data. Here, we describe a software environment, the portable UNIX programming system (PUPS), which has been developed to allow efficient computational representation and analysis of such data. The system can also be used as a general development tool for database and classification applications. As the complexity of analytical biology problems may lead to computation times of several days or weeks even on powerful computer hardware, the PUPS environment gives support for persistent computations by providing mechanisms for dynamic interaction and homeostatic protection of processes. Biological objects and their interrelations are also represented in a homeostatic way in PUPS. Object relationships are maintained and updated by the objects themselves, thus providing a flexible, scalable and current data representation. Based on the PUPS environment, we have developed an optimization package, CANTOR, which can be applied to a wide range of relational data and which has been employed in different analyses of neuroanatomical connectivity. The CANTOR package makes use of the PUPS system features by modifying candidate arrangements of objects within the system's database. This restructuring is carried out via optimization algorithms that are based on user-defined cost functions, thus providing flexible and powerful tools for the structural analysis of the database content. The use of stochastic optimization also enables the CANTOR system to deal effectively with incomplete and inconsistent data. Prototypical forms of PUPS and CANTOR have been coded and used successfully in the analysis of anatomical and functional mammalian brain connectivity, involving complex and inconsistent experimental data. In addition, PUPS has been used for solving multivariate engineering optimization problems and to implement the digital identification system (DAISY), a system for the automated classification of biological objects. PUPS is implemented in ANSI-C under the POSIX.1 standard and is to a great extent architecture- and operating-system independent. The software is supported by systems libraries that allow multi-threading (the concurrent processing of several database operations), as well as the distribution of the dynamic data objects and library operations over clusters of computers. These attributes make the system easily scalable, and in principle allow the representation and analysis of arbitrarily large sets of relational data. PUPS and CANTOR are freely distributed (http://www.pups.org.uk) as open-source software under the GNU license agreement.

  1. The portable UNIX programming system (PUPS) and CANTOR: a computational environment for dynamical representation and analysis of complex neurobiological data.

    PubMed Central

    O'Neill, M A; Hilgetag, C C

    2001-01-01

    Many problems in analytical biology, such as the classification of organisms, the modelling of macromolecules, or the structural analysis of metabolic or neural networks, involve complex relational data. Here, we describe a software environment, the portable UNIX programming system (PUPS), which has been developed to allow efficient computational representation and analysis of such data. The system can also be used as a general development tool for database and classification applications. As the complexity of analytical biology problems may lead to computation times of several days or weeks even on powerful computer hardware, the PUPS environment gives support for persistent computations by providing mechanisms for dynamic interaction and homeostatic protection of processes. Biological objects and their interrelations are also represented in a homeostatic way in PUPS. Object relationships are maintained and updated by the objects themselves, thus providing a flexible, scalable and current data representation. Based on the PUPS environment, we have developed an optimization package, CANTOR, which can be applied to a wide range of relational data and which has been employed in different analyses of neuroanatomical connectivity. The CANTOR package makes use of the PUPS system features by modifying candidate arrangements of objects within the system's database. This restructuring is carried out via optimization algorithms that are based on user-defined cost functions, thus providing flexible and powerful tools for the structural analysis of the database content. The use of stochastic optimization also enables the CANTOR system to deal effectively with incomplete and inconsistent data. Prototypical forms of PUPS and CANTOR have been coded and used successfully in the analysis of anatomical and functional mammalian brain connectivity, involving complex and inconsistent experimental data. In addition, PUPS has been used for solving multivariate engineering optimization problems and to implement the digital identification system (DAISY), a system for the automated classification of biological objects. PUPS is implemented in ANSI-C under the POSIX.1 standard and is to a great extent architecture- and operating-system independent. The software is supported by systems libraries that allow multi-threading (the concurrent processing of several database operations), as well as the distribution of the dynamic data objects and library operations over clusters of computers. These attributes make the system easily scalable, and in principle allow the representation and analysis of arbitrarily large sets of relational data. PUPS and CANTOR are freely distributed (http://www.pups.org.uk) as open-source software under the GNU license agreement. PMID:11545702

  2. Real-world clinical applicability of pathogenicity predictors assessed on SERPINA1 mutations in alpha-1-antitrypsin deficiency.

    PubMed

    Giacopuzzi, Edoardo; Laffranchi, Mattia; Berardelli, Romina; Ravasio, Viola; Ferrarotti, Ilaria; Gooptu, Bibek; Borsani, Giuseppe; Fra, Annamaria

    2018-06-07

    The growth of publicly available data informing upon genetic variations, mechanisms of disease and disease sub-phenotypes offers great potential for personalised medicine. Computational approaches are likely required to assess large numbers of novel genetic variants. However, the integration of genetic, structural and pathophysiological data still represents a challenge for computational predictions and their clinical use. We addressed these issues for alpha-1-antitrypsin deficiency, a disease mediated by mutations in the SERPINA1 gene encoding alpha-1-antitrypsin. We compiled a comprehensive database of SERPINA1 coding mutations and assigned them apparent pathological relevance based upon available data. 'Benign' and 'Pathogenic' mutations were used to assess performance of 31 pathogenicity predictors. Well-performing algorithms clustered the subset of variants known to be severely pathogenic with high scores. Eight new mutations identified in the ExAC database and achieving high scores were selected for characterisation in cell models and showed secretory deficiency and polymer formation, supporting the predictive power of our computational approach. The behaviour of the pathogenic new variants and consistent outliers were rationalised by considering the protein structural context and residue conservation. These findings highlight the potential of computational methods to provide meaningful predictions of the pathogenic significance of novel mutations and identify areas for further investigation. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  3. Graph Learning in Knowledge Bases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goldberg, Sean; Wang, Daisy Zhe

    The amount of text data has been growing exponentially in recent years, giving rise to automatic information extraction methods that store text annotations in a database. The current state-of-theart structured prediction methods, however, are likely to contain errors and it’s important to be able to manage the overall uncertainty of the database. On the other hand, the advent of crowdsourcing has enabled humans to aid machine algorithms at scale. As part of this project we introduced pi-CASTLE , a system that optimizes and integrates human and machine computing as applied to a complex structured prediction problem involving conditional random fieldsmore » (CRFs). We proposed strategies grounded in information theory to select a token subset, formulate questions for the crowd to label, and integrate these labelings back into the database using a method of constrained inference. On both a text segmentation task over academic citations and a named entity recognition task over tweets we showed an order of magnitude improvement in accuracy gain over baseline methods.« less

  4. Hand-held computer operating system program for collection of resident experience data.

    PubMed

    Malan, T K; Haffner, W H; Armstrong, A Y; Satin, A J

    2000-11-01

    To describe a system for recording resident experience involving hand-held computers with the Palm Operating System (3 Com, Inc., Santa Clara, CA). Hand-held personal computers (PCs) are popular, easy to use, inexpensive, portable, and can share data among other operating systems. Residents in our program carry individual hand-held database computers to record Residency Review Committee (RRC) reportable patient encounters. Each resident's data is transferred to a single central relational database compatible with Microsoft Access (Microsoft Corporation, Redmond, WA). Patient data entry and subsequent transfer to a central database is accomplished with commercially available software that requires minimal computer expertise to implement and maintain. The central database can then be used for statistical analysis or to create required RRC resident experience reports. As a result, the data collection and transfer process takes less time for residents and program director alike, than paper-based or central computer-based systems. The system of collecting resident encounter data using hand-held computers with the Palm Operating System is easy to use, relatively inexpensive, accurate, and secure. The user-friendly system provides prompt, complete, and accurate data, enhancing the education of residents while facilitating the job of the program director.

  5. GlycomeDB – integration of open-access carbohydrate structure databases

    PubMed Central

    Ranzinger, René; Herget, Stephan; Wetter, Thomas; von der Lieth, Claus-Wilhelm

    2008-01-01

    Background Although carbohydrates are the third major class of biological macromolecules, after proteins and DNA, there is neither a comprehensive database for carbohydrate structures nor an established universal structure encoding scheme for computational purposes. Funding for further development of the Complex Carbohydrate Structure Database (CCSD or CarbBank) ceased in 1997, and since then several initiatives have developed independent databases with partially overlapping foci. For each database, different encoding schemes for residues and sequence topology were designed. Therefore, it is virtually impossible to obtain an overview of all deposited structures or to compare the contents of the various databases. Results We have implemented procedures which download the structures contained in the seven major databases, e.g. GLYCOSCIENCES.de, the Consortium for Functional Glycomics (CFG), the Kyoto Encyclopedia of Genes and Genomes (KEGG) and the Bacterial Carbohydrate Structure Database (BCSDB). We have created a new database called GlycomeDB, containing all structures, their taxonomic annotations and references (IDs) for the original databases. More than 100000 datasets were imported, resulting in more than 33000 unique sequences now encoded in GlycomeDB using the universal format GlycoCT. Inconsistencies were found in all public databases, which were discussed and corrected in multiple feedback rounds with the responsible curators. Conclusion GlycomeDB is a new, publicly available database for carbohydrate sequences with a unified, all-encompassing structure encoding format and NCBI taxonomic referencing. The database is updated weekly and can be downloaded free of charge. The JAVA application GlycoUpdateDB is also available for establishing and updating a local installation of GlycomeDB. With the advent of GlycomeDB, the distributed islands of knowledge in glycomics are now bridged to form a single resource. PMID:18803830

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it; Fiorina, E.; Pennazio, F.

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number ofmore » features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large scale screenings and clinical programs.« less

  7. Large scale validation of the M5L lung CAD on heterogeneous CT datasets.

    PubMed

    Torres, E Lopez; Fiorina, E; Pennazio, F; Peroni, C; Saletta, M; Camarlinghi, N; Fantacci, M E; Cerello, P

    2015-04-01

    M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large scale screenings and clinical programs.

  8. Migration of the CERN IT Data Centre Support System to ServiceNow

    NASA Astrophysics Data System (ADS)

    Alvarez Alonso, R.; Arneodo, G.; Barring, O.; Bonfillou, E.; Coelho dos Santos, M.; Dore, V.; Lefebure, V.; Fedorko, I.; Grossir, A.; Hefferman, J.; Mendez Lorenzo, P.; Moller, M.; Pera Mira, O.; Salter, W.; Trevisani, F.; Toteva, Z.

    2014-06-01

    The large potential and flexibility of the ServiceNow infrastructure based on "best practises" methods is allowing the migration of some of the ticketing systems traditionally used for the monitoring of the servers and services available at the CERN IT Computer Centre. This migration enables the standardization and globalization of the ticketing and control systems implementing a generic system extensible to other departments and users. One of the activities of the Service Management project together with the Computing Facilities group has been the migration of the ITCM structure based on Remedy to ServiceNow within the context of one of the ITIL processes called Event Management. The experience gained during the first months of operation has been instrumental towards the migration to ServiceNow of other service monitoring systems and databases. The usage of this structure is also extended to the service tracking at the Wigner Centre in Budapest.

  9. Baseline estimation in flame's spectra by using neural networks and robust statistics

    NASA Astrophysics Data System (ADS)

    Garces, Hugo; Arias, Luis; Rojas, Alejandro

    2014-09-01

    This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.

  10. A unified approach to the design of clinical reporting systems.

    PubMed

    Gouveia-Oliveira, A; Salgado, N C; Azevedo, A P; Lopes, L; Raposo, V D; Almeida, I; de Melo, F G

    1994-12-01

    Computer-based Clinical Reporting Systems (CRS) for diagnostic departments that use structured data entry have a number of functional and structural affinities suggesting that a common software architecture for CRS may be defined. Such an architecture should allow easy expandability and reusability of a CRS. We report the development methodology and the architecture of SISCOPE, a CRS originally designed for gastrointestinal endoscopy that is expandable and reusable. Its main components are a patient database, a knowledge base, a reports base, and screen and reporting engines. The knowledge base contains the description of the controlled vocabulary and all the information necessary to control the menu system, and is easily accessed and modified with a conventional text editor. The structure of the controlled vocabulary is formally presented as an entity-relationship diagram. The screen engine drives a dynamic user interface and the reporting engine automatically creates a medical report; both engines operate by following a set of rules and the information contained in the knowledge base. Clinical experience has shown this architecture to be highly flexible and to allow frequent modifications of both the vocabulary and the menu system. This structure provided increased collaboration among development teams, insulating the domain expert from the details of the database, and enabling him to modify the system as necessary and to test the changes immediately. The system has also been reused in several different domains.

  11. Learning-based stochastic object models for use in optimizing imaging systems

    NASA Astrophysics Data System (ADS)

    Dolly, Steven R.; Anastasio, Mark A.; Yu, Lifeng; Li, Hua

    2017-03-01

    It is widely known that the optimization of imaging systems based on objective, or task-based, measures of image quality via computer-simulation requires use of a stochastic object model (SOM). However, the development of computationally tractable SOMs that can accurately model the statistical variations in anatomy within a specified ensemble of patients remains a challenging task. Because they are established by use of image data corresponding a single patient, previously reported numerical anatomical models lack of the ability to accurately model inter- patient variations in anatomy. In certain applications, however, databases of high-quality volumetric images are available that can facilitate this task. In this work, a novel and tractable methodology for learning a SOM from a set of volumetric training images is developed. The proposed method is based upon geometric attribute distribution (GAD) models, which characterize the inter-structural centroid variations and the intra-structural shape variations of each individual anatomical structure. The GAD models are scalable and deformable, and constrained by their respective principal attribute variations learned from training data. By use of the GAD models, random organ shapes and positions can be generated and integrated to form an anatomical phantom. The randomness in organ shape and position will reflect the variability of anatomy present in the training data. To demonstrate the methodology, a SOM corresponding to the pelvis of an adult male was computed and a corresponding ensemble of phantoms was created. Additionally, computer-simulated X-ray projection images corresponding to the phantoms were computed, from which tomographic images were reconstructed.

  12. Clinical nursing informatics. Developing tools for knowledge workers.

    PubMed

    Ozbolt, J G; Graves, J R

    1993-06-01

    Current research in clinical nursing informatics is proceeding along three important dimensions: (1) identifying and defining nursing's language and structuring its data; (2) understanding clinical judgment and how computer-based systems can facilitate and not replace it; and (3) discovering how well-designed systems can transform nursing practice. A number of efforts are underway to find and use language that accurately represents nursing and that can be incorporated into computer-based information systems. These efforts add to understanding nursing problems, interventions, and outcomes, and provide the elements for databases from which nursing's costs and effectiveness can be studied. Research on clinical judgment focuses on how nurses (perhaps with different levels of expertise) assess patient needs, set goals, and plan and deliver care, as well as how computer-based systems can be developed to aid these cognitive processes. Finally, investigators are studying not only how computers can help nurses with the mechanics and logistics of processing information but also and more importantly how access to informatics tools changes nursing care.

  13. A database of new zeolite-like materials.

    PubMed

    Pophale, Ramdas; Cheeseman, Phillip A; Deem, Michael W

    2011-07-21

    We here describe a database of computationally predicted zeolite-like materials. These crystals were discovered by a Monte Carlo search for zeolite-like materials. Positions of Si atoms as well as unit cell, space group, density, and number of crystallographically unique atoms were explored in the construction of this database. The database contains over 2.6 M unique structures. Roughly 15% of these are within +30 kJ mol(-1) Si of α-quartz, the band in which most of the known zeolites lie. These structures have topological, geometrical, and diffraction characteristics that are similar to those of known zeolites. The database is the result of refinement by two interatomic potentials that both satisfy the Pauli exclusion principle. The database has been deposited in the publicly available PCOD database and in www.hypotheticalzeolites.net/database/deem/. This journal is © the Owner Societies 2011

  14. Domain fusion analysis by applying relational algebra to protein sequence and domain databases

    PubMed Central

    Truong, Kevin; Ikura, Mitsuhiko

    2003-01-01

    Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020

  15. High-throughput screening for thermoelectric sulphides by using crystal structure features as descriptors

    NASA Astrophysics Data System (ADS)

    Zhang, Ruizhi; Du, Baoli; Chen, Kan; Reece, Mike; Materials Research Insititute Team

    With the increasing computational power and reliable databases, high-throughput screening is playing a more and more important role in the search of new thermoelectric materials. Rather than the well established density functional theory (DFT) calculation based methods, we propose an alternative approach to screen for new TE materials: using crystal structural features as 'descriptors'. We show that a non-distorted transition metal sulphide polyhedral network can be a good descriptor for high power factor according to crystal filed theory. By using Cu/S containing compounds as an example, 1600+ Cu/S containing entries in the Inorganic Crystal Structure Database (ICSD) were screened, and of those 84 phases are identified as promising thermoelectric materials. The screening results are validated by both electronic structure calculations and experimental results from the literature. We also fabricated some new compounds to test our screening results. Another advantage of using crystal structure features as descriptors is that we can easily establish structural relationships between the identified phases. Based on this, two material design approaches are discussed: 1) High-pressure synthesis of metastable phase; 2) In-situ 2-phase composites with coherent interface. This work was supported by a Marie Curie International Incoming Fellowship of the European Community Human Potential Program.

  16. Integrated Primary Care Information Database (IPCI)

    Cancer.gov

    The Integrated Primary Care Information Database is a longitudinal observational database that was created specifically for pharmacoepidemiological and pharmacoeconomic studies, inlcuding data from computer-based patient records supplied voluntarily by general practitioners.

  17. Exploring Discretization Error in Simulation-Based Aerodynamic Databases

    NASA Technical Reports Server (NTRS)

    Aftosmis, Michael J.; Nemec, Marian

    2010-01-01

    This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretization errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tolerances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simulations using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation highlights sensitivities of the database under a variety of conditions. The presence of transonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh resolution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, controlling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.

  18. Standardized description of scientific evidence using the Evidence Ontology (ECO)

    PubMed Central

    Chibucos, Marcus C.; Mungall, Christopher J.; Balakrishnan, Rama; Christie, Karen R.; Huntley, Rachael P.; White, Owen; Blake, Judith A.; Lewis, Suzanna E.; Giglio, Michelle

    2014-01-01

    The Evidence Ontology (ECO) is a structured, controlled vocabulary for capturing evidence in biological research. ECO includes diverse terms for categorizing evidence that supports annotation assertions including experimental types, computational methods, author statements and curator inferences. Using ECO, annotation assertions can be distinguished according to the evidence they are based on such as those made by curators versus those automatically computed or those made via high-throughput data review versus single test experiments. Originally created for capturing evidence associated with Gene Ontology annotations, ECO is now used in other capacities by many additional annotation resources including UniProt, Mouse Genome Informatics, Saccharomyces Genome Database, PomBase, the Protein Information Resource and others. Information on the development and use of ECO can be found at http://evidenceontology.org. The ontology is freely available under Creative Commons license (CC BY-SA 3.0), and can be downloaded in both Open Biological Ontologies and Web Ontology Language formats at http://code.google.com/p/evidenceontology. Also at this site is a tracker for user submission of term requests and questions. ECO remains under active development in response to user-requested terms and in collaborations with other ontologies and database resources. Database URL: Evidence Ontology Web site: http://evidenceontology.org PMID:25052702

  19. Integration of a neuroimaging processing pipeline into a pan-canadian computing grid

    NASA Astrophysics Data System (ADS)

    Lavoie-Courchesne, S.; Rioux, P.; Chouinard-Decorte, F.; Sherif, T.; Rousseau, M.-E.; Das, S.; Adalat, R.; Doyon, J.; Craddock, C.; Margulies, D.; Chu, C.; Lyttelton, O.; Evans, A. C.; Bellec, P.

    2012-02-01

    The ethos of the neuroimaging field is quickly moving towards the open sharing of resources, including both imaging databases and processing tools. As a neuroimaging database represents a large volume of datasets and as neuroimaging processing pipelines are composed of heterogeneous, computationally intensive tools, such open sharing raises specific computational challenges. This motivates the design of novel dedicated computing infrastructures. This paper describes an interface between PSOM, a code-oriented pipeline development framework, and CBRAIN, a web-oriented platform for grid computing. This interface was used to integrate a PSOM-compliant pipeline for preprocessing of structural and functional magnetic resonance imaging into CBRAIN. We further tested the capacity of our infrastructure to handle a real large-scale project. A neuroimaging database including close to 1000 subjects was preprocessed using our interface and publicly released to help the participants of the ADHD-200 international competition. This successful experiment demonstrated that our integrated grid-computing platform is a powerful solution for high-throughput pipeline analysis in the field of neuroimaging.

  20. Phynx: an open source software solution supporting data management and web-based patient-level data review for drug safety studies in the general practice research database and other health care databases.

    PubMed

    Egbring, Marco; Kullak-Ublick, Gerd A; Russmann, Stefan

    2010-01-01

    To develop a software solution that supports management and clinical review of patient data from electronic medical records databases or claims databases for pharmacoepidemiological drug safety studies. We used open source software to build a data management system and an internet application with a Flex client on a Java application server with a MySQL database backend. The application is hosted on Amazon Elastic Compute Cloud. This solution named Phynx supports data management, Web-based display of electronic patient information, and interactive review of patient-level information in the individual clinical context. This system was applied to a dataset from the UK General Practice Research Database (GPRD). Our solution can be setup and customized with limited programming resources, and there is almost no extra cost for software. Access times are short, the displayed information is structured in chronological order and visually attractive, and selected information such as drug exposure can be blinded. External experts can review patient profiles and save evaluations and comments via a common Web browser. Phynx provides a flexible and economical solution for patient-level review of electronic medical information from databases considering the individual clinical context. It can therefore make an important contribution to an efficient validation of outcome assessment in drug safety database studies.

  1. Coarse-grained modeling of RNA 3D structure.

    PubMed

    Dawson, Wayne K; Maciejczyk, Maciej; Jankowska, Elzbieta J; Bujnicki, Janusz M

    2016-07-01

    Functional RNA molecules depend on three-dimensional (3D) structures to carry out their tasks within the cell. Understanding how these molecules interact to carry out their biological roles requires a detailed knowledge of RNA 3D structure and dynamics as well as thermodynamics, which strongly governs the folding of RNA and RNA-RNA interactions as well as a host of other interactions within the cellular environment. Experimental determination of these properties is difficult, and various computational methods have been developed to model the folding of RNA 3D structures and their interactions with other molecules. However, computational methods also have their limitations, especially when the biological effects demand computation of the dynamics beyond a few hundred nanoseconds. For the researcher confronted with such challenges, a more amenable approach is to resort to coarse-grained modeling to reduce the number of data points and computational demand to a more tractable size, while sacrificing as little critical information as possible. This review presents an introduction to the topic of coarse-grained modeling of RNA 3D structures and dynamics, covering both high- and low-resolution strategies. We discuss how physics-based approaches compare with knowledge based methods that rely on databases of information. In the course of this review, we discuss important aspects in the reasoning process behind building different models and the goals and pitfalls that can result. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms

    NASA Astrophysics Data System (ADS)

    Gao, Connie W.; Allen, Joshua W.; Green, William H.; West, Richard H.

    2016-06-01

    Reaction Mechanism Generator (RMG) constructs kinetic models composed of elementary chemical reaction steps using a general understanding of how molecules react. Species thermochemistry is estimated through Benson group additivity and reaction rate coefficients are estimated using a database of known rate rules and reaction templates. At its core, RMG relies on two fundamental data structures: graphs and trees. Graphs are used to represent chemical structures, and trees are used to represent thermodynamic and kinetic data. Models are generated using a rate-based algorithm which excludes species from the model based on reaction fluxes. RMG can generate reaction mechanisms for species involving carbon, hydrogen, oxygen, sulfur, and nitrogen. It also has capabilities for estimating transport and solvation properties, and it automatically computes pressure-dependent rate coefficients and identifies chemically-activated reaction paths. RMG is an object-oriented program written in Python, which provides a stable, robust programming architecture for developing an extensible and modular code base with a large suite of unit tests. Computationally intensive functions are cythonized for speed improvements.

  3. Selection of examples in case-based computer-aided decision systems

    PubMed Central

    Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.

    2013-01-01

    Case-based computer-aided decision (CB-CAD) systems rely on a database of previously stored, known examples when classifying new, incoming queries. Such systems can be particularly useful since they do not need retraining every time a new example is deposited in the case base. The adaptive nature of case-based systems is well suited to the current trend of continuously expanding digital databases in the medical domain. To maintain efficiency, however, such systems need sophisticated strategies to effectively manage the available evidence database. In this paper, we discuss the general problem of building an evidence database by selecting the most useful examples to store while satisfying existing storage requirements. We evaluate three intelligent techniques for this purpose: genetic algorithm-based selection, greedy selection and random mutation hill climbing. These techniques are compared to a random selection strategy used as the baseline. The study is performed with a previously presented CB-CAD system applied for false positive reduction in screening mammograms. The experimental evaluation shows that when the development goal is to maximize the system’s diagnostic performance, the intelligent techniques are able to reduce the size of the evidence database to 37% of the original database by eliminating superfluous and/or detrimental examples while at the same time significantly improving the CAD system’s performance. Furthermore, if the case-base size is a main concern, the total number of examples stored in the system can be reduced to only 2–4% of the original database without a decrease in the diagnostic performance. Comparison of the techniques shows that random mutation hill climbing provides the best balance between the diagnostic performance and computational efficiency when building the evidence database of the CB-CAD system. PMID:18854606

  4. GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.

    PubMed

    Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun

    2013-01-01

    As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.

  5. Knowledge Representation: A Brief Review.

    ERIC Educational Resources Information Center

    Vickery, B. C.

    1986-01-01

    Reviews different structures and techniques of knowledge representation: structure of database records and files, data structures in computer programming, syntatic and semantic structure of natural language, knowledge representation in artificial intelligence, and models of human memory. A prototype expert system that makes use of some of these…

  6. Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System.

    PubMed

    Liu, Yu; Hong, Yang; Lin, Chun-Yuan; Hung, Che-Lun

    2015-01-01

    The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization technique, and only using the GPU capability to do the SW computations one by one. Hence, in this paper, we will propose an efficient SW alignment method, called CUDA-SWfr, for the protein database search by using the intratask parallelization technique based on a CPU-GPU collaborative system. Before doing the SW computations on GPU, a procedure is applied on CPU by using the frequency distance filtration scheme (FDFS) to eliminate the unnecessary alignments. The experimental results indicate that CUDA-SWfr runs 9.6 times and 96 times faster than the CPU-based SW method without and with FDFS, respectively.

  7. Classification of Chemicals Based On Structured Toxicity ...

    EPA Pesticide Factsheets

    Thirty years and millions of dollars worth of pesticide registration toxicity studies, historically stored as hardcopy and scanned documents, have been digitized into highly standardized and structured toxicity data within the Toxicity Reference Database (ToxRefDB). Toxicity-based classifications of chemicals were performed as a model application of ToxRefDB. These endpoints will ultimately provide the anchoring toxicity information for the development of predictive models and biological signatures utilizing in vitro assay data. Utilizing query and structured data mining approaches, toxicity profiles were uniformly generated for greater than 300 chemicals. Based on observation rate, species concordance and regulatory relevance, individual and aggregated effects have been selected to classify the chemicals providing a set of predictable endpoints. ToxRefDB exhibits the utility of transforming unstructured toxicity data into structured data and, furthermore, into computable outputs, and serves as a model for applying such data to address modern toxicological problems.

  8. Database Design Methodology and Database Management System for Computer-Aided Structural Design Optimization.

    DTIC Science & Technology

    1984-12-01

    52242 Prepared for the AIR FORCE OFFICE OF SCIENTIFIC RESEARCH Under Grant No. AFOSR 82-0322 December 1984 ~ " ’w Unclassified SECURITY CLASSIFICATION4...OF THIS PAGE REPORT DOCUMENTATION PAGE is REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKINGS Unclassified None 20 SECURITY CLASSIFICATION...designer .and computer- are 20 DIiRIBUTION/AVAILABI LIT Y 0P ABSTR4ACT 21 ABSTRACT SECURITY CLASSIFICA1ONr UNCLASSIFIED/UNLIMITED SAME AS APT OTIC USERS

  9. Virtual fragment preparation for computational fragment-based drug design.

    PubMed

    Ludington, Jennifer L

    2015-01-01

    Fragment-based drug design (FBDD) has become an important component of the drug discovery process. The use of fragments can accelerate both the search for a hit molecule and the development of that hit into a lead molecule for clinical testing. In addition to experimental methodologies for FBDD such as NMR and X-ray Crystallography screens, computational techniques are playing an increasingly important role. The success of the computational simulations is due in large part to how the database of virtual fragments is prepared. In order to prepare the fragments appropriately it is necessary to understand how FBDD differs from other approaches and the issues inherent in building up molecules from smaller fragment pieces. The ultimate goal of these calculations is to link two or more simulated fragments into a molecule that has an experimental binding affinity consistent with the additive predicted binding affinities of the virtual fragments. Computationally predicting binding affinities is a complex process, with many opportunities for introducing error. Therefore, care should be taken with the fragment preparation procedure to avoid introducing additional inaccuracies.This chapter is focused on the preparation process used to create a virtual fragment database. Several key issues of fragment preparation which affect the accuracy of binding affinity predictions are discussed. The first issue is the selection of the two-dimensional atomic structure of the virtual fragment. Although the particular usage of the fragment can affect this choice (i.e., whether the fragment will be used for calibration, binding site characterization, hit identification, or lead optimization), general factors such as synthetic accessibility, size, and flexibility are major considerations in selecting the 2D structure. Other aspects of preparing the virtual fragments for simulation are the generation of three-dimensional conformations and the assignment of the associated atomic point charges.

  10. Discovery and study of novel protein tyrosine phosphatase 1B inhibitors

    NASA Astrophysics Data System (ADS)

    Zhang, Qian; Chen, Xi; Feng, Changgen

    2017-10-01

    Protein tyrosine phosphatase 1B (PTP1B) is considered to be a target for therapy of type II diabetes and obesity. So it is of great significance to take advantage of a computer aided drug design protocol involving the structured-based virtual screening with docking simulations for fast searching small molecule PTP1B inhibitors. Based on optimized complex structure of PTP1B bound with specific inhibitor of IX1, structured-based virtual screening against a library of natural products containing 35308 molecules, which was constructed based on Traditional Chinese Medicine database@ Taiwan (TCM database@ Taiwan), was conducted to determine the occurrence of PTP1B inhibitors using the Lubbock module and CDOCKER module from Discovery Studio 3.1 software package. The results were further filtered by predictive ADME simulation and predictive toxic simulation. As a result, 2 good drug-like molecules, namely para-benzoquinone compound 1 and Clavepictine analogue 2 were identified ultimately with the dock score of original inhibitor (IX1) and the receptor as a threshold. Binding model analyses revealed that these two candidate compounds have good interactions with PTP1B. The PTP1B inhibitory activity of compound 2 hasn't been reported before. The optimized compound 2 has higher scores and deserves further study.

  11. sc-PDB-Frag: a database of protein-ligand interaction patterns for Bioisosteric replacements.

    PubMed

    Desaphy, Jérémy; Rognan, Didier

    2014-07-28

    Bioisosteric replacement plays an important role in medicinal chemistry by keeping the biological activity of a molecule while changing either its core scaffold or substituents, thereby facilitating lead optimization and patenting. Bioisosteres are classically chosen in order to keep the main pharmacophoric moieties of the substructure to replace. However, notably when changing a scaffold, no attention is usually paid as whether all atoms of the reference scaffold are equally important for binding to the desired target. We herewith propose a novel database for bioisosteric replacement (scPDBFrag), capitalizing on our recently published structure-based approach to scaffold hopping, focusing on interaction pattern graphs. Protein-bound ligands are first fragmented and the interaction of the corresponding fragments with their protein environment computed-on-the-fly. Using an in-house developed graph alignment tool, interaction patterns graphs can be compared, aligned, and sorted by decreasing similarity to any reference. In the herein presented sc-PDB-Frag database ( http://bioinfo-pharma.u-strasbg.fr/scPDBFrag ), fragments, interaction patterns, alignments, and pairwise similarity scores have been extracted from the sc-PDB database of 8077 druggable protein-ligand complexes and further stored in a relational database. We herewith present the database, its Web implementation, and procedures for identifying true bioisosteric replacements based on conserved interaction patterns.

  12. Information-theoretic CAD system in mammography: Entropy-based indexing for computational efficiency and robust performance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tourassi, Georgia D.; Harrawood, Brian; Singh, Swatee

    2007-08-15

    We have previously presented a knowledge-based computer-assisted detection (KB-CADe) system for the detection of mammographic masses. The system is designed to compare a query mammographic region with mammographic templates of known ground truth. The templates are stored in an adaptive knowledge database. Image similarity is assessed with information theoretic measures (e.g., mutual information) derived directly from the image histograms. A previous study suggested that the diagnostic performance of the system steadily improves as the knowledge database is initially enriched with more templates. However, as the database increases in size, an exhaustive comparison of the query case with each stored templatemore » becomes computationally burdensome. Furthermore, blind storing of new templates may result in redundancies that do not necessarily improve diagnostic performance. To address these concerns we investigated an entropy-based indexing scheme for improving the speed of analysis and for satisfying database storage restrictions without compromising the overall diagnostic performance of our KB-CADe system. The indexing scheme was evaluated on two different datasets as (i) a search mechanism to sort through the knowledge database, and (ii) a selection mechanism to build a smaller, concise knowledge database that is easier to maintain but still effective. There were two important findings in the study. First, entropy-based indexing is an effective strategy to identify fast a subset of templates that are most relevant to a given query. Only this subset could be analyzed in more detail using mutual information for optimized decision making regarding the query. Second, a selective entropy-based deposit strategy may be preferable where only high entropy cases are maintained in the knowledge database. Overall, the proposed entropy-based indexing scheme was shown to reduce the computational cost of our KB-CADe system by 55% to 80% while maintaining the system's diagnostic performance.« less

  13. Drug search for leishmaniasis: a virtual screening approach by grid computing

    NASA Astrophysics Data System (ADS)

    Ochoa, Rodrigo; Watowich, Stanley J.; Flórez, Andrés; Mesa, Carol V.; Robledo, Sara M.; Muskus, Carlos

    2016-07-01

    The trypanosomatid protozoa Leishmania is endemic in 100 countries, with infections causing 2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen 2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.

  14. Drug search for leishmaniasis: a virtual screening approach by grid computing.

    PubMed

    Ochoa, Rodrigo; Watowich, Stanley J; Flórez, Andrés; Mesa, Carol V; Robledo, Sara M; Muskus, Carlos

    2016-07-01

    The trypanosomatid protozoa Leishmania is endemic in ~100 countries, with infections causing ~2 million new cases of leishmaniasis annually. Disease symptoms can include severe skin and mucosal ulcers, fever, anemia, splenomegaly, and death. Unfortunately, therapeutics approved to treat leishmaniasis are associated with potentially severe side effects, including death. Furthermore, drug-resistant Leishmania parasites have developed in most endemic countries. To address an urgent need for new, safe and inexpensive anti-leishmanial drugs, we utilized the IBM World Community Grid to complete computer-based drug discovery screens (Drug Search for Leishmaniasis) using unique leishmanial proteins and a database of 600,000 drug-like small molecules. Protein structures from different Leishmania species were selected for molecular dynamics (MD) simulations, and a series of conformational "snapshots" were chosen from each MD trajectory to simulate the protein's flexibility. A Relaxed Complex Scheme methodology was used to screen ~2000 MD conformations against the small molecule database, producing >1 billion protein-ligand structures. For each protein target, a binding spectrum was calculated to identify compounds predicted to bind with highest average affinity to all protein conformations. Significantly, four different Leishmania protein targets were predicted to strongly bind small molecules, with the strongest binding interactions predicted to occur for dihydroorotate dehydrogenase (LmDHODH; PDB:3MJY). A number of predicted tight-binding LmDHODH inhibitors were tested in vitro and potent selective inhibitors of Leishmania panamensis were identified. These promising small molecules are suitable for further development using iterative structure-based optimization and in vitro/in vivo validation assays.

  15. Computer systems and methods for the query and visualization multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2017-04-25

    A method of generating a data visualization is performed at a computer having a display, one or more processors, and memory. The memory stores one or more programs for execution by the one or more processors. The process receives user specification of a plurality of characteristics of a data visualization. The data visualization is based on data from a multidimensional database. The characteristics specify at least x-position and y-position of data marks corresponding to tuples of data retrieved from the database. The process generates a data visualization according to the specified plurality of characteristics. The data visualization has an x-axis defined based on data for one or more first fields from the database that specify x-position of the data marks and the data visualization has a y-axis defined based on data for one or more second fields from the database that specify y-position of the data marks.

  16. Interdisciplinary analysis procedures in the modeling and control of large space-based structures

    NASA Technical Reports Server (NTRS)

    Cooper, Paul A.; Stockwell, Alan E.; Kim, Zeen C.

    1987-01-01

    The paper describes a computer software system called the Integrated Multidisciplinary Analysis Tool, IMAT, that has been developed at NASA Langley Research Center. IMAT provides researchers and analysts with an efficient capability to analyze satellite control systems influenced by structural dynamics. Using a menu-driven interactive executive program, IMAT links a relational database to commercial structural and controls analysis codes. The paper describes the procedures followed to analyze a complex satellite structure and control system. The codes used to accomplish the analysis are described, and an example is provided of an application of IMAT to the analysis of a reference space station subject to a rectangular pulse loading at its docking port.

  17. SABRE: ligand/structure-based virtual screening approach using consensus molecular-shape pattern recognition.

    PubMed

    Wei, Ning-Ning; Hamza, Adel

    2014-01-27

    We present an efficient and rational ligand/structure shape-based virtual screening approach combining our previous ligand shape-based similarity SABRE (shape-approach-based routines enhanced) and the 3D shape of the receptor binding site. Our approach exploits the pharmacological preferences of a number of known active ligands to take advantage of the structural diversities and chemical similarities, using a linear combination of weighted molecular shape density. Furthermore, the algorithm generates a consensus molecular-shape pattern recognition that is used to filter and place the candidate structure into the binding pocket. The descriptor pool used to construct the consensus molecular-shape pattern consists of four dimensional (4D) fingerprints generated from the distribution of conformer states available to a molecule and the 3D shapes of a set of active ligands computed using SABRE software. The virtual screening efficiency of SABRE was validated using the Database of Useful Decoys (DUD) and the filtered version (WOMBAT) of 10 DUD targets. The ligand/structure shape-based similarity SABRE algorithm outperforms several other widely used virtual screening methods which uses the data fusion of multiscreening tools (2D and 3D fingerprints) and demonstrates a superior early retrieval rate of active compounds (EF(0.1%) = 69.0% and EF(1%) = 98.7%) from a large size of ligand database (∼95,000 structures). Therefore, our developed similarity approach can be of particular use for identifying active compounds that are similar to reference molecules and predicting activity against other targets (chemogenomics). An academic license of the SABRE program is available on request.

  18. The Development of a Korean Drug Dosing Database

    PubMed Central

    Kim, Sun Ah; Kim, Jung Hoon; Jang, Yoo Jin; Jeon, Man Ho; Hwang, Joong Un; Jeong, Young Mi; Choi, Kyung Suk; Lee, Iyn Hyang; Jeon, Jin Ok; Lee, Eun Sook; Lee, Eun Kyung; Kim, Hong Bin; Chin, Ho Jun; Ha, Ji Hye; Kim, Young Hoon

    2011-01-01

    Objectives This report describes the development process of a drug dosing database for ethical drugs approved by the Korea Food & Drug Administration (KFDA). The goal of this study was to develop a computerized system that supports physicians' prescribing decisions, particularly in regards to medication dosing. Methods The advisory committee, comprised of doctors, pharmacists, and nurses from the Seoul National University Bundang Hospital, pharmacists familiar with drug databases, KFDA officials, and software developers from the BIT Computer Co. Ltd. analyzed approved KFDA drug dosing information, defined the fields and properties of the information structure, and designed a management program used to enter dosing information. The management program was developed using a web based system that allows multiple researchers to input drug dosing information in an organized manner. The whole process was improved by adding additional input fields and eliminating the unnecessary existing fields used when the dosing information was entered, resulting in an improved field structure. Results A total of 16,994 drugs sold in the Korean market in July 2009, excluding the exclusion criteria (e.g., radioactivity drugs, X-ray contrast medium), usage and dosing information were made into a database. Conclusions The drug dosing database was successfully developed and the dosing information for new drugs can be continually maintained through the management mode. This database will be used to develop the drug utilization review standards and to provide appropriate dosing information. PMID:22259729

  19. Rigid-Docking Approaches to Explore Protein-Protein Interaction Space.

    PubMed

    Matsuzaki, Yuri; Uchikoga, Nobuyuki; Ohue, Masahito; Akiyama, Yutaka

    Protein-protein interactions play core roles in living cells, especially in the regulatory systems. As information on proteins has rapidly accumulated on publicly available databases, much effort has been made to obtain a better picture of protein-protein interaction networks using protein tertiary structure data. Predicting relevant interacting partners from their tertiary structure is a challenging task and computer science methods have the potential to assist with this. Protein-protein rigid docking has been utilized by several projects, docking-based approaches having the advantages that they can suggest binding poses of predicted binding partners which would help in understanding the interaction mechanisms and that comparing docking results of both non-binders and binders can lead to understanding the specificity of protein-protein interactions from structural viewpoints. In this review we focus on explaining current computational prediction methods to predict pairwise direct protein-protein interactions that form protein complexes.

  20. FPGA accelerator for protein secondary structure prediction based on the GOR algorithm

    PubMed Central

    2011-01-01

    Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582

  1. IMPPAT: A curated database of Indian Medicinal Plants, Phytochemistry And Therapeutics.

    PubMed

    Mohanraj, Karthikeyan; Karthikeyan, Bagavathy Shanmugam; Vivek-Ananth, R P; Chand, R P Bharath; Aparna, S R; Mangalapandi, Pattulingam; Samal, Areejit

    2018-03-12

    Phytochemicals of medicinal plants encompass a diverse chemical space for drug discovery. India is rich with a flora of indigenous medicinal plants that have been used for centuries in traditional Indian medicine to treat human maladies. A comprehensive online database on the phytochemistry of Indian medicinal plants will enable computational approaches towards natural product based drug discovery. In this direction, we present, IMPPAT, a manually curated database of 1742 Indian Medicinal Plants, 9596 Phytochemicals, And 1124 Therapeutic uses spanning 27074 plant-phytochemical associations and 11514 plant-therapeutic associations. Notably, the curation effort led to a non-redundant in silico library of 9596 phytochemicals with standard chemical identifiers and structure information. Using cheminformatic approaches, we have computed the physicochemical, ADMET (absorption, distribution, metabolism, excretion, toxicity) and drug-likeliness properties of the IMPPAT phytochemicals. We show that the stereochemical complexity and shape complexity of IMPPAT phytochemicals differ from libraries of commercial compounds or diversity-oriented synthesis compounds while being similar to other libraries of natural products. Within IMPPAT, we have filtered a subset of 960 potential druggable phytochemicals, of which majority have no significant similarity to existing FDA approved drugs, and thus, rendering them as good candidates for prospective drugs. IMPPAT database is openly accessible at: https://cb.imsc.res.in/imppat .

  2. Network-based drug discovery by integrating systems biology and computational technologies

    PubMed Central

    Leung, Elaine L.; Cao, Zhi-Wei; Jiang, Zhi-Hong; Zhou, Hua

    2013-01-01

    Network-based intervention has been a trend of curing systemic diseases, but it relies on regimen optimization and valid multi-target actions of the drugs. The complex multi-component nature of medicinal herbs may serve as valuable resources for network-based multi-target drug discovery due to its potential treatment effects by synergy. Recently, robustness of multiple systems biology platforms shows powerful to uncover molecular mechanisms and connections between the drugs and their targeting dynamic network. However, optimization methods of drug combination are insufficient, owning to lacking of tighter integration across multiple ‘-omics’ databases. The newly developed algorithm- or network-based computational models can tightly integrate ‘-omics’ databases and optimize combinational regimens of drug development, which encourage using medicinal herbs to develop into new wave of network-based multi-target drugs. However, challenges on further integration across the databases of medicinal herbs with multiple system biology platforms for multi-target drug optimization remain to the uncertain reliability of individual data sets, width and depth and degree of standardization of herbal medicine. Standardization of the methodology and terminology of multiple system biology and herbal database would facilitate the integration. Enhance public accessible databases and the number of research using system biology platform on herbal medicine would be helpful. Further integration across various ‘-omics’ platforms and computational tools would accelerate development of network-based drug discovery and network medicine. PMID:22877768

  3. Aero/fluids database system

    NASA Technical Reports Server (NTRS)

    Reardon, John E.; Violett, Duane L., Jr.

    1991-01-01

    The AFAS Database System was developed to provide the basic structure of a comprehensive database system for the Marshall Space Flight Center (MSFC) Structures and Dynamics Laboratory Aerophysics Division. The system is intended to handle all of the Aerophysics Division Test Facilities as well as data from other sources. The system was written for the DEC VAX family of computers in FORTRAN-77 and utilizes the VMS indexed file system and screen management routines. Various aspects of the system are covered, including a description of the user interface, lists of all code structure elements, descriptions of the file structures, a description of the security system operation, a detailed description of the data retrieval tasks, a description of the session log, and a description of the archival system.

  4. Design document for the Surface Currents Data Base (SCDB) Management System (SCDBMS), version 1.0

    NASA Technical Reports Server (NTRS)

    Krisnnamagaru, Ramesh; Cesario, Cheryl; Foster, M. S.; Das, Vishnumohan

    1994-01-01

    The Surface Currents Database Management System (SCDBMS) provides access to the Surface Currents Data Base (SCDB) which is maintained by the Naval Oceanographic Office (NAVOCEANO). The SCDBMS incorporates database technology in providing seamless access to surface current data. The SCDBMS is an interactive software application with a graphical user interface (GUI) that supports user control of SCDBMS functional capabilities. The purpose of this document is to define and describe the structural framework and logistical design of the software components/units which are integrated into the major computer software configuration item (CSCI) identified as the SCDBMS, Version 1.0. The preliminary design is based on functional specifications and requirements identified in the governing Statement of Work prepared by the Naval Oceanographic Office (NAVOCEANO) and distributed as a request for proposal by the National Aeronautics and Space Administration (NASA).

  5. Public Databases Supporting Computational Toxicology

    EPA Science Inventory

    A major goal of the emerging field of computational toxicology is the development of screening-level models that predict potential toxicity of chemicals from a combination of mechanistic in vitro assay data and chemical structure descriptors. In order to build these models, resea...

  6. Identifying essential proteins based on sub-network partition and prioritization by integrating subcellular localization information.

    PubMed

    Li, Min; Li, Wenkai; Wu, Fang-Xiang; Pan, Yi; Wang, Jianxin

    2018-06-14

    Essential proteins are important participants in various life activities and play a vital role in the survival and reproduction of living organisms. Identification of essential proteins from protein-protein interaction (PPI) networks has great significance to facilitate the study of human complex diseases, the design of drugs and the development of bioinformatics and computational science. Studies have shown that highly connected proteins in a PPI network tend to be essential. A series of computational methods have been proposed to identify essential proteins by analyzing topological structures of PPI networks. However, the high noise in the PPI data can degrade the accuracy of essential protein prediction. Moreover, proteins must be located in the appropriate subcellular localization to perform their functions, and only when the proteins are located in the same subcellular localization, it is possible that they can interact with each other. In this paper, we propose a new network-based essential protein discovery method based on sub-network partition and prioritization by integrating subcellular localization information, named SPP. The proposed method SPP was tested on two different yeast PPI networks obtained from DIP database and BioGRID database. The experimental results show that SPP can effectively reduce the effect of false positives in PPI networks and predict essential proteins more accurately compared with other existing computational methods DC, BC, CC, SC, EC, IC, NC. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Automatic segmentation of solitary pulmonary nodules based on local intensity structure analysis and 3D neighborhood features in 3D chest CT images

    NASA Astrophysics Data System (ADS)

    Chen, Bin; Kitasaka, Takayuki; Honma, Hirotoshi; Takabatake, Hirotsugu; Mori, Masaki; Natori, Hiroshi; Mori, Kensaku

    2012-03-01

    This paper presents a solitary pulmonary nodule (SPN) segmentation method based on local intensity structure analysis and neighborhood feature analysis in chest CT images. Automated segmentation of SPNs is desirable for a chest computer-aided detection/diagnosis (CAS) system since a SPN may indicate early stage of lung cancer. Due to the similar intensities of SPNs and other chest structures such as blood vessels, many false positives (FPs) are generated by nodule detection methods. To reduce such FPs, we introduce two features that analyze the relation between each segmented nodule candidate and it neighborhood region. The proposed method utilizes a blob-like structure enhancement (BSE) filter based on Hessian analysis to augment the blob-like structures as initial nodule candidates. Then a fine segmentation is performed to segment much more accurate region of each nodule candidate. FP reduction is mainly addressed by investigating two neighborhood features based on volume ratio and eigenvector of Hessian that are calculates from the neighborhood region of each nodule candidate. We evaluated the proposed method by using 40 chest CT images, include 20 standard-dose CT images that we randomly chosen from a local database and 20 low-dose CT images that were randomly chosen from a public database: LIDC. The experimental results revealed that the average TP rate of proposed method was 93.6% with 12.3 FPs/case.

  8. The Cambridge Structural Database

    PubMed Central

    Groom, Colin R.; Bruno, Ian J.; Lightfoot, Matthew P.; Ward, Suzanna C.

    2016-01-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal–organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface. PMID:27048719

  9. The Cambridge Structural Database.

    PubMed

    Groom, Colin R; Bruno, Ian J; Lightfoot, Matthew P; Ward, Suzanna C

    2016-04-01

    The Cambridge Structural Database (CSD) contains a complete record of all published organic and metal-organic small-molecule crystal structures. The database has been in operation for over 50 years and continues to be the primary means of sharing structural chemistry data and knowledge across disciplines. As well as structures that are made public to support scientific articles, it includes many structures published directly as CSD Communications. All structures are processed both computationally and by expert structural chemistry editors prior to entering the database. A key component of this processing is the reliable association of the chemical identity of the structure studied with the experimental data. This important step helps ensure that data is widely discoverable and readily reusable. Content is further enriched through selective inclusion of additional experimental data. Entries are available to anyone through free CSD community web services. Linking services developed and maintained by the CCDC, combined with the use of standard identifiers, facilitate discovery from other resources. Data can also be accessed through CCDC and third party software applications and through an application programming interface.

  10. Domain fusion analysis by applying relational algebra to protein sequence and domain databases.

    PubMed

    Truong, Kevin; Ikura, Mitsuhiko

    2003-05-06

    Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.

  11. The Effect of Relational Database Technology on Administrative Computing at Carnegie Mellon University.

    ERIC Educational Resources Information Center

    Golden, Cynthia; Eisenberger, Dorit

    1990-01-01

    Carnegie Mellon University's decision to standardize its administrative system development efforts on relational database technology and structured query language is discussed and its impact is examined in one of its larger, more widely used applications, the university information system. Advantages, new responsibilities, and challenges of the…

  12. SPIRES Tailored to a Special Library: A Mainframe Answer for a Small Online Catalog.

    ERIC Educational Resources Information Center

    Newton, Mary

    1989-01-01

    Describes the design and functions of a technical library database maintained on a mainframe computer and supported by the SPIRES database management system. The topics covered include record structures, vocabulary control, input procedures, searching features, time considerations, and cost effectiveness. (three references) (CLB)

  13. Ontology based heterogeneous materials database integration and semantic query

    NASA Astrophysics Data System (ADS)

    Zhao, Shuai; Qian, Quan

    2017-10-01

    Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.

  14. Analogs of methyllycaconitine as novel noncompetitive inhibitors of nicotinic receptors: pharmacological characterization, computational modeling, and pharmacophore development.

    PubMed

    McKay, Dennis B; Chang, Cheng; González-Cestari, Tatiana F; McKay, Susan B; El-Hajj, Raed A; Bryant, Darrell L; Zhu, Michael X; Swaan, Peter W; Arason, Kristjan M; Pulipaka, Aravinda B; Orac, Crina M; Bergmeier, Stephen C

    2007-05-01

    As a novel approach to drug discovery involving neuronal nicotinic acetylcholine receptors (nAChRs), our laboratory targeted nonagonist binding sites (i.e., noncompetitive binding sites, negative allosteric binding sites) located on nAChRs. Cultured bovine adrenal cells were used as neuronal models to investigate interactions of 67 analogs of methyllycaconitine (MLA) on native alpha3beta4* nAChRs. The availability of large numbers of structurally related molecules presents a unique opportunity for the development of pharmacophore models for noncompetitive binding sites. Our MLA analogs inhibited nicotine-mediated functional activation of both native and recombinant alpha3beta4* nAChRs with a wide range of IC(50) values (0.9-115 microM). These analogs had little or no inhibitory effects on agonist binding to native or recombinant nAChRs, supporting noncompetitive inhibitory activity. Based on these data, two highly predictive 3D quantitative structure-activity relationship (comparative molecular field analysis and comparative molecular similarity index analysis) models were generated. These computational models were successfully validated and provided insights into the molecular interactions of MLA analogs with nAChRs. In addition, a pharmacophore model was constructed to analyze and visualize the binding requirements to the analog binding site. The pharmacophore model was subsequently applied to search structurally diverse molecular databases to prospectively identify novel inhibitors. The rapid identification of eight molecules from database mining and our successful demonstration of in vitro inhibitory activity support the utility of these computational models as novel tools for the efficient retrieval of inhibitors. These results demonstrate the effectiveness of computational modeling and pharmacophore development, which may lead to the identification of new therapeutic drugs that target novel sites on nAChRs.

  15. A protein relational database and protein family knowledge bases to facilitate structure-based design analyses.

    PubMed

    Mobilio, Dominick; Walker, Gary; Brooijmans, Natasja; Nilakantan, Ramaswamy; Denny, R Aldrin; Dejoannis, Jason; Feyfant, Eric; Kowticwar, Rupesh K; Mankala, Jyoti; Palli, Satish; Punyamantula, Sairam; Tatipally, Maneesh; John, Reji K; Humblet, Christine

    2010-08-01

    The Protein Data Bank is the most comprehensive source of experimental macromolecular structures. It can, however, be difficult at times to locate relevant structures with the Protein Data Bank search interface. This is particularly true when searching for complexes containing specific interactions between protein and ligand atoms. Moreover, searching within a family of proteins can be tedious. For example, one cannot search for some conserved residue as residue numbers vary across structures. We describe herein three databases, Protein Relational Database, Kinase Knowledge Base, and Matrix Metalloproteinase Knowledge Base, containing protein structures from the Protein Data Bank. In Protein Relational Database, atom-atom distances between protein and ligand have been precalculated allowing for millisecond retrieval based on atom identity and distance constraints. Ring centroids, centroid-centroid and centroid-atom distances and angles have also been included permitting queries for pi-stacking interactions and other structural motifs involving rings. Other geometric features can be searched through the inclusion of residue pair and triplet distances. In Kinase Knowledge Base and Matrix Metalloproteinase Knowledge Base, the catalytic domains have been aligned into common residue numbering schemes. Thus, by searching across Protein Relational Database and Kinase Knowledge Base, one can easily retrieve structures wherein, for example, a ligand of interest is making contact with the gatekeeper residue.

  16. The Research of Computer Aided Farm Machinery Designing Method Based on Ergonomics

    NASA Astrophysics Data System (ADS)

    Gao, Xiyin; Li, Xinling; Song, Qiang; Zheng, Ying

    Along with agricultural economy development, the farm machinery product type Increases gradually, the ergonomics question is also getting more and more prominent. The widespread application of computer aided machinery design makes it possible that farm machinery design is intuitive, flexible and convenient. At present, because the developed computer aided ergonomics software has not suitable human body database, which is needed in view of farm machinery design in China, the farm machinery design have deviation in ergonomics analysis. This article puts forward that using the open database interface procedure in CATIA to establish human body database which aims at the farm machinery design, and reading the human body data to ergonomics module of CATIA can product practical application virtual body, using human posture analysis and human activity analysis module to analysis the ergonomics in farm machinery, thus computer aided farm machinery designing method based on engineering can be realized.

  17. Word aligned bitmap compression method, data structure, and apparatus

    DOEpatents

    Wu, Kesheng; Shoshani, Arie; Otoo, Ekow

    2004-12-14

    The Word-Aligned Hybrid (WAH) bitmap compression method and data structure is a relatively efficient method for searching and performing logical, counting, and pattern location operations upon large datasets. The technique is comprised of a data structure and methods that are optimized for computational efficiency by using the WAH compression method, which typically takes advantage of the target computing system's native word length. WAH is particularly apropos to infrequently varying databases, including those found in the on-line analytical processing (OLAP) industry, due to the increased computational efficiency of the WAH compressed bitmap index. Some commercial database products already include some version of a bitmap index, which could possibly be replaced by the WAH bitmap compression techniques for potentially increased operation speed, as well as increased efficiencies in constructing compressed bitmaps. Combined together, this technique may be particularly useful for real-time business intelligence. Additional WAH applications may include scientific modeling, such as climate and combustion simulations, to minimize search time for analysis and subsequent data visualization.

  18. Data-driven discovery of new Dirac semimetal materials

    NASA Astrophysics Data System (ADS)

    Yan, Qimin; Chen, Ru; Neaton, Jeffrey

    In recent years, a significant amount of materials property data from high-throughput computations based on density functional theory (DFT) and the application of database technologies have enabled the rise of data-driven materials discovery. In this work, we initiate the extension of the data-driven materials discovery framework to the realm of topological semimetal materials and to accelerate the discovery of novel Dirac semimetals. We implement current available and develop new workflows to data-mine the Materials Project database for novel Dirac semimetals with desirable band structures and symmetry protected topological properties. This data-driven effort relies on the successful development of several automatic data generation and analysis tools, including a workflow for the automatic identification of topological invariants and pattern recognition techniques to find specific features in a massive number of computed band structures. Utilizing this approach, we successfully identified more than 15 novel Dirac point and Dirac nodal line systems that have not been theoretically predicted or experimentally identified. This work is supported by the Materials Project Predictive Modeling Center through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231.

  19. Bioisostere Identification by Determining the Amino Acid Binding Preferences of Common Chemical Fragments.

    PubMed

    Sato, Tomohiro; Hashimoto, Noriaki; Honma, Teruki

    2017-12-26

    To assist in the structural optimization of hit/lead compounds during drug discovery, various computational approaches to identify potentially useful bioisosteric conversions have been reported. Here, the preference of chemical fragments to hydrogen bonds with specific amino acid residues was used to identify potential bioisosteric conversions. We first compiled a data set of chemical fragments frequently occurring in complex structures contained in the Protein Data Bank. We then used a computational approach to determine the amino acids to which these chemical fragments most frequently hydrogen bonded. The results of the frequency analysis were used to hierarchically cluster chemical fragments according to their amino acid preferences. The Euclid distance between amino acid preferences of chemical fragments for hydrogen bonding was then compared to MMP information in the ChEMBL database. To demonstrate the applicability of the approach for compound optimization, the similarity of amino acid preferences was used to identify known bioisosteric conversions of the epidermal growth factor receptor inhibitor gefitinib. The amino acid preference distance successfully detected bioisosteric fragments corresponding to the morpholine ring in gefitinib with a higher ROC score compared to those based on topological similarity of substituents and frequency of MMP in the ChEMBL database.

  20. G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.

    PubMed

    Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H

    2009-01-01

    Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.

  1. Discovering H-bonding rules in crystals with inductive logic programming.

    PubMed

    Ando, Howard Y; Dehaspe, Luc; Luyten, Walter; Van Craenenbroeck, Elke; Vandecasteele, Henk; Van Meervelt, Luc

    2006-01-01

    In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these schemes with rules that predict H-bonding in crystals from 2D structural information only. Modern computational power and the advances in inductive logic programming (ILP) can now provide computational chemistry with the opportunity for extracting structure-specific rules from large databases that can be incorporated into expert systems. ILP technology is here applied to H-bonding in crystals to develop a self-extracting expert system utilizing data in the Cambridge Structural Database of small molecule crystal structures. A clear increase in performance was observed when the ILP system DMax was allowed to refer to the local structural environment of the possible H-bond donor/acceptor pairs. This ability distinguishes ILP from more traditional approaches that build rules on the basis of global molecular properties.

  2. Definition and maintenance of a telemetry database dictionary

    NASA Technical Reports Server (NTRS)

    Knopf, William P. (Inventor)

    2007-01-01

    A telemetry dictionary database includes a component for receiving spreadsheet workbooks of telemetry data over a web-based interface from other computer devices. Another component routes the spreadsheet workbooks to a specified directory on the host processing device. A process then checks the received spreadsheet workbooks for errors, and if no errors are detected the spreadsheet workbooks are routed to another directory to await initiation of a remote database loading process. The loading process first converts the spreadsheet workbooks to comma separated value (CSV) files. Next, a network connection with the computer system that hosts the telemetry dictionary database is established and the CSV files are ported to the computer system that hosts the telemetry dictionary database. This is followed by a remote initiation of a database loading program. Upon completion of loading a flatfile generation program is manually initiated to generate a flatfile to be used in a mission operations environment by the core ground system.

  3. Secondary Structure Predictions for Long RNA Sequences Based on Inversion Excursions and MapReduce.

    PubMed

    Yehdego, Daniel T; Zhang, Boyu; Kodimala, Vikram K R; Johnson, Kyle L; Taufer, Michela; Leung, Ming-Ying

    2013-05-01

    Secondary structures of ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Experimental observations and computing limitations suggest that we can approach the secondary structure prediction problem for long RNA sequences by segmenting them into shorter chunks, predicting the secondary structures of each chunk individually using existing prediction programs, and then assembling the results to give the structure of the original sequence. The selection of cutting points is a crucial component of the segmenting step. Noting that stem-loops and pseudoknots always contain an inversion, i.e., a stretch of nucleotides followed closely by its inverse complementary sequence, we developed two cutting methods for segmenting long RNA sequences based on inversion excursions: the centered and optimized method. Each step of searching for inversions, chunking, and predictions can be performed in parallel. In this paper we use a MapReduce framework, i.e., Hadoop, to extensively explore meaningful inversion stem lengths and gap sizes for the segmentation and identify correlations between chunking methods and prediction accuracy. We show that for a set of long RNA sequences in the RFAM database, whose secondary structures are known to contain pseudoknots, our approach predicts secondary structures more accurately than methods that do not segment the sequence, when the latter predictions are possible computationally. We also show that, as sequences exceed certain lengths, some programs cannot computationally predict pseudoknots while our chunking methods can. Overall, our predicted structures still retain the accuracy level of the original prediction programs when compared with known experimental secondary structure.

  4. Database for CO2 Separation Performances of MOFs Based on Computational Materials Screening.

    PubMed

    Altintas, Cigdem; Avci, Gokay; Daglar, Hilal; Nemati Vesali Azar, Ayda; Velioglu, Sadiye; Erucar, Ilknur; Keskin, Seda

    2018-05-23

    Metal-organic frameworks (MOFs) are potential adsorbents for CO 2 capture. Because thousands of MOFs exist, computational studies become very useful in identifying the top performing materials for target applications in a time-effective manner. In this study, molecular simulations were performed to screen the MOF database to identify the best materials for CO 2 separation from flue gas (CO 2 /N 2 ) and landfill gas (CO 2 /CH 4 ) under realistic operating conditions. We validated the accuracy of our computational approach by comparing the simulation results for the CO 2 uptakes, CO 2 /N 2 and CO 2 /CH 4 selectivities of various types of MOFs with the available experimental data. Binary CO 2 /N 2 and CO 2 /CH 4 mixture adsorption data were then calculated for the entire MOF database. These data were then used to predict selectivity, working capacity, regenerability, and separation potential of MOFs. The top performing MOF adsorbents that can separate CO 2 /N 2 and CO 2 /CH 4 with high performance were identified. Molecular simulations for the adsorption of a ternary CO 2 /N 2 /CH 4 mixture were performed for these top materials to provide a more realistic performance assessment of MOF adsorbents. The structure-performance analysis showed that MOFs with Δ Q st 0 > 30 kJ/mol, 3.8 Å < pore-limiting diameter < 5 Å, 5 Å < largest cavity diameter < 7.5 Å, 0.5 < ϕ < 0.75, surface area < 1000 m 2 /g, and ρ > 1 g/cm 3 are the best candidates for selective separation of CO 2 from flue gas and landfill gas. This information will be very useful to design novel MOFs exhibiting high CO 2 separation potentials. Finally, an online, freely accessible database https://cosmoserc.ku.edu.tr was established, for the first time in the literature, which reports all of the computed adsorbent metrics of 3816 MOFs for CO 2 /N 2 , CO 2 /CH 4 , and CO 2 /N 2 /CH 4 separations in addition to various structural properties of MOFs.

  5. Building Parts Inventory Files Using the AppleWorks Data Base Subprogram and Apple IIe or GS Computers.

    ERIC Educational Resources Information Center

    Schlenker, Richard M.

    This manual is a "how to" training device for building database files using the AppleWorks program with an Apple IIe or Apple IIGS Computer with Duodisk or two disk drives and an 80-column card. The manual provides step-by-step directions, and includes 25 figures depicting the computer screen at the various stages of the database file…

  6. Bias-Free Chemically Diverse Test Sets from Machine Learning.

    PubMed

    Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S

    2017-08-14

    Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.

  7. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.

    PubMed

    Liu, Yongchao; Wirawan, Adrianto; Schmidt, Bertil

    2013-04-04

    The maximal sensitivity for local alignments makes the Smith-Waterman algorithm a popular choice for protein sequence database search based on pairwise alignment. However, the algorithm is compute-intensive due to a quadratic time complexity. Corresponding runtimes are further compounded by the rapid growth of sequence databases. We present CUDASW++ 3.0, a fast Smith-Waterman protein database search algorithm, which couples CPU and GPU SIMD instructions and carries out concurrent CPU and GPU computations. For the CPU computation, this algorithm employs SSE-based vector execution units as accelerators. For the GPU computation, we have investigated for the first time a GPU SIMD parallelization, which employs CUDA PTX SIMD video instructions to gain more data parallelism beyond the SIMT execution model. Moreover, sequence alignment workloads are automatically distributed over CPUs and GPUs based on their respective compute capabilities. Evaluation on the Swiss-Prot database shows that CUDASW++ 3.0 gains a performance improvement over CUDASW++ 2.0 up to 2.9 and 3.2, with a maximum performance of 119.0 and 185.6 GCUPS, on a single-GPU GeForce GTX 680 and a dual-GPU GeForce GTX 690 graphics card, respectively. In addition, our algorithm has demonstrated significant speedups over other top-performing tools: SWIPE and BLAST+. CUDASW++ 3.0 is written in CUDA C++ and PTX assembly languages, targeting GPUs based on the Kepler architecture. This algorithm obtains significant speedups over its predecessor: CUDASW++ 2.0, by benefiting from the use of CPU and GPU SIMD instructions as well as the concurrent execution on CPUs and GPUs. The source code and the simulated data are available at http://cudasw.sourceforge.net.

  8. IMAT (Integrated Multidisciplinary Analysis Tool) user's guide for the VAX/VMS computer

    NASA Technical Reports Server (NTRS)

    Meissner, Frances T. (Editor)

    1988-01-01

    The Integrated Multidisciplinary Analysis Tool (IMAT) is a computer software system for the VAX/VMS computer developed at the Langley Research Center. IMAT provides researchers and analysts with an efficient capability to analyze satellite control systems influenced by structural dynamics. Using a menu-driven executive system, IMAT leads the user through the program options. IMAT links a relational database manager to commercial and in-house structural and controls analysis codes. This paper describes the IMAT software system and how to use it.

  9. Combining computational models, semantic annotations and simulation experiments in a graph database

    PubMed Central

    Henkel, Ron; Wolkenhauer, Olaf; Waltemath, Dagmar

    2015-01-01

    Model repositories such as the BioModels Database, the CellML Model Repository or JWS Online are frequently accessed to retrieve computational models of biological systems. However, their storage concepts support only restricted types of queries and not all data inside the repositories can be retrieved. In this article we present a storage concept that meets this challenge. It grounds on a graph database, reflects the models’ structure, incorporates semantic annotations and simulation descriptions and ultimately connects different types of model-related data. The connections between heterogeneous model-related data and bio-ontologies enable efficient search via biological facts and grant access to new model features. The introduced concept notably improves the access of computational models and associated simulations in a model repository. This has positive effects on tasks such as model search, retrieval, ranking, matching and filtering. Furthermore, our work for the first time enables CellML- and Systems Biology Markup Language-encoded models to be effectively maintained in one database. We show how these models can be linked via annotations and queried. Database URL: https://sems.uni-rostock.de/projects/masymos/ PMID:25754863

  10. Protein structure database search and evolutionary classification.

    PubMed

    Yang, Jinn-Moon; Tung, Chi-Hua

    2006-01-01

    As more protein structures become available and structural genomics efforts provide structural models in a genome-wide strategy, there is a growing need for fast and accurate methods for discovering homologous proteins and evolutionary classifications of newly determined structures. We have developed 3D-BLAST, in part, to address these issues. 3D-BLAST is as fast as BLAST and calculates the statistical significance (E-value) of an alignment to indicate the reliability of the prediction. Using this method, we first identified 23 states of the structural alphabet that represent pattern profiles of the backbone fragments and then used them to represent protein structure databases as structural alphabet sequence databases (SADB). Our method enhanced BLAST as a search method, using a new structural alphabet substitution matrix (SASM) to find the longest common substructures with high-scoring structured segment pairs from an SADB database. Using personal computers with Intel Pentium4 (2.8 GHz) processors, our method searched more than 10 000 protein structures in 1.3 s and achieved a good agreement with search results from detailed structure alignment methods. [3D-BLAST is available at http://3d-blast.life.nctu.edu.tw].

  11. Chemical Space: Big Data Challenge for Molecular Diversity.

    PubMed

    Awale, Mahendra; Visini, Ricardo; Probst, Daniel; Arús-Pous, Josep; Reymond, Jean-Louis

    2017-10-25

    Chemical space describes all possible molecules as well as multi-dimensional conceptual spaces representing the structural diversity of these molecules. Part of this chemical space is available in public databases ranging from thousands to billions of compounds. Exploiting these databases for drug discovery represents a typical big data problem limited by computational power, data storage and data access capacity. Here we review recent developments of our laboratory, including progress in the chemical universe databases (GDB) and the fragment subset FDB-17, tools for ligand-based virtual screening by nearest neighbor searches, such as our multi-fingerprint browser for the ZINC database to select purchasable screening compounds, and their application to discover potent and selective inhibitors for calcium channel TRPV6 and Aurora A kinase, the polypharmacology browser (PPB) for predicting off-target effects, and finally interactive 3D-chemical space visualization using our online tools WebDrugCS and WebMolCS. All resources described in this paper are available for public use at www.gdb.unibe.ch.

  12. Teaching Three-Dimensional Structural Chemistry Using Crystal Structure Databases. 4. Examples of Discovery-Based Learning Using the Complete Cambridge Structural Database

    ERIC Educational Resources Information Center

    Battle, Gary M.; Allen, Frank H.; Ferrence, Gregory M.

    2011-01-01

    Parts 1 and 2 of this series described the educational value of experimental three-dimensional (3D) chemical structures determined by X-ray crystallography and retrieved from the crystallographic databases. In part 1, we described the information content of the Cambridge Structural Database (CSD) and discussed a representative teaching subset of…

  13. Method of locating related items in a geometric space for data mining

    DOEpatents

    Hendrickson, B.A.

    1999-07-27

    A method for locating related items in a geometric space transforms relationships among items to geometric locations. The method locates items in the geometric space so that the distance between items corresponds to the degree of relatedness. The method facilitates communication of the structure of the relationships among the items. The method is especially beneficial for communicating databases with many items, and with non-regular relationship patterns. Examples of such databases include databases containing items such as scientific papers or patents, related by citations or keywords. A computer system adapted for practice of the present invention can include a processor, a storage subsystem, a display device, and computer software to direct the location and display of the entities. The method comprises assigning numeric values as a measure of similarity between each pairing of items. A matrix is constructed, based on the numeric values. The eigenvectors and eigenvalues of the matrix are determined. Each item is located in the geometric space at coordinates determined from the eigenvectors and eigenvalues. Proper construction of the matrix and proper determination of coordinates from eigenvectors can ensure that distance between items in the geometric space is representative of the numeric value measure of the items' similarity. 12 figs.

  14. Method of locating related items in a geometric space for data mining

    DOEpatents

    Hendrickson, Bruce A.

    1999-01-01

    A method for locating related items in a geometric space transforms relationships among items to geometric locations. The method locates items in the geometric space so that the distance between items corresponds to the degree of relatedness. The method facilitates communication of the structure of the relationships among the items. The method is especially beneficial for communicating databases with many items, and with non-regular relationship patterns. Examples of such databases include databases containing items such as scientific papers or patents, related by citations or keywords. A computer system adapted for practice of the present invention can include a processor, a storage subsystem, a display device, and computer software to direct the location and display of the entities. The method comprises assigning numeric values as a measure of similarity between each pairing of items. A matrix is constructed, based on the numeric values. The eigenvectors and eigenvalues of the matrix are determined. Each item is located in the geometric space at coordinates determined from the eigenvectors and eigenvalues. Proper construction of the matrix and proper determination of coordinates from eigenvectors can ensure that distance between items in the geometric space is representative of the numeric value measure of the items' similarity.

  15. Heterogeneous Biomedical Database Integration Using a Hybrid Strategy: A p53 Cantcer Research Database

    PubMed Central

    Bichutskiy, Vadim Y.; Colman, Richard; Brachmann, Rainer K.; Lathrop, Richard H.

    2006-01-01

    Complex problems in life science research give rise to multidisciplinary collaboration, and hence, to the need for heterogeneous database integration. The tumor suppressor p53 is mutated in close to 50% of human cancers, and a small drug-like molecule with the ability to restore native function to cancerous p53 mutants is a long-held medical goal of cancer treatment. The Cancer Research DataBase (CRDB) was designed in support of a project to find such small molecules. As a cancer informatics project, the CRDB involved small molecule data, computational docking results, functional assays, and protein structure data. As an example of the hybrid strategy for data integration, it combined the mediation and data warehousing approaches. This paper uses the CRDB to illustrate the hybrid strategy as a viable approach to heterogeneous data integration in biomedicine, and provides a design method for those considering similar systems. More efficient data sharing implies increased productivity, and, hopefully, improved chances of success in cancer research. (Code and database schemas are freely downloadable, http://www.igb.uci.edu/research/research.html.) PMID:19458771

  16. Development of damage probability matrices based on Greek earthquake damage data

    NASA Astrophysics Data System (ADS)

    Eleftheriadou, Anastasia K.; Karabinis, Athanasios I.

    2011-03-01

    A comprehensive study is presented for empirical seismic vulnerability assessment of typical structural types, representative of the building stock of Southern Europe, based on a large set of damage statistics. The observational database was obtained from post-earthquake surveys carried out in the area struck by the September 7, 1999 Athens earthquake. After analysis of the collected observational data, a unified damage database has been created which comprises 180,945 damaged buildings from/after the near-field area of the earthquake. The damaged buildings are classified in specific structural types, according to the materials, seismic codes and construction techniques in Southern Europe. The seismic demand is described in terms of both the regional macroseismic intensity and the ratio α g/ a o, where α g is the maximum peak ground acceleration (PGA) of the earthquake event and a o is the unique value PGA that characterizes each municipality shown on the Greek hazard map. The relative and cumulative frequencies of the different damage states for each structural type and each intensity level are computed in terms of damage ratio. Damage probability matrices (DPMs) and vulnerability curves are obtained for specific structural types. A comparison analysis is fulfilled between the produced and the existing vulnerability models.

  17. Holistic computational structure screening of more than 12,000 candidates for solid lithium-ion conductor materials

    NASA Astrophysics Data System (ADS)

    Sendek, Austin D.; Yang, Qian; Cubuk, Ekin D.; Duerloo, Karel-Alexander N.; Cui, Yi; Reed, Evan J.

    We present a new type of large-scale computational screening approach for identifying promising candidate materials for solid state electrolytes for lithium ion batteries that is capable of screening all known lithium containing solids. To predict the likelihood of a candidate material exhibiting high lithium ion conductivity, we leverage machine learning techniques to train an ionic conductivity classification model using logistic regression based on experimental measurements reported in the literature. This model, which is built on easily calculable atomistic descriptors, provides new insight into the structure-property relationship for superionic behavior in solids and is approximately one million times faster to evaluate than DFT-based approaches to calculating diffusion coefficients or migration barriers. We couple this model with several other technologically motivated heuristics to reduce the list of candidate materials from the more than 12,000 known lithium containing solids to 21 structures that show promise as electrolytes, few of which have been examined experimentally. Our screening utilizes structures and electronic information contained in the Materials Project database. This work is supported by an Office of Technology Licensing Fellowship through the Stanford Graduate Fellowship Program and a seed Grant from the TomKat Center for Sustainable Energy at Stanford.

  18. Design and Implementation of an Interface Editor for the Amadeus Multi- Relational Database Front-end System

    DTIC Science & Technology

    1993-03-25

    application of Object-Oriented Programming (OOP) and Human-Computer Interface (HCI) design principles. Knowledge gained from each topic has been incorporated...through the ap- plication of Object-Oriented Programming (OOP) and Human-Computer Interface (HCI) design principles. Knowledge gained from each topic has...programming and Human-Computer Interface (HCI) design. Knowledge gained from each is applied to the design of a Form-based interface for database data

  19. Pan Air Geometry Management System (PAGMS): A data-base management system for PAN AIR geometry data

    NASA Technical Reports Server (NTRS)

    Hall, J. F.

    1981-01-01

    A data-base management system called PAGMS was developed to facilitate the data transfer in applications computer programs that create, modify, plot or otherwise manipulate PAN AIR type geometry data in preparation for input to the PAN AIR system of computer programs. PAGMS is composed of a series of FORTRAN callable subroutines which can be accessed directly from applications programs. Currently only a NOS version of PAGMS has been developed.

  20. High-Throughput Thermodynamic Modeling and Uncertainty Quantification for ICME

    NASA Astrophysics Data System (ADS)

    Otis, Richard A.; Liu, Zi-Kui

    2017-05-01

    One foundational component of the integrated computational materials engineering (ICME) and Materials Genome Initiative is the computational thermodynamics based on the calculation of phase diagrams (CALPHAD) method. The CALPHAD method pioneered by Kaufman has enabled the development of thermodynamic, atomic mobility, and molar volume databases of individual phases in the full space of temperature, composition, and sometimes pressure for technologically important multicomponent engineering materials, along with sophisticated computational tools for using the databases. In this article, our recent efforts will be presented in terms of developing new computational tools for high-throughput modeling and uncertainty quantification based on high-throughput, first-principles calculations and the CALPHAD method along with their potential propagations to downstream ICME modeling and simulations.

  1. D Partition-Based Clustering for Supply Chain Data Management

    NASA Astrophysics Data System (ADS)

    Suhaibah, A.; Uznir, U.; Anton, F.; Mioc, D.; Rahman, A. A.

    2015-10-01

    Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.

  2. High throughput profile-profile based fold recognition for the entire human proteome.

    PubMed

    McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2006-06-07

    In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

  3. Efficient method for high-throughput virtual screening based on flexible docking: discovery of novel acetylcholinesterase inhibitors.

    PubMed

    Mizutani, Miho Yamada; Itai, Akiko

    2004-09-23

    A method of easily finding ligands, with a variety of core structures, for a given target macromolecule would greatly contribute to the rapid identification of novel lead compounds for drug development. We have developed an efficient method for discovering ligand candidates from a number of flexible compounds included in databases, when the three-dimensional (3D) structure of the drug target is available. The method, named ADAM&EVE, makes use of our automated docking method ADAM, which has already been reported. Like ADAM, ADAM&EVE takes account of the flexibility of each molecule in databases, by exploring the conformational space fully and continuously. Database screening has been made much faster than with ADAM through the tuning of parameters, so that computational screening of several hundred thousand compounds is possible in a practical time. Promising ligand candidates can be selected according to various criteria based on the docking results and characteristics of compounds. Furthermore, we have developed a new tool, EVE-MAKE, for automatically preparing the additional compound data necessary for flexible docking calculation, prior to 3D database screening. Among several successful cases of lead discovery by ADAM&EVE, the finding of novel acetylcholinesterase (AChE) inhibitors is presented here. We performed a virtual screening of about 160 000 commercially available compounds against the X-ray crystallographic structure of AChE. Among 114 compounds that could be purchased and assayed, 35 molecules with various core structures showed inhibitory activities with IC(50) values less than 100 microM. Thirteen compounds had IC(50) values between 0.5 and 10 microM, and almost all their core structures are very different from those of known inhibitors. The results demonstrate the effectiveness and validity of the ADAM&EVE approach and provide a starting point for development of novel drugs to treat Alzheimer's disease.

  4. Information literacy for evidence-based practice in perianesthesia nurses: readiness for evidence-based practice.

    PubMed

    Ross, Jacqueline

    2010-04-01

    Information literacy, the recognition of information required, and the development of skills for locating, evaluating, and effectively using relevant evidence is needed for evidence-based practice (EBP). The purpose of this study was to examine perianesthesia nurses' perception of searching skills and access to evidence sources. The design was a descriptive, exploratory survey. The sample consisted of ASPAN members (n = 64) and nonmembers (n = 64). The Information Literacy for Evidence-Based Nursing Practice instrument was used. Findings were that ASPAN members read more journal articles, were more proficient with computers, and used Cumulative Index to Nursing and Allied Health Literature (CINAHL) more frequently than nonmembers. The three top barriers to use of research were: lack of understanding of organization or structure of electronic databases, lack of skills to critique and/or synthesize the literature, and difficulty in accessing research materials. In conclusion, education is needed for critiquing literature and understanding electronic databases and research articles to promote EBP in perianesthesia areas. Copyright 2010. Published by Elsevier Inc.

  5. Software for Building Models of 3D Objects via the Internet

    NASA Technical Reports Server (NTRS)

    Schramer, Tim; Jensen, Jeff

    2003-01-01

    The Virtual EDF Builder (where EDF signifies Electronic Development Fixture) is a computer program that facilitates the use of the Internet for building and displaying digital models of three-dimensional (3D) objects that ordinarily comprise assemblies of solid models created previously by use of computer-aided-design (CAD) programs. The Virtual EDF Builder resides on a Unix-based server computer. It is used in conjunction with a commercially available Web-based plug-in viewer program that runs on a client computer. The Virtual EDF Builder acts as a translator between the viewer program and a database stored on the server. The translation function includes the provision of uniform resource locator (URL) links to other Web-based computer systems and databases. The Virtual EDF builder can be used in two ways: (1) If the client computer is Unix-based, then it can assemble a model locally; the computational load is transferred from the server to the client computer. (2) Alternatively, the server can be made to build the model, in which case the server bears the computational load and the results are downloaded to the client computer or workstation upon completion.

  6. Top 10 Uses for ClarisWorks in the One-Computer Classroom.

    ERIC Educational Resources Information Center

    Robinette, Michelle

    1996-01-01

    Suggests ways to use ClarisWorks to motivate students when only one computer is accessible: (1) class database; (2) grade book; (3) classroom journal; (4) ongoing story center; (5) skill-and-draw review station; (6) monthly class magazine/newspaper; (7) research base/project planner; (8) lecture and presentation enhancement; (9) database of ideas…

  7. ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)

    EPA Science Inventory

    ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...

  8. Predictive framework for shape-selective separations in three-dimensional zeolites and metal-organic frameworks.

    PubMed

    First, Eric L; Gounaris, Chrysanthos E; Floudas, Christodoulos A

    2013-05-07

    With the growing number of zeolites and metal-organic frameworks (MOFs) available, computational methods are needed to screen databases of structures to identify those most suitable for applications of interest. We have developed novel methods based on mathematical optimization to predict the shape selectivity of zeolites and MOFs in three dimensions by considering the energy costs of transport through possible pathways. Our approach is applied to databases of over 1800 microporous materials including zeolites, MOFs, zeolitic imidazolate frameworks, and hypothetical MOFs. New materials are identified for applications in gas separations (CO2/N2, CO2/CH4, and CO2/H2), air separation (O2/N2), and chemicals (propane/propylene, ethane/ethylene, styrene/ethylbenzene, and xylenes).

  9. Automated Gene Ontology annotation for anonymous sequence data.

    PubMed

    Hennig, Steffen; Groth, Detlef; Lehrach, Hans

    2003-07-01

    Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

  10. Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms

    DOE PAGES

    Gao, Connie W.; Allen, Joshua W.; Green, William H.; ...

    2016-02-24

    Reaction Mechanism Generator (RMG) constructs kinetic models composed of elementary chemical reaction steps using a general understanding of how molecules react. Species thermochemistry is estimated through Benson group additivity and reaction rate coefficients are estimated using a database of known rate rules and reaction templates. At its core, RMG relies on two fundamental data structures: graphs and trees. Graphs are used to represent chemical structures, and trees are used to represent thermodynamic and kinetic data. Models are generated using a rate-based algorithm which excludes species from the model based on reaction fluxes. RMG can generate reaction mechanisms for species involvingmore » carbon, hydrogen, oxygen, sulfur, and nitrogen. It also has capabilities for estimating transport and solvation properties, and it automatically computes pressure-dependent rate coefficients and identifies chemically-activated reaction paths. RMG is an object-oriented program written in Python, which provides a stable, robust programming architecture for developing an extensible and modular code base with a large suite of unit tests. Computationally intensive functions are cythonized for speed improvements.« less

  11. Protein structure determination by exhaustive search of Protein Data Bank derived databases.

    PubMed

    Stokes-Rees, Ian; Sliz, Piotr

    2010-12-14

    Parallel sequence and structure alignment tools have become ubiquitous and invaluable at all levels in the study of biological systems. We demonstrate the application and utility of this same parallel search paradigm to the process of protein structure determination, benefitting from the large and growing corpus of known structures. Such searches were previously computationally intractable. Through the method of Wide Search Molecular Replacement, developed here, they can be completed in a few hours with the aide of national-scale federated cyberinfrastructure. By dramatically expanding the range of models considered for structure determination, we show that small (less than 12% structural coverage) and low sequence identity (less than 20% identity) template structures can be identified through multidimensional template scoring metrics and used for structure determination. Many new macromolecular complexes can benefit significantly from such a technique due to the lack of known homologous protein folds or sequences. We demonstrate the effectiveness of the method by determining the structure of a full-length p97 homologue from Trichoplusia ni. Example cases with the MHC/T-cell receptor complex and the EmoB protein provide systematic estimates of minimum sequence identity, structure coverage, and structural similarity required for this method to succeed. We describe how this structure-search approach and other novel computationally intensive workflows are made tractable through integration with the US national computational cyberinfrastructure, allowing, for example, rapid processing of the entire Structural Classification of Proteins protein fragment database.

  12. Constructing a Geology Ontology Using a Relational Database

    NASA Astrophysics Data System (ADS)

    Hou, W.; Yang, L.; Yin, S.; Ye, J.; Clarke, K.

    2013-12-01

    In geology community, the creation of a common geology ontology has become a useful means to solve problems of data integration, knowledge transformation and the interoperation of multi-source, heterogeneous and multiple scale geological data. Currently, human-computer interaction methods and relational database-based methods are the primary ontology construction methods. Some human-computer interaction methods such as the Geo-rule based method, the ontology life cycle method and the module design method have been proposed for applied geological ontologies. Essentially, the relational database-based method is a reverse engineering of abstracted semantic information from an existing database. The key is to construct rules for the transformation of database entities into the ontology. Relative to the human-computer interaction method, relational database-based methods can use existing resources and the stated semantic relationships among geological entities. However, two problems challenge the development and application. One is the transformation of multiple inheritances and nested relationships and their representation in an ontology. The other is that most of these methods do not measure the semantic retention of the transformation process. In this study, we focused on constructing a rule set to convert the semantics in a geological database into a geological ontology. According to the relational schema of a geological database, a conversion approach is presented to convert a geological spatial database to an OWL-based geological ontology, which is based on identifying semantics such as entities, relationships, inheritance relationships, nested relationships and cluster relationships. The semantic integrity of the transformation was verified using an inverse mapping process. In a geological ontology, an inheritance and union operations between superclass and subclass were used to present the nested relationship in a geochronology and the multiple inheritances relationship. Based on a Quaternary database of downtown of Foshan city, Guangdong Province, in Southern China, a geological ontology was constructed using the proposed method. To measure the maintenance of semantics in the conversation process and the results, an inverse mapping from the ontology to a relational database was tested based on a proposed conversation rule. The comparison of schema and entities and the reduction of tables between the inverse database and the original database illustrated that the proposed method retains the semantic information well during the conversation process. An application for abstracting sandstone information showed that semantic relationships among concepts in the geological database were successfully reorganized in the constructed ontology. Key words: geological ontology; geological spatial database; multiple inheritance; OWL Acknowledgement: This research is jointly funded by the Specialized Research Fund for the Doctoral Program of Higher Education of China (RFDP) (20100171120001), NSFC (41102207) and the Fundamental Research Funds for the Central Universities (12lgpy19).

  13. Computer-Based Mapping for Curriculum Development.

    ERIC Educational Resources Information Center

    Allen, Brockenbrough S.; And Others

    This article describes the results of a three-month experiment in the use of computer-based semantic networks for curriculum development. A team of doctoral and master's degree students developed a 1200-item computer database representing a tentative "domain of competency" for a proposed MA degree in Workforce Education and Lifelong…

  14. Applications of Technology to CAS Data-Base Production.

    ERIC Educational Resources Information Center

    Weisgerber, David W.

    1984-01-01

    Reviews the economic importance of applying computer technology to Chemical Abstracts Service database production from 1973 to 1983. Database building, technological applications for editorial processing (online editing, Author Index Manufacturing System), and benefits (increased staff productivity, reduced rate of increase of cost of services,…

  15. Software for Managing Inventory of Flight Hardware

    NASA Technical Reports Server (NTRS)

    Salisbury, John; Savage, Scott; Thomas, Shirman

    2003-01-01

    The Flight Hardware Support Request System (FHSRS) is a computer program that relieves engineers at Marshall Space Flight Center (MSFC) of most of the non-engineering administrative burden of managing an inventory of flight hardware. The FHSRS can also be adapted to perform similar functions for other organizations. The FHSRS affords a combination of capabilities, including those formerly provided by three separate programs in purchasing, inventorying, and inspecting hardware. The FHSRS provides a Web-based interface with a server computer that supports a relational database of inventory; electronic routing of requests and approvals; and electronic documentation from initial request through implementation of quality criteria, acquisition, receipt, inspection, storage, and final issue of flight materials and components. The database lists both hardware acquired for current projects and residual hardware from previous projects. The increased visibility of residual flight components provided by the FHSRS has dramatically improved the re-utilization of materials in lieu of new procurements, resulting in a cost savings of over $1.7 million. The FHSRS includes subprograms for manipulating the data in the database, informing of the status of a request or an item of hardware, and searching the database on any physical or other technical characteristic of a component or material. The software structure forces normalization of the data to facilitate inquiries and searches for which users have entered mixed or inconsistent values.

  16. Compilation of the data-base of the star catalogue by ADABAS.

    NASA Astrophysics Data System (ADS)

    Ishikawa, T.

    A data-base of the FK4 Star Catalogue is compiled by using HITAC M-280H in the Computer Center of Tokyo University and a commercial data-base management system (DBMS) ADABAS. The purpose of this attempt is to examine whether the ADABAS, which could be regarded as a representative of the currently available DBMS's developed majorly for business and information retrieval purposes, proves itself useful for handling mass numerical data like the star catalogue data. It is concluded that the data-base could really be a convenient way for storing and utilizing the star catalogue data.

  17. Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites

    NASA Astrophysics Data System (ADS)

    Valasatava, Yana; Andreini, Claudia; Rosato, Antonio

    2015-03-01

    Metalloproteins account for a substantial fraction of all proteins. They incorporate metal atoms, which are required for their structure and/or function. Here we describe a new computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity. These sites are extracted from the MetalPDB database of minimal functional sites (MFSs) in metal-binding biological macromolecules. Structural similarity is measured by the scoring function of the available MetalS2 program. Hierarchical clustering was used to organize MFSs into clusters, for each of which a representative MFS was identified. The comparison of all representative MFSs provided a thorough structure-based classification of the sites analyzed. As examples, the application of the proposed computational protocol to all heme-binding proteins and zinc-binding proteins of known structure highlighted the existence of structural subtypes, validated known evolutionary links and shed new light on the occurrence of similar sites in systems at different evolutionary distances. The present approach thus makes available an innovative viewpoint on metalloproteins, where the functionally crucial metal sites effectively lead the discovery of structural and functional relationships in a largely protein-independent manner.

  18. IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

    PubMed

    Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

    2016-01-01

    The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.

  19. High-Level Data-Abstraction System

    NASA Technical Reports Server (NTRS)

    Fishwick, P. A.

    1986-01-01

    Communication with data-base processor flexible and efficient. High Level Data Abstraction (HILDA) system is three-layer system supporting data-abstraction features of Intel data-base processor (DBP). Purpose of HILDA establishment of flexible method of efficiently communicating with DBP. Power of HILDA lies in its extensibility with regard to syntax and semantic changes. HILDA's high-level query language readily modified. Offers powerful potential to computer sites where DBP attached to DEC VAX-series computer. HILDA system written in Pascal and FORTRAN 77 for interactive execution.

  20. Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database.

    PubMed

    Akune, Yukie; Lin, Chi-Hung; Abrahams, Jodie L; Zhang, Jingyu; Packer, Nicolle H; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P

    2016-08-05

    Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Proteins of unknown function in the Protein Data Bank (PDB): an inventory of true uncharacterized proteins and computational tools for their analysis.

    PubMed

    Nadzirin, Nurul; Firdaus-Raih, Mohd

    2012-10-08

    Proteins of uncharacterized functions form a large part of many of the currently available biological databases and this situation exists even in the Protein Data Bank (PDB). Our analysis of recent PDB data revealed that only 42.53% of PDB entries (1084 coordinate files) that were categorized under "unknown function" are true examples of proteins of unknown function at this point in time. The remainder 1465 entries also annotated as such appear to be able to have their annotations re-assessed, based on the availability of direct functional characterization experiments for the protein itself, or for homologous sequences or structures thus enabling computational function inference.

  2. Interactive flutter analysis and parametric study for conceptual wing design

    NASA Technical Reports Server (NTRS)

    Mukhopadhyay, Vivek

    1995-01-01

    An interactive computer program was developed for wing flutter analysis in the conceptual design stage. The objective was to estimate the flutter instability boundary of a flexible cantilever wing, when well defined structural and aerodynamic data are not available, and then study the effect of change in Mach number, dynamic pressure, torsional frequency, sweep, mass ratio, aspect ratio, taper ratio, center of gravity, and pitch inertia, to guide the development of the concept. The software was developed on MathCad (trademark) platform for Macintosh, with integrated documentation, graphics, database and symbolic mathematics. The analysis method was based on nondimensional parametric plots of two primary flutter parameters, namely Regier number and Flutter number, with normalization factors based on torsional stiffness, sweep, mass ratio, aspect ratio, center of gravity location and pitch inertia radius of gyration. The plots were compiled in a Vaught Corporation report from a vast database of past experiments and wind tunnel tests. The computer program was utilized for flutter analysis of the outer wing of a Blended Wing Body concept, proposed by McDonnell Douglas Corporation. Using a set of assumed data, preliminary flutter boundary and flutter dynamic pressure variation with altitude, Mach number and torsional stiffness were determined.

  3. Representation and alignment of sung queries for music information retrieval

    NASA Astrophysics Data System (ADS)

    Adams, Norman H.; Wakefield, Gregory H.

    2005-09-01

    The pursuit of robust and rapid query-by-humming systems, which search melodic databases using sung queries, is a common theme in music information retrieval. The retrieval aspect of this database problem has received considerable attention, whereas the front-end processing of sung queries and the data structure to represent melodies has been based on musical intuition and historical momentum. The present work explores three time series representations for sung queries: a sequence of notes, a ``smooth'' pitch contour, and a sequence of pitch histograms. The performance of the three representations is compared using a collection of naturally sung queries. It is found that the most robust performance is achieved by the representation with highest dimension, the smooth pitch contour, but that this representation presents a formidable computational burden. For all three representations, it is necessary to align the query and target in order to achieve robust performance. The computational cost of the alignment is quadratic, hence it is necessary to keep the dimension small for rapid retrieval. Accordingly, iterative deepening is employed to achieve both robust performance and rapid retrieval. Finally, the conventional iterative framework is expanded to adapt the alignment constraints based on previous iterations, further expediting retrieval without degrading performance.

  4. Toward intelligent information sysytem

    NASA Astrophysics Data System (ADS)

    Onodera, Natsuo

    "Hypertext" means a concept of a novel computer-assisted tool for storage and retrieval of text information based on human association. Structure of knowledge in our idea processing is generally complicated and networked, but traditional paper documents merely express it in essentially linear and sequential forms. However, recent advances in work-station technology have allowed us to process easily electronic documents containing non-linear structure such as references or hierarchies. This paper describes concept, history and basic organization of hypertext, and shows the outline and features of existing main hypertext systems. Particularly, use of the hypertext database is illustrated by an example of Intermedia developed by Brown University.

  5. ACToR – Aggregated Computational Toxicology Resource ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a collection of databases collated or developed by the US EPA National Center for Computational Toxicology (NCCT). More than 200 sources of publicly available data on environmental chemicals have been brought together and made searchable by chemical name and other identifiers, and by chemical structure. Data includes chemical structure, physico-chemical values, in vitro assay data and in vivo toxicology data. Chemicals include, but are not limited to, high and medium production volume industrial chemicals, pesticides (active and inert ingredients), and potential ground and drinking water contaminants.

  6. Brain CT image similarity retrieval method based on uncertain location graph.

    PubMed

    Pan, Haiwei; Li, Pengyuan; Li, Qing; Han, Qilong; Feng, Xiaoning; Gao, Linlin

    2014-03-01

    A number of brain computed tomography (CT) images stored in hospitals that contain valuable information should be shared to support computer-aided diagnosis systems. Finding the similar brain CT images from the brain CT image database can effectively help doctors diagnose based on the earlier cases. However, the similarity retrieval for brain CT images requires much higher accuracy than the general images. In this paper, a new model of uncertain location graph (ULG) is presented for brain CT image modeling and similarity retrieval. According to the characteristics of brain CT image, we propose a novel method to model brain CT image to ULG based on brain CT image texture. Then, a scheme for ULG similarity retrieval is introduced. Furthermore, an effective index structure is applied to reduce the searching time. Experimental results reveal that our method functions well on brain CT images similarity retrieval with higher accuracy and efficiency.

  7. The semantic web and computer vision: old AI meets new AI

    NASA Astrophysics Data System (ADS)

    Mundy, J. L.; Dong, Y.; Gilliam, A.; Wagner, R.

    2018-04-01

    There has been vast process in linking semantic information across the billions of web pages through the use of ontologies encoded in the Web Ontology Language (OWL) based on the Resource Description Framework (RDF). A prime example is the Wikipedia where the knowledge contained in its more than four million pages is encoded in an ontological database called DBPedia http://wiki.dbpedia.org/. Web-based query tools can retrieve semantic information from DBPedia encoded in interlinked ontologies that can be accessed using natural language. This paper will show how this vast context can be used to automate the process of querying images and other geospatial data in support of report changes in structures and activities. Computer vision algorithms are selected and provided with context based on natural language requests for monitoring and analysis. The resulting reports provide semantically linked observations from images and 3D surface models.

  8. SATPdb: a database of structurally annotated therapeutic peptides

    PubMed Central

    Singh, Sandeep; Chaudhary, Kumardeep; Dhanda, Sandeep Kumar; Bhalla, Sherry; Usmani, Salman Sadullah; Gautam, Ankur; Tuknait, Abhishek; Agrawal, Piyush; Mathur, Deepika; Raghava, Gajendra P.S.

    2016-01-01

    SATPdb (http://crdd.osdd.net/raghava/satpdb/) is a database of structurally annotated therapeutic peptides, curated from 22 public domain peptide databases/datasets including 9 of our own. The current version holds 19192 unique experimentally validated therapeutic peptide sequences having length between 2 and 50 amino acids. It covers peptides having natural, non-natural and modified residues. These peptides were systematically grouped into 10 categories based on their major function or therapeutic property like 1099 anticancer, 10585 antimicrobial, 1642 drug delivery and 1698 antihypertensive peptides. We assigned or annotated structure of these therapeutic peptides using structural databases (Protein Data Bank) and state-of-the-art structure prediction methods like I-TASSER, HHsearch and PEPstrMOD. In addition, SATPdb facilitates users in performing various tasks that include: (i) structure and sequence similarity search, (ii) peptide browsing based on their function and properties, (iii) identification of moonlighting peptides and (iv) searching of peptides having desired structure and therapeutic activities. We hope this database will be useful for researchers working in the field of peptide-based therapeutics. PMID:26527728

  9. ChemProt-2.0: visual navigation in a disease chemical biology database

    PubMed Central

    Kim Kjærulff, Sonny; Wich, Louis; Kringelum, Jens; Jacobsen, Ulrik P.; Kouskoumvekaki, Irene; Audouze, Karine; Lund, Ole; Brunak, Søren; Oprea, Tudor I.; Taboureau, Olivier

    2013-01-01

    ChemProt-2.0 (http://www.cbs.dtu.dk/services/ChemProt-2.0) is a public available compilation of multiple chemical–protein annotation resources integrated with diseases and clinical outcomes information. The database has been updated to >1.15 million compounds with 5.32 millions bioactivity measurements for 15 290 proteins. Each protein is linked to quality-scored human protein–protein interactions data based on more than half a million interactions, for studying diseases and biological outcomes (diseases, pathways and GO terms) through protein complexes. In ChemProt-2.0, therapeutic effects as well as adverse drug reactions have been integrated allowing for suggesting proteins associated to clinical outcomes. New chemical structure fingerprints were computed based on the similarity ensemble approach. Protein sequence similarity search was also integrated to evaluate the promiscuity of proteins, which can help in the prediction of off-target effects. Finally, the database was integrated into a visual interface that enables navigation of the pharmacological space for small molecules. Filtering options were included in order to facilitate and to guide dynamic search of specific queries. PMID:23185041

  10. Maintaining the Database for Information Object Analysis, Intent, Dissemination and Enhancement (IOAIDE) and the US Army Research Laboratory Campus Sensor Network (ARL CSN)

    DTIC Science & Technology

    2017-01-01

    CII-B 2800 Powder Mill Road Adelphi, MD 20783-1138 8. PERFORMING ORGANIZATION REPORT NUMBER ARL-TR-7921 9. SPONSORING/MONITORING AGENCY NAME(S...server database, structured query language, information objects, instructions, maintenance , cursor on target events, unattended ground sensors...unlimited. iii Contents List of Figures iv 1. Introduction 1 2. Computer and Software Development Tools Requirements 1 3. Database Maintenance 2 3.1

  11. Computer assisted data analysis in intensive care: the ICDEV project--development of a scientific database system for intensive care (Intensive Care Data Evaluation Project).

    PubMed

    Metnitz, P G; Laback, P; Popow, C; Laback, O; Lenz, K; Hiesmayr, M

    1995-01-01

    Patient Data Management Systems (PDMS) for ICUs collect, present and store clinical data. Various intentions make analysis of those digitally stored data desirable, such as quality control or scientific purposes. The aim of the Intensive Care Data Evaluation project (ICDEV), was to provide a database tool for the analysis of data recorded at various ICUs at the University Clinics of Vienna. General Hospital of Vienna, with two different PDMSs used: CareVue 9000 (Hewlett Packard, Andover, USA) at two ICUs (one medical ICU and one neonatal ICU) and PICIS Chart+ (PICIS, Paris, France) at one Cardiothoracic ICU. CONCEPT AND METHODS: Clinically oriented analysis of the data collected in a PDMS at an ICU was the beginning of the development. After defining the database structure we established a client-server based database system under Microsoft Windows NI and developed a user friendly data quering application using Microsoft Visual C++ and Visual Basic; ICDEV was successfully installed at three different ICUs, adjustment to the different PDMS configurations were done within a few days. The database structure developed by us enables a powerful query concept representing an 'EXPERT QUESTION COMPILER' which may help to answer almost any clinical questions. Several program modules facilitate queries at the patient, group and unit level. Results from ICDEV-queries are automatically transferred to Microsoft Excel for display (in form of configurable tables and graphs) and further processing. The ICDEV concept is configurable for adjustment to different intensive care information systems and can be used to support computerized quality control. However, as long as there exists no sufficient artifact recognition or data validation software for automatically recorded patient data, the reliability of these data and their usage for computer assisted quality control remain unclear and should be further studied.

  12. Blind test of physics-based prediction of protein structures.

    PubMed

    Shell, M Scott; Ozkan, S Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A

    2009-02-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.

  13. Blind Test of Physics-Based Prediction of Protein Structures

    PubMed Central

    Shell, M. Scott; Ozkan, S. Banu; Voelz, Vincent; Wu, Guohong Albert; Dill, Ken A.

    2009-01-01

    We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences. PMID:19186130

  14. Seeing is believing: on the use of image databases for visually exploring plant organelle dynamics.

    PubMed

    Mano, Shoji; Miwa, Tomoki; Nishikawa, Shuh-ichi; Mimura, Tetsuro; Nishimura, Mikio

    2009-12-01

    Organelle dynamics vary dramatically depending on cell type, developmental stage and environmental stimuli, so that various parameters, such as size, number and behavior, are required for the description of the dynamics of each organelle. Imaging techniques are superior to other techniques for describing organelle dynamics because these parameters are visually exhibited. Therefore, as the results can be seen immediately, investigators can more easily grasp organelle dynamics. At present, imaging techniques are emerging as fundamental tools in plant organelle research, and the development of new methodologies to visualize organelles and the improvement of analytical tools and equipment have allowed the large-scale generation of image and movie data. Accordingly, image databases that accumulate information on organelle dynamics are an increasingly indispensable part of modern plant organelle research. In addition, image databases are potentially rich data sources for computational analyses, as image and movie data reposited in the databases contain valuable and significant information, such as size, number, length and velocity. Computational analytical tools support image-based data mining, such as segmentation, quantification and statistical analyses, to extract biologically meaningful information from each database and combine them to construct models. In this review, we outline the image databases that are dedicated to plant organelle research and present their potential as resources for image-based computational analyses.

  15. Data, knowledge and method bases in chemical sciences. Part IV. Current status in databases.

    PubMed

    Braibanti, Antonio; Rao, Rupenaguntla Sambasiva; Rao, Gollapalli Nagesvara; Ramam, Veluri Anantha; Rao, Sattiraju Veera Venkata Satyanarayana

    2002-01-01

    Computer readable databases have become an integral part of chemical research right from planning data acquisition to interpretation of the information generated. The databases available today are numerical, spectral and bibliographic. Data representation by different schemes--relational, hierarchical and objects--is demonstrated. Quality index (QI) throws light on the quality of data. The objective, prospects and impact of database activity on expert systems are discussed. The number and size of corporate databases available on international networks crossed manageable number leading to databases about their contents. Subsets of corporate or small databases have been developed by groups of chemists. The features and role of knowledge-based or intelligent databases are described.

  16. Advances and trends in structural and solid mechanics; Proceedings of the Symposium, Washington, DC, October 4-7, 1982

    NASA Technical Reports Server (NTRS)

    Noor, A. K. (Editor); Housner, J. M.

    1983-01-01

    The mechanics of materials and material characterization are considered, taking into account micromechanics, the behavior of steel structures at elevated temperatures, and an anisotropic plasticity model for inelastic multiaxial cyclic deformation. Other topics explored are related to advances and trends in finite element technology, classical analytical techniques and their computer implementation, interactive computing and computational strategies for nonlinear problems, advances and trends in numerical analysis, database management systems and CAD/CAM, space structures and vehicle crashworthiness, beams, plates and fibrous composite structures, design-oriented analysis, artificial intelligence and optimization, contact problems, random waves, and lifetime prediction. Earthquake-resistant structures and other advanced structural applications are also discussed, giving attention to cumulative damage in steel structures subjected to earthquake ground motions, and a mixed domain analysis of nuclear containment structures using impulse functions.

  17. Efficiently computing and deriving topological relation matrices between complex regions with broad boundaries

    NASA Astrophysics Data System (ADS)

    Du, Shihong; Guo, Luo; Wang, Qiao; Qin, Qimin

    The extended 9-intersection matrix is used to formalize topological relations between uncertain regions while it is designed to satisfy the requirements at a concept level, and to deal with the complex regions with broad boundaries (CBBRs) as a whole without considering their hierarchical structures. In contrast to simple regions with broad boundaries, CBBRs have complex hierarchical structures. Therefore, it is necessary to take into account the complex hierarchical structure and to represent the topological relations between all regions in CBBRs as a relation matrix, rather than using the extended 9-intersection matrix to determine topological relations. In this study, a tree model is first used to represent the intrinsic configuration of CBBRs hierarchically. Then, the reasoning tables are presented for deriving topological relations between child, parent and sibling regions from the relations between two given regions in CBBRs. Finally, based on the reasoning, efficient methods are proposed to compute and derive the topological relation matrix. The proposed methods can be incorporated into spatial databases to facilitate geometric-oriented applications.

  18. Program for Generating Graphs and Charts

    NASA Technical Reports Server (NTRS)

    Ackerson, C. T.

    1986-01-01

    Office Automation Pilot (OAP) Graphics Database system offers IBM personal computer user assistance in producing wide variety of graphs and charts and convenient data-base system, called chart base, for creating and maintaining data associated with graphs and charts. Thirteen different graphics packages available. Access graphics capabilities obtained in similar manner. User chooses creation, revision, or chartbase-maintenance options from initial menu; Enters or modifies data displayed on graphic chart. OAP graphics data-base system written in Microsoft PASCAL.

  19. The Proteome Folding Project: Proteome-scale prediction of structure and function

    PubMed Central

    Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard

    2011-01-01

    The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions. PMID:21824995

  20. 20170917 - A Cross-platform Format to Associate NMR-extracted data (NMReDATA) to Chemical Structures (SMASHNMR)

    EPA Science Inventory

    An open initiative involving cross-disciplinary contributors of computer-assisted structure elucidation (CASE), including methodology specialists, software and database developers and the editorial board of Magnetic Resonance in Chemistry, is addressing the old problem of reporti...

  1. FDA toxicity databases and real-time data entry.

    PubMed

    Arvidson, Kirk B

    2008-11-15

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.

  2. Extending (Q)SARs to incorporate proprietary knowledge for regulatory purposes: A case study using aromatic amine mutagenicity.

    PubMed

    Ahlberg, Ernst; Amberg, Alexander; Beilke, Lisa D; Bower, David; Cross, Kevin P; Custer, Laura; Ford, Kevin A; Van Gompel, Jacky; Harvey, James; Honma, Masamitsu; Jolly, Robert; Joossens, Elisabeth; Kemper, Raymond A; Kenyon, Michelle; Kruhlak, Naomi; Kuhnke, Lara; Leavitt, Penny; Naven, Russell; Neilan, Claire; Quigley, Donald P; Shuey, Dana; Spirkl, Hans-Peter; Stavitskaya, Lidiya; Teasdale, Andrew; White, Angela; Wichard, Joerg; Zwickl, Craig; Myatt, Glenn J

    2016-06-01

    Statistical-based and expert rule-based models built using public domain mutagenicity knowledge and data are routinely used for computational (Q)SAR assessments of pharmaceutical impurities in line with the approach recommended in the ICH M7 guideline. Knowledge from proprietary corporate mutagenicity databases could be used to increase the predictive performance for selected chemical classes as well as expand the applicability domain of these (Q)SAR models. This paper outlines a mechanism for sharing knowledge without the release of proprietary data. Primary aromatic amine mutagenicity was selected as a case study because this chemical class is often encountered in pharmaceutical impurity analysis and mutagenicity of aromatic amines is currently difficult to predict. As part of this analysis, a series of aromatic amine substructures were defined and the number of mutagenic and non-mutagenic examples for each chemical substructure calculated across a series of public and proprietary mutagenicity databases. This information was pooled across all sources to identify structural classes that activate or deactivate aromatic amine mutagenicity. This structure activity knowledge, in combination with newly released primary aromatic amine data, was incorporated into Leadscope's expert rule-based and statistical-based (Q)SAR models where increased predictive performance was demonstrated. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. A dictionary based informational genome analysis

    PubMed Central

    2012-01-01

    Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068

  4. Structator: fast index-based search for RNA sequence-structure patterns

    PubMed Central

    2011-01-01

    Background The secondary structure of RNA molecules is intimately related to their function and often more conserved than the sequence. Hence, the important task of searching databases for RNAs requires to match sequence-structure patterns. Unfortunately, current tools for this task have, in the best case, a running time that is only linear in the size of sequence databases. Furthermore, established index data structures for fast sequence matching, like suffix trees or arrays, cannot benefit from the complementarity constraints introduced by the secondary structure of RNAs. Results We present a novel method and readily applicable software for time efficient matching of RNA sequence-structure patterns in sequence databases. Our approach is based on affix arrays, a recently introduced index data structure, preprocessed from the target database. Affix arrays support bidirectional pattern search, which is required for efficiently handling the structural constraints of the pattern. Structural patterns like stem-loops can be matched inside out, such that the loop region is matched first and then the pairing bases on the boundaries are matched consecutively. This allows to exploit base pairing information for search space reduction and leads to an expected running time that is sublinear in the size of the sequence database. The incorporation of a new chaining approach in the search of RNA sequence-structure patterns enables the description of molecules folding into complex secondary structures with multiple ordered patterns. The chaining approach removes spurious matches from the set of intermediate results, in particular of patterns with little specificity. In benchmark experiments on the Rfam database, our method runs up to two orders of magnitude faster than previous methods. Conclusions The presented method's sublinear expected running time makes it well suited for RNA sequence-structure pattern matching in large sequence databases. RNA molecules containing several stem-loop substructures can be described by multiple sequence-structure patterns and their matches are efficiently handled by a novel chaining method. Beyond our algorithmic contributions, we provide with Structator a complete and robust open-source software solution for index-based search of RNA sequence-structure patterns. The Structator software is available at http://www.zbh.uni-hamburg.de/Structator. PMID:21619640

  5. Creating the New from the Old: Combinatorial Libraries Generation with Machine-Learning-Based Compound Structure Optimization.

    PubMed

    Podlewska, Sabina; Czarnecki, Wojciech M; Kafel, Rafał; Bojarski, Andrzej J

    2017-02-27

    The growing computational abilities of various tools that are applied in the broadly understood field of computer-aided drug design have led to the extreme popularity of virtual screening in the search for new biologically active compounds. Most often, the source of such molecules consists of commercially available compound databases, but they can also be searched for within the libraries of structures generated in silico from existing ligands. Various computational combinatorial approaches are based solely on the chemical structure of compounds, using different types of substitutions for new molecules formation. In this study, the starting point for combinatorial library generation was the fingerprint referring to the optimal substructural composition in terms of the activity toward a considered target, which was obtained using a machine learning-based optimization procedure. The systematic enumeration of all possible connections between preferred substructures resulted in the formation of target-focused libraries of new potential ligands. The compounds were initially assessed by machine learning methods using a hashed fingerprint to represent molecules; the distribution of their physicochemical properties was also investigated, as well as their synthetic accessibility. The examination of various fingerprints and machine learning algorithms indicated that the Klekota-Roth fingerprint and support vector machine were an optimal combination for such experiments. This study was performed for 8 protein targets, and the obtained compound sets and their characterization are publically available at http://skandal.if-pan.krakow.pl/comb_lib/ .

  6. Implementation of Automatic Focusing Algorithms for a Computer Vision System with Camera Control.

    DTIC Science & Technology

    1983-08-15

    obtainable from real data, rather than relying on a stock database. Often, computer vision and image processing algorithms become subconsciously tuned to...two coils on the same mount structure. Since it was not possible to reprogram the binary system, we turned to the POPEYE system for both its grey

  7. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  8. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  9. Data-Based Predictive Control with Multirate Prediction Step

    NASA Technical Reports Server (NTRS)

    Barlow, Jonathan S.

    2010-01-01

    Data-based predictive control is an emerging control method that stems from Model Predictive Control (MPC). MPC computes current control action based on a prediction of the system output a number of time steps into the future and is generally derived from a known model of the system. Data-based predictive control has the advantage of deriving predictive models and controller gains from input-output data. Thus, a controller can be designed from the outputs of complex simulation code or a physical system where no explicit model exists. If the output data happens to be corrupted by periodic disturbances, the designed controller will also have the built-in ability to reject these disturbances without the need to know them. When data-based predictive control is implemented online, it becomes a version of adaptive control. One challenge of MPC is computational requirements increasing with prediction horizon length. This paper develops a closed-loop dynamic output feedback controller that minimizes a multi-step-ahead receding-horizon cost function with multirate prediction step. One result is a reduced influence of prediction horizon and the number of system outputs on the computational requirements of the controller. Another result is an emphasis on portions of the prediction window that are sampled more frequently. A third result is the ability to include more outputs in the feedback path than in the cost function.

  10. Developing a Near Real-time System for Earthquake Slip Distribution Inversion

    NASA Astrophysics Data System (ADS)

    Zhao, Li; Hsieh, Ming-Che; Luo, Yan; Ji, Chen

    2016-04-01

    Advances in observational and computational seismology in the past two decades have enabled completely automatic and real-time determinations of the focal mechanisms of earthquake point sources. However, seismic radiations from moderate and large earthquakes often exhibit strong finite-source directivity effect, which is critically important for accurate ground motion estimations and earthquake damage assessments. Therefore, an effective procedure to determine earthquake rupture processes in near real-time is in high demand for hazard mitigation and risk assessment purposes. In this study, we develop an efficient waveform inversion approach for the purpose of solving for finite-fault models in 3D structure. Full slip distribution inversions are carried out based on the identified fault planes in the point-source solutions. To ensure efficiency in calculating 3D synthetics during slip distribution inversions, a database of strain Green tensors (SGT) is established for 3D structural model with realistic surface topography. The SGT database enables rapid calculations of accurate synthetic seismograms for waveform inversion on a regular desktop or even a laptop PC. We demonstrate our source inversion approach using two moderate earthquakes (Mw~6.0) in Taiwan and in mainland China. Our results show that 3D velocity model provides better waveform fitting with more spatially concentrated slip distributions. Our source inversion technique based on the SGT database is effective for semi-automatic, near real-time determinations of finite-source solutions for seismic hazard mitigation purposes.

  11. Mineral and Geochemical Classification From Spectroscopy/Diffraction Through Neural Networks

    NASA Astrophysics Data System (ADS)

    Ferralis, N.; Grossman, J.; Summons, R. E.

    2017-12-01

    Spectroscopy and diffraction techniques are essential for understanding structural, chemical and functional properties of geological materials for Earth and Planetary Sciences. Beyond data collection, quantitative insight relies on experimentally assembled, or computationally derived spectra. Inference on the geochemical or geophysical properties (such as crystallographic order, chemical functionality, elemental composition, etc.) of a particular geological material (mineral, organic matter, etc.) is based on fitting unknown spectra and comparing the fit with consolidated databases. The complexity of fitting highly convoluted spectra, often limits the ability to infer geochemical characteristics, and limits the throughput for extensive datasets. With the emergence of heuristic approaches to pattern recognitions though machine learning, in this work we investigate the possibility and potential of using supervised neural networks trained on available public spectroscopic database to directly infer geochemical parameters from unknown spectra. Using Raman, infrared spectroscopy and powder x-ray diffraction from the publicly available RRUFF database, we train neural network models to classify mineral and organic compounds (pure or mixtures) based on crystallographic structure from diffraction, chemical functionality, elemental composition and bonding from spectroscopy. As expected, the accuracy of the inference is strongly dependent on the quality and extent of the training data. We will identify a series of requirements and guidelines for the training dataset needed to achieve consistent high accuracy inference, along with methods to compensate for limited of data.

  12. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes

    PubMed Central

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel

    2009-01-01

    Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481

  13. Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.

    PubMed

    Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Angel

    2009-03-19

    Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.

  14. Evaluating the efficacy of a structure-derived amino acid substitution matrix in detecting protein homologs by BLAST and PSI-BLAST.

    PubMed

    Goonesekere, Nalin Cw

    2009-01-01

    The large numbers of protein sequences generated by whole genome sequencing projects require rapid and accurate methods of annotation. The detection of homology through computational sequence analysis is a powerful tool in determining the complex evolutionary and functional relationships that exist between proteins. Homology search algorithms employ amino acid substitution matrices to detect similarity between proteins sequences. The substitution matrices in common use today are constructed using sequences aligned without reference to protein structure. Here we present amino acid substitution matrices constructed from the alignment of a large number of protein domain structures from the structural classification of proteins (SCOP) database. We show that when incorporated into the homology search algorithms BLAST and PSI-blast, the structure-based substitution matrices enhance the efficacy of detecting remote homologs.

  15. Operating Policies and Procedures of Computer Data-Base Systems.

    ERIC Educational Resources Information Center

    Anderson, David O.

    Speaking on the operating policies and procedures of computer data bases containing information on students, the author divides his remarks into three parts: content decisions, data base security, and user access. He offers nine recommended practices that should increase the data base's usefulness to the user community: (1) the cost of developing…

  16. Computational Study on New Natural Compound Inhibitors of Pyruvate Dehydrogenase Kinases

    PubMed Central

    Zhou, Xiaoli; Yu, Shanshan; Su, Jing; Sun, Liankun

    2016-01-01

    Pyruvate dehydrogenase kinases (PDKs) are key enzymes in glucose metabolism, negatively regulating pyruvate dehyrogenase complex (PDC) activity through phosphorylation. Inhibiting PDKs could upregulate PDC activity and drive cells into more aerobic metabolism. Therefore, PDKs are potential targets for metabolism related diseases, such as cancers and diabetes. In this study, a series of computer-aided virtual screening techniques were utilized to discover potential inhibitors of PDKs. Structure-based screening using Libdock was carried out following by ADME (adsorption, distribution, metabolism, excretion) and toxicity prediction. Molecular docking was used to analyze the binding mechanism between these compounds and PDKs. Molecular dynamic simulation was utilized to confirm the stability of potential compound binding. From the computational results, two novel natural coumarins compounds (ZINC12296427 and ZINC12389251) from the ZINC database were found binding to PDKs with favorable interaction energy and predicted to be non-toxic. Our study provide valuable information of PDK-coumarins binding mechanisms in PDK inhibitor-based drug discovery. PMID:26959013

  17. Computational Study on New Natural Compound Inhibitors of Pyruvate Dehydrogenase Kinases.

    PubMed

    Zhou, Xiaoli; Yu, Shanshan; Su, Jing; Sun, Liankun

    2016-03-04

    Pyruvate dehydrogenase kinases (PDKs) are key enzymes in glucose metabolism, negatively regulating pyruvate dehyrogenase complex (PDC) activity through phosphorylation. Inhibiting PDKs could upregulate PDC activity and drive cells into more aerobic metabolism. Therefore, PDKs are potential targets for metabolism related diseases, such as cancers and diabetes. In this study, a series of computer-aided virtual screening techniques were utilized to discover potential inhibitors of PDKs. Structure-based screening using Libdock was carried out following by ADME (adsorption, distribution, metabolism, excretion) and toxicity prediction. Molecular docking was used to analyze the binding mechanism between these compounds and PDKs. Molecular dynamic simulation was utilized to confirm the stability of potential compound binding. From the computational results, two novel natural coumarins compounds (ZINC12296427 and ZINC12389251) from the ZINC database were found binding to PDKs with favorable interaction energy and predicted to be non-toxic. Our study provide valuable information of PDK-coumarins binding mechanisms in PDK inhibitor-based drug discovery.

  18. Advances in Toxico-Cheminformatics: Supporting a New ...

    EPA Pesticide Factsheets

    EPA’s National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction through the harnessing of legacy toxicity data, creation of data linkages, and generation of new high-throughput screening (HTS) data. The DSSTox project is working to improve public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate both data-mining and read-across. Both DSSTox Structure-Files and the dedicated on-line DSSTox Structure-Browser are enabling seamless structure-based searching and linkages to and from previously isolated, chemically indexed public toxicity data resources (e.g., NTP, EPA IRIS, CPDB). Most recently, structure-enabled search capabilities have been extended to chemical exposure-related microarray experiments in the public EBI Array Express database, additionally linking this resource to the NIEHS CEBS toxicogenomics database. The public DSSTox chemical and bioassay inventory has been recently integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The DSSTox project is providing cheminformatics support for EPA’s ToxCastTM project, as well as supporting collaborations with the National Toxicology Program (NTP) HTS and the NIH Chemical Genomics Center (NCGC). Phase I of the ToxCastTM project is generating HT

  19. Building blocks for automated elucidation of metabolites: machine learning methods for NMR prediction.

    PubMed

    Kuhn, Stefan; Egert, Björn; Neumann, Steffen; Steinbeck, Christoph

    2008-09-25

    Current efforts in Metabolomics, such as the Human Metabolome Project, collect structures of biological metabolites as well as data for their characterisation, such as spectra for identification of substances and measurements of their concentration. Still, only a fraction of existing metabolites and their spectral fingerprints are known. Computer-Assisted Structure Elucidation (CASE) of biological metabolites will be an important tool to leverage this lack of knowledge. Indispensable for CASE are modules to predict spectra for hypothetical structures. This paper evaluates different statistical and machine learning methods to perform predictions of proton NMR spectra based on data from our open database NMRShiftDB. A mean absolute error of 0.18 ppm was achieved for the prediction of proton NMR shifts ranging from 0 to 11 ppm. Random forest, J48 decision tree and support vector machines achieved similar overall errors. HOSE codes being a notably simple method achieved a comparatively good result of 0.17 ppm mean absolute error. NMR prediction methods applied in the course of this work delivered precise predictions which can serve as a building block for Computer-Assisted Structure Elucidation for biological metabolites.

  20. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    PubMed Central

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  1. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    PubMed

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  2. Inquiry-Based Learning of Molecular Phylogenetics

    ERIC Educational Resources Information Center

    Campo, Daniel; Garcia-Vazquez, Eva

    2008-01-01

    Reconstructing phylogenies from nucleotide sequences is a challenge for students because it strongly depends on evolutionary models and computer tools that are frequently updated. We present here an inquiry-based course aimed at learning how to trace a phylogeny based on sequences existing in public databases. Computer tools are freely available…

  3. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  4. Building an efficient curation workflow for the Arabidopsis literature corpus

    PubMed Central

    Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva

    2012-01-01

    TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298

  5. Overview of data and conceptual approaches for derivation of quantitative structure-activity relationships for ecotoxicological effects of organic chemicals.

    PubMed

    Bradbury, Steven P; Russom, Christine L; Ankley, Gerald T; Schultz, T Wayne; Walker, John D

    2003-08-01

    The use of quantitative structure-activity relationships (QSARs) in assessing potential toxic effects of organic chemicals on aquatic organisms continues to evolve as computational efficiency and toxicological understanding advance. With the ever-increasing production of new chemicals, and the need to optimize resources to assess thousands of existing chemicals in commerce, regulatory agencies have turned to QSARs as essential tools to help prioritize tiered risk assessments when empirical data are not available to evaluate toxicological effects. Progress in designing scientifically credible QSARs is intimately associated with the development of empirically derived databases of well-defined and quantified toxicity endpoints, which are based on a strategic evaluation of diverse sets of chemical structures, modes of toxic action, and species. This review provides a brief overview of four databases created for the purpose of developing QSARs for estimating toxicity of chemicals to aquatic organisms. The evolution of QSARs based initially on general chemical classification schemes, to models founded on modes of toxic action that range from nonspecific partitioning into hydrophobic cellular membranes to receptor-mediated mechanisms is summarized. Finally, an overview of expert systems that integrate chemical-specific mode of action classification and associated QSAR selection for estimating potential toxicological effects of organic chemicals is presented.

  6. Automatic localization of the nipple in mammograms using Gabor filters and the Radon transform

    NASA Astrophysics Data System (ADS)

    Chakraborty, Jayasree; Mukhopadhyay, Sudipta; Rangayyan, Rangaraj M.; Sadhu, Anup; Azevedo-Marques, P. M.

    2013-02-01

    The nipple is an important landmark in mammograms. Detection of the nipple is useful for alignment and registration of mammograms in computer-aided diagnosis of breast cancer. In this paper, a novel approach is proposed for automatic detection of the nipple based on the oriented patterns of the breast tissues present in mammograms. The Radon transform is applied to the oriented patterns obtained by a bank of Gabor filters to detect the linear structures related to the tissue patterns. The detected linear structures are then used to locate the nipple position using the characteristics of convergence of the tissue patterns towards the nipple. The performance of the method was evaluated with 200 scanned-film images from the mini-MIAS database and 150 digital radiography (DR) images from a local database. Average errors of 5:84 mm and 6:36 mm were obtained with respect to the reference nipple location marked by a radiologist for the mini-MIAS and the DR images, respectively.

  7. Multidisciplinary analysis of actively controlled large flexible spacecraft

    NASA Technical Reports Server (NTRS)

    Cooper, Paul A.; Young, John W.; Sutter, Thomas R.

    1986-01-01

    The control of Flexible Structures (COFS) program has supported the development of an analysis capability at the Langley Research Center called the Integrated Multidisciplinary Analysis Tool (IMAT) which provides an efficient data storage and transfer capability among commercial computer codes to aid in the dynamic analysis of actively controlled structures. IMAT is a system of computer programs which transfers Computer-Aided-Design (CAD) configurations, structural finite element models, material property and stress information, structural and rigid-body dynamic model information, and linear system matrices for control law formulation among various commercial applications programs through a common database. Although general in its formulation, IMAT was developed specifically to aid in the evaluation of the structures. A description of the IMAT system and results of an application of the system are given.

  8. A case study for a digital seabed database: Bohai Sea engineering geology database

    NASA Astrophysics Data System (ADS)

    Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang

    2006-07-01

    This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.

  9. ESCOLEX: a grade-level lexical database from European Portuguese elementary to middle school textbooks.

    PubMed

    Soares, Ana Paula; Medeiros, José Carlos; Simões, Alberto; Machado, João; Costa, Ana; Iriarte, Álvaro; de Almeida, José João; Pinheiro, Ana P; Comesaña, Montserrat

    2014-03-01

    In this article, we introduce ESCOLEX, the first European Portuguese children's lexical database with grade-level-adjusted word frequency statistics. Computed from a 3.2-million-word corpus, ESCOLEX provides 48,381 word forms extracted from 171 elementary and middle school textbooks for 6- to 11-year-old children attending the first six grades in the Portuguese educational system. Like other children's grade-level databases (e.g., Carroll, Davies, & Richman, 1971; Corral, Ferrero, & Goikoetxea, Behavior Research Methods, 41, 1009-1017, 2009; Lété, Sprenger-Charolles, & Colé, Behavior Research Methods, Instruments, & Computers, 36, 156-166, 2004; Zeno, Ivens, Millard, Duvvuri, 1995), ESCOLEX provides four frequency indices for each grade: overall word frequency (F), index of dispersion across the selected textbooks (D), estimated frequency per million words (U), and standard frequency index (SFI). It also provides a new measure, contextual diversity (CD). In addition, the number of letters in the word and its part(s) of speech, number of syllables, syllable structure, and adult frequencies taken from P-PAL (a European Portuguese corpus-based lexical database; Soares, Comesaña, Iriarte, Almeida, Simões, Costa, …, Machado, 2010; Soares, Iriarte, Almeida, Simões, Costa, França, …, Comesaña, in press) are provided. ESCOLEX will be a useful tool both for researchers interested in language processing and development and for professionals in need of verbal materials adjusted to children's developmental stages. ESCOLEX can be downloaded along with this article or from http://p-pal.di.uminho.pt/about/databases .

  10. Centre-based restricted nearest feature plane with angle classifier for face recognition

    NASA Astrophysics Data System (ADS)

    Tang, Linlin; Lu, Huifen; Zhao, Liang; Li, Zuohua

    2017-10-01

    An improved classifier based on the nearest feature plane (NFP), called the centre-based restricted nearest feature plane with the angle (RNFPA) classifier, is proposed for the face recognition problems here. The famous NFP uses the geometrical information of samples to increase the number of training samples, but it increases the computation complexity and it also has an inaccuracy problem coursed by the extended feature plane. To solve the above problems, RNFPA exploits a centre-based feature plane and utilizes a threshold of angle to restrict extended feature space. By choosing the appropriate angle threshold, RNFPA can improve the performance and decrease computation complexity. Experiments in the AT&T face database, AR face database and FERET face database are used to evaluate the proposed classifier. Compared with the original NFP classifier, the nearest feature line (NFL) classifier, the nearest neighbour (NN) classifier and some other improved NFP classifiers, the proposed one achieves competitive performance.

  11. Enhancing SAMOS Data Access in DOMS via a Neo4j Property Graph Database.

    NASA Astrophysics Data System (ADS)

    Stallard, A. P.; Smith, S. R.; Elya, J. L.

    2016-12-01

    The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. The Distributed Oceanographic Match-Up Service (DOMS) under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The service currently uses Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by creating relationships between data objects. The authors will present the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS will also be described. Currently, Neo4j contains spatial and temporal records from SAMOS which are modeled into a time tree and r-tree using Graph Aware and Spatial plugin tools for Neo4j. These extensions provide callable Java procedures within CYPHER (Neo4j's query language) that generate in-graph structures. Once generated, these structures can be queried using procedures from these libraries, or directly via CYPHER statements. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query would be complex and likely prohibitively slow. Using the time tree model, one can specify a path from the root to the data which restricts resolutions to certain timeframes (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative.

  12. Semi-automatic forensic approach using mandibular midline lingual structures as fingerprint: a pilot study.

    PubMed

    Shaheen, E; Mowafy, B; Politis, C; Jacobs, R

    2017-12-01

    Previous research proposed the use of the mandibular midline neurovascular canal structures as a forensic finger print. In their observer study, an average correct identification of 95% was reached which triggered this study. To present a semi-automatic computer recognition approach to replace the observers and to validate the accuracy of this newly proposed method. Imaging data from Computer Tomography (CT) and Cone Beam Computer Tomography (CBCT) of mandibles scanned at two different moments were collected to simulate an AM and PM situation where the first scan presented AM and the second scan was used to simulate PM. Ten cases with 20 scans were used to build a classifier which relies on voxel based matching and results with classification into one of two groups: "Unmatched" and "Matched". This protocol was then tested using five other scans out of the database. Unpaired t-testing was applied and accuracy of the computerized approach was determined. A significant difference was found between the "Unmatched" and "Matched" classes with means of 0.41 and 0.86 respectively. Furthermore, the testing phase showed an accuracy of 100%. The validation of this method pushes this protocol further to a fully automatic identification procedure for victim identification based on the mandibular midline canals structures only in cases with available AM and PM CBCT/CT data.

  13. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

    PubMed

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-03-01

    Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org

  14. FRAGSION: ultra-fast protein fragment library generation by IOHMM sampling.

    PubMed

    Bhattacharya, Debswapna; Adhikari, Badri; Li, Jilong; Cheng, Jianlin

    2016-07-01

    Speed, accuracy and robustness of building protein fragment library have important implications in de novo protein structure prediction since fragment-based methods are one of the most successful approaches in template-free modeling (FM). Majority of the existing fragment detection methods rely on database-driven search strategies to identify candidate fragments, which are inherently time-consuming and often hinder the possibility to locate longer fragments due to the limited sizes of databases. Also, it is difficult to alleviate the effect of noisy sequence-based predicted features such as secondary structures on the quality of fragment. Here, we present FRAGSION, a database-free method to efficiently generate protein fragment library by sampling from an Input-Output Hidden Markov Model. FRAGSION offers some unique features compared to existing approaches in that it (i) is lightning-fast, consuming only few seconds of CPU time to generate fragment library for a protein of typical length (300 residues); (ii) can generate dynamic-size fragments of any length (even for the whole protein sequence) and (iii) offers ways to handle noise in predicted secondary structure during fragment sampling. On a FM dataset from the most recent Critical Assessment of Structure Prediction, we demonstrate that FGRAGSION provides advantages over the state-of-the-art fragment picking protocol of ROSETTA suite by speeding up computation by several orders of magnitude while achieving comparable performance in fragment quality. Source code and executable versions of FRAGSION for Linux and MacOS is freely available to non-commercial users at http://sysbio.rnet.missouri.edu/FRAGSION/ It is bundled with a manual and example data. chengji@missouri.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. Structural Mechanics Predictions Relating to Clinical Coronary Stent Fracture in a 5 Year Period in FDA MAUDE Database

    PubMed Central

    Everett, Kay D.; Conway, Claire; Desany, Gerard J.; Baker, Brian L.; Choi, Gilwoo; Taylor, Charles A.; Edelman, Elazer R.

    2016-01-01

    Endovascular stents are the mainstay of interventional cardiovascular medicine. Technological advances have reduced biological and clinical complications but not mechanical failure. Stent strut fracture is increasingly recognized as of paramount clinical importance. Though consensus reigns that fractures can result from material fatigue, how fracture is induced and the mechanisms underlying its clinical sequelae remain ill-defined. In this study, strut fractures were identified in the prospectively maintained Food and Drug Administration's (FDA) Manufacturer and User Facility Device Experience Database (MAUDE), covering years 2006–2011, and differentiated based on specific coronary artery implantation site and device configuration. These data, and knowledge of the extent of dynamic arterial deformations obtained from patient CT images and published data, were used to define boundary conditions for 3D finite element models incorporating multimodal, multi-cycle deformation. The structural response for a range of stent designs and configurations was predicted by computational models and included estimation of maximum principal, minimum principal and equivalent plastic strains. Fatigue assessment was performed with Goodman diagrams and safe/unsafe regions defined for different stent designs. Von Mises stress and maximum principal strain increased with multimodal, fully reversed deformation. Spatial maps of unsafe locations corresponded to the identified locations of fracture in different coronary arteries in the clinical database. These findings, for the first time, provide insight into a potential link between patient adverse events and computational modeling of stent deformation. Understanding of the mechanical forces imposed under different implantation conditions may assist in rational design and optimal placement of these devices. PMID:26467552

  16. Structural Mechanics Predictions Relating to Clinical Coronary Stent Fracture in a 5 Year Period in FDA MAUDE Database.

    PubMed

    Everett, Kay D; Conway, Claire; Desany, Gerard J; Baker, Brian L; Choi, Gilwoo; Taylor, Charles A; Edelman, Elazer R

    2016-02-01

    Endovascular stents are the mainstay of interventional cardiovascular medicine. Technological advances have reduced biological and clinical complications but not mechanical failure. Stent strut fracture is increasingly recognized as of paramount clinical importance. Though consensus reigns that fractures can result from material fatigue, how fracture is induced and the mechanisms underlying its clinical sequelae remain ill-defined. In this study, strut fractures were identified in the prospectively maintained Food and Drug Administration's (FDA) Manufacturer and User Facility Device Experience Database (MAUDE), covering years 2006-2011, and differentiated based on specific coronary artery implantation site and device configuration. These data, and knowledge of the extent of dynamic arterial deformations obtained from patient CT images and published data, were used to define boundary conditions for 3D finite element models incorporating multimodal, multi-cycle deformation. The structural response for a range of stent designs and configurations was predicted by computational models and included estimation of maximum principal, minimum principal and equivalent plastic strains. Fatigue assessment was performed with Goodman diagrams and safe/unsafe regions defined for different stent designs. Von Mises stress and maximum principal strain increased with multimodal, fully reversed deformation. Spatial maps of unsafe locations corresponded to the identified locations of fracture in different coronary arteries in the clinical database. These findings, for the first time, provide insight into a potential link between patient adverse events and computational modeling of stent deformation. Understanding of the mechanical forces imposed under different implantation conditions may assist in rational design and optimal placement of these devices.

  17. Ground Software Maintenance Facility (GSMF) system manual

    NASA Technical Reports Server (NTRS)

    Derrig, D.; Griffith, G.

    1986-01-01

    The Ground Software Maintenance Facility (GSMF) is designed to support development and maintenance of spacelab ground support software. THE GSMF consists of a Perkin Elmer 3250 (Host computer) and a MITRA 125s (ATE computer), with appropriate interface devices and software to simulate the Electrical Ground Support Equipment (EGSE). This document is presented in three sections: (1) GSMF Overview; (2) Software Structure; and (3) Fault Isolation Capability. The overview contains information on hardware and software organization along with their corresponding block diagrams. The Software Structure section describes the modes of software structure including source files, link information, and database files. The Fault Isolation section describes the capabilities of the Ground Computer Interface Device, Perkin Elmer host, and MITRA ATE.

  18. Computer-aided prediction of xenobiotic metabolism in the human body

    NASA Astrophysics Data System (ADS)

    Bezhentsev, V. M.; Tarasova, O. A.; Dmitriev, A. V.; Rudik, A. V.; Lagunin, A. A.; Filimonov, D. A.; Poroikov, V. V.

    2016-08-01

    The review describes the major databases containing information about the metabolism of xenobiotics, including data on drug metabolism, metabolic enzymes, schemes of biotransformation and the structures of some substrates and metabolites. Computational approaches used to predict the interaction of xenobiotics with metabolic enzymes, prediction of metabolic sites in the molecule, generation of structures of potential metabolites for subsequent evaluation of their properties are considered. The advantages and limitations of various computational methods for metabolism prediction and the prospects for their applications to improve the safety and efficacy of new drugs are discussed. Bibliography — 165 references.

  19. PropBase Query Layer: a single portal to UK subsurface physical property databases

    NASA Astrophysics Data System (ADS)

    Kingdon, Andrew; Nayembil, Martin L.; Richardson, Anne E.; Smith, A. Graham

    2013-04-01

    Until recently, the delivery of geological information for industry and public was achieved by geological mapping. Now pervasively available computers mean that 3D geological models can deliver realistic representations of the geometric location of geological units, represented as shells or volumes. The next phase of this process is to populate these with physical properties data that describe subsurface heterogeneity and its associated uncertainty. Achieving this requires capture and serving of physical, hydrological and other property information from diverse sources to populate these models. The British Geological Survey (BGS) holds large volumes of subsurface property data, derived both from their own research data collection and also other, often commercially derived data sources. This can be voxelated to incorporate this data into the models to demonstrate property variation within the subsurface geometry. All property data held by BGS has for many years been stored in relational databases to ensure their long-term continuity. However these have, by necessity, complex structures; each database contains positional reference data and model information, and also metadata such as sample identification information and attributes that define the source and processing. Whilst this is critical to assessing these analyses, it also hugely complicates the understanding of variability of the property under assessment and requires multiple queries to study related datasets making extracting physical properties from these databases difficult. Therefore the PropBase Query Layer has been created to allow simplified aggregation and extraction of all related data and its presentation of complex data in simple, mostly denormalized, tables which combine information from multiple databases into a single system. The structure from each relational database is denormalized in a generalised structure, so that each dataset can be viewed together in a common format using a simple interface. Data are re-engineered to facilitate easy loading. The query layer structure comprises tables, procedures, functions, triggers, views and materialised views. The structure contains a main table PRB_DATA which contains all of the data with the following attribution: • a unique identifier • the data source • the unique identifier from the parent database for traceability • the 3D location • the property type • the property value • the units • necessary qualifiers • precision information and an audit trail Data sources, property type and units are constrained by dictionaries, a key component of the structure which defines what properties and inheritance hierarchies are to be coded and also guides the process as to what and how these are extracted from the structure. Data types served by the Query Layer include site investigation derived geotechnical data, hydrogeology datasets, regional geochemistry, geophysical logs as well as lithological and borehole metadata. The size and complexity of the data sets with multiple parent structures requires a technically robust approach to keep the layer synchronised. This is achieved through Oracle procedures written in PL/SQL containing the logic required to carry out the data manipulation (inserts, updates, deletes) to keep the layer synchronised with the underlying databases either as regular scheduled jobs (weekly, monthly etc) or invoked on demand. The PropBase Query Layer's implementation has enabled rapid data discovery, visualisation and interpretation of geological data with greater ease, simplifying the parametrisation of 3D model volumes and facilitating the study of intra-unit heterogeneity.

  20. SAPIENS: Spreading Activation Processor for Information Encoded in Network Structures. Technical Report No. 296.

    ERIC Educational Resources Information Center

    Ortony, Andrew; Radin, Dean I.

    The product of researchers' efforts to develop a computer processor which distinguishes between relevant and irrelevant information in the database, Spreading Activation Processor for Information Encoded in Network Structures (SAPIENS) exhibits (1) context sensitivity, (2) efficiency, (3) decreasing activation over time, (4) summation of…

  1. Surface similarity-based molecular query-retrieval

    PubMed Central

    Singh, Rahul

    2007-01-01

    Background Discerning the similarity between molecules is a challenging problem in drug discovery as well as in molecular biology. The importance of this problem is due to the fact that the biochemical characteristics of a molecule are closely related to its structure. Therefore molecular similarity is a key notion in investigations targeting exploration of molecular structural space, query-retrieval in molecular databases, and structure-activity modelling. Determining molecular similarity is related to the choice of molecular representation. Currently, representations with high descriptive power and physical relevance like 3D surface-based descriptors are available. Information from such representations is both surface-based and volumetric. However, most techniques for determining molecular similarity tend to focus on idealized 2D graph-based descriptors due to the complexity that accompanies reasoning with more elaborate representations. Results This paper addresses the problem of determining similarity when molecules are described using complex surface-based representations. It proposes an intrinsic, spherical representation that systematically maps points on a molecular surface to points on a standard coordinate system (a sphere). Molecular surface properties such as shape, field strengths, and effects due to field super-positioningcan then be captured as distributions on the surface of the sphere. Surface-based molecular similarity is subsequently determined by computing the similarity of the surface-property distributions using a novel formulation of histogram-intersection. The similarity formulation is not only sensitive to the 3D distribution of the surface properties, but is also highly efficient to compute. Conclusion The proposed method obviates the computationally expensive step of molecular pose-optimisation, can incorporate conformational variations, and facilitates highly efficient determination of similarity by directly comparing molecular surfaces and surface-based properties. Retrieval performance, applications in structure-activity modeling of complex biological properties, and comparisons with existing research and commercial methods demonstrate the validity and effectiveness of the approach. PMID:17634096

  2. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  3. SM-TF: A structural database of small molecule-transcription factor complexes.

    PubMed

    Xu, Xianjin; Ma, Zhiwei; Sun, Hongmin; Zou, Xiaoqin

    2016-06-30

    Transcription factors (TFs) are the proteins involved in the transcription process, ensuring the correct expression of specific genes. Numerous diseases arise from the dysfunction of specific TFs. In fact, over 30 TFs have been identified as therapeutic targets of about 9% of the approved drugs. In this study, we created a structural database of small molecule-transcription factor (SM-TF) complexes, available online at http://zoulab.dalton.missouri.edu/SM-TF. The 3D structures of the co-bound small molecule and the corresponding binding sites on TFs are provided in the database, serving as a valuable resource to assist structure-based drug design related to TFs. Currently, the SM-TF database contains 934 entries covering 176 TFs from a variety of species. The database is further classified into several subsets by species and organisms. The entries in the SM-TF database are linked to the UniProt database and other sequence-based TF databases. Furthermore, the druggable TFs from human and the corresponding approved drugs are linked to the DrugBank. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  4. New approach for T-wave peak detection and T-wave end location in 12-lead paced ECG signals based on a mathematical model.

    PubMed

    Madeiro, João P V; Nicolson, William B; Cortez, Paulo C; Marques, João A L; Vázquez-Seisdedos, Carlos R; Elangovan, Narmadha; Ng, G Andre; Schlindwein, Fernando S

    2013-08-01

    This paper presents an innovative approach for T-wave peak detection and subsequent T-wave end location in 12-lead paced ECG signals based on a mathematical model of a skewed Gaussian function. Following the stage of QRS segmentation, we establish search windows using a number of the earliest intervals between each QRS offset and subsequent QRS onset. Then, we compute a template based on a Gaussian-function, modified by a mathematical procedure to insert asymmetry, which models the T-wave. Cross-correlation and an approach based on the computation of Trapezium's area are used to locate, respectively, the peak and end point of each T-wave throughout the whole raw ECG signal. For evaluating purposes, we used a database of high resolution 12-lead paced ECG signals, recorded from patients with ischaemic cardiomyopathy (ICM) in the University Hospitals of Leicester NHS Trust, UK, and the well-known QT database. The average T-wave detection rates, sensitivity and positive predictivity, were both equal to 99.12%, for the first database, and, respectively, equal to 99.32% and 99.47%, for QT database. The average time errors computed for T-wave peak and T-wave end locations were, respectively, -0.38±7.12 ms and -3.70±15.46 ms, for the first database, and 1.40±8.99 ms and 2.83±15.27 ms, for QT database. The results demonstrate the accuracy, consistency and robustness of the proposed method for a wide variety of T-wave morphologies studied. Copyright © 2012 IPEM. Published by Elsevier Ltd. All rights reserved.

  5. Materials Databases Infrastructure Constructed by First Principles Calculations: A Review

    DOE PAGES

    Lin, Lianshan

    2015-10-13

    The First Principles calculations, especially the calculation based on High-Throughput Density Functional Theory, have been widely accepted as the major tools in atom scale materials design. The emerging super computers, along with the powerful First Principles calculations, have accumulated hundreds of thousands of crystal and compound records. The exponential growing of computational materials information urges the development of the materials databases, which not only provide unlimited storage for the daily increasing data, but still keep the efficiency in data storage, management, query, presentation and manipulation. This review covers the most cutting edge materials databases in materials design, and their hotmore » applications such as in fuel cells. By comparing the advantages and drawbacks of these high-throughput First Principles materials databases, the optimized computational framework can be identified to fit the needs of fuel cell applications. The further development of high-throughput DFT materials database, which in essence accelerates the materials innovation, is discussed in the summary as well.« less

  6. Highway rock slope management program.

    DOT National Transportation Integrated Search

    2001-06-30

    Development of a comprehensive geotechnical database for risk management of highway rock slope problems is described. Computer software selected to program the client/server application in windows environment, components and structure of the geote...

  7. Reflections on CD-ROM: Bridging the Gap between Technology and Purpose.

    ERIC Educational Resources Information Center

    Saviers, Shannon Smith

    1987-01-01

    Provides a technological overview of CD-ROM (Compact Disc-Read Only Memory), an optically-based medium for data storage offering large storage capacity, computer-based delivery system, read-only medium, and economic mass production. CD-ROM database attributes appropriate for information delivery are also reviewed, including large database size,…

  8. Automated CFD Database Generation for a 2nd Generation Glide-Back-Booster

    NASA Technical Reports Server (NTRS)

    Chaderjian, Neal M.; Rogers, Stuart E.; Aftosmis, Michael J.; Pandya, Shishir A.; Ahmad, Jasim U.; Tejmil, Edward

    2003-01-01

    A new software tool, AeroDB, is used to compute thousands of Euler and Navier-Stokes solutions for a 2nd generation glide-back booster in one week. The solution process exploits a common job-submission grid environment using 13 computers located at 4 different geographical sites. Process automation and web-based access to the database greatly reduces the user workload, removing much of the tedium and tendency for user input errors. The database consists of forces, moments, and solution files obtained by varying the Mach number, angle of attack, and sideslip angle. The forces and moments compare well with experimental data. Stability derivatives are also computed using a monotone cubic spline procedure. Flow visualization and three-dimensional surface plots are used to interpret and characterize the nature of computed flow fields.

  9. FY09 Final Report for LDRD Project: Understanding Viral Quasispecies Evolution through Computation and Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhou, C

    2009-11-12

    In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.

  10. Automated design synthesis of robotic/human workcells for improved manufacturing system design in hazardous environments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Joshua M.

    Manufacturing tasks that are deemed too hazardous for workers require the use of automation, robotics, and/or other remote handling tools. The associated hazards may be radiological or nonradiological, and based on the characteristics of the environment and processing, a design may necessitate robotic labor, human labor, or both. There are also other factors such as cost, ergonomics, maintenance, and efficiency that also effect task allocation and other design choices. Handling the tradeoffs of these factors can be complex, and lack of experience can be an issue when trying to determine if and what feasible automation/robotics options exist. To address thismore » problem, we utilize common engineering design approaches adapted more for manufacturing system design in hazardous environments. We limit our scope to the conceptual and embodiment design stages, specifically a computational algorithm for concept generation and early design evaluation. In regard to concept generation, we first develop the functional model or function structure for the process, using the common 'verb-noun' format for describing function. A common language or functional basis for manufacturing was developed and utilized to formalize function descriptions and guide rules for function decomposition. Potential components for embodiment are also grouped in terms of this functional language and are stored in a database. The properties of each component are given as quantitative and qualitative criteria. Operators are also rated for task-relevant criteria which are used to address task compatibility. Through the gathering of process requirements/constraints, construction of the component database, and development of the manufacturing basis and rule set, design knowledge is stored and available for computer use. Thus, once the higher level process functions are defined, the computer can automate the synthesis of new design concepts through alternating steps of embodiment and function structure updates/decomposition. In the process, criteria guide function allocation of components/operators and help ensure compatibility and feasibility. Through multiple function assignment options and varied function structures, multiple design concepts are created. All of the generated designs are then evaluated based on a number of relevant evaluation criteria: cost, dose, ergonomics, hazards, efficiency, etc. These criteria are computed using physical properties/parameters of each system based on the qualities an engineer would use to make evaluations. Nuclear processes such as oxide conversion and electrorefining are utilized to aid algorithm development and provide test cases for the completed program. Through our approach, we capture design knowledge related to manufacturing and other operations in hazardous environments to enable a computational program to automatically generate and evaluate system design concepts.« less

  11. QuadBase2: web server for multiplexed guanine quadruplex mining and visualization

    PubMed Central

    Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    DNA guanine quadruplexes or G4s are non-canonical DNA secondary structures which affect genomic processes like replication, transcription and recombination. G4s are computationally identified by specific nucleotide motifs which are also called putative G4 (PG4) motifs. Despite the general relevance of these structures, there is currently no tool available that can allow batch queries and genome-wide analysis of these motifs in a user-friendly interface. QuadBase2 (quadbase.igib.res.in) presents a completely reinvented web server version of previously published QuadBase database. QuadBase2 enables users to mine PG4 motifs in up to 178 eukaryotes through the EuQuad module. This module interfaces with Ensembl Compara database, to allow users mine PG4 motifs in the orthologues of genes of interest across eukaryotes. PG4 motifs can be mined across genes and their promoter sequences in 1719 prokaryotes through ProQuad module. This module includes a feature that allows genome-wide mining of PG4 motifs and their visualization as circular histograms. TetraplexFinder, the module for mining PG4 motifs in user-provided sequences is now capable of handling up to 20 MB of data. QuadBase2 is a comprehensive PG4 motif mining tool that further expands the configurations and algorithms for mining PG4 motifs in a user-friendly way. PMID:27185890

  12. FDA toxicity databases and real-time data entry

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arvidson, Kirk B.

    Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributedmore » in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.« less

  13. SuML: A Survey Markup Language for Generalized Survey Encoding

    PubMed Central

    Barclay, MW; Lober, WB; Karras, BT

    2002-01-01

    There is a need in clinical and research settings for a sophisticated, generalized, web based survey tool that supports complex logic, separation of content and presentation, and computable guidelines. There are many commercial and open source survey packages available that provide simple logic; few provide sophistication beyond “goto” statements; none support the use of guidelines. These tools are driven by databases, static web pages, and structured documents using markup languages such as eXtensible Markup Language (XML). We propose a generalized, guideline aware language and an implementation architecture using open source standards.

  14. Atomic interaction networks in the core of protein domains and their native folds.

    PubMed

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S; Sasisekharan, V; Sasisekharan, Ram

    2010-02-23

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be "signature" of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1-2 angstroms (mean 1.61A) C(alpha) RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the 'twilight' and 'midnight' zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools.

  15. Atomic Interaction Networks in the Core of Protein Domains and Their Native Folds

    PubMed Central

    Soundararajan, Venkataramanan; Raman, Rahul; Raguram, S.; Sasisekharan, V.; Sasisekharan, Ram

    2010-01-01

    Vastly divergent sequences populate a majority of protein folds. In the quest to identify features that are conserved within protein domains belonging to the same fold, we set out to examine the entire protein universe on a fold-by-fold basis. We report that the atomic interaction network in the solvent-unexposed core of protein domains are fold-conserved, extraordinary sequence divergence notwithstanding. Further, we find that this feature, termed protein core atomic interaction network (or PCAIN) is significantly distinguishable across different folds, thus appearing to be “signature” of a domain's native fold. As part of this study, we computed the PCAINs for 8698 representative protein domains from families across the 1018 known protein folds to construct our seed database and an automated framework was developed for PCAIN-based characterization of the protein fold universe. A test set of randomly selected domains that are not in the seed database was classified with over 97% accuracy, independent of sequence divergence. As an application of this novel fold signature, a PCAIN-based scoring scheme was developed for comparative (homology-based) structure prediction, with 1–2 angstroms (mean 1.61A) Cα RMSD generally observed between computed structures and reference crystal structures. Our results are consistent across the full spectrum of test domains including those from recent CASP experiments and most notably in the ‘twilight’ and ‘midnight’ zones wherein <30% and <10% target-template sequence identity prevails (mean twilight RMSD of 1.69A). We further demonstrate the utility of the PCAIN protocol to derive biological insight into protein structure-function relationships, by modeling the structure of the YopM effector novel E3 ligase (NEL) domain from plague-causative bacterium Yersinia Pestis and discussing its implications for host adaptive and innate immune modulation by the pathogen. Considering the several high-throughput, sequence-identity-independent applications demonstrated in this work, we suggest that the PCAIN is a fundamental fold feature that could be a valuable addition to the arsenal of protein modeling and analysis tools. PMID:20186337

  16. Aerodynamic Database Development for Mars Smart Lander Vehicle Configurations

    NASA Technical Reports Server (NTRS)

    Bobskill, Glenn J.; Parikh, Paresh C.; Prabhu, Ramadas K.; Tyler, Erik D.

    2002-01-01

    An aerodynamic database has been generated for the Mars Smart Lander Shelf-All configuration using computational fluid dynamics (CFD) simulations. Three different CFD codes, USM3D and FELISA, based on unstructured grid technology and LAURA, an established and validated structured CFD code, were used. As part of this database development, the results for the Mars continuum were validated with experimental data and comparisons made where applicable. The validation of USM3D and LAURA with the Unitary experimental data, the use of intermediate LAURA check analyses, as well as the validation of FELISA with the Mach 6 CF(sub 4) experimental data provided a higher confidence in the ability for CFD to provide aerodynamic data in order to determine the static trim characteristics for longitudinal stability. The analyses of the noncontinuum regime showed the existence of multiple trim angles of attack that can be unstable or stable trim points. This information is needed to design guidance controller throughout the trajectory.

  17. Non-Coding RNA Analysis Using the Rfam Database.

    PubMed

    Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I

    2018-06-01

    Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  18. Drug Repositioning by Kernel-Based Integration of Molecular Structure, Molecular Activity, and Phenotype Data

    PubMed Central

    Wang, Yongcui; Chen, Shilong; Deng, Naiyang; Wang, Yong

    2013-01-01

    Computational inference of novel therapeutic values for existing drugs, i.e., drug repositioning, offers the great prospect for faster and low-risk drug development. Previous researches have indicated that chemical structures, target proteins, and side-effects could provide rich information in drug similarity assessment and further disease similarity. However, each single data source is important in its own way and data integration holds the great promise to reposition drug more accurately. Here, we propose a new method for drug repositioning, PreDR (Predict Drug Repositioning), to integrate molecular structure, molecular activity, and phenotype data. Specifically, we characterize drug by profiling in chemical structure, target protein, and side-effects space, and define a kernel function to correlate drugs with diseases. Then we train a support vector machine (SVM) to computationally predict novel drug-disease interactions. PreDR is validated on a well-established drug-disease network with 1,933 interactions among 593 drugs and 313 diseases. By cross-validation, we find that chemical structure, drug target, and side-effects information are all predictive for drug-disease relationships. More experimentally observed drug-disease interactions can be revealed by integrating these three data sources. Comparison with existing methods demonstrates that PreDR is competitive both in accuracy and coverage. Follow-up database search and pathway analysis indicate that our new predictions are worthy of further experimental validation. Particularly several novel predictions are supported by clinical trials databases and this shows the significant prospects of PreDR in future drug treatment. In conclusion, our new method, PreDR, can serve as a useful tool in drug discovery to efficiently identify novel drug-disease interactions. In addition, our heterogeneous data integration framework can be applied to other problems. PMID:24244318

  19. Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery.

    PubMed

    Karthikeyan, Muthukumarasamy; Vyas, Renu

    2015-01-01

    Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.

  20. A new relational database structure and online interface for the HITRAN database

    NASA Astrophysics Data System (ADS)

    Hill, Christian; Gordon, Iouli E.; Rothman, Laurence S.; Tennyson, Jonathan

    2013-11-01

    A new format for the HITRAN database is proposed. By storing the line-transition data in a number of linked tables described by a relational database schema, it is possible to overcome the limitations of the existing format, which have become increasingly apparent over the last few years as new and more varied data are being used by radiative-transfer models. Although the database in the new format can be searched using the well-established Structured Query Language (SQL), a web service, HITRANonline, has been deployed to allow users to make most common queries of the database using a graphical user interface in a web page. The advantages of the relational form of the database to ensuring data integrity and consistency are explored, and the compatibility of the online interface with the emerging standards of the Virtual Atomic and Molecular Data Centre (VAMDC) project is discussed. In particular, the ability to access HITRAN data using a standard query language from other websites, command line tools and from within computer programs is described.

  1. Tensoral for post-processing users and simulation authors

    NASA Technical Reports Server (NTRS)

    Dresselhaus, Eliot

    1993-01-01

    The CTR post-processing effort aims to make turbulence simulations and data more readily and usefully available to the research and industrial communities. The Tensoral language, which provides the foundation for this effort, is introduced here in the form of a user's guide. The Tensoral user's guide is presented in two main sections. Section one acts as a general introduction and guides database users who wish to post-process simulation databases. Section two gives a brief description of how database authors and other advanced users can make simulation codes and/or the databases they generate available to the user community via Tensoral database back ends. The two-part structure of this document conforms to the two-level design structure of the Tensoral language. Tensoral has been designed to be a general computer language for performing tensor calculus and statistics on numerical data. Tensoral's generality allows it to be used for stand-alone native coding of high-level post-processing tasks (as described in section one of this guide). At the same time, Tensoral's specialization to a minute task (namely, to numerical tensor calculus and statistics) allows it to be easily embedded into applications written partly in Tensoral and partly in other computer languages (here, C and Vectoral). Embedded Tensoral, aimed at advanced users for more general coding (e.g. of efficient simulations, for interfacing with pre-existing software, for visualization, etc.), is described in section two of this guide.

  2. Self Organizing Map-Based Classification of Cathepsin k and S Inhibitors with Different Selectivity Profiles Using Different Structural Molecular Fingerprints: Design and Application for Discovery of Novel Hits.

    PubMed

    Ihmaid, Saleh K; Ahmed, Hany E A; Zayed, Mohamed F; Abadleh, Mohammed M

    2016-01-30

    The main step in a successful drug discovery pipeline is the identification of small potent compounds that selectively bind to the target of interest with high affinity. However, there is still a shortage of efficient and accurate computational methods with powerful capability to study and hence predict compound selectivity properties. In this work, we propose an affordable machine learning method to perform compound selectivity classification and prediction. For this purpose, we have collected compounds with reported activity and built a selectivity database formed of 153 cathepsin K and S inhibitors that are considered of medicinal interest. This database has three compound sets, two K/S and S/K selective ones and one non-selective KS one. We have subjected this database to the selectivity classification tool 'Emergent Self-Organizing Maps' for exploring its capability to differentiate selective cathepsin inhibitors for one target over the other. The method exhibited good clustering performance for selective ligands with high accuracy (up to 100 %). Among the possibilites, BAPs and MACCS molecular structural fingerprints were used for such a classification. The results exhibited the ability of the method for structure-selectivity relationship interpretation and selectivity markers were identified for the design of further novel inhibitors with high activity and target selectivity.

  3. PDB explorer -- a web based algorithm for protein annotation viewer and 3D visualization.

    PubMed

    Nayarisseri, Anuraj; Shardiwal, Rakesh Kumar; Yadav, Mukesh; Kanungo, Neha; Singh, Pooja; Shah, Pratik; Ahmed, Sheaza

    2014-12-01

    The PDB file format, is a text format characterizing the three dimensional structures of macro molecules available in the Protein Data Bank (PDB). Determined protein structure are found in coalition with other molecules or ions such as nucleic acids, water, ions, Drug molecules and so on, which therefore can be described in the PDB format and have been deposited in PDB database. PDB is a machine generated file, it's not human readable format, to read this file we need any computational tool to understand it. The objective of our present study is to develop a free online software for retrieval, visualization and reading of annotation of a protein 3D structure which is available in PDB database. Main aim is to create PDB file in human readable format, i.e., the information in PDB file is converted in readable sentences. It displays all possible information from a PDB file including 3D structure of that file. Programming languages and scripting languages like Perl, CSS, Javascript, Ajax, and HTML have been used for the development of PDB Explorer. The PDB Explorer directly parses the PDB file, calling methods for parsed element secondary structure element, atoms, coordinates etc. PDB Explorer is freely available at http://www.pdbexplorer.eminentbio.com/home with no requirement of log-in.

  4. VaProS: a database-integration approach for protein/genome information retrieval.

    PubMed

    Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

    2016-12-01

    Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .

  5. Review of Spatial-Database System Usability: Recommendations for the ADDNS Project

    DTIC Science & Technology

    2007-12-01

    basic GIS background information , with a closer look at spatial databases. A GIS is also a computer- based system designed to capture, manage...foundation for deploying enterprise-wide spatial information systems . According to Oracle® [18], it enables accurate delivery of location- based services...Toronto TR 2007-141 Lanter, D.P. (1991). Design of a lineage- based meta-data base for GIS. Cartography and Geographic Information Systems , 18

  6. An Inverse Interpolation Method Utilizing In-Flight Strain Measurements for Determining Loads and Structural Response of Aerospace Vehicles

    NASA Technical Reports Server (NTRS)

    Shkarayev, S.; Krashantisa, R.; Tessler, A.

    2004-01-01

    An important and challenging technology aimed at the next generation of aerospace vehicles is that of structural health monitoring. The key problem is to determine accurately, reliably, and in real time the applied loads, stresses, and displacements experienced in flight, with such data establishing an information database for structural health monitoring. The present effort is aimed at developing a finite element-based methodology involving an inverse formulation that employs measured surface strains to recover the applied loads, stresses, and displacements in an aerospace vehicle in real time. The computational procedure uses a standard finite element model (i.e., "direct analysis") of a given airframe, with the subsequent application of the inverse interpolation approach. The inverse interpolation formulation is based on a parametric approximation of the loading and is further constructed through a least-squares minimization of calculated and measured strains. This procedure results in the governing system of linear algebraic equations, providing the unknown coefficients that accurately define the load approximation. Numerical simulations are carried out for problems involving various levels of structural approximation. These include plate-loading examples and an aircraft wing box. Accuracy and computational efficiency of the proposed method are discussed in detail. The experimental validation of the methodology by way of structural testing of an aircraft wing is also discussed.

  7. THGS: a web-based database of Transmembrane Helices in Genome Sequences

    PubMed Central

    Fernando, S. A.; Selvarani, P.; Das, Soma; Kumar, Ch. Kiran; Mondal, Sukanta; Ramakumar, S.; Sekar, K.

    2004-01-01

    Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http://pranag.physics.iisc.ernet.in/thgs/ or http://144.16.71.10/thgs/. PMID:14681375

  8. Design of Control Plane Architecture Based on Cloud Platform and Experimental Network Demonstration for Multi-domain SDON

    NASA Astrophysics Data System (ADS)

    Li, Ming; Yin, Hongxi; Xing, Fangyuan; Wang, Jingchao; Wang, Honghuan

    2016-02-01

    With the features of network virtualization and resource programming, Software Defined Optical Network (SDON) is considered as the future development trend of optical network, provisioning a more flexible, efficient and open network function, supporting intraconnection and interconnection of data centers. Meanwhile cloud platform can provide powerful computing, storage and management capabilities. In this paper, with the coordination of SDON and cloud platform, a multi-domain SDON architecture based on cloud control plane has been proposed, which is composed of data centers with database (DB), path computation element (PCE), SDON controller and orchestrator. In addition, the structure of the multidomain SDON orchestrator and OpenFlow-enabled optical node are proposed to realize the combination of centralized and distributed effective management and control platform. Finally, the functional verification and demonstration are performed through our optical experiment network.

  9. DTREEv2, a computer-based support system for the risk assessment of genetically modified plants.

    PubMed

    Pertry, Ine; Nothegger, Clemens; Sweet, Jeremy; Kuiper, Harry; Davies, Howard; Iserentant, Dirk; Hull, Roger; Mezzetti, Bruno; Messens, Kathy; De Loose, Marc; de Oliveira, Dulce; Burssens, Sylvia; Gheysen, Godelieve; Tzotzos, George

    2014-03-25

    Risk assessment of genetically modified organisms (GMOs) remains a contentious area and a major factor influencing the adoption of agricultural biotech. Methodologically, in many countries, risk assessment is conducted by expert committees with little or no recourse to databases and expert systems that can facilitate the risk assessment process. In this paper we describe DTREEv2, a computer-based decision support system for the identification of hazards related to the introduction of GM-crops into the environment. DTREEv2 structures hazard identification and evaluation by means of an Event-Tree type of analysis. The system produces an output flagging identified hazards and potential risks. It is intended to be used for the preparation and evaluation of biosafety dossiers and, as such, its usefulness extends to researchers, risk assessors and regulators in government and industry. Copyright © 2013 Elsevier B.V. All rights reserved.

  10. 48 CFR 352.227-14 - Rights in Data-Exceptional Circumstances.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ....] Computer database or database means a collection of recorded information in a form capable of, and for the... databases or computer software documentation. Computer software documentation means owner's manuals, user's... nature (including computer databases and computer software documentation). This term does not include...

  11. Computation of Flow Through Water-Control Structures Using Program DAMFLO.2

    USGS Publications Warehouse

    Sanders, Curtis L.; Feaster, Toby D.

    2004-01-01

    As part of its mission to collect, analyze, and store streamflow data, the U.S. Geological Survey computes flow through several dam structures throughout the country. Flows are computed using hydraulic equations that describe flow through sluice and Tainter gates, crest gates, lock gates, spillways, locks, pumps, and siphons, which are calibrated using flow measurements. The program DAMFLO.2 was written to compute, tabulate, and plot flow through dam structures using data that describe the physical properties of dams and various hydraulic parameters and ratings that use time-varying data, such as lake elevations or gate openings. The program uses electronic computer files of time-varying data, such as lake elevation or gate openings, retrieved from the U.S. Geological Survey Automated Data Processing System. Computed time-varying flow data from DAMFLO.2 are output in flat files, which can be entered into the Automated Data Processing System database. All computations are made in units of feet and seconds. DAMFLO.2 uses the procedures and language developed by the SAS Institute Inc.

  12. Computer-Assisted Promotion of Recreational Opportunities in Natural Resource Areas: A Demonstration and Case Example

    Treesearch

    Emilyn Sheffield; Leslie Furr; Charles Nelson

    1992-01-01

    Filevision IV is a multilayer imaging and data-base management system that combines drawing, filing and extensive report-writing capabilities (Filevision IV, 1988). Filevision IV users access data by attaching graphics to text-oriented data-base records. Tourist attractions, support services, and geo-graphic features can be located on a base map of an area or region....

  13. GLYDE-II: The GLYcan data exchange format

    PubMed Central

    Ranzinger, Rene; Kochut, Krys J.; Miller, John A.; Eavenson, Matthew; Lütteke, Thomas; York, William S.

    2017-01-01

    Summary The GLYcan Data Exchange (GLYDE) standard has been developed for the representation of the chemical structures of monosaccharides, glycans and glycoconjugates using a connection table formalism formatted in XML. This format allows structures, including those that do not exist in any database, to be unambiguously represented and shared by diverse computational tools. GLYDE implements a partonomy model based on human language along with rules that provide consistent structural representations, including a robust namespace for specifying monosaccharides. This approach facilitates the reuse of data processing software at the level of granularity that is most appropriate for extraction of the desired information. GLYDE-II has already been used as a key element of several glycoinformatics tools. The philosophical and technical underpinnings of GLYDE-II and recent implementation of its enhanced features are described. PMID:28955652

  14. Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA.

    PubMed

    Xia, Fei; Jin, Guoqing

    2014-06-01

    PKNOTS is a most famous benchmark program and has been widely used to predict RNA secondary structure including pseudoknots. It adopts the standard four-dimensional (4D) dynamic programming (DP) method and is the basis of many variants and improved algorithms. Unfortunately, the O(N(6)) computing requirements and complicated data dependency greatly limits the usefulness of PKNOTS package with the explosion in gene database size. In this paper, we present a fine-grained parallel PKNOTS package and prototype system for accelerating RNA folding application based on FPGA chip. We adopted a series of storage optimization strategies to resolve the "Memory Wall" problem. We aggressively exploit parallel computing strategies to improve computational efficiency. We also propose several methods that collectively reduce the storage requirements for FPGA on-chip memory. To the best of our knowledge, our design is the first FPGA implementation for accelerating 4D DP problem for RNA folding application including pseudoknots. The experimental results show a factor of more than 50x average speedup over the PKNOTS-1.08 software running on a PC platform with Intel Core2 Q9400 Quad CPU for input RNA sequences. However, the power consumption of our FPGA accelerator is only about 50% of the general-purpose micro-processors.

  15. A PDB-wide, evolution-based assessment of protein-protein interfaces.

    PubMed

    Baskaran, Kumaran; Duarte, Jose M; Biyani, Nikhil; Bliven, Spencer; Capitani, Guido

    2014-10-18

    Thanks to the growth in sequence and structure databases, more than 50 million sequences are now available in UniProt and 100,000 structures in the PDB. Rich information about protein-protein interfaces can be obtained by a comprehensive study of protein contacts in the PDB, their sequence conservation and geometric features. An automated computational pipeline was developed to run our Evolutionary Protein-Protein Interface Classifier (EPPIC) software on the entire PDB and store the results in a relational database, currently containing > 800,000 interfaces. This allows the analysis of interface data on a PDB-wide scale. Two large benchmark datasets of biological interfaces and crystal contacts, each containing about 3000 entries, were automatically generated based on criteria thought to be strong indicators of interface type. The BioMany set of biological interfaces includes NMR dimers solved as crystal structures and interfaces that are preserved across diverse crystal forms, as catalogued by the Protein Common Interface Database (ProtCID) from Xu and Dunbrack. The second dataset, XtalMany, is derived from interfaces that would lead to infinite assemblies and are therefore crystal contacts. BioMany and XtalMany were used to benchmark the EPPIC approach. The performance of EPPIC was also compared to classifications from the Protein Interfaces, Surfaces, and Assemblies (PISA) program on a PDB-wide scale, finding that the two approaches give the same call in about 88% of PDB interfaces. By comparing our safest predictions to the PDB author annotations, we provide a lower-bound estimate of the error rate of biological unit annotations in the PDB. Additionally, we developed a PyMOL plugin for direct download and easy visualization of EPPIC interfaces for any PDB entry. Both the datasets and the PyMOL plugin are available at http://www.eppic-web.org/ewui/\\#downloads. Our computational pipeline allows us to analyze protein-protein contacts and their sequence conservation across the entire PDB. Two new benchmark datasets are provided, which are over an order of magnitude larger than existing manually curated ones. These tools enable the comprehensive study of several aspects of protein-protein contacts in the PDB and represent a basis for future, even larger scale studies of protein-protein interactions.

  16. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    PubMed

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  17. Development of a Boundary Layer Property Interpolation Tool in Support of Orbiter Return To Flight

    NASA Technical Reports Server (NTRS)

    Greene, Francis A.; Hamilton, H. Harris

    2006-01-01

    A new tool was developed to predict the boundary layer quantities required by several physics-based predictive/analytic methods that assess damaged Orbiter tile. This new tool, the Boundary Layer Property Prediction (BLPROP) tool, supplies boundary layer values used in correlations that determine boundary layer transition onset and surface heating-rate augmentation/attenuation factors inside tile gouges (i.e. cavities). BLPROP interpolates through a database of computed solutions and provides boundary layer and wall data (delta, theta, Re(sub theta)/M(sub e), Re(sub theta)/M(sub e), Re(sub theta), P(sub w), and q(sub w)) based on user input surface location and free stream conditions. Surface locations are limited to the Orbiter s windward surface. Constructed using predictions from an inviscid w/boundary-layer method and benchmark viscous CFD, the computed database covers the hypersonic continuum flight regime based on two reference flight trajectories. First-order one-dimensional Lagrange interpolation accounts for Mach number and angle-of-attack variations, whereas non-dimensional normalization accounts for differences between the reference and input Reynolds number. Employing the same computational methods used to construct the database, solutions at other trajectory points taken from previous STS flights were computed: these results validate the BLPROP algorithm. Percentage differences between interpolated and computed values are presented and are used to establish the level of uncertainty of the new tool.

  18. Mitigating component performance variation

    DOEpatents

    Gara, Alan G.; Sylvester, Steve S.; Eastep, Jonathan M.; Nagappan, Ramkumar; Cantalupo, Christopher M.

    2018-01-09

    Apparatus and methods may provide for characterizing a plurality of similar components of a distributed computing system based on a maximum safe operation level associated with each component and storing characterization data in a database and allocating non-uniform power to each similar component based at least in part on the characterization data in the database to substantially equalize performance of the components.

  19. A statistical view of FMRFamide neuropeptide diversity.

    PubMed

    Espinoza, E; Carrigan, M; Thomas, S G; Shaw, G; Edison, A S

    2000-01-01

    FMRFamide-like peptide (FLP) amino acid sequences have been collected and statistically analyzed. FLP amino acid composition as a function of position in the peptide is graphically presented for several major phyla. Results of total amino acid composition and frequencies of pairs of FLP amino acids have been computed and compared with corresponding values from the entire GenBank protein sequence database. The data for pairwise distributions of amino acids should help in future structure-function studies of FLPs. To aid in future peptide discovery, a computer program and search protocol was developed to identify FLPs from the GenBank protein database without the use of keywords.

  20. Comprehensive Thematic T-Matrix Reference Database: A 2015-2017 Update

    NASA Technical Reports Server (NTRS)

    Mishchenko, Michael I.; Zakharova, Nadezhda; Khlebtsov, Nikolai G.; Videen, Gorden; Wriedt, Thomas

    2017-01-01

    The T-matrix method pioneered by Peter C. Waterman is one of the most versatile and efficient numerically exact computer solvers of the time-harmonic macroscopic Maxwell equations. It is widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, periodic structures (including metamaterials), and particles in the vicinity of plane or rough interfaces separating media with different refractive indices. This paper is the eighth update to the comprehensive thematic database of peer-reviewed T-matrix publications initiated in 2004 and lists relevant publications that have appeared since 2015. It also references a small number of earlier publications overlooked previously.

  1. Comprehensive thematic T-matrix reference database: A 2015-2017 update

    NASA Astrophysics Data System (ADS)

    Mishchenko, Michael I.; Zakharova, Nadezhda T.; Khlebtsov, Nikolai G.; Videen, Gorden; Wriedt, Thomas

    2017-11-01

    The T-matrix method pioneered by Peter C. Waterman is one of the most versatile and efficient numerically exact computer solvers of the time-harmonic macroscopic Maxwell equations. It is widely used for the computation of electromagnetic scattering by single and composite particles, discrete random media, periodic structures (including metamaterials), and particles in the vicinity of plane or rough interfaces separating media with different refractive indices. This paper is the eighth update to the comprehensive thematic database of peer-reviewed T-matrix publications initiated in 2004 and lists relevant publications that have appeared since 2015. It also references a small number of earlier publications overlooked previously.

  2. Autonomous facial recognition system inspired by human visual system based logarithmical image visualization technique

    NASA Astrophysics Data System (ADS)

    Wan, Qianwen; Panetta, Karen; Agaian, Sos

    2017-05-01

    Autonomous facial recognition system is widely used in real-life applications, such as homeland border security, law enforcement identification and authentication, and video-based surveillance analysis. Issues like low image quality, non-uniform illumination as well as variations in poses and facial expressions can impair the performance of recognition systems. To address the non-uniform illumination challenge, we present a novel robust autonomous facial recognition system inspired by the human visual system based, so called, logarithmical image visualization technique. In this paper, the proposed method, for the first time, utilizes the logarithmical image visualization technique coupled with the local binary pattern to perform discriminative feature extraction for facial recognition system. The Yale database, the Yale-B database and the ATT database are used for computer simulation accuracy and efficiency testing. The extensive computer simulation demonstrates the method's efficiency, accuracy, and robustness of illumination invariance for facial recognition.

  3. Automated prediction of protein function and detection of functional sites from structure.

    PubMed

    Pazos, Florencio; Sternberg, Michael J E

    2004-10-12

    Current structural genomics projects are yielding structures for proteins whose functions are unknown. Accordingly, there is a pressing requirement for computational methods for function prediction. Here we present PHUNCTIONER, an automatic method for structure-based function prediction using automatically extracted functional sites (residues associated to functions). The method relates proteins with the same function through structural alignments and extracts 3D profiles of conserved residues. Functional features to train the method are extracted from the Gene Ontology (GO) database. The method extracts these features from the entire GO hierarchy and hence is applicable across the whole range of function specificity. 3D profiles associated with 121 GO annotations were extracted. We tested the power of the method both for the prediction of function and for the extraction of functional sites. The success of function prediction by our method was compared with the standard homology-based method. In the zone of low sequence similarity (approximately 15%), our method assigns the correct GO annotation in 90% of the protein structures considered, approximately 20% higher than inheritance of function from the closest homologue.

  4. Development of a statewide landslide inventory program.

    DOT National Transportation Integrated Search

    2003-02-01

    Development of a comprehensive geotechnical database for risk management of highway landslide problems is described. Computer software selected to program the client/server application in a data window, components and structure of the geotechnical da...

  5. High-Performance Secure Database Access Technologies for HEP Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matthew Vranicar; John Weicher

    2006-04-17

    The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less

  6. Construction of crystal structure prototype database: methods and applications.

    PubMed

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-26

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  7. Construction of crystal structure prototype database: methods and applications

    NASA Astrophysics Data System (ADS)

    Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming

    2017-04-01

    Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.

  8. New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.

    PubMed

    Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta

    2018-04-15

    Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.

  9. StraPep: a structure database of bioactive peptides

    PubMed Central

    Wang, Jian; Yin, Tailang; Xiao, Xuwen; He, Dan; Xue, Zhidong; Jiang, Xinnong; Wang, Yan

    2018-01-01

    Abstract Bioactive peptides, with a variety of biological activities and wide distribution in nature, have attracted great research interest in biological and medical fields, especially in pharmaceutical industry. The structural information of bioactive peptide is important for the development of peptide-based drugs. Many databases have been developed cataloguing bioactive peptides. However, to our knowledge, database dedicated to collect all the bioactive peptides with known structure is not available yet. Thus, we developed StraPep, a structure database of bioactive peptides. StraPep holds 3791 bioactive peptide structures, which belong to 1312 unique bioactive peptide sequences. About 905 out of 1312 (68%) bioactive peptides in StraPep contain disulfide bonds, which is significantly higher than that (21%) of PDB. Interestingly, 150 out of 616 (24%) bioactive peptides with three or more disulfide bonds form a structural motif known as cystine knot, which confers considerable structural stability on proteins and is an attractive scaffold for drug design. Detailed information of each peptide, including the experimental structure, the location of disulfide bonds, secondary structure, classification, post-translational modification and so on, has been provided. A wide range of user-friendly tools, such as browsing, sequence and structure-based searching and so on, has been incorporated into StraPep. We hope that this database will be helpful for the research community. Database URL: http://isyslab.info/StraPep PMID:29688386

  10. Data Stream Mining

    NASA Astrophysics Data System (ADS)

    Gaber, Mohamed Medhat; Zaslavsky, Arkady; Krishnaswamy, Shonali

    Data mining is concerned with the process of computationally extracting hidden knowledge structures represented in models and patterns from large data repositories. It is an interdisciplinary field of study that has its roots in databases, statistics, machine learning, and data visualization. Data mining has emerged as a direct outcome of the data explosion that resulted from the success in database and data warehousing technologies over the past two decades (Fayyad, 1997,Fayyad, 1998,Kantardzic, 2003).

  11. The research of network database security technology based on web service

    NASA Astrophysics Data System (ADS)

    Meng, Fanxing; Wen, Xiumei; Gao, Liting; Pang, Hui; Wang, Qinglin

    2013-03-01

    Database technology is one of the most widely applied computer technologies, its security is becoming more and more important. This paper introduced the database security, network database security level, studies the security technology of the network database, analyzes emphatically sub-key encryption algorithm, applies this algorithm into the campus-one-card system successfully. The realization process of the encryption algorithm is discussed, this method is widely used as reference in many fields, particularly in management information system security and e-commerce.

  12. Informatics application provides instant research to practice benefits.

    PubMed Central

    Bowles, K. H.; Peng, T.; Qian, R.; Naylor, M. D.

    2001-01-01

    A web-based research information system was designed to enable our research team to efficiently measure health related quality of life among frail older adults in a variety of health care settings (home care, nursing homes, assisted living, PACE). The structure, process, and outcome data is collected using laptop computers and downloaded to a SQL database. Unique features of this project are the ability to transfer research to practice by instantly sharing individual and aggregate results with the clinicians caring for these elders and directly impacting the quality of their care. Clinicians can also dial in to the database to access standard queries or receive customized reports about the patients in their facilities. This paper will describe the development and implementation of the information system. The conference presentation will include a demonstration and examples of research to practice benefits. PMID:11825156

  13. Heterogeneous distributed databases: A case study

    NASA Technical Reports Server (NTRS)

    Stewart, Tracy R.; Mukkamala, Ravi

    1991-01-01

    Alternatives are reviewed for accessing distributed heterogeneous databases and a recommended solution is proposed. The current study is limited to the Automated Information Systems Center at the Naval Sea Combat Systems Engineering Station at Norfolk, VA. This center maintains two databases located on Digital Equipment Corporation's VAX computers running under the VMS operating system. The first data base, ICMS, resides on a VAX11/780 and has been implemented using VAX DBMS, a CODASYL based system. The second database, CSA, resides on a VAX 6460 and has been implemented using the ORACLE relational database management system (RDBMS). Both databases are used for configuration management within the U.S. Navy. Different customer bases are supported by each database. ICMS tracks U.S. Navy ships and major systems (anti-sub, sonar, etc.). Even though the major systems on ships and submarines have totally different functions, some of the equipment within the major systems are common to both ships and submarines.

  14. ACToR-Aggregated Computational Resource | Science ...

    EPA Pesticide Factsheets

    ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food & Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Center for Computational Toxicology, ACToR helps manage large data sets being used in a high throughput environmental chemical screening and prioritization program called ToxCast(TM).

  15. ACToR - Aggregated Computational Toxicology Resource

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Judson, Richard; Richard, Ann; Dix, David

    2008-11-15

    ACToR (Aggregated Computational Toxicology Resource) is a database and set of software applications that bring into one central location many types and sources of data on environmental chemicals. Currently, the ACToR chemical database contains information on chemical structure, in vitro bioassays and in vivo toxicology assays derived from more than 150 sources including the U.S. Environmental Protection Agency (EPA), Centers for Disease Control (CDC), U.S. Food and Drug Administration (FDA), National Institutes of Health (NIH), state agencies, corresponding government agencies in Canada, Europe and Japan, universities, the World Health Organization (WHO) and non-governmental organizations (NGOs). At the EPA National Centermore » for Computational Toxicology, ACToR helps manage large data sets being used in a high-throughput environmental chemical screening and prioritization program called ToxCast{sup TM}.« less

  16. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  17. A Framework for Preliminary Design of Aircraft Structures Based on Process Information. Part 1

    NASA Technical Reports Server (NTRS)

    Rais-Rohani, Masoud

    1998-01-01

    This report discusses the general framework and development of a computational tool for preliminary design of aircraft structures based on process information. The described methodology is suitable for multidisciplinary design optimization (MDO) activities associated with integrated product and process development (IPPD). The framework consists of three parts: (1) product and process definitions; (2) engineering synthesis, and (3) optimization. The product and process definitions are part of input information provided by the design team. The backbone of the system is its ability to analyze a given structural design for performance as well as manufacturability and cost assessment. The system uses a database on material systems and manufacturing processes. Based on the identified set of design variables and an objective function, the system is capable of performing optimization subject to manufacturability, cost, and performance constraints. The accuracy of the manufacturability measures and cost models discussed here depend largely on the available data on specific methods of manufacture and assembly and associated labor requirements. As such, our focus in this research has been on the methodology itself and not so much on its accurate implementation in an industrial setting. A three-tier approach is presented for an IPPD-MDO based design of aircraft structures. The variable-complexity cost estimation methodology and an approach for integrating manufacturing cost assessment into design process are also discussed. This report is presented in two parts. In the first part, the design methodology is presented, and the computational design tool is described. In the second part, a prototype model of the preliminary design Tool for Aircraft Structures based on Process Information (TASPI) is described. Part two also contains an example problem that applies the methodology described here for evaluation of six different design concepts for a wing spar.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berra, P.B.; Chung, S.M.; Hachem, N.I.

    This article presents techniques for managing a very large data/knowledge base to support multiple inference-mechanisms for logic programming. Because evaluation of goals can require accessing data from the extensional database, or EDB, in very general ways, one must often resort to indexing on all fields of the extensional database facts. This presents a formidable management problem in that the index data may be larger than the EDB itself. This problem becomes even more serious in this case of very large data/knowledge bases (hundreds of gigabytes), since considerably more hardware will be required to process and store the index data. Inmore » order to reduce the amount of index data considerably without losing generality, the authors form a surrogate file, which is a hashing transformation of the facts. Superimposed code words (SCW), concatenated code words (CCW), and transformed inverted lists (TIL) are possible structures for the surrogate file. since these transformations are quite regular and compact, the authors consider possible computer architecture for the processing of the surrogate file.« less

  19. Algorithm for detection the QRS complexes based on support vector machine

    NASA Astrophysics Data System (ADS)

    Van, G. V.; Podmasteryev, K. V.

    2017-11-01

    The efficiency of computer ECG analysis depends on the accurate detection of QRS-complexes. This paper presents an algorithm for QRS complex detection based of support vector machine (SVM). The proposed algorithm is evaluated on annotated standard databases such as MIT-BIH Arrhythmia database. The QRS detector obtained a sensitivity Se = 98.32% and specificity Sp = 95.46% for MIT-BIH Arrhythmia database. This algorithm can be used as the basis for the software to diagnose electrical activity of the heart.

  20. The multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) high performance computing infrastructure: applications in neuroscience and neuroinformatics research

    PubMed Central

    Goscinski, Wojtek J.; McIntosh, Paul; Felzmann, Ulrich; Maksimenko, Anton; Hall, Christopher J.; Gureyev, Timur; Thompson, Darren; Janke, Andrew; Galloway, Graham; Killeen, Neil E. B.; Raniga, Parnesh; Kaluza, Owen; Ng, Amanda; Poudel, Govinda; Barnes, David G.; Nguyen, Toan; Bonnington, Paul; Egan, Gary F.

    2014-01-01

    The Multi-modal Australian ScienceS Imaging and Visualization Environment (MASSIVE) is a national imaging and visualization facility established by Monash University, the Australian Synchrotron, the Commonwealth Scientific Industrial Research Organization (CSIRO), and the Victorian Partnership for Advanced Computing (VPAC), with funding from the National Computational Infrastructure and the Victorian Government. The MASSIVE facility provides hardware, software, and expertise to drive research in the biomedical sciences, particularly advanced brain imaging research using synchrotron x-ray and infrared imaging, functional and structural magnetic resonance imaging (MRI), x-ray computer tomography (CT), electron microscopy and optical microscopy. The development of MASSIVE has been based on best practice in system integration methodologies, frameworks, and architectures. The facility has: (i) integrated multiple different neuroimaging analysis software components, (ii) enabled cross-platform and cross-modality integration of neuroinformatics tools, and (iii) brought together neuroimaging databases and analysis workflows. MASSIVE is now operational as a nationally distributed and integrated facility for neuroinfomatics and brain imaging research. PMID:24734019

  1. Example-Based Super-Resolution Fluorescence Microscopy.

    PubMed

    Jia, Shu; Han, Boran; Kutz, J Nathan

    2018-04-23

    Capturing biological dynamics with high spatiotemporal resolution demands the advancement in imaging technologies. Super-resolution fluorescence microscopy offers spatial resolution surpassing the diffraction limit to resolve near-molecular-level details. While various strategies have been reported to improve the temporal resolution of super-resolution imaging, all super-resolution techniques are still fundamentally limited by the trade-off associated with the longer image acquisition time that is needed to achieve higher spatial information. Here, we demonstrated an example-based, computational method that aims to obtain super-resolution images using conventional imaging without increasing the imaging time. With a low-resolution image input, the method provides an estimate of its super-resolution image based on an example database that contains super- and low-resolution image pairs of biological structures of interest. The computational imaging of cellular microtubules agrees approximately with the experimental super-resolution STORM results. This new approach may offer potential improvements in temporal resolution for experimental super-resolution fluorescence microscopy and provide a new path for large-data aided biomedical imaging.

  2. Graphical user interfaces for symbol-oriented database visualization and interaction

    NASA Astrophysics Data System (ADS)

    Brinkschulte, Uwe; Siormanolakis, Marios; Vogelsang, Holger

    1997-04-01

    In this approach, two basic services designed for the engineering of computer based systems are combined: a symbol-oriented man-machine-service and a high speed database-service. The man-machine service is used to build graphical user interfaces (GUIs) for the database service; these interfaces are stored using the database service. The idea is to create a GUI-builder and a GUI-manager for the database service based upon the man-machine service using the concept of symbols. With user-definable and predefined symbols, database contents can be visualized and manipulated in a very flexible and intuitive way. Using the GUI-builder and GUI-manager, a user can build and operate its own graphical user interface for a given database according to its needs without writing a single line of code.

  3. Identification of family-specific residue packing motifs and their use for structure-based protein function prediction: I. Method development.

    PubMed

    Bandyopadhyay, Deepak; Huan, Jun; Prins, Jan; Snoeyink, Jack; Wang, Wei; Tropsha, Alexander

    2009-11-01

    Protein function prediction is one of the central problems in computational biology. We present a novel automated protein structure-based function prediction method using libraries of local residue packing patterns that are common to most proteins in a known functional family. Critical to this approach is the representation of a protein structure as a graph where residue vertices (residue name used as a vertex label) are connected by geometrical proximity edges. The approach employs two steps. First, it uses a fast subgraph mining algorithm to find all occurrences of family-specific labeled subgraphs for all well characterized protein structural and functional families. Second, it queries a new structure for occurrences of a set of motifs characteristic of a known family, using a graph index to speed up Ullman's subgraph isomorphism algorithm. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared with their distribution in a large non-redundant database of proteins. This method can assign a new structure to a specific functional family in cases where sequence alignments, sequence patterns, structural superposition and active site templates fail to provide accurate annotation.

  4. Comparison and optimization of in silico algorithms for predicting the pathogenicity of sodium channel variants in epilepsy.

    PubMed

    Holland, Katherine D; Bouley, Thomas M; Horn, Paul S

    2017-07-01

    Variants in neuronal voltage-gated sodium channel α-subunits genes SCN1A, SCN2A, and SCN8A are common in early onset epileptic encephalopathies and other autosomal dominant childhood epilepsy syndromes. However, in clinical practice, missense variants are often classified as variants of uncertain significance when missense variants are identified but heritability cannot be determined. Genetic testing reports often include results of computational tests to estimate pathogenicity and the frequency of that variant in population-based databases. The objective of this work was to enhance clinicians' understanding of results by (1) determining how effectively computational algorithms predict epileptogenicity of sodium channel (SCN) missense variants; (2) optimizing their predictive capabilities; and (3) determining if epilepsy-associated SCN variants are present in population-based databases. This will help clinicians better understand the results of indeterminate SCN test results in people with epilepsy. Pathogenic, likely pathogenic, and benign variants in SCNs were identified using databases of sodium channel variants. Benign variants were also identified from population-based databases. Eight algorithms commonly used to predict pathogenicity were compared. In addition, logistic regression was used to determine if a combination of algorithms could better predict pathogenicity. Based on American College of Medical Genetic Criteria, 440 variants were classified as pathogenic or likely pathogenic and 84 were classified as benign or likely benign. Twenty-eight variants previously associated with epilepsy were present in population-based gene databases. The output provided by most computational algorithms had a high sensitivity but low specificity with an accuracy of 0.52-0.77. Accuracy could be improved by adjusting the threshold for pathogenicity. Using this adjustment, the Mendelian Clinically Applicable Pathogenicity (M-CAP) algorithm had an accuracy of 0.90 and a combination of algorithms increased the accuracy to 0.92. Potentially pathogenic variants are present in population-based sources. Most computational algorithms overestimate pathogenicity; however, a weighted combination of several algorithms increased classification accuracy to >0.90. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.

  5. A high-throughput approach to profile RNA structure.

    PubMed

    Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano

    2017-03-17

    Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Application of the 4D fingerprint method with a robust scoring function for scaffold-hopping and drug repurposing strategies.

    PubMed

    Hamza, Adel; Wagner, Jonathan M; Wei, Ning-Ning; Kwiatkowski, Stefan; Zhan, Chang-Guo; Watt, David S; Korotkov, Konstantin V

    2014-10-27

    Two factors contribute to the inefficiency associated with screening pharmaceutical library collections as a means of identifying new drugs: [1] the limited success of virtual screening (VS) methods in identifying new scaffolds; [2] the limited accuracy of computational methods in predicting off-target effects. We recently introduced a 3D shape-based similarity algorithm of the SABRE program, which encodes a consensus molecular shape pattern of a set of active ligands into a 4D fingerprint descriptor. Here, we report a mathematical model for shape similarity comparisons and ligand database filtering using this 4D fingerprint method and benchmarked the scoring function HWK (Hamza-Wei-Korotkov), using the 81 targets of the DEKOIS database. Subsequently, we applied our combined 4D fingerprint and HWK scoring function VS approach in scaffold-hopping and drug repurposing using the National Cancer Institute (NCI) and Food and Drug Administration (FDA) databases, and we identified new inhibitors with different scaffolds of MycP1 protease from the mycobacterial ESX-1 secretion system. Experimental evaluation of nine compounds from the NCI database and three from the FDA database displayed IC50 values ranging from 70 to 100 μM against MycP1 and possessed high structural diversity, which provides departure points for further structure-activity relationship (SAR) optimization. In addition, this study demonstrates that the combination of our 4D fingerprint algorithm and the HWK scoring function may provide a means for identifying repurposed drugs for the treatment of infectious diseases and may be used in the drug-target profile strategy.

  7. A geophysical perspective on mantle water content and melting: Inverting electromagnetic sounding data using laboratory-based electrical conductivity profiles

    NASA Astrophysics Data System (ADS)

    Khan, A.; Shankland, T. J.

    2012-02-01

    This paper applies electromagnetic sounding methods for Earth's mantle to constrain its thermal state, chemical composition, and "water" content. We consider long-period inductive response functions in the form of C-responses from four stations distributed across the Earth (Europe, North America, Asia and Australia) covering a period range from 3.9 to 95.2 days and sensitivity to ~ 1200 km depth. We invert C-responses directly for thermo-chemical state using a self-consistent thermodynamic method that computes phase equilibria as functions of pressure, temperature, and composition (in the Na2O-CaO-FeO-MgO-Al2O3-SiO2 model system). Computed mineral modes are combined with recent laboratory-based electrical conductivity models from independent experimental research groups (Yoshino (2010) and Karato (2011)) to compute bulk conductivity structure beneath each of the four stations from which C-responses are estimated. To reliably allocate water between the various mineral phases we include laboratory-measured water partition coefficients for major upper mantle and transition zone minerals. This scheme is interfaced with a sampling-based algorithm to solve the resulting non-linear inverse problem. This approach has two advantages: (1) It anchors temperatures, composition, electrical conductivities, and discontinuities that are in laboratory-based forward models, and (2) At the same time it permits the use of geophysical inverse methods to optimize conductivity profiles to match geophysical data. The results show lateral variations in upper mantle temperatures beneath the four stations that appear to persist throughout the upper mantle and parts of the transition zone. Calculated mantle temperatures at 410 and 660 km depth lie in the range 1250-1650 °C and 1500-1750 °C, respectively, and generally agree with the experimentally-determined temperatures at which the measured phase reactions olivine → β-spinel and γ-spinel → ferropericlase + perovskite occur. The retrieved conductivity structures beneath the various stations tend to follow trends observed for temperature with the strongest lateral variations in the uppermost mantle; for depths > 300 km conductivities appear to depend less on the particular conductivity database. Conductivities at 410 km and at 660 km depth are found to agree overall with purely geophysically-derived global and semi-global one-dimensional conductivity models. Both electrical conductivity databases point to < 0.01 wt.% H2O in the upper mantle. For transition zone minerals results from the laboratory database of Yoshino (2010) suggest that a much higher water content (up to 2 wt.% H2O) is required than in the other database (Karato, 2011), which favors a relatively "dry" transition zone (< 0.01 wt.% H2O). Incorporating laboratory measurements of hydrous silicate melting relations and available conductivity data allows us to consider the possibility of hydration melting and a high-conductivity melt layer above the 410-km discontinuity. The latter appears to be 1) regionally localized and 2) principally a feature from the Yoshino (2010) database. Further, there is evidence of lateral heterogeneity: The mantle beneath southwestern North America and central China appears "wetter" than that beneath central Europe or Australia.

  8. JET2 Viewer: a database of predicted multiple, possibly overlapping, protein-protein interaction sites for PDB structures.

    PubMed

    Ripoche, Hugues; Laine, Elodie; Ceres, Nicoletta; Carbone, Alessandra

    2017-01-04

    The database JET2 Viewer, openly accessible at http://www.jet2viewer.upmc.fr/, reports putative protein binding sites for all three-dimensional (3D) structures available in the Protein Data Bank (PDB). This knowledge base was generated by applying the computational method JET 2 at large-scale on more than 20 000 chains. JET 2 strategy yields very precise predictions of interacting surfaces and unravels their evolutionary process and complexity. JET2 Viewer provides an online intelligent display, including interactive 3D visualization of the binding sites mapped onto PDB structures and suitable files recording JET 2 analyses. Predictions were evaluated on more than 15 000 experimentally characterized protein interfaces. This is, to our knowledge, the largest evaluation of a protein binding site prediction method. The overall performance of JET 2 on all interfaces are: Sen = 52.52, PPV = 51.24, Spe = 80.05, Acc = 75.89. The data can be used to foster new strategies for protein-protein interactions modulation and interaction surface redesign. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. Knowledge Based Engineering for Spatial Database Management and Use

    NASA Technical Reports Server (NTRS)

    Peuquet, D. (Principal Investigator)

    1984-01-01

    The use of artificial intelligence techniques that are applicable to Geographic Information Systems (GIS) are examined. Questions involving the performance and modification to the database structure, the definition of spectra in quadtree structures and their use in search heuristics, extension of the knowledge base, and learning algorithm concepts are investigated.

  10. ACToR: Aggregated Computational Toxicology Resource (T) ...

    EPA Pesticide Factsheets

    The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information on chemicals of interest to the EPA, and in particular to be a central source for the testing data on all chemicals regulated by all EPA programs; (2) To be a source of in vivo training data sets for building in vitro to in vivo computational models; (3) To serve as a central source of chemical structure and identity information for the ToxCastTM and Tox21 programs. There are 4 main databases, all linked through a common set of chemical information and a common structure linking chemicals to assay data: the public ACToR system (available at http://actor.epa.gov), the ToxMiner database holding ToxCast and Tox21 data, along with results form statistical analyses on these data; the Tox21 chemical repository which is managing the ordering and sample tracking process for the larger Tox21 project; and the public version of ToxRefDB. The public ACToR system contains information on ~500K compounds with toxicology, exposure and chemical property information from >400 public sources. The web site is visited by ~1,000 unique users per month and generates ~1,000 page requests per day on average. The databases are built on open source technology, which has allowed us to export them to a number of col

  11. Automated Euler and Navier-Stokes Database Generation for a Glide-Back Booster

    NASA Technical Reports Server (NTRS)

    Chaderjian, Neal M.; Rogers, Stuart E.; Aftosmis, Mike J.; Pandya, Shishir A.; Ahmad, Jasim U.; Tejnil, Edward

    2004-01-01

    The past two decades have seen a sustained increase in the use of high fidelity Computational Fluid Dynamics (CFD) in basic research, aircraft design, and the analysis of post-design issues. As the fidelity of a CFD method increases, the number of cases that can be readily and affordably computed greatly diminishes. However, computer speeds now exceed 2 GHz, hundreds of processors are currently available and more affordable, and advances in parallel CFD algorithms scale more readily with large numbers of processors. All of these factors make it feasible to compute thousands of high fidelity cases. However, there still remains the overwhelming task of monitoring the solution process. This paper presents an approach to automate the CFD solution process. A new software tool, AeroDB, is used to compute thousands of Euler and Navier-Stokes solutions for a 2nd generation glide-back booster in one week. The solution process exploits a common job-submission grid environment, the NASA Information Power Grid (IPG), using 13 computers located at 4 different geographical sites. Process automation and web-based access to a MySql database greatly reduces the user workload, removing much of the tedium and tendency for user input errors. The AeroDB framework is shown. The user submits/deletes jobs, monitors AeroDB's progress, and retrieves data and plots via a web portal. Once a job is in the database, a job launcher uses an IPG resource broker to decide which computers are best suited to run the job. Job/code requirements, the number of CPUs free on a remote system, and queue lengths are some of the parameters the broker takes into account. The Globus software provides secure services for user authentication, remote shell execution, and secure file transfers over an open network. AeroDB automatically decides when a job is completed. Currently, the Cart3D unstructured flow solver is used for the Euler equations, and the Overflow structured overset flow solver is used for the Navier-Stokes equations. Other codes can be readily included into the AeroDB framework.

  12. Computational predictive models for P-glycoprotein inhibition of in-house chalcone derivatives and drug-bank compounds.

    PubMed

    Ngo, Trieu-Du; Tran, Thanh-Dao; Le, Minh-Tri; Thai, Khac-Minh

    2016-11-01

    The human P-glycoprotein (P-gp) efflux pump is of great interest for medicinal chemists because of its important role in multidrug resistance (MDR). Because of the high polyspecificity as well as the unavailability of high-resolution X-ray crystal structures of this transmembrane protein, ligand-based, and structure-based approaches which were machine learning, homology modeling, and molecular docking were combined for this study. In ligand-based approach, individual two-dimensional quantitative structure-activity relationship models were developed using different machine learning algorithms and subsequently combined into the Ensemble model which showed good performance on both the diverse training set and the validation sets. The applicability domain and the prediction quality of the developed models were also judged using the state-of-the-art methods and tools. In our structure-based approach, the P-gp structure and its binding region were predicted for a docking study to determine possible interactions between the ligands and the receptor. Based on these in silico tools, hit compounds for reversing MDR were discovered from the in-house and DrugBank databases through virtual screening using prediction models and molecular docking in an attempt to restore cancer cell sensitivity to cytotoxic drugs.

  13. Planform: an application and database of graph-encoded planarian regenerative experiments.

    PubMed

    Lobo, Daniel; Malone, Taylor J; Levin, Michael

    2013-04-15

    Understanding the mechanisms governing the regeneration capabilities of many organisms is a fundamental interest in biology and medicine. An ever-increasing number of manipulation and molecular experiments are attempting to discover a comprehensive model for regeneration, with the planarian flatworm being one of the most important model species. Despite much effort, no comprehensive, constructive, mechanistic models exist yet, and it is now clear that computational tools are needed to mine this huge dataset. However, until now, there is no database of regenerative experiments, and the current genotype-phenotype ontologies and databases are based on textual descriptions, which are not understandable by computers. To overcome these difficulties, we present here Planform (Planarian formalization), a manually curated database and software tool for planarian regenerative experiments, based on a mathematical graph formalism. The database contains more than a thousand experiments from the main publications in the planarian literature. The software tool provides the user with a graphical interface to easily interact with and mine the database. The presented system is a valuable resource for the regeneration community and, more importantly, will pave the way for the application of novel artificial intelligence tools to extract knowledge from this dataset. The database and software tool are freely available at http://planform.daniel-lobo.com.

  14. Image ratio features for facial expression recognition application.

    PubMed

    Song, Mingli; Tao, Dacheng; Liu, Zicheng; Li, Xuelong; Zhou, Mengchu

    2010-06-01

    Video-based facial expression recognition is a challenging problem in computer vision and human-computer interaction. To target this problem, texture features have been extracted and widely used, because they can capture image intensity changes raised by skin deformation. However, existing texture features encounter problems with albedo and lighting variations. To solve both problems, we propose a new texture feature called image ratio features. Compared with previously proposed texture features, e.g., high gradient component features, image ratio features are more robust to albedo and lighting variations. In addition, to further improve facial expression recognition accuracy based on image ratio features, we combine image ratio features with facial animation parameters (FAPs), which describe the geometric motions of facial feature points. The performance evaluation is based on the Carnegie Mellon University Cohn-Kanade database, our own database, and the Japanese Female Facial Expression database. Experimental results show that the proposed image ratio feature is more robust to albedo and lighting variations, and the combination of image ratio features and FAPs outperforms each feature alone. In addition, we study asymmetric facial expressions based on our own facial expression database and demonstrate the superior performance of our combined expression recognition system.

  15. CREDO: a structural interactomics database for drug discovery

    PubMed Central

    Schreyer, Adrian M.; Blundell, Tom L.

    2013-01-01

    CREDO is a unique relational database storing all pairwise atomic interactions of inter- as well as intra-molecular contacts between small molecules and macromolecules found in experimentally determined structures from the Protein Data Bank. These interactions are integrated with further chemical and biological data. The database implements useful data structures and algorithms such as cheminformatics routines to create a comprehensive analysis platform for drug discovery. The database can be accessed through a web-based interface, downloads of data sets and web services at http://www-cryst.bioc.cam.ac.uk/credo. Database URL: http://www-cryst.bioc.cam.ac.uk/credo PMID:23868908

  16. Design and Implementation of a Relational Database Management System for the AFIT Thesis Process.

    DTIC Science & Technology

    1985-09-01

    AIRLIFT Gourdin 4. APPLIED MATHEMATICS Daneman Lee Na rga rsen ker 5. ARTIFICIAL INTELLEGENCE Gen et 6. CAPARILITY ASSESSMENT S Budde Talbott 31...05 ARTIFICIAL INTELLIGENCE 06 CAPABILITY ASSESSMENT 07 COMMUNIICATIONS 08 COMPUTER AIDED DESIGN 09 COMPUTER BASED TRAINING 10 COMPUTER SOFTWARE 11

  17. Computer-Based Education. The Best of ERIC, 1988.

    ERIC Educational Resources Information Center

    McLaughlin, Pamela

    This annotated bibliography provides an overview of literature entered into the ERIC database in 1988 on computer use in elementary and secondary education, adult education, and special education. The first of four sections provides a list of overview documents on: computer-assisted instruction. Focusing on special applications, the second section…

  18. Computer Series, 102: Bits and Pieces, 40.

    ERIC Educational Resources Information Center

    Birk, James P., Ed.

    1989-01-01

    Discussed are seven computer programs: (1) a computer graphics experiment for organic chemistry laboratory; (2) a gel filtration simulation; (3) judging spelling correctness; (4) interfacing the TLC548 ADC; (5) a digitizing circuit for the Apple II game port; (6) a chemical information base; and (7) an IBM PC article database. (MVL)

  19. The Natural Link between Teaching History and Computer Skills.

    ERIC Educational Resources Information Center

    Farnworth, George M.

    1992-01-01

    Suggests that, because both history and computers are information based, there is an natural link between the two. Argues that history teachers should exploit the technology to help students to understand history while they become computer literate. Points out uses for databases, word processing, desktop publishing, and telecommunications in…

  20. Searching for New Double Stars with a Computer

    NASA Astrophysics Data System (ADS)

    Bryant, T. V.

    2015-04-01

    The advent of computers with large amounts of RAM memory and fast processors, as well as easy internet access to large online astronomical databases, has made computer searches based on astrometric data practicable for most researchers. This paper describes one such search that has uncovered hitherto unrecognized double stars.

  1. Computational identification of potent inhibitors for Streptomycin 3″-adenylyltransferase of Serratia marcescens.

    PubMed

    Prabhu, Dhamodharan; Vidhyavathi, Ramasamy; Jeyakanthan, Jeyaraman

    2017-02-01

    Serratia marcescens is an opportunistic pathogen responsible for the respiratory and urinary tract infections in humans. The antibiotic resistance mechanism of S. marcescens is mediated through aminoglycoside modification enzyme that transfer adenyl group from substrate to antibiotic through regiospecific transfers for the inactivation of antibiotics. Streptomycin 3 ″ -adenylyltransferase acts on the 3' position of the antibiotic and considered as a novel drug target to overcome bacterial antibiotic resistance. Till now, there is no experimentally solved crystal structure of Streptomycin 3″-adenylyltransferase in S. marcescens. Hence, the present study was initiated to construct the three dimensional structure of Streptomycin 3″-adenylyltransferase in order to understand the binding mechanism. The modeled structure was subjected to structure-based virtual screening to identify potent compounds from the five chemical structure databases. Furthermore, different computational methods such as molecular docking, molecular dynamics simulations, ADME toxicity assessment, free energy and density functional theory calculations predicted the structural, binding and pharmacokinetic properties of the best five compounds. Overall, the results suggested that stable binding confirmation of the five potent compounds were mediated through hydrophobic, π-π stacking, salt bridges and hydrogen bond interactions. The identified compounds could pave way for the development of anti-pathogenic agents as potential drug entities. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L; Hanrahan, Patrick

    2015-03-03

    A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes multiple operand names, each operand corresponding to one or more fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first operands with the columns shelf and to associate one or more second operands with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first operands, and each pane has a y-axis defined based on data for the one or more second operands.

  3. Computer systems and methods for the query and visualization of multidimensional databases

    DOEpatents

    Stolte, Chris; Tang, Diane L.; Hanrahan, Patrick

    2015-11-10

    A computer displays a graphical user interface on its display. The graphical user interface includes a schema information region and a data visualization region. The schema information region includes a plurality of fields of a multi-dimensional database that includes at least one data hierarchy. The data visualization region includes a columns shelf and a rows shelf. The computer detects user actions to associate one or more first fields with the columns shelf and to associate one or more second fields with the rows shelf. The computer generates a visual table in the data visualization region in accordance with the user actions. The visual table includes one or more panes. Each pane has an x-axis defined based on data for the one or more first fields, and each pane has a y-axis defined based on data for the one or more second fields.

  4. ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

    PubMed Central

    Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

    2010-01-01

    Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515

  5. ASDB: a resource for probing protein functions with small molecules.

    PubMed

    Liu, Zhihong; Ding, Peng; Yan, Xin; Zheng, Minghao; Zhou, Huihao; Xu, Yuehua; Du, Yunfei; Gu, Qiong; Xu, Jun

    2016-06-01

    : Identifying chemical probes or seeking scaffolds for a specific biological target is important for protein function studies. Therefore, we create the Annotated Scaffold Database (ASDB), a computer-readable and systematic target-annotated scaffold database, to serve such needs. The scaffolds in ASDB were derived from public databases including ChEMBL, DrugBank and TCMSP, with a scaffold-based classification approach. Each scaffold was assigned with an InChIKey as its unique identifier, energy-minimized 3D conformations, and other calculated properties. A scaffold is also associated with drugs, natural products, drug targets and medical indications. The database can be retrieved through text or structure query tools. ASDB collects 333 601 scaffolds, which are associated with 4368 targets. The scaffolds consist of 3032 scaffolds derived from drugs and 5163 scaffolds derived from natural products. For given scaffolds, scaffold-target networks can be generated from the database to demonstrate the relations of scaffolds and targets. ASDB is freely available at http://www.rcdd.org.cn/asdb/with the major web browsers. junxu@biochemomes.com or xujun9@mail.sysu.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Machine learning of molecular electronic properties in chemical compound space

    NASA Astrophysics Data System (ADS)

    Montavon, Grégoire; Rupp, Matthias; Gobre, Vivekanand; Vazquez-Mayagoitia, Alvaro; Hansen, Katja; Tkatchenko, Alexandre; Müller, Klaus-Robert; Anatole von Lilienfeld, O.

    2013-09-01

    The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e. nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ‘quantum machine’ is similar, and sometimes superior, to modern quantum-chemical methods—at negligible computational cost.

  7. A computational method for the identification of new candidate carcinogenic and non-carcinogenic chemicals.

    PubMed

    Chen, Lei; Chu, Chen; Lu, Jing; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

    2015-09-01

    Cancer is one of the leading causes of human death. Based on current knowledge, one of the causes of cancer is exposure to toxic chemical compounds, including radioactive compounds, dioxin, and arsenic. The identification of new carcinogenic chemicals may warn us of potential danger and help to identify new ways to prevent cancer. In this study, a computational method was proposed to identify potential carcinogenic chemicals, as well as non-carcinogenic chemicals. According to the current validated carcinogenic and non-carcinogenic chemicals from the CPDB (Carcinogenic Potency Database), the candidate chemicals were searched in a weighted chemical network constructed according to chemical-chemical interactions. Then, the obtained candidate chemicals were further selected by a randomization test and information on chemical interactions and structures. The analyses identified several candidate carcinogenic chemicals, while those candidates identified as non-carcinogenic were supported by a literature search. In addition, several candidate carcinogenic/non-carcinogenic chemicals exhibit structural dissimilarity with validated carcinogenic/non-carcinogenic chemicals.

  8. Description and detection of burst events in turbulent flows

    NASA Astrophysics Data System (ADS)

    Schmid, P. J.; García-Gutierrez, A.; Jiménez, J.

    2018-04-01

    A mathematical and computational framework is developed for the detection and identification of coherent structures in turbulent wall-bounded shear flows. In a first step, this data-based technique will use an embedding methodology to formulate the fluid motion as a phase-space trajectory, from which state-transition probabilities can be computed. Within this formalism, a second step then applies repeated clustering and graph-community techniques to determine a hierarchy of coherent structures ranked by their persistencies. This latter information will be used to detect highly transitory states that act as precursors to violent and intermittent events in turbulent fluid motion (e.g., bursts). Used as an analysis tool, this technique allows the objective identification of intermittent (but important) events in turbulent fluid motion; however, it also lays the foundation for advanced control strategies for their manipulation. The techniques are applied to low-dimensional model equations for turbulent transport, such as the self-sustaining process (SSP), for varying levels of complexity.

  9. Off-Site Indexing: A Cottage Industry.

    ERIC Educational Resources Information Center

    Fay, Catherine H.

    1984-01-01

    Briefly describes use of off-site staffing--indexers, abstractors, editors--in the production of two major databases: Management Contents and The Computer Data Base. Discussion covers the production sequence; database administrator; off-site indexer; savings (office space, furniture and equipment costs, salaries, and overhead); and problems…

  10. AQUIS: A PC-based air inventory and permit manager

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, A.E.; Huber, C.C.; Tschanz, J.

    1992-01-01

    The Air Quality Utility Information System (AQUIS) was developed to calculate and track sources, emissions, stacks, permits, and related information. The system runs on IBM-compatible personal computers with dBASE IV and tracks more than 1,200 data items distributed among various source categories. AQUIS is currently operating at nine US Air Force facilities that have up to 1,000 sources. The system provides a flexible reporting capability that permits users who are unfamiliar with database structure to design and prepare reports containing user-specified information. In addition to six criteria pollutants, AQUIS calculates compound-specific emissions and allows users to enter their own emissionmore » estimates.« less

  11. A machine-learned computational functional genomics-based approach to drug classification.

    PubMed

    Lötsch, Jörn; Ultsch, Alfred

    2016-12-01

    The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.

  12. Metabolomics Workbench: An international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools

    PubMed Central

    Sud, Manish; Fahy, Eoin; Cotter, Dawn; Azam, Kenan; Vadivelu, Ilango; Burant, Charles; Edison, Arthur; Fiehn, Oliver; Higashi, Richard; Nair, K. Sreekumaran; Sumner, Susan; Subramaniam, Shankar

    2016-01-01

    The Metabolomics Workbench, available at www.metabolomicsworkbench.org, is a public repository for metabolomics metadata and experimental data spanning various species and experimental platforms, metabolite standards, metabolite structures, protocols, tutorials, and training material and other educational resources. It provides a computational platform to integrate, analyze, track, deposit and disseminate large volumes of heterogeneous data from a wide variety of metabolomics studies including mass spectrometry (MS) and nuclear magnetic resonance spectrometry (NMR) data spanning over 20 different species covering all the major taxonomic categories including humans and other mammals, plants, insects, invertebrates and microorganisms. Additionally, a number of protocols are provided for a range of metabolite classes, sample types, and both MS and NMR-based studies, along with a metabolite structure database. The metabolites characterized in the studies available on the Metabolomics Workbench are linked to chemical structures in the metabolite structure database to facilitate comparative analysis across studies. The Metabolomics Workbench, part of the data coordinating effort of the National Institute of Health (NIH) Common Fund's Metabolomics Program, provides data from the Common Fund's Metabolomics Resource Cores, metabolite standards, and analysis tools to the wider metabolomics community and seeks data depositions from metabolomics researchers across the world. PMID:26467476

  13. ANN expert system screening for illicit amphetamines using molecular descriptors

    NASA Astrophysics Data System (ADS)

    Gosav, S.; Praisler, M.; Dorohoi, D. O.

    2007-05-01

    The goal of this study was to develop and an artificial neural network (ANN) based on computed descriptors, which would be able to classify the molecular structures of potential illicit amphetamines and to derive their biological activity according to the similarity of their molecular structure with amphetamines of known toxicity. The system is necessary for testing new molecular structures for epidemiological, clinical, and forensic purposes. It was built using a database formed by 146 compounds representing drugs of abuse (mainly central stimulants, hallucinogens, sympathomimetic amines, narcotics and other potent analgesics), precursors, or derivatized counterparts. Their molecular structures were characterized by computing three types of descriptors: 38 constitutional descriptors (CDs), 69 topological descriptors (TDs) and 160 3D-MoRSE descriptors (3DDs). An ANN system was built for each category of variables. All three networks (CD-NN, TD-NN and 3DD-NN) were trained to distinguish between stimulant amphetamines, hallucinogenic amphetamines, and nonamphetamines. A selection of variables was performed when necessary. The efficiency with which each network identifies the class identity of an unknown sample was evaluated by calculating several figures of merit. The results of the comparative analysis are presented.

  14. New Toxico-Cheminformatics & Computational Toxicology ...

    EPA Pesticide Factsheets

    EPA’s National Center for Computational Toxicology is building capabilities to support a new paradigm for toxicity screening and prediction. The DSSTox project is improving public access to quality structure-annotated chemical toxicity information in less summarized forms than traditionally employed in SAR modeling, and in ways that facilitate data-mining, and data read-across. The DSSTox Structure-Browser provides structure searchability across all published DSSTox toxicity-related inventory, and is enabling linkages between previously isolated toxicity data resources. As of early March 2008, the public DSSTox inventory has been integrated into PubChem, allowing a user to take full advantage of PubChem structure-activity and bioassay clustering features. The most recent DSSTox version of the Carcinogenic Potency Database file (CPDBAS) illustrates ways in which various summary definitions of carcinogenic activity can be employed in modeling and data mining. Phase I of the ToxCastTM project is generating high-throughput screening data from several hundred biochemical and cell-based assays for a set of 320 chemicals, mostly pesticide actives, with rich toxicology profiles. Incorporating and expanding traditional SAR concepts into this new high-throughput and data-rich world pose conceptual and practical challenges, but also holds great promise for improving predictive capabilities.

  15. Searching the Cambridge Structural Database for polymorphs.

    PubMed

    van de Streek, Jacco; Motherwell, Sam

    2005-10-01

    In order to identify all pairs of polymorphs in the Cambridge Structural Database (CSD), a method was devised to automatically compare two crystal structures. The comparison is based on simulated powder diffraction patterns, but with special provisions to deal with differences in unit-cell volumes caused by temperature or pressure. Among the 325,000 crystal structures in the Cambridge Structural Database, 35,000 pairs of crystal structures of the same chemical compound were identified and compared. A total of 7300 pairs of polymorphs were identified, of which 154 previously were unknown.

  16. MicroUse: The Database on Microcomputer Applications in Libraries and Information Centers.

    ERIC Educational Resources Information Center

    Chen, Ching-chih; Wang, Xiaochu

    1984-01-01

    Describes MicroUse, a microcomputer-based database on microcomputer applications in libraries and information centers which was developed using relational database manager dBASE II. The description includes its system configuration, software utilized, the in-house-developed dBASE programs, multifile structure, basic functions, MicroUse records,…

  17. Computational assessment of hemodynamics-based diagnostic tools using a database of virtual subjects: Application to three case studies.

    PubMed

    Willemet, Marie; Vennin, Samuel; Alastruey, Jordi

    2016-12-08

    Many physiological indexes and algorithms based on pulse wave analysis have been suggested in order to better assess cardiovascular function. Because these tools are often computed from in-vivo hemodynamic measurements, their validation is time-consuming, challenging, and biased by measurement errors. Recently, a new methodology has been suggested to assess theoretically these computed tools: a database of virtual subjects generated using numerical 1D-0D modeling of arterial hemodynamics. The generated set of simulations encloses a wide selection of healthy cases that could be encountered in a clinical study. We applied this new methodology to three different case studies that demonstrate the potential of our new tool, and illustrated each of them with a clinically relevant example: (i) we assessed the accuracy of indexes estimating pulse wave velocity; (ii) we validated and refined an algorithm that computes central blood pressure; and (iii) we investigated theoretical mechanisms behind the augmentation index. Our database of virtual subjects is a new tool to assist the clinician: it provides insight into the physical mechanisms underlying the correlations observed in clinical practice. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.

  18. ZifBASE: a database of zinc finger proteins and associated resources.

    PubMed

    Jayakanthan, Mannu; Muthukumaran, Jayaraman; Chandrasekar, Sanniyasi; Chawla, Konika; Punetha, Ankita; Sundar, Durai

    2009-09-09

    Information on the occurrence of zinc finger protein motifs in genomes is crucial to the developing field of molecular genome engineering. The knowledge of their target DNA-binding sequences is vital to develop chimeric proteins for targeted genome engineering and site-specific gene correction. There is a need to develop a computational resource of zinc finger proteins (ZFP) to identify the potential binding sites and its location, which reduce the time of in vivo task, and overcome the difficulties in selecting the specific type of zinc finger protein and the target site in the DNA sequence. ZifBASE provides an extensive collection of various natural and engineered ZFP. It uses standard names and a genetic and structural classification scheme to present data retrieved from UniProtKB, GenBank, Protein Data Bank, ModBase, Protein Model Portal and the literature. It also incorporates specialized features of ZFP including finger sequences and positions, number of fingers, physiochemical properties, classes, framework, PubMed citations with links to experimental structures (PDB, if available) and modeled structures of natural zinc finger proteins. ZifBASE provides information on zinc finger proteins (both natural and engineered ones), the number of finger units in each of the zinc finger proteins (with multiple fingers), the synergy between the adjacent fingers and their positions. Additionally, it gives the individual finger sequence and their target DNA site to which it binds for better and clear understanding on the interactions of adjacent fingers. The current version of ZifBASE contains 139 entries of which 89 are engineered ZFPs, containing 3-7F totaling to 296 fingers. There are 50 natural zinc finger protein entries ranging from 2-13F, totaling to 307 fingers. It has sequences and structures from literature, Protein Data Bank, ModBase and Protein Model Portal. The interface is cross linked to other public databases like UniprotKB, PDB, ModBase and Protein Model Portal and PubMed for making it more informative. A database is established to maintain the information of the sequence features, including the class, framework, number of fingers, residues, position, recognition site and physio-chemical properties (molecular weight, isoelectric point) of both natural and engineered zinc finger proteins and dissociation constant of few. ZifBASE can provide more effective and efficient way of accessing the zinc finger protein sequences and their target binding sites with the links to their three-dimensional structures. All the data and functions are available at the advanced web-based search interface http://web.iitd.ac.in/~sundar/zifbase.

  19. Improving Learning in Computer-Based Instruction through Questioning and Grouping Strategies

    ERIC Educational Resources Information Center

    Niemczyk, Mary; Savenye, Wilhelmina

    2010-01-01

    This study investigated the comparative effects of adjunct questions, student self-generated questions, and note taking on learning from a multimedia database. High school students worked individually or in cooperative dyads on a computer-based multimedia unit using a study guide to answer either adjunct questions, generate self-questions, or take…

  20. Computer-aided personal interviewing. A new technique for data collection in epidemiologic surveys.

    PubMed

    Birkett, N J

    1988-03-01

    Most epidemiologic studies involve the collection of data directly from selected respondents. Traditionally, interviewers are provided with the interview in booklet form on paper and answers are recorded therein. On receipt at the study office, the interview results are coded, transcribed, and keypunched for analysis. The author's team has developed a method of personal interviewing which uses a structured interview stored on a lap-sized computer. Responses are entered into the computer and are subject to immediate error-checking and correction. All skip-patterns are automatic. Data entry to the final data-base involves no manual data transcription. A pilot evaluation with a preliminary version of the system using tape-recorded interviews in a test/re-test methodology revealed a slightly higher error rate, probably related to weaknesses in the pilot system and the training process. Computer interviews tended to be longer but other features of the interview process were not affected by computer. The author's team has now completed 2,505 interviews using this system in a community-based blood pressure survey. It has been well accepted by both interviewers and respondents. Failure to complete an interview on the computer was uncommon (5 per cent) and well-handled by paper back-up questionnaires. The results show that computer-aided personal interviewing in the home is feasible but that further evaluation is needed to establish the impact of this methodology on overall data quality.

  1. mTM-align: a server for fast protein structure database search and multiple protein structure alignment.

    PubMed

    Dong, Runze; Pan, Shuo; Peng, Zhenling; Zhang, Yang; Yang, Jianyi

    2018-05-21

    With the rapid increase of the number of protein structures in the Protein Data Bank, it becomes urgent to develop algorithms for efficient protein structure comparisons. In this article, we present the mTM-align server, which consists of two closely related modules: one for structure database search and the other for multiple structure alignment. The database search is speeded up based on a heuristic algorithm and a hierarchical organization of the structures in the database. The multiple structure alignment is performed using the recently developed algorithm mTM-align. Benchmark tests demonstrate that our algorithms outperform other peering methods for both modules, in terms of speed and accuracy. One of the unique features for the server is the interplay between database search and multiple structure alignment. The server provides service not only for performing fast database search, but also for making accurate multiple structure alignment with the structures found by the search. For the database search, it takes about 2-5 min for a structure of a medium size (∼300 residues). For the multiple structure alignment, it takes a few seconds for ∼10 structures of medium sizes. The server is freely available at: http://yanglab.nankai.edu.cn/mTM-align/.

  2. DG-AMMOS: a new tool to generate 3d conformation of small molecules using distance geometry and automated molecular mechanics optimization for in silico screening.

    PubMed

    Lagorce, David; Pencheva, Tania; Villoutreix, Bruno O; Miteva, Maria A

    2009-11-13

    Discovery of new bioactive molecules that could enter drug discovery programs or that could serve as chemical probes is a very complex and costly endeavor. Structure-based and ligand-based in silico screening approaches are nowadays extensively used to complement experimental screening approaches in order to increase the effectiveness of the process and facilitating the screening of thousands or millions of small molecules against a biomolecular target. Both in silico screening methods require as input a suitable chemical compound collection and most often the 3D structure of the small molecules has to be generated since compounds are usually delivered in 1D SMILES, CANSMILES or in 2D SDF formats. Here, we describe the new open source program DG-AMMOS which allows the generation of the 3D conformation of small molecules using Distance Geometry and their energy minimization via Automated Molecular Mechanics Optimization. The program is validated on the Astex dataset, the ChemBridge Diversity database and on a number of small molecules with known crystal structures extracted from the Cambridge Structural Database. A comparison with the free program Balloon and the well-known commercial program Omega generating the 3D of small molecules is carried out. The results show that the new free program DG-AMMOS is a very efficient 3D structure generator engine. DG-AMMOS provides fast, automated and reliable access to the generation of 3D conformation of small molecules and facilitates the preparation of a compound collection prior to high-throughput virtual screening computations. The validation of DG-AMMOS on several different datasets proves that generated structures are generally of equal quality or sometimes better than structures obtained by other tested methods.

  3. Structure and Stability of Molecular Crystals with Many-Body Dispersion-Inclusive Density Functional Tight Binding.

    PubMed

    Mortazavi, Majid; Brandenburg, Jan Gerit; Maurer, Reinhard J; Tkatchenko, Alexandre

    2018-01-18

    Accurate prediction of structure and stability of molecular crystals is crucial in materials science and requires reliable modeling of long-range dispersion interactions. Semiempirical electronic structure methods are computationally more efficient than their ab initio counterparts, allowing structure sampling with significant speedups. We combine the Tkatchenko-Scheffler van der Waals method (TS) and the many-body dispersion method (MBD) with third-order density functional tight-binding (DFTB3) via a charge population-based method. We find an overall good performance for the X23 benchmark database of molecular crystals, despite an underestimation of crystal volume that can be traced to the DFTB parametrization. We achieve accurate lattice energy predictions with DFT+MBD energetics on top of vdW-inclusive DFTB3 structures, resulting in a speedup of up to 3000 times compared with a full DFT treatment. This suggests that vdW-inclusive DFTB3 can serve as a viable structural prescreening tool in crystal structure prediction.

  4. RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins.

    PubMed

    Hirsh, Layla; Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C E

    2018-05-09

    RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.

  5. Vibrational Spectroscopy and Astrobiology

    NASA Technical Reports Server (NTRS)

    Chaban, Galina M.; Kwak, D. (Technical Monitor)

    2001-01-01

    Role of vibrational spectroscopy in solving problems related to astrobiology will be discussed. Vibrational (infrared) spectroscopy is a very sensitive tool for identifying molecules. Theoretical approach used in this work is based on direct computation of anharmonic vibrational frequencies and intensities from electronic structure codes. One of the applications of this computational technique is possible identification of biological building blocks (amino acids, small peptides, DNA bases) in the interstellar medium (ISM). Identifying small biological molecules in the ISM is very important from the point of view of origin of life. Hybrid (quantum mechanics/molecular mechanics) theoretical techniques will be discussed that may allow to obtain accurate vibrational spectra of biomolecular building blocks and to create a database of spectroscopic signatures that can assist observations of these molecules in space. Another application of the direct computational spectroscopy technique is to help to design and analyze experimental observations of ice surfaces of one of the Jupiter's moons, Europa, that possibly contains hydrated salts. The presence of hydrated salts on the surface can be an indication of a subsurface ocean and the possible existence of life forms inhabiting such an ocean.

  6. Expression and Purification of a Novel Computationally Designed Antigen for Simultaneously Detection of HTLV-1 and HBV Antibodies.

    PubMed

    Heydari Zarnagh, Hafez; Ravanshad, Mehrdad; Pourfatollah, Ali Akbar; Rasaee, Mohammad Javad

    2015-04-01

    Computational tools are reliable alternatives to laborious work in chimeric protein design. In this study, a chimeric antigen was designed using computational techniques for simultaneous detection of anti-HTLV-I and anti-HBV in infected sera. Databases were searched for amino acid sequences of HBV/HLV-I diagnostic antigens. The immunodominant fragments were selected based on propensity scales. The diagnostic antigen was designed using these fragments. Secondary and tertiary structures were predicted and the B-cell epitopes were mapped on the surface of built model. The synthetic DNA coding antigen was sub-cloned into pGS21a expression vector. SDS-PAGE analysis showed that glutathione fused antigen was highly expressed in E. coli BL21 (DE3) cells. The recombinant antigen was purified by nickel affinity chromatography. ELISA results showed that soluble antigen could specifically react with the HTLV-I and HBV infected sera. This specific antigen could be used as suitable agent for antibody-antigen based screening tests and can help clinicians in order to perform quick and precise screening of the HBV and HTLV-I infections.

  7. PASS2: an automated database of protein alignments organised as structural superfamilies.

    PubMed

    Bhaduri, Anirban; Pugalenthi, Ganesan; Sowdhamini, Ramanathan

    2004-04-02

    The functional selection and three-dimensional structural constraints of proteins in nature often relates to the retention of significant sequence similarity between proteins of similar fold and function despite poor sequence identity. Organization of structure-based sequence alignments for distantly related proteins, provides a map of the conserved and critical regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The Protein Alignment organised as Structural Superfamily (PASS2) database represents continuously updated, structural alignments for evolutionary related, sequentially distant proteins. An automated and updated version of PASS2 is, in direct correspondence with SCOP 1.63, consisting of sequences having identity below 40% among themselves. Protein domains have been grouped into 628 multi-member superfamilies and 566 single member superfamilies. Structure-based sequence alignments for the superfamilies have been obtained using COMPARER, while initial equivalencies have been derived from a preliminary superposition using LSQMAN or STAMP 4.0. The final sequence alignments have been annotated for structural features using JOY4.0. The database is supplemented with sequence relatives belonging to different genomes, conserved spatially interacting and structural motifs, probabilistic hidden markov models of superfamilies based on the alignments and useful links to other databases. Probabilistic models and sensitive position specific profiles obtained from reliable superfamily alignments aid annotation of remote homologues and are useful tools in structural and functional genomics. PASS2 presents the phylogeny of its members both based on sequence and structural dissimilarities. Clustering of members allows us to understand diversification of the family members. The search engine has been improved for simpler browsing of the database. The database resolves alignments among the structural domains consisting of evolutionarily diverged set of sequences. Availability of reliable sequence alignments of distantly related proteins despite poor sequence identity and single-member superfamilies permit better sampling of structures in libraries for fold recognition of new sequences and for the understanding of protein structure-function relationships of individual superfamilies. PASS2 is accessible at http://www.ncbs.res.in/~faculty/mini/campass/pass2.html

  8. The Hype over Hyperbolic Browsers.

    ERIC Educational Resources Information Center

    Allen, Maryellen Mott

    2002-01-01

    Considers complaints about the usability in the human-computer interaction aspect of information retrieval and discusses information visualization, the Online Library of Information Visualization Environments, hyperbolic information structure, subject searching, real-world applications, relational databases and hyperbolic trees, and the future of…

  9. Embedded wavelet-based face recognition under variable position

    NASA Astrophysics Data System (ADS)

    Cotret, Pascal; Chevobbe, Stéphane; Darouich, Mehdi

    2015-02-01

    For several years, face recognition has been a hot topic in the image processing field: this technique is applied in several domains such as CCTV, electronic devices delocking and so on. In this context, this work studies the efficiency of a wavelet-based face recognition method in terms of subject position robustness and performance on various systems. The use of wavelet transform has a limited impact on the position robustness of PCA-based face recognition. This work shows, for a well-known database (Yale face database B*), that subject position in a 3D space can vary up to 10% of the original ROI size without decreasing recognition rates. Face recognition is performed on approximation coefficients of the image wavelet transform: results are still satisfying after 3 levels of decomposition. Furthermore, face database size can be divided by a factor 64 (22K with K = 3). In the context of ultra-embedded vision systems, memory footprint is one of the key points to be addressed; that is the reason why compression techniques such as wavelet transform are interesting. Furthermore, it leads to a low-complexity face detection stage compliant with limited computation resources available on such systems. The approach described in this work is tested on three platforms from a standard x86-based computer towards nanocomputers such as RaspberryPi and SECO boards. For K = 3 and a database with 40 faces, the execution mean time for one frame is 0.64 ms on a x86-based computer, 9 ms on a SECO board and 26 ms on a RaspberryPi (B model).

  10. Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.

    PubMed

    Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas

    2017-07-24

    Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.

  11. 48 CFR 52.227-14 - Rights in Data-General.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... database or database means a collection of recorded information in a form capable of, and for the purpose... enable the computer program to be produced, created, or compiled. (2) Does not include computer databases... databases and computer software documentation). This term does not include computer software or financial...

  12. 48 CFR 52.227-14 - Rights in Data-General.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... database or database means a collection of recorded information in a form capable of, and for the purpose... enable the computer program to be produced, created, or compiled. (2) Does not include computer databases... databases and computer software documentation). This term does not include computer software or financial...

  13. 48 CFR 52.227-14 - Rights in Data-General.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... database or database means a collection of recorded information in a form capable of, and for the purpose... enable the computer program to be produced, created, or compiled. (2) Does not include computer databases... databases and computer software documentation). This term does not include computer software or financial...

  14. 48 CFR 52.227-14 - Rights in Data-General.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... database or database means a collection of recorded information in a form capable of, and for the purpose... enable the computer program to be produced, created, or compiled. (2) Does not include computer databases... databases and computer software documentation). This term does not include computer software or financial...

  15. A structural informatics approach to mine kinase knowledge bases.

    PubMed

    Brooijmans, Natasja; Mobilio, Dominick; Walker, Gary; Nilakantan, Ramaswamy; Denny, Rajiah A; Feyfant, Eric; Diller, David; Bikker, Jack; Humblet, Christine

    2010-03-01

    In this paper, we describe a combination of structural informatics approaches developed to mine data extracted from existing structure knowledge bases (Protein Data Bank and the GVK database) with a focus on kinase ATP-binding site data. In contrast to existing systems that retrieve and analyze protein structures, our techniques are centered on a database of ligand-bound geometries in relation to residues lining the binding site and transparent access to ligand-based SAR data. We illustrate the systems in the context of the Abelson kinase and related inhibitor structures. 2009 Elsevier Ltd. All rights reserved.

  16. [A Terahertz Spectral Database Based on Browser/Server Technique].

    PubMed

    Zhang, Zhuo-yong; Song, Yue

    2015-09-01

    With the solution of key scientific and technical problems and development of instrumentation, the application of terahertz technology in various fields has been paid more and more attention. Owing to the unique characteristic advantages, terahertz technology has been showing a broad future in the fields of fast, non-damaging detections, as well as many other fields. Terahertz technology combined with other complementary methods can be used to cope with many difficult practical problems which could not be solved before. One of the critical points for further development of practical terahertz detection methods depends on a good and reliable terahertz spectral database. We developed a BS (browser/server) -based terahertz spectral database recently. We designed the main structure and main functions to fulfill practical requirements. The terahertz spectral database now includes more than 240 items, and the spectral information was collected based on three sources: (1) collection and citation from some other abroad terahertz spectral databases; (2) collected from published literatures; and (3) spectral data measured in our laboratory. The present paper introduced the basic structure and fundament functions of the terahertz spectral database developed in our laboratory. One of the key functions of this THz database is calculation of optical parameters. Some optical parameters including absorption coefficient, refractive index, etc. can be calculated based on the input THz time domain spectra. The other main functions and searching methods of the browser/server-based terahertz spectral database have been discussed. The database search system can provide users convenient functions including user registration, inquiry, displaying spectral figures and molecular structures, spectral matching, etc. The THz database system provides an on-line searching function for registered users. Registered users can compare the input THz spectrum with the spectra of database, according to the obtained correlation coefficient one can perform the searching task very fast and conveniently. Our terahertz spectral database can be accessed at http://www.teralibrary.com. The proposed terahertz spectral database is based on spectral information so far, and will be improved in the future. We hope this terahertz spectral database can provide users powerful, convenient, and high efficient functions, and could promote the broader applications of terahertz technology.

  17. Differentiation of several interstitial lung disease patterns in HRCT images using support vector machine: role of databases on performance

    NASA Astrophysics Data System (ADS)

    Kale, Mandar; Mukhopadhyay, Sudipta; Dash, Jatindra K.; Garg, Mandeep; Khandelwal, Niranjan

    2016-03-01

    Interstitial lung disease (ILD) is complicated group of pulmonary disorders. High Resolution Computed Tomography (HRCT) considered to be best imaging technique for analysis of different pulmonary disorders. HRCT findings can be categorised in several patterns viz. Consolidation, Emphysema, Ground Glass Opacity, Nodular, Normal etc. based on their texture like appearance. Clinician often find it difficult to diagnosis these pattern because of their complex nature. In such scenario computer-aided diagnosis system could help clinician to identify patterns. Several approaches had been proposed for classification of ILD patterns. This includes computation of textural feature and training /testing of classifier such as artificial neural network (ANN), support vector machine (SVM) etc. In this paper, wavelet features are calculated from two different ILD database, publically available MedGIFT ILD database and private ILD database, followed by performance evaluation of ANN and SVM classifiers in terms of average accuracy. It is found that average classification accuracy by SVM is greater than ANN where trained and tested on same database. Investigation continued further to test variation in accuracy of classifier when training and testing is performed with alternate database and training and testing of classifier with database formed by merging samples from same class from two individual databases. The average classification accuracy drops when two independent databases used for training and testing respectively. There is significant improvement in average accuracy when classifiers are trained and tested with merged database. It infers dependency of classification accuracy on training data. It is observed that SVM outperforms ANN when same database is used for training and testing.

  18. Structure-based virtual screening and characterization of a novel IL-6 antagonistic compound from synthetic compound database.

    PubMed

    Wang, Jing; Qiao, Chunxia; Xiao, He; Lin, Zhou; Li, Yan; Zhang, Jiyan; Shen, Beifen; Fu, Tinghuan; Feng, Jiannan

    2016-01-01

    According to the three-dimensional (3D) complex structure of (hIL-6⋅hIL-6R⋅gp 130) 2 and the binding orientation of hIL-6, three compounds with high affinity to hIL-6R and bioactivity to block hIL-6 in vitro were screened theoretically from the chemical databases, including 3D-Available Chemicals Directory (ACD) and MDL Drug Data Report (MDDR), by means of the computer-guided virtual screening method. Using distance geometry, molecular modeling and molecular dynamics trajectory analysis methods, the binding mode and binding energy of the three compounds were evaluated theoretically. Enzyme-linked immunosorbent assay analysis demonstrated that all the three compounds could block IL-6 binding to IL-6R specifically. However, only compound 1 could effectively antagonize the function of hIL-6 and inhibit the proliferation of XG-7 cells in a dose-dependent manner, whereas it showed no cytotoxicity to SP2/0 or L929 cells. These data demonstrated that the compound 1 could be a promising candidate of hIL-6 antagonist.

  19. NALDB: nucleic acid ligand database for small molecules targeting nucleic acid

    PubMed Central

    Kumar Mishra, Subodh; Kumar, Amit

    2016-01-01

    Nucleic acid ligand database (NALDB) is a unique database that provides detailed information about the experimental data of small molecules that were reported to target several types of nucleic acid structures. NALDB is the first ligand database that contains ligand information for all type of nucleic acid. NALDB contains more than 3500 ligand entries with detailed pharmacokinetic and pharmacodynamic information such as target name, target sequence, ligand 2D/3D structure, SMILES, molecular formula, molecular weight, net-formal charge, AlogP, number of rings, number of hydrogen bond donor and acceptor, potential energy along with their Ki, Kd, IC50 values. All these details at single platform would be helpful for the development and betterment of novel ligands targeting nucleic acids that could serve as a potential target in different diseases including cancers and neurological disorders. With maximum 255 conformers for each ligand entry, our database is a multi-conformer database and can facilitate the virtual screening process. NALDB provides powerful web-based search tools that make database searching efficient and simplified using option for text as well as for structure query. NALDB also provides multi-dimensional advanced search tool which can screen the database molecules on the basis of molecular properties of ligand provided by database users. A 3D structure visualization tool has also been included for 3D structure representation of ligands. NALDB offers an inclusive pharmacological information and the structurally flexible set of small molecules with their three-dimensional conformers that can accelerate the virtual screening and other modeling processes and eventually complement the nucleic acid-based drug discovery research. NALDB can be routinely updated and freely available on bsbe.iiti.ac.in/bsbe/naldb/HOME.php. Database URL: http://bsbe.iiti.ac.in/bsbe/naldb/HOME.php PMID:26896846

  20. A dynamic clinical dental relational database.

    PubMed

    Taylor, D; Naguib, R N G; Boulton, S

    2004-09-01

    The traditional approach to relational database design is based on the logical organization of data into a number of related normalized tables. One assumption is that the nature and structure of the data is known at the design stage. In the case of designing a relational database to store historical dental epidemiological data from individual clinical surveys, the structure of the data is not known until the data is presented for inclusion into the database. This paper addresses the issues concerned with the theoretical design of a clinical dynamic database capable of adapting the internal table structure to accommodate clinical survey data, and presents a prototype database application capable of processing, displaying, and querying the dental data.

  1. The Computer Bulletin Board. Modified Gran Plots of Very Weak Acids on a Spreadsheet.

    ERIC Educational Resources Information Center

    Chau, F. T.; And Others

    1990-01-01

    Presented are two applications of computer technology to chemistry instruction: the use of a spreadsheet program to analyze acid-base titration curves and the use of database software to catalog stockroom inventories. (CW)

  2. Fragger: a protein fragment picker for structural queries.

    PubMed

    Berenger, Francois; Simoncini, David; Voet, Arnout; Shrestha, Rojan; Zhang, Kam Y J

    2017-01-01

    Protein modeling and design activities often require querying the Protein Data Bank (PDB) with a structural fragment, possibly containing gaps. For some applications, it is preferable to work on a specific subset of the PDB or with unpublished structures. These requirements, along with specific user needs, motivated the creation of a new software to manage and query 3D protein fragments. Fragger is a protein fragment picker that allows protein fragment databases to be created and queried. All fragment lengths are supported and any set of PDB files can be used to create a database. Fragger can efficiently search a fragment database with a query fragment and a distance threshold. Matching fragments are ranked by distance to the query. The query fragment can have structural gaps and the allowed amino acid sequences matching a query can be constrained via a regular expression of one-letter amino acid codes. Fragger also incorporates a tool to compute the backbone RMSD of one versus many fragments in high throughput. Fragger should be useful for protein design, loop grafting and related structural bioinformatics tasks.

  3. A data analysis expert system for large established distributed databases

    NASA Technical Reports Server (NTRS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-01-01

    A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.

  4. Structure and weights optimisation of a modified Elman network emotion classifier using hybrid computational intelligence algorithms: a comparative study

    NASA Astrophysics Data System (ADS)

    Sheikhan, Mansour; Abbasnezhad Arabi, Mahdi; Gharavian, Davood

    2015-10-01

    Artificial neural networks are efficient models in pattern recognition applications, but their performance is dependent on employing suitable structure and connection weights. This study used a hybrid method for obtaining the optimal weight set and architecture of a recurrent neural emotion classifier based on gravitational search algorithm (GSA) and its binary version (BGSA), respectively. By considering the features of speech signal that were related to prosody, voice quality, and spectrum, a rich feature set was constructed. To select more efficient features, a fast feature selection method was employed. The performance of the proposed hybrid GSA-BGSA method was compared with similar hybrid methods based on particle swarm optimisation (PSO) algorithm and its binary version, PSO and discrete firefly algorithm, and hybrid of error back-propagation and genetic algorithm that were used for optimisation. Experimental tests on Berlin emotional database demonstrated the superior performance of the proposed method using a lighter network structure.

  5. UQlust: combining profile hashing with linear-time ranking for efficient clustering and analysis of big macromolecular data.

    PubMed

    Adamczak, Rafal; Meller, Jarek

    2016-12-28

    Advances in computing have enabled current protein and RNA structure prediction and molecular simulation methods to dramatically increase their sampling of conformational spaces. The quickly growing number of experimentally resolved structures, and databases such as the Protein Data Bank, also implies large scale structural similarity analyses to retrieve and classify macromolecular data. Consequently, the computational cost of structure comparison and clustering for large sets of macromolecular structures has become a bottleneck that necessitates further algorithmic improvements and development of efficient software solutions. uQlust is a versatile and easy-to-use tool for ultrafast ranking and clustering of macromolecular structures. uQlust makes use of structural profiles of proteins and nucleic acids, while combining a linear-time algorithm for implicit comparison of all pairs of models with profile hashing to enable efficient clustering of large data sets with a low memory footprint. In addition to ranking and clustering of large sets of models of the same protein or RNA molecule, uQlust can also be used in conjunction with fragment-based profiles in order to cluster structures of arbitrary length. For example, hierarchical clustering of the entire PDB using profile hashing can be performed on a typical laptop, thus opening an avenue for structural explorations previously limited to dedicated resources. The uQlust package is freely available under the GNU General Public License at https://github.com/uQlust . uQlust represents a drastic reduction in the computational complexity and memory requirements with respect to existing clustering and model quality assessment methods for macromolecular structure analysis, while yielding results on par with traditional approaches for both proteins and RNAs.

  6. Vanderbilt University Institute of Imaging Science Center for Computational Imaging XNAT: A multimodal data archive and processing environment.

    PubMed

    Harrigan, Robert L; Yvernault, Benjamin C; Boyd, Brian D; Damon, Stephen M; Gibney, Kyla David; Conrad, Benjamin N; Phillips, Nicholas S; Rogers, Baxter P; Gao, Yurui; Landman, Bennett A

    2016-01-01

    The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has developed a database built on XNAT housing over a quarter of a million scans. The database provides framework for (1) rapid prototyping, (2) large scale batch processing of images and (3) scalable project management. The system uses the web-based interfaces of XNAT and REDCap to allow for graphical interaction. A python middleware layer, the Distributed Automation for XNAT (DAX) package, distributes computation across the Vanderbilt Advanced Computing Center for Research and Education high performance computing center. All software are made available in open source for use in combining portable batch scripting (PBS) grids and XNAT servers. Copyright © 2015 Elsevier Inc. All rights reserved.

  7. Novel Hybrid Virtual Screening Protocol Based on Molecular Docking and Structure-Based Pharmacophore for Discovery of Methionyl-tRNA Synthetase Inhibitors as Antibacterial Agents

    PubMed Central

    Liu, Chi; He, Gu; Jiang, Qinglin; Han, Bo; Peng, Cheng

    2013-01-01

    Methione tRNA synthetase (MetRS) is an essential enzyme involved in protein biosynthesis in all living organisms and is a potential antibacterial target. In the current study, the structure-based pharmacophore (SBP)-guided method has been suggested to generate a comprehensive pharmacophore of MetRS based on fourteen crystal structures of MetRS-inhibitor complexes. In this investigation, a hybrid protocol of a virtual screening method, comprised of pharmacophore model-based virtual screening (PBVS), rigid and flexible docking-based virtual screenings (DBVS), is used for retrieving new MetRS inhibitors from commercially available chemical databases. This hybrid virtual screening approach was then applied to screen the Specs (202,408 compounds) database, a structurally diverse chemical database. Fifteen hit compounds were selected from the final hits and shifted to experimental studies. These results may provide important information for further research of novel MetRS inhibitors as antibacterial agents. PMID:23839093

  8. Building an Integrated Environment for Multimedia

    NASA Technical Reports Server (NTRS)

    1997-01-01

    Multimedia courseware on the solar system and earth science suitable for use in elementary, middle, and high schools was developed under this grant. The courseware runs on Silicon Graphics, Incorporated (SGI) workstations and personal computers (PCs). There is also a version of the courseware accessible via the World Wide Web. Accompanying multimedia database systems were also developed to enhance the multimedia courseware. The database systems accompanying the PC software are based on the relational model, while the database systems accompanying the SGI software are based on the object-oriented model.

  9. MODBASE, a database of annotated comparative protein structure models

    PubMed Central

    Pieper, Ursula; Eswar, Narayanan; Stuart, Ashley C.; Ilyin, Valentin A.; Sali, Andrej

    2002-01-01

    MODBASE (http://guitar.rockefeller.edu/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on PSI-BLAST, IMPALA and MODELLER. MODBASE uses the MySQL relational database management system for flexible and efficient querying, and the MODVIEW Netscape plugin for viewing and manipulating multiple sequences and structures. It is updated regularly to reflect the growth of the protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different datasets. The largest dataset contains models for domains in 304 517 out of 539 171 unique protein sequences in the complete TrEMBL database (23 March 2001); only models based on significant alignments (PSI-BLAST E-value < 10–4) and models assessed to have the correct fold are included. Other datasets include models for target selection and structure-based annotation by the New York Structural Genomics Research Consortium, models for prediction of genes in the Drosophila melanogaster genome, models for structure determination of several ribosomal particles and models calculated by the MODWEB comparative modeling web server. PMID:11752309

  10. Relational databases: a transparent framework for encouraging biology students to think informatically.

    PubMed

    Rice, Michael; Gladstone, William; Weir, Michael

    2004-01-01

    We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills.

  11. Relational Databases: A Transparent Framework for Encouraging Biology Students To Think Informatically

    PubMed Central

    2004-01-01

    We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a custom algorithm using Drosophila cDNA transcripts and genomic DNA and supports a set of procedures for analyzing splice-site sequence space. A generic Web interface permits the execution of the procedures with a variety of parameter settings and also supports custom structured query language queries. Moreover, new analytical procedures can be added by updating special metatables in the database without altering the Web interface. The database provides a powerful setting for students to develop informatic thinking skills. PMID:15592597

  12. Predicting domain-domain interaction based on domain profiles with feature selection and support vector machines

    PubMed Central

    2010-01-01

    Background Protein-protein interaction (PPI) plays essential roles in cellular functions. The cost, time and other limitations associated with the current experimental methods have motivated the development of computational methods for predicting PPIs. As protein interactions generally occur via domains instead of the whole molecules, predicting domain-domain interaction (DDI) is an important step toward PPI prediction. Computational methods developed so far have utilized information from various sources at different levels, from primary sequences, to molecular structures, to evolutionary profiles. Results In this paper, we propose a computational method to predict DDI using support vector machines (SVMs), based on domains represented as interaction profile hidden Markov models (ipHMM) where interacting residues in domains are explicitly modeled according to the three dimensional structural information available at the Protein Data Bank (PDB). Features about the domains are extracted first as the Fisher scores derived from the ipHMM and then selected using singular value decomposition (SVD). Domain pairs are represented by concatenating their selected feature vectors, and classified by a support vector machine trained on these feature vectors. The method is tested by leave-one-out cross validation experiments with a set of interacting protein pairs adopted from the 3DID database. The prediction accuracy has shown significant improvement as compared to InterPreTS (Interaction Prediction through Tertiary Structure), an existing method for PPI prediction that also uses the sequences and complexes of known 3D structure. Conclusions We show that domain-domain interaction prediction can be significantly enhanced by exploiting information inherent in the domain profiles via feature selection based on Fisher scores, singular value decomposition and supervised learning based on support vector machines. Datasets and source code are freely available on the web at http://liao.cis.udel.edu/pub/svdsvm. Implemented in Matlab and supported on Linux and MS Windows. PMID:21034480

  13. Real-time solution of linear computational problems using databases of parametric reduced-order models with arbitrary underlying meshes

    NASA Astrophysics Data System (ADS)

    Amsallem, David; Tezaur, Radek; Farhat, Charbel

    2016-12-01

    A comprehensive approach for real-time computations using a database of parametric, linear, projection-based reduced-order models (ROMs) based on arbitrary underlying meshes is proposed. In the offline phase of this approach, the parameter space is sampled and linear ROMs defined by linear reduced operators are pre-computed at the sampled parameter points and stored. Then, these operators and associated ROMs are transformed into counterparts that satisfy a certain notion of consistency. In the online phase of this approach, a linear ROM is constructed in real-time at a queried but unsampled parameter point by interpolating the pre-computed linear reduced operators on matrix manifolds and therefore computing an interpolated linear ROM. The proposed overall model reduction framework is illustrated with two applications: a parametric inverse acoustic scattering problem associated with a mockup submarine, and a parametric flutter prediction problem associated with a wing-tank system. The second application is implemented on a mobile device, illustrating the capability of the proposed computational framework to operate in real-time.

  14. A reactive, scalable, and transferable model for molecular energies from a neural network approach based on local information

    NASA Astrophysics Data System (ADS)

    Unke, Oliver T.; Meuwly, Markus

    2018-06-01

    Despite the ever-increasing computer power, accurate ab initio calculations for large systems (thousands to millions of atoms) remain infeasible. Instead, approximate empirical energy functions are used. Most current approaches are either transferable between different chemical systems, but not particularly accurate, or they are fine-tuned to a specific application. In this work, a data-driven method to construct a potential energy surface based on neural networks is presented. Since the total energy is decomposed into local atomic contributions, the evaluation is easily parallelizable and scales linearly with system size. With prediction errors below 0.5 kcal mol-1 for both unknown molecules and configurations, the method is accurate across chemical and configurational space, which is demonstrated by applying it to datasets from nonreactive and reactive molecular dynamics simulations and a diverse database of equilibrium structures. The possibility to use small molecules as reference data to predict larger structures is also explored. Since the descriptor only uses local information, high-level ab initio methods, which are computationally too expensive for large molecules, become feasible for generating the necessary reference data used to train the neural network.

  15. Computational modeling-based discovery of novel classes of anti-inflammatory drugs that target lanthionine synthetase C-like protein 2.

    PubMed

    Lu, Pinyi; Hontecillas, Raquel; Horne, William T; Carbo, Adria; Viladomiu, Monica; Pedragosa, Mireia; Bevan, David R; Lewis, Stephanie N; Bassaganya-Riera, Josep

    2012-01-01

    Lanthionine synthetase component C-like protein 2 (LANCL2) is a member of the eukaryotic lanthionine synthetase component C-Like protein family involved in signal transduction and insulin sensitization. Recently, LANCL2 is a target for the binding and signaling of abscisic acid (ABA), a plant hormone with anti-diabetic and anti-inflammatory effects. The goal of this study was to determine the role of LANCL2 as a potential therapeutic target for developing novel drugs and nutraceuticals against inflammatory diseases. Previously, we performed homology modeling to construct a three-dimensional structure of LANCL2 using the crystal structure of lanthionine synthetase component C-like protein 1 (LANCL1) as a template. Using this model, structure-based virtual screening was performed using compounds from NCI (National Cancer Institute) Diversity Set II, ChemBridge, ZINC natural products, and FDA-approved drugs databases. Several potential ligands were identified using molecular docking. In order to validate the anti-inflammatory efficacy of the top ranked compound (NSC61610) in the NCI Diversity Set II, a series of in vitro and pre-clinical efficacy studies were performed using a mouse model of dextran sodium sulfate (DSS)-induced colitis. Our findings showed that the lead compound, NSC61610, activated peroxisome proliferator-activated receptor gamma in a LANCL2- and adenylate cyclase/cAMP dependent manner in vitro and ameliorated experimental colitis by down-modulating colonic inflammatory gene expression and favoring regulatory T cell responses. LANCL2 is a novel therapeutic target for inflammatory diseases. High-throughput, structure-based virtual screening is an effective computational-based drug design method for discovering anti-inflammatory LANCL2-based drug candidates.

  16. Computational Modeling-Based Discovery of Novel Classes of Anti-Inflammatory Drugs That Target Lanthionine Synthetase C-Like Protein 2

    PubMed Central

    Lu, Pinyi; Hontecillas, Raquel; Horne, William T.; Carbo, Adria; Viladomiu, Monica; Pedragosa, Mireia; Bevan, David R.; Lewis, Stephanie N.; Bassaganya-Riera, Josep

    2012-01-01

    Background Lanthionine synthetase component C-like protein 2 (LANCL2) is a member of the eukaryotic lanthionine synthetase component C-Like protein family involved in signal transduction and insulin sensitization. Recently, LANCL2 is a target for the binding and signaling of abscisic acid (ABA), a plant hormone with anti-diabetic and anti-inflammatory effects. Methodology/Principal Findings The goal of this study was to determine the role of LANCL2 as a potential therapeutic target for developing novel drugs and nutraceuticals against inflammatory diseases. Previously, we performed homology modeling to construct a three-dimensional structure of LANCL2 using the crystal structure of lanthionine synthetase component C-like protein 1 (LANCL1) as a template. Using this model, structure-based virtual screening was performed using compounds from NCI (National Cancer Institute) Diversity Set II, ChemBridge, ZINC natural products, and FDA-approved drugs databases. Several potential ligands were identified using molecular docking. In order to validate the anti-inflammatory efficacy of the top ranked compound (NSC61610) in the NCI Diversity Set II, a series of in vitro and pre-clinical efficacy studies were performed using a mouse model of dextran sodium sulfate (DSS)-induced colitis. Our findings showed that the lead compound, NSC61610, activated peroxisome proliferator-activated receptor gamma in a LANCL2- and adenylate cyclase/cAMP dependent manner in vitro and ameliorated experimental colitis by down-modulating colonic inflammatory gene expression and favoring regulatory T cell responses. Conclusions/Significance LANCL2 is a novel therapeutic target for inflammatory diseases. High-throughput, structure-based virtual screening is an effective computational-based drug design method for discovering anti-inflammatory LANCL2-based drug candidates. PMID:22509338

  17. End-User Use of Data Base Query Language: Pros and Cons.

    ERIC Educational Resources Information Center

    Nicholes, Walter

    1988-01-01

    Man-machine interface, the concept of a computer "query," a review of database technology, and a description of the use of query languages at Brigham Young University are discussed. The pros and cons of end-user use of database query languages are explored. (Author/MLW)

  18. Update of aircraft profile data for the Integrated Noise Model computer program, vol 1: final report

    DOT National Transportation Integrated Search

    1992-03-01

    This report provides aircraft takeoff and landing profiles, aircraft aerodynamic performance coefficients and engine performance coefficients for the aircraft data base (Database 9) in the Integrated Noise Model (INM) computer program. Flight profile...

  19. FLASHFLOOD: A 3D Field-based similarity search and alignment method for flexible molecules

    NASA Astrophysics Data System (ADS)

    Pitman, Michael C.; Huber, Wolfgang K.; Horn, Hans; Krämer, Andreas; Rice, Julia E.; Swope, William C.

    2001-07-01

    A three-dimensional field-based similarity search and alignment method for flexible molecules is introduced. The conformational space of a flexible molecule is represented in terms of fragments and torsional angles of allowed conformations. A user-definable property field is used to compute features of fragment pairs. Features are generalizations of CoMMA descriptors (Silverman, B.D. and Platt, D.E., J. Med. Chem., 39 (1996) 2129.) that characterize local regions of the property field by its local moments. The features are invariant under coordinate system transformations. Features taken from a query molecule are used to form alignments with fragment pairs in the database. An assembly algorithm is then used to merge the fragment pairs into full structures, aligned to the query. Key to the method is the use of a context adaptive descriptor scaling procedure as the basis for similarity. This allows the user to tune the weights of the various feature components based on examples relevant to the particular context under investigation. The property fields may range from simple, phenomenological fields, to fields derived from quantum mechanical calculations. We apply the method to the dihydrofolate/methotrexate benchmark system, and show that when one injects relevant contextual information into the descriptor scaling procedure, better results are obtained more efficiently. We also show how the method works and include computer times for a query from a database that represents approximately 23 million conformers of seventeen flexible molecules.

  20. Computer-Based Testing in the Medical Curriculum: A Decade of Experiences at One School

    ERIC Educational Resources Information Center

    McNulty, John; Chandrasekhar, Arcot; Hoyt, Amy; Gruener, Gregory; Espiritu, Baltazar; Price, Ron, Jr.

    2011-01-01

    This report summarizes more than a decade of experiences with implementing computer-based testing across a 4-year medical curriculum. Practical considerations are given to the fields incorporated within an item database and their use in the creation and analysis of examinations, security issues in the delivery and integrity of examinations,…

  1. Computer-based Interactive Literature Searching for CSU-Chico Chemistry Students.

    ERIC Educational Resources Information Center

    Cooke, Ron C.; And Others

    The intent of this instructional manual, which is aimed at exploring the literature of a discipline and presented in a self-paced, course segment format applicable to any course content, is to enable college students to conduct computer-based interactive searches through multiple databases. The manual is divided into 10 chapters: (1) Introduction,…

  2. Geo-spatial Service and Application based on National E-government Network Platform and Cloud

    NASA Astrophysics Data System (ADS)

    Meng, X.; Deng, Y.; Li, H.; Yao, L.; Shi, J.

    2014-04-01

    With the acceleration of China's informatization process, our party and government take a substantive stride in advancing development and application of digital technology, which promotes the evolution of e-government and its informatization. Meanwhile, as a service mode based on innovative resources, cloud computing may connect huge pools together to provide a variety of IT services, and has become one relatively mature technical pattern with further studies and massive practical applications. Based on cloud computing technology and national e-government network platform, "National Natural Resources and Geospatial Database (NRGD)" project integrated and transformed natural resources and geospatial information dispersed in various sectors and regions, established logically unified and physically dispersed fundamental database and developed national integrated information database system supporting main e-government applications. Cross-sector e-government applications and services are realized to provide long-term, stable and standardized natural resources and geospatial fundamental information products and services for national egovernment and public users.

  3. Discovery and Development of ATP-Competitive mTOR Inhibitors Using Computational Approaches.

    PubMed

    Luo, Yao; Wang, Ling

    2017-11-16

    The mammalian target of rapamycin (mTOR) is a central controller of cell growth, proliferation, metabolism, and angiogenesis. This protein is an attractive target for new anticancer drug development. Significant progress has been made in hit discovery, lead optimization, drug candidate development and determination of the three-dimensional (3D) structure of mTOR. Computational methods have been applied to accelerate the discovery and development of mTOR inhibitors helping to model the structure of mTOR, screen compound databases, uncover structure-activity relationship (SAR) and optimize the hits, mine the privileged fragments and design focused libraries. Besides, computational approaches were also applied to study protein-ligand interactions mechanisms and in natural product-driven drug discovery. Herein, we survey the most recent progress on the application of computational approaches to advance the discovery and development of compounds targeting mTOR. Future directions in the discovery of new mTOR inhibitors using computational methods are also discussed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  4. Databases and Associated Tools for Glycomics and Glycoproteomics.

    PubMed

    Lisacek, Frederique; Mariethoz, Julien; Alocci, Davide; Rudd, Pauline M; Abrahams, Jodie L; Campbell, Matthew P; Packer, Nicolle H; Ståhle, Jonas; Widmalm, Göran; Mullen, Elaine; Adamczyk, Barbara; Rojas-Macias, Miguel A; Jin, Chunsheng; Karlsson, Niclas G

    2017-01-01

    The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).

  5. Soft computing approach to 3D lung nodule segmentation in CT.

    PubMed

    Badura, P; Pietka, E

    2014-10-01

    This paper presents a novel, multilevel approach to the segmentation of various types of pulmonary nodules in computed tomography studies. It is based on two branches of computational intelligence: the fuzzy connectedness (FC) and the evolutionary computation. First, the image and auxiliary data are prepared for the 3D FC analysis during the first stage of an algorithm - the masks generation. Its main goal is to process some specific types of nodules connected to the pleura or vessels. It consists of some basic image processing operations as well as dedicated routines for the specific cases of nodules. The evolutionary computation is performed on the image and seed points in order to shorten the FC analysis and improve its accuracy. After the FC application, the remaining vessels are removed during the postprocessing stage. The method has been validated using the first dataset of studies acquired and described by the Lung Image Database Consortium (LIDC) and by its latest release - the LIDC-IDRI (Image Database Resource Initiative) database. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Image Engine: an object-oriented multimedia database for storing, retrieving and sharing medical images and text.

    PubMed Central

    Lowe, H. J.

    1993-01-01

    This paper describes Image Engine, an object-oriented, microcomputer-based, multimedia database designed to facilitate the storage and retrieval of digitized biomedical still images, video, and text using inexpensive desktop computers. The current prototype runs on Apple Macintosh computers and allows network database access via peer to peer file sharing protocols. Image Engine supports both free text and controlled vocabulary indexing of multimedia objects. The latter is implemented using the TView thesaurus model developed by the author. The current prototype of Image Engine uses the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary (with UMLS Meta-1 extensions) as its indexing thesaurus. PMID:8130596

  7. HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing

    PubMed Central

    Karimi, Ramin; Hajdu, Andras

    2016-01-01

    Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678

  8. HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

    PubMed

    Karimi, Ramin; Hajdu, Andras

    2016-01-01

    Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.

  9. Security and privacy qualities of medical devices: an analysis of FDA postmarket surveillance.

    PubMed

    Kramer, Daniel B; Baker, Matthew; Ransford, Benjamin; Molina-Markham, Andres; Stewart, Quinn; Fu, Kevin; Reynolds, Matthew R

    2012-01-01

    Medical devices increasingly depend on computing functions such as wireless communication and Internet connectivity for software-based control of therapies and network-based transmission of patients' stored medical information. These computing capabilities introduce security and privacy risks, yet little is known about the prevalence of such risks within the clinical setting. We used three comprehensive, publicly available databases maintained by the Food and Drug Administration (FDA) to evaluate recalls and adverse events related to security and privacy risks of medical devices. Review of weekly enforcement reports identified 1,845 recalls; 605 (32.8%) of these included computers, 35 (1.9%) stored patient data, and 31 (1.7%) were capable of wireless communication. Searches of databases specific to recalls and adverse events identified only one event with a specific connection to security or privacy. Software-related recalls were relatively common, and most (81.8%) mentioned the possibility of upgrades, though only half of these provided specific instructions for the update mechanism. Our review of recalls and adverse events from federal government databases reveals sharp inconsistencies with databases at individual providers with respect to security and privacy risks. Recalls related to software may increase security risks because of unprotected update and correction mechanisms. To detect signals of security and privacy problems that adversely affect public health, federal postmarket surveillance strategies should rethink how to effectively and efficiently collect data on security and privacy problems in devices that increasingly depend on computing systems susceptible to malware.

  10. Security and Privacy Qualities of Medical Devices: An Analysis of FDA Postmarket Surveillance

    PubMed Central

    Kramer, Daniel B.; Baker, Matthew; Ransford, Benjamin; Molina-Markham, Andres; Stewart, Quinn; Fu, Kevin; Reynolds, Matthew R.

    2012-01-01

    Background Medical devices increasingly depend on computing functions such as wireless communication and Internet connectivity for software-based control of therapies and network-based transmission of patients’ stored medical information. These computing capabilities introduce security and privacy risks, yet little is known about the prevalence of such risks within the clinical setting. Methods We used three comprehensive, publicly available databases maintained by the Food and Drug Administration (FDA) to evaluate recalls and adverse events related to security and privacy risks of medical devices. Results Review of weekly enforcement reports identified 1,845 recalls; 605 (32.8%) of these included computers, 35 (1.9%) stored patient data, and 31 (1.7%) were capable of wireless communication. Searches of databases specific to recalls and adverse events identified only one event with a specific connection to security or privacy. Software-related recalls were relatively common, and most (81.8%) mentioned the possibility of upgrades, though only half of these provided specific instructions for the update mechanism. Conclusions Our review of recalls and adverse events from federal government databases reveals sharp inconsistencies with databases at individual providers with respect to security and privacy risks. Recalls related to software may increase security risks because of unprotected update and correction mechanisms. To detect signals of security and privacy problems that adversely affect public health, federal postmarket surveillance strategies should rethink how to effectively and efficiently collect data on security and privacy problems in devices that increasingly depend on computing systems susceptible to malware. PMID:22829874

  11. Building High School Science Department Inventory Records Using the Appleworks Data Base Subprogram and Apple IIe or GS Computers.

    ERIC Educational Resources Information Center

    Schlenker, Richard M.

    This manual was developed for use as a "how to" training device and provides a step-by-step introduction to using AppleWorks in the database mode. Instructions are given to prepare the original database with the headings of the user's choice. Inserting information records in the new database is covered, along with changing the layout of…

  12. NAViGaTing the Micronome – Using Multiple MicroRNA Prediction Databases to Identify Signalling Pathway-Associated MicroRNAs

    PubMed Central

    Shirdel, Elize A.; Xie, Wing; Mak, Tak W.; Jurisica, Igor

    2011-01-01

    Background MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome – referred to as the micronome – to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal — mirDIP (http://ophid.utoronto.ca/mirDIP). Results mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Conclusions Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level. PMID:21364759

  13. Segmentation of lung nodules in computed tomography images using dynamic programming and multidirection fusion techniques.

    PubMed

    Wang, Qian; Song, Enmin; Jin, Renchao; Han, Ping; Wang, Xiaotong; Zhou, Yanying; Zeng, Jianchao

    2009-06-01

    The aim of this study was to develop a novel algorithm for segmenting lung nodules on three-dimensional (3D) computed tomographic images to improve the performance of computer-aided diagnosis (CAD) systems. The database used in this study consists of two data sets obtained from the Lung Imaging Database Consortium. The first data set, containing 23 nodules (22% irregular nodules, 13% nonsolid nodules, 17% nodules attached to other structures), was used for training. The second data set, containing 64 nodules (37% irregular nodules, 40% nonsolid nodules, 62% nodules attached to other structures), was used for testing. Two key techniques were developed in the segmentation algorithm: (1) a 3D extended dynamic programming model, with a newly defined internal cost function based on the information between adjacent slices, allowing parameters to be adapted to each slice, and (2) a multidirection fusion technique, which makes use of the complementary relationships among different directions to improve the final segmentation accuracy. The performance of this approach was evaluated by the overlap criterion, complemented by the true-positive fraction and the false-positive fraction criteria. The mean values of the overlap, true-positive fraction, and false-positive fraction for the first data set achieved using the segmentation scheme were 66%, 75%, and 15%, respectively, and the corresponding values for the second data set were 58%, 71%, and 22%, respectively. The experimental results indicate that this segmentation scheme can achieve better performance for nodule segmentation than two existing algorithms reported in the literature. The proposed 3D extended dynamic programming model is an effective way to segment sequential images of lung nodules. The proposed multidirection fusion technique is capable of reducing segmentation errors especially for no-nodule and near-end slices, thus resulting in better overall performance.

  14. Prediction of beta-turns in proteins using the first-order Markov models.

    PubMed

    Lin, Thy-Hou; Wang, Ging-Ming; Wang, Yen-Tseng

    2002-01-01

    We present a method based on the first-order Markov models for predicting simple beta-turns and loops containing multiple turns in proteins. Sequences of 338 proteins in a database are divided using the published turn criteria into the following three regions, namely, the turn, the boundary, and the nonturn ones. A transition probability matrix is constructed for either the turn or the nonturn region using the weighted transition probabilities computed for dipeptides identified from each region. There are two such matrices constructed for the boundary region since the transition probabilities for dipeptides immediately preceding or following a turn are different. The window used for scanning a protein sequence from amino (N-) to carboxyl (C-) terminal is a hexapeptide since the transition probability computed for a turn tetrapeptide is capped at both the N- and C- termini with a boundary transition probability indexed respectively from the two boundary transition matrices. A sum of the averaged product of the transition probabilities of all the hexapeptides involving each residue is computed. This is then weighted with a probability computed from assuming that all the hexapeptides are from the nonturn region to give the final prediction quantity. Both simple beta-turns and loops containing multiple turns in a protein are then identified by the rising of the prediction quantity computed. The performance of the prediction scheme or the percentage (%) of correct prediction is evaluated through computation of Matthews correlation coefficients for each protein predicted. It is found that the prediction method is capable of giving prediction results with better correlation between the percent of correct prediction and the Matthews correlation coefficients for a group of test proteins as compared with those predicted using some secondary structural prediction methods. The prediction accuracy for about 40% of proteins in the database or 50% of proteins in the test set is better than 70%. Such a percentage for the test set is reduced to 30 if the structures of all the proteins in the set are treated as unknown.

  15. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity

    PubMed Central

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping). PMID:29065644

  16. Computer-Aided Clinical Trial Recruitment Based on Domain-Specific Language Translation: A Case Study of Retinopathy of Prematurity.

    PubMed

    Zhang, Yinsheng; Zhang, Guoming; Shang, Qian

    2017-01-01

    Reusing the data from healthcare information systems can effectively facilitate clinical trials (CTs). How to select candidate patients eligible for CT recruitment criteria is a central task. Related work either depends on DBA (database administrator) to convert the recruitment criteria to native SQL queries or involves the data mapping between a standard ontology/information model and individual data source schema. This paper proposes an alternative computer-aided CT recruitment paradigm, based on syntax translation between different DSLs (domain-specific languages). In this paradigm, the CT recruitment criteria are first formally represented as production rules. The referenced rule variables are all from the underlying database schema. Then the production rule is translated to an intermediate query-oriented DSL (e.g., LINQ). Finally, the intermediate DSL is directly mapped to native database queries (e.g., SQL) automated by ORM (object-relational mapping).

  17. The CEBAF Element Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theodore Larrieu, Christopher Slominski, Michele Joyce

    2011-03-01

    With the inauguration of the CEBAF Element Database (CED) in Fall 2010, Jefferson Lab computer scientists have taken a step toward the eventual goal of a model-driven accelerator. Once fully populated, the database will be the primary repository of information used for everything from generating lattice decks to booting control computers to building controls screens. A requirement influencing the CED design is that it provide access to not only present, but also future and past configurations of the accelerator. To accomplish this, an introspective database schema was designed that allows new elements, types, and properties to be defined on-the-fly withmore » no changes to table structure. Used in conjunction with Oracle Workspace Manager, it allows users to query data from any time in the database history with the same tools used to query the present configuration. Users can also check-out workspaces to use as staging areas for upcoming machine configurations. All Access to the CED is through a well-documented Application Programming Interface (API) that is translated automatically from original C++ source code into native libraries for scripting languages such as perl, php, and TCL making access to the CED easy and ubiquitous.« less

  18. Automated bond order assignment as an optimization problem.

    PubMed

    Dehof, Anna Katharina; Rurainski, Alexander; Bui, Quang Bao Anh; Böcker, Sebastian; Lenhof, Hans-Peter; Hildebrandt, Andreas

    2011-03-01

    Numerous applications in Computational Biology process molecular structures and hence strongly rely not only on correct atomic coordinates but also on correct bond order information. For proteins and nucleic acids, bond orders can be easily deduced but this does not hold for other types of molecules like ligands. For ligands, bond order information is not always provided in molecular databases and thus a variety of approaches tackling this problem have been developed. In this work, we extend an ansatz proposed by Wang et al. that assigns connectivity-based penalty scores and tries to heuristically approximate its optimum. In this work, we present three efficient and exact solvers for the problem replacing the heuristic approximation scheme of the original approach: an A*, an ILP and an fixed-parameter approach (FPT) approach. We implemented and evaluated the original implementation, our A*, ILP and FPT formulation on the MMFF94 validation suite and the KEGG Drug database. We show the benefit of computing exact solutions of the penalty minimization problem and the additional gain when computing all optimal (or even suboptimal) solutions. We close with a detailed comparison of our methods. The A* and ILP solution are integrated into the open-source C++ LGPL library BALL and the molecular visualization and modelling tool BALLView and can be downloaded from our homepage www.ball-project.org. The FPT implementation can be downloaded from http://bio.informatik.uni-jena.de/software/.

  19. Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses

    PubMed Central

    Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

    2015-01-01

    Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099

  20. Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.

    PubMed

    Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

    2015-01-01

    Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.

  1. [A systematic evaluation of application of the web-based cancer database].

    PubMed

    Huang, Tingting; Liu, Jialin; Li, Yong; Zhang, Rui

    2013-10-01

    In order to support the theory and practice of the web-based cancer database development in China, we applied a systematic evaluation to assess the development condition of the web-based cancer databases at home and abroad. We performed computer-based retrieval of the Ovid-MEDLINE, Springerlink, EBSCOhost, Wiley Online Library and CNKI databases, the papers of which were published between Jan. 1995 and Dec. 2011, and retrieved the references of these papers by hand. We selected qualified papers according to the pre-established inclusion and exclusion criteria, and carried out information extraction and analysis of the papers. Eventually, searching the online database, we obtained 1244 papers, and checking the reference lists, we found other 19 articles. Thirty-one articles met the inclusion and exclusion criteria and we extracted the proofs and assessed them. Analyzing these evidences showed that the U.S.A. counted for 26% in the first place. Thirty-nine percent of these web-based cancer databases are comprehensive cancer databases. As for single cancer databases, breast cancer and prostatic cancer are on the top, both counting for 10% respectively. Thirty-two percent of the cancer database are associated with cancer gene information. For the technical applications, MySQL and PHP applied most widely, nearly 23% each.

  2. On-Line Database of Vibration-Based Damage Detection Experiments

    NASA Technical Reports Server (NTRS)

    Pappa, Richard S.; Doebling, Scott W.; Kholwad, Tina D.

    2000-01-01

    This paper describes a new, on-line bibliographic database of vibration-based damage detection experiments. Publications in the database discuss experiments conducted on actual structures as well as those conducted with simulated data. The database can be searched and sorted in many ways, and it provides photographs of test structures when available. It currently contains 100 publications, which is estimated to be about 5-10% of the number of papers written to date on this subject. Additional entries are forthcoming. This database is available for public use on the Internet at the following address: http://sdbpappa-mac.larc.nasa.gov. Click on the link named "dd_experiments.fp3" and then type "guest" as the password. No user name is required.

  3. The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.

    PubMed

    Takashima, S; Yamaoka, K

    1999-08-30

    Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.

  4. A Computational Approach From Gene to Structure Analysis of the Human ABCA4 Transporter Involved in Genetic Retinal Diseases.

    PubMed

    Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia

    2017-10-01

    The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.

  5. Concentrations of indoor pollutants database: User's manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1992-05-01

    This manual describes the computer-based database on indoor air pollutants. This comprehensive database alloys helps utility personnel perform rapid searches on literature related to indoor air pollutants. Besides general information, it provides guidance for finding specific information on concentrations of indoor air pollutants. The manual includes information on installing and using the database as well as a tutorial to assist the user in becoming familiar with the procedures involved in doing bibliographic and summary section searches. The manual demonstrates how to search for information by going through a series of questions that provide search parameters such as pollutants type, year,more » building type, keywords (from a specific list), country, geographic region, author's last name, and title. As more and more parameters are specified, the list of references found in the data search becomes smaller and more specific to the user's needs. Appendixes list types of information that can be input into the database when making a request. The CIP database allows individual utilities to obtain information on indoor air quality based on building types and other factors in their own service territory. This information is useful for utilities with concerns about indoor air quality and the control of indoor air pollutants. The CIP database itself is distributed by the Electric Power Software Center and runs on IBM PC-compatible computers.« less

  6. Concentrations of indoor pollutants database: User`s manual

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    1992-05-01

    This manual describes the computer-based database on indoor air pollutants. This comprehensive database alloys helps utility personnel perform rapid searches on literature related to indoor air pollutants. Besides general information, it provides guidance for finding specific information on concentrations of indoor air pollutants. The manual includes information on installing and using the database as well as a tutorial to assist the user in becoming familiar with the procedures involved in doing bibliographic and summary section searches. The manual demonstrates how to search for information by going through a series of questions that provide search parameters such as pollutants type, year,more » building type, keywords (from a specific list), country, geographic region, author`s last name, and title. As more and more parameters are specified, the list of references found in the data search becomes smaller and more specific to the user`s needs. Appendixes list types of information that can be input into the database when making a request. The CIP database allows individual utilities to obtain information on indoor air quality based on building types and other factors in their own service territory. This information is useful for utilities with concerns about indoor air quality and the control of indoor air pollutants. The CIP database itself is distributed by the Electric Power Software Center and runs on IBM PC-compatible computers.« less

  7. Normal values and standardization of parameters in nuclear cardiology: Japanese Society of Nuclear Medicine working group database.

    PubMed

    Nakajima, Kenichi; Matsumoto, Naoya; Kasai, Tokuo; Matsuo, Shinro; Kiso, Keisuke; Okuda, Koichi

    2016-04-01

    As a 2-year project of the Japanese Society of Nuclear Medicine working group activity, normal myocardial imaging databases were accumulated and summarized. Stress-rest with gated and non-gated image sets were accumulated for myocardial perfusion imaging and could be used for perfusion defect scoring and normal left ventricular (LV) function analysis. For single-photon emission computed tomography (SPECT) with multi-focal collimator design, databases of supine and prone positions and computed tomography (CT)-based attenuation correction were created. The CT-based correction provided similar perfusion patterns between genders. In phase analysis of gated myocardial perfusion SPECT, a new approach for analyzing dyssynchrony, normal ranges of parameters for phase bandwidth, standard deviation and entropy were determined in four software programs. Although the results were not interchangeable, dependency on gender, ejection fraction and volumes were common characteristics of these parameters. Standardization of (123)I-MIBG sympathetic imaging was performed regarding heart-to-mediastinum ratio (HMR) using a calibration phantom method. The HMRs from any collimator types could be converted to the value with medium-energy comparable collimators. Appropriate quantification based on common normal databases and standard technology could play a pivotal role for clinical practice and researches.

  8. Accessing the public MIMIC-II intensive care relational database for clinical research.

    PubMed

    Scott, Daniel J; Lee, Joon; Silva, Ikaro; Park, Shinhyuk; Moody, George B; Celi, Leo A; Mark, Roger G

    2013-01-10

    The Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) database is a free, public resource for intensive care research. The database was officially released in 2006, and has attracted a growing number of researchers in academia and industry. We present the two major software tools that facilitate accessing the relational database: the web-based QueryBuilder and a downloadable virtual machine (VM) image. QueryBuilder and the MIMIC-II VM have been developed successfully and are freely available to MIMIC-II users. Simple example SQL queries and the resulting data are presented. Clinical studies pertaining to acute kidney injury and prediction of fluid requirements in the intensive care unit are shown as typical examples of research performed with MIMIC-II. In addition, MIMIC-II has also provided data for annual PhysioNet/Computing in Cardiology Challenges, including the 2012 Challenge "Predicting mortality of ICU Patients". QueryBuilder is a web-based tool that provides easy access to MIMIC-II. For more computationally intensive queries, one can locally install a complete copy of MIMIC-II in a VM. Both publicly available tools provide the MIMIC-II research community with convenient querying interfaces and complement the value of the MIMIC-II relational database.

  9. Cheminformatics meets molecular mechanics: a combined application of knowledge-based pose scoring and physical force field-based hit scoring functions improves the accuracy of structure-based virtual screening.

    PubMed

    Hsieh, Jui-Hua; Yin, Shuangye; Wang, Xiang S; Liu, Shubin; Dokholyan, Nikolay V; Tropsha, Alexander

    2012-01-23

    Poor performance of scoring functions is a well-known bottleneck in structure-based virtual screening (VS), which is most frequently manifested in the scoring functions' inability to discriminate between true ligands vs known nonbinders (therefore designated as binding decoys). This deficiency leads to a large number of false positive hits resulting from VS. We have hypothesized that filtering out or penalizing docking poses recognized as non-native (i.e., pose decoys) should improve the performance of VS in terms of improved identification of true binders. Using several concepts from the field of cheminformatics, we have developed a novel approach to identifying pose decoys from an ensemble of poses generated by computational docking procedures. We demonstrate that the use of target-specific pose (scoring) filter in combination with a physical force field-based scoring function (MedusaScore) leads to significant improvement of hit rates in VS studies for 12 of the 13 benchmark sets from the clustered version of the Database of Useful Decoys (DUD). This new hybrid scoring function outperforms several conventional structure-based scoring functions, including XSCORE::HMSCORE, ChemScore, PLP, and Chemgauss3, in 6 out of 13 data sets at early stage of VS (up 1% decoys of the screening database). We compare our hybrid method with several novel VS methods that were recently reported to have good performances on the same DUD data sets. We find that the retrieved ligands using our method are chemically more diverse in comparison with two ligand-based methods (FieldScreen and FLAP::LBX). We also compare our method with FLAP::RBLB, a high-performance VS method that also utilizes both the receptor and the cognate ligand structures. Interestingly, we find that the top ligands retrieved using our method are highly complementary to those retrieved using FLAP::RBLB, hinting effective directions for best VS applications. We suggest that this integrative VS approach combining cheminformatics and molecular mechanics methodologies may be applied to a broad variety of protein targets to improve the outcome of structure-based drug discovery studies.

  10. Impact of computational structure-based methods on drug discovery.

    PubMed

    Reynolds, Charles H

    2014-01-01

    Structure-based drug design has become an indispensible tool in drug discovery. The emergence of structure-based design is due to gains in structural biology that have provided exponential growth in the number of protein crystal structures, new computational algorithms and approaches for modeling protein-ligand interactions, and the tremendous growth of raw computer power in the last 30 years. Computer modeling and simulation have made major contributions to the discovery of many groundbreaking drugs in recent years. Examples are presented that highlight the evolution of computational structure-based design methodology, and the impact of that methodology on drug discovery.

  11. sc-PDB: a 3D-database of ligandable binding sites—10 years on

    PubMed Central

    Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier; Kellenberger, Esther

    2015-01-01

    The sc-PDB database (available at http://bioinfo-pharma.u-strasbg.fr/scPDB/) is a comprehensive and up-to-date selection of ligandable binding sites of the Protein Data Bank. Sites are defined from complexes between a protein and a pharmacological ligand. The database provides the all-atom description of the protein, its ligand, their binding site and their binding mode. Currently, the sc-PDB archive registers 9283 binding sites from 3678 unique proteins and 5608 unique ligands. The sc-PDB database was publicly launched in 2004 with the aim of providing structure files suitable for computational approaches to drug design, such as docking. During the last 10 years we have improved and standardized the processes for (i) identifying binding sites, (ii) correcting structures, (iii) annotating protein function and ligand properties and (iv) characterizing their binding mode. This paper presents the latest enhancements in the database, specifically pertaining to the representation of molecular interaction and to the similarity between ligand/protein binding patterns. The new website puts emphasis in pictorial analysis of data. PMID:25300483

  12. An online model composition tool for system biology models

    PubMed Central

    2013-01-01

    Background There are multiple representation formats for Systems Biology computational models, and the Systems Biology Markup Language (SBML) is one of the most widely used. SBML is used to capture, store, and distribute computational models by Systems Biology data sources (e.g., the BioModels Database) and researchers. Therefore, there is a need for all-in-one web-based solutions that support advance SBML functionalities such as uploading, editing, composing, visualizing, simulating, querying, and browsing computational models. Results We present the design and implementation of the Model Composition Tool (Interface) within the PathCase-SB (PathCase Systems Biology) web portal. The tool helps users compose systems biology models to facilitate the complex process of merging systems biology models. We also present three tools that support the model composition tool, namely, (1) Model Simulation Interface that generates a visual plot of the simulation according to user’s input, (2) iModel Tool as a platform for users to upload their own models to compose, and (3) SimCom Tool that provides a side by side comparison of models being composed in the same pathway. Finally, we provide a web site that hosts BioModels Database models and a separate web site that hosts SBML Test Suite models. Conclusions Model composition tool (and the other three tools) can be used with little or no knowledge of the SBML document structure. For this reason, students or anyone who wants to learn about systems biology will benefit from the described functionalities. SBML Test Suite models will be a nice starting point for beginners. And, for more advanced purposes, users will able to access and employ models of the BioModels Database as well. PMID:24006914

  13. Consistent structures and interactions by density functional theory with small atomic orbital basis sets.

    PubMed

    Grimme, Stefan; Brandenburg, Jan Gerit; Bannwarth, Christoph; Hansen, Andreas

    2015-08-07

    A density functional theory (DFT) based composite electronic structure approach is proposed to efficiently compute structures and interaction energies in large chemical systems. It is based on the well-known and numerically robust Perdew-Burke-Ernzerhoff (PBE) generalized-gradient-approximation in a modified global hybrid functional with a relatively large amount of non-local Fock-exchange. The orbitals are expanded in Ahlrichs-type valence-double zeta atomic orbital (AO) Gaussian basis sets, which are available for many elements. In order to correct for the basis set superposition error (BSSE) and to account for the important long-range London dispersion effects, our well-established atom-pairwise potentials are used. In the design of the new method, particular attention has been paid to an accurate description of structural parameters in various covalent and non-covalent bonding situations as well as in periodic systems. Together with the recently proposed three-fold corrected (3c) Hartree-Fock method, the new composite scheme (termed PBEh-3c) represents the next member in a hierarchy of "low-cost" electronic structure approaches. They are mainly free of BSSE and account for most interactions in a physically sound and asymptotically correct manner. PBEh-3c yields good results for thermochemical properties in the huge GMTKN30 energy database. Furthermore, the method shows excellent performance for non-covalent interaction energies in small and large complexes. For evaluating its performance on equilibrium structures, a new compilation of standard test sets is suggested. These consist of small (light) molecules, partially flexible, medium-sized organic molecules, molecules comprising heavy main group elements, larger systems with long bonds, 3d-transition metal systems, non-covalently bound complexes (S22 and S66×8 sets), and peptide conformations. For these sets, overall deviations from accurate reference data are smaller than for various other tested DFT methods and reach that of triple-zeta AO basis set second-order perturbation theory (MP2/TZ) level at a tiny fraction of computational effort. Periodic calculations conducted for molecular crystals to test structures (including cell volumes) and sublimation enthalpies indicate very good accuracy competitive to computationally more involved plane-wave based calculations. PBEh-3c can be applied routinely to several hundreds of atoms on a single processor and it is suggested as a robust "high-speed" computational tool in theoretical chemistry and physics.

  14. Update of aircraft profile data for the Integrated Noise Model computer program, vol. 3 : appendix B aircraft performance coefficients

    DOT National Transportation Integrated Search

    1992-03-01

    This report provides aircraft takeoff and landing profiles, : aircraft aerodynamic performance coefficients and engine : performance coefficients for the aircraft data base : (Database 9) in the Integrated Noise Model (INM) computer : program. Flight...

  15. Development of a Searchable Metabolite Database and Simulator of Xenobiotic Metabolism

    EPA Science Inventory

    A computational tool (MetaPath) has been developed for storage and analysis of metabolic pathways and associated metadata. The system is capable of sophisticated text and chemical structure/substructure searching as well as rapid comparison of metabolites formed across chemicals,...

  16. UNIX: A Tool for Information Management.

    ERIC Educational Resources Information Center

    Frey, Dean

    1989-01-01

    Describes UNIX, a computer operating system that supports multi-task and multi-user operations. Characteristics that make it especially suitable for library applications are discussed, including a hierarchical file structure and utilities for text processing, database activities, and bibliographic work. Sources of information on hardware…

  17. Accelerating simulation for the multiple-point statistics algorithm using vector quantization

    NASA Astrophysics Data System (ADS)

    Zuo, Chen; Pan, Zhibin; Liang, Hao

    2018-03-01

    Multiple-point statistics (MPS) is a prominent algorithm to simulate categorical variables based on a sequential simulation procedure. Assuming training images (TIs) as prior conceptual models, MPS extracts patterns from TIs using a template and records their occurrences in a database. However, complex patterns increase the size of the database and require considerable time to retrieve the desired elements. In order to speed up simulation and improve simulation quality over state-of-the-art MPS methods, we propose an accelerating simulation for MPS using vector quantization (VQ), called VQ-MPS. First, a variable representation is presented to make categorical variables applicable for vector quantization. Second, we adopt a tree-structured VQ to compress the database so that stationary simulations are realized. Finally, a transformed template and classified VQ are used to address nonstationarity. A two-dimensional (2D) stationary channelized reservoir image is used to validate the proposed VQ-MPS. In comparison with several existing MPS programs, our method exhibits significantly better performance in terms of computational time, pattern reproductions, and spatial uncertainty. Further demonstrations consist of a 2D four facies simulation, two 2D nonstationary channel simulations, and a three-dimensional (3D) rock simulation. The results reveal that our proposed method is also capable of solving multifacies, nonstationarity, and 3D simulations based on 2D TIs.

  18. Noise Radiation From a Leading-Edge Slat

    NASA Technical Reports Server (NTRS)

    Lockard, David P.; Choudhari, Meelan M.

    2009-01-01

    This paper extends our previous computations of unsteady flow within the slat cove region of a multi-element high-lift airfoil configuration, which showed that both statistical and structural aspects of the experimentally observed unsteady flow behavior can be captured via 3D simulations over a computational domain of narrow spanwise extent. Although such narrow domain simulation can account for the spanwise decorrelation of the slat cove fluctuations, the resulting database cannot be applied towards acoustic predictions of the slat without invoking additional approximations to synthesize the fluctuation field over the rest of the span. This deficiency is partially alleviated in the present work by increasing the spanwise extent of the computational domain from 37.3% of the slat chord to nearly 226% (i.e., 15% of the model span). The simulation database is used to verify consistency with previous computational results and, then, to develop predictions of the far-field noise radiation in conjunction with a frequency-domain Ffowcs-Williams Hawkings solver.

  19. Spatial distribution of clinical computer systems in primary care in England in 2016 and implications for primary care electronic medical record databases: a cross-sectional population study.

    PubMed

    Kontopantelis, Evangelos; Stevens, Richard John; Helms, Peter J; Edwards, Duncan; Doran, Tim; Ashcroft, Darren M

    2018-02-28

    UK primary care databases (PCDs) are used by researchers worldwide to inform clinical practice. These databases have been primarily tied to single clinical computer systems, but little is known about the adoption of these systems by primary care practices or their geographical representativeness. We explore the spatial distribution of clinical computing systems and discuss the implications for the longevity and regional representativeness of these resources. Cross-sectional study. English primary care clinical computer systems. 7526 general practices in August 2016. Spatial mapping of family practices in England in 2016 by clinical computer system at two geographical levels, the lower Clinical Commissioning Group (CCG, 209 units) and the higher National Health Service regions (14 units). Data for practices included numbers of doctors, nurses and patients, and area deprivation. Of 7526 practices, Egton Medical Information Systems (EMIS) was used in 4199 (56%), SystmOne in 2552 (34%) and Vision in 636 (9%). Great regional variability was observed for all systems, with EMIS having a stronger presence in the West of England, London and the South; SystmOne in the East and some regions in the South; and Vision in London, the South, Greater Manchester and Birmingham. PCDs based on single clinical computer systems are geographically clustered in England. For example, Clinical Practice Research Datalink and The Health Improvement Network, the most popular primary care databases in terms of research outputs, are based on the Vision clinical computer system, used by <10% of practices and heavily concentrated in three major conurbations and the South. Researchers need to be aware of the analytical challenges posed by clustering, and barriers to accessing alternative PCDs need to be removed. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  20. A database to enable discovery and design of piezoelectric materials

    PubMed Central

    de Jong, Maarten; Chen, Wei; Geerlings, Henry; Asta, Mark; Persson, Kristin Aslaug

    2015-01-01

    Piezoelectric materials are used in numerous applications requiring a coupling between electrical fields and mechanical strain. Despite the technological importance of this class of materials, for only a small fraction of all inorganic compounds which display compatible crystallographic symmetry, has piezoelectricity been characterized experimentally or computationally. In this work we employ first-principles calculations based on density functional perturbation theory to compute the piezoelectric tensors for nearly a thousand compounds, thereby increasing the available data for this property by more than an order of magnitude. The results are compared to select experimental data to establish the accuracy of the calculated properties. The details of the calculations are also presented, along with a description of the format of the database developed to make these computational results publicly available. In addition, the ways in which the database can be accessed and applied in materials development efforts are described. PMID:26451252

Top