A case study for a digital seabed database: Bohai Sea engineering geology database
NASA Astrophysics Data System (ADS)
Tianyun, Su; Shikui, Zhai; Baohua, Liu; Ruicai, Liang; Yanpeng, Zheng; Yong, Wang
2006-07-01
This paper discusses the designing plan of ORACLE-based Bohai Sea engineering geology database structure from requisition analysis, conceptual structure analysis, logical structure analysis, physical structure analysis and security designing. In the study, we used the object-oriented Unified Modeling Language (UML) to model the conceptual structure of the database and used the powerful function of data management which the object-oriented and relational database ORACLE provides to organize and manage the storage space and improve its security performance. By this means, the database can provide rapid and highly effective performance in data storage, maintenance and query to satisfy the application requisition of the Bohai Sea Oilfield Paradigm Area Information System.
NASA Technical Reports Server (NTRS)
Rasmussen, Robert; Bennett, Matthew
2006-01-01
The State Analysis Database Tool software establishes a productive environment for collaboration among software and system engineers engaged in the development of complex interacting systems. The tool embodies State Analysis, a model-based system engineering methodology founded on a state-based control architecture (see figure). A state represents a momentary condition of an evolving system, and a model may describe how a state evolves and is affected by other states. The State Analysis methodology is a process for capturing system and software requirements in the form of explicit models and states, and defining goal-based operational plans consistent with the models. Requirements, models, and operational concerns have traditionally been documented in a variety of system engineering artifacts that address different aspects of a mission s lifecycle. In State Analysis, requirements, models, and operations information are State Analysis artifacts that are consistent and stored in a State Analysis Database. The tool includes a back-end database, a multi-platform front-end client, and Web-based administrative functions. The tool is structured to prompt an engineer to follow the State Analysis methodology, to encourage state discovery and model description, and to make software requirements and operations plans consistent with model descriptions.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
Building a generalized distributed system model
NASA Technical Reports Server (NTRS)
Mukkamala, Ravi
1991-01-01
A number of topics related to building a generalized distributed system model are discussed. The effects of distributed database modeling on evaluation of transaction rollbacks, the measurement of effects of distributed database models on transaction availability measures, and a performance analysis of static locking in replicated distributed database systems are covered.
Validation of a common data model for active safety surveillance research
Ryan, Patrick B; Reich, Christian G; Hartzema, Abraham G; Stang, Paul E
2011-01-01
Objective Systematic analysis of observational medical databases for active safety surveillance is hindered by the variation in data models and coding systems. Data analysts often find robust clinical data models difficult to understand and ill suited to support their analytic approaches. Further, some models do not facilitate the computations required for systematic analysis across many interventions and outcomes for large datasets. Translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts' understanding and the suitability for large-scale systematic analysis. In addition to facilitating analysis, a suitable CDM has to faithfully represent the source observational database. Before beginning to use the Observational Medical Outcomes Partnership (OMOP) CDM and a related dictionary of standardized terminologies for a study of large-scale systematic active safety surveillance, the authors validated the model's suitability for this use by example. Validation by example To validate the OMOP CDM, the model was instantiated into a relational database, data from 10 different observational healthcare databases were loaded into separate instances, a comprehensive array of analytic methods that operate on the data model was created, and these methods were executed against the databases to measure performance. Conclusion There was acceptable representation of the data from 10 observational databases in the OMOP CDM using the standardized terminologies selected, and a range of analytic methods was developed and executed with sufficient performance to be useful for active safety surveillance. PMID:22037893
Developing a database for pedestrians' earthquake emergency evacuation in indoor scenarios.
Zhou, Junxue; Li, Sha; Nie, Gaozhong; Fan, Xiwei; Tan, Jinxian; Li, Huayue; Pang, Xiaoke
2018-01-01
With the booming development of evacuation simulation software, developing an extensive database in indoor scenarios for evacuation models is imperative. In this paper, we conduct a qualitative and quantitative analysis of the collected videotapes and aim to provide a complete and unitary database of pedestrians' earthquake emergency response behaviors in indoor scenarios, including human-environment interactions. Using the qualitative analysis method, we extract keyword groups and keywords that code the response modes of pedestrians and construct a general decision flowchart using chronological organization. Using the quantitative analysis method, we analyze data on the delay time, evacuation speed, evacuation route and emergency exit choices. Furthermore, we study the effect of classroom layout on emergency evacuation. The database for indoor scenarios provides reliable input parameters and allows the construction of real and effective constraints for use in software and mathematical models. The database can also be used to validate the accuracy of evacuation models.
2010-06-11
MODELING WITH IMPLEMENTED GBI AND MD DATA (STEADY STATE GB MIGRATION) PAGE 48 5. FORMATION AND ANALYSIS OF GB PROPERTIES DATABASE PAGE 53 5.1...Relative GB energy for specified GBM averaged on possible GBIs PAGE 53 5.2. Database validation on available experimental data PAGE 56 5.3. Comparison...PAGE 70 Fig. 6.11. MC Potts Rex. and GG software: (a) modeling volume analysis; (b) searching for GB energy value within included database . PAGE
Evaluation of Acoustic Propagation Paths into the Human Head
2005-07-25
paths. A 3D finite-element solid mesh was constructed using a digital image database of an adult male head. Finite-element analysis was used to model the...air-borne sound pressure amplitude) via the alternate propagation paths. A 3D finite-element solid mesh was constructed using a digital image database ... database of an adult male head Coupled acoustic-mechanical finite-element analysis (FEA) was used to model the wave propagation through the fluid-solid
The Monitoring Erosion of Agricultural Land and spatial database of erosion events
NASA Astrophysics Data System (ADS)
Kapicka, Jiri; Zizala, Daniel
2013-04-01
In 2011 originated in The Czech Republic The Monitoring Erosion of Agricultural Land as joint project of State Land Office (SLO) and Research Institute for Soil and Water Conservation (RISWC). The aim of the project is collecting and record keeping information about erosion events on agricultural land and their evaluation. The main idea is a creation of a spatial database that will be source of data and information for evaluation and modeling erosion process, for proposal of preventive measures and measures to reduce negative impacts of erosion events. A subject of monitoring is the manifestations of water erosion, wind erosion and slope deformation in which cause damaged agriculture land. A website, available on http://me.vumop.cz, is used as a tool for keeping and browsing information about monitored events. SLO employees carry out record keeping. RISWC is specialist institute in the Monitoring Erosion of Agricultural Land that performs keeping the spatial database, running the website, managing the record keeping of events, analysis the cause of origins events and statistical evaluations of keeping events and proposed measures. Records are inserted into the database using the user interface of the website which has map server as a component. Website is based on database technology PostgreSQL with superstructure PostGIS and MapServer UMN. Each record is in the database spatial localized by a drawing and it contains description information about character of event (data, situation description etc.) then there are recorded information about land cover and about grown crops. A part of database is photodocumentation which is taken in field reconnaissance which is performed within two days after notify of event. Another part of database are information about precipitations from accessible precipitation gauges. Website allows to do simple spatial analysis as are area calculation, slope calculation, percentage representation of GAEC etc.. Database structure was designed on the base of needs analysis inputs to mathematical models. Mathematical models are used for detailed analysis of chosen erosion events which include soil analysis. Till the end 2012 has had the database 135 events. The content of database still accrues and gives rise to the extensive source of data that is usable for testing mathematical models.
Pitkänen, Esa; Akerlund, Arto; Rantanen, Ari; Jouhten, Paula; Ukkonen, Esko
2008-08-25
ReMatch is a web-based, user-friendly tool that constructs stoichiometric network models for metabolic flux analysis, integrating user-developed models into a database collected from several comprehensive metabolic data resources, including KEGG, MetaCyc and CheBI. Particularly, ReMatch augments the metabolic reactions of the model with carbon mappings to facilitate (13)C metabolic flux analysis. The construction of a network model consisting of biochemical reactions is the first step in most metabolic modelling tasks. This model construction can be a tedious task as the required information is usually scattered to many separate databases whose interoperability is suboptimal, due to the heterogeneous naming conventions of metabolites in different databases. Another, particularly severe data integration problem is faced in (13)C metabolic flux analysis, where the mappings of carbon atoms from substrates into products in the model are required. ReMatch has been developed to solve the above data integration problems. First, ReMatch matches the imported user-developed model against the internal ReMatch database while considering a comprehensive metabolite name thesaurus. This, together with wild card support, allows the user to specify the model quickly without having to look the names up manually. Second, ReMatch is able to augment reactions of the model with carbon mappings, obtained either from the internal database or given by the user with an easy-touse tool. The constructed models can be exported into 13C-FLUX and SBML file formats. Further, a stoichiometric matrix and visualizations of the network model can be generated. The constructed models of metabolic networks can be optionally made available to the other users of ReMatch. Thus, ReMatch provides a common repository for metabolic network models with carbon mappings for the needs of metabolic flux analysis community. ReMatch is freely available for academic use at http://www.cs.helsinki.fi/group/sysfys/software/rematch/.
Advanced Technology Lifecycle Analysis System (ATLAS) Technology Tool Box (TTB)
NASA Technical Reports Server (NTRS)
Doyle, Monica; ONeil, Daniel A.; Christensen, Carissa B.
2005-01-01
The Advanced Technology Lifecycle Analysis System (ATLAS) is a decision support tool designed to aid program managers and strategic planners in determining how to invest technology research and development dollars. It is an Excel-based modeling package that allows a user to build complex space architectures and evaluate the impact of various technology choices. ATLAS contains system models, cost and operations models, a campaign timeline and a centralized technology database. Technology data for all system models is drawn from a common database, the ATLAS Technology Tool Box (TTB). The TTB provides a comprehensive, architecture-independent technology database that is keyed to current and future timeframes.
A Community Data Model for Hydrologic Observations
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Horsburgh, J. S.; Zaslavsky, I.; Maidment, D. R.; Valentine, D.; Jennings, B.
2006-12-01
The CUAHSI Hydrologic Information System project is developing information technology infrastructure to support hydrologic science. Hydrologic information science involves the description of hydrologic environments in a consistent way, using data models for information integration. This includes a hydrologic observations data model for the storage and retrieval of hydrologic observations in a relational database designed to facilitate data retrieval for integrated analysis of information collected by multiple investigators. It is intended to provide a standard format to facilitate the effective sharing of information between investigators and to facilitate analysis of information within a single study area or hydrologic observatory, or across hydrologic observatories and regions. The observations data model is designed to store hydrologic observations and sufficient ancillary information (metadata) about the observations to allow them to be unambiguously interpreted and used and provide traceable heritage from raw measurements to usable information. The design is based on the premise that a relational database at the single observation level is most effective for providing querying capability and cross dimension data retrieval and analysis. This premise is being tested through the implementation of a prototype hydrologic observations database, and the development of web services for the retrieval of data from and ingestion of data into the database. These web services hosted by the San Diego Supercomputer center make data in the database accessible both through a Hydrologic Data Access System portal and directly from applications software such as Excel, Matlab and ArcGIS that have Standard Object Access Protocol (SOAP) capability. This paper will (1) describe the data model; (2) demonstrate the capability for representing diverse data in the same database; (3) demonstrate the use of the database from applications software for the performance of hydrologic analysis across different observation types.
Compartmental and Data-Based Modeling of Cerebral Hemodynamics: Linear Analysis.
Henley, B C; Shin, D C; Zhang, R; Marmarelis, V Z
Compartmental and data-based modeling of cerebral hemodynamics are alternative approaches that utilize distinct model forms and have been employed in the quantitative study of cerebral hemodynamics. This paper examines the relation between a compartmental equivalent-circuit and a data-based input-output model of dynamic cerebral autoregulation (DCA) and CO2-vasomotor reactivity (DVR). The compartmental model is constructed as an equivalent-circuit utilizing putative first principles and previously proposed hypothesis-based models. The linear input-output dynamics of this compartmental model are compared with data-based estimates of the DCA-DVR process. This comparative study indicates that there are some qualitative similarities between the two-input compartmental model and experimental results.
Observational database for studies of nearby universe
NASA Astrophysics Data System (ADS)
Kaisina, E. I.; Makarov, D. I.; Karachentsev, I. D.; Kaisin, S. S.
2012-01-01
We present the description of a database of galaxies of the Local Volume (LVG), located within 10 Mpc around the Milky Way. It contains more than 800 objects. Based on an analysis of functional capabilities, we used the PostgreSQL DBMS as a management system for our LVG database. Applying semantic modelling methods, we developed a physical ER-model of the database. We describe the developed architecture of the database table structure, and the implemented web-access, available at http://www.sao.ru/lv/lvgdb.
Advanced Traffic Management Systems (ATMS) research analysis database system
DOT National Transportation Integrated Search
2001-06-01
The ATMS Research Analysis Database Systems (ARADS) consists of a Traffic Software Data Dictionary (TSDD) and a Traffic Software Object Model (TSOM) for application to microscopic traffic simulation and signal optimization domains. The purpose of thi...
Analysis and fit of stellar spectra using a mega-database of CMFGEN models
NASA Astrophysics Data System (ADS)
Fierro-Santillán, Celia; Zsargó, Janos; Klapp, Jaime; Díaz-Azuara, Santiago Alfredo; Arrieta, Anabel; Arias, Lorena
2017-11-01
We present a tool for analysis and fit of stellar spectra using a mega database of 15,000 atmosphere models for OB stars. We have developed software tools, which allow us to find the model that best fits to an observed spectrum, comparing equivalent widths and line ratios in the observed spectrum with all models of the database. We use the Hα, Hβ, Hγ, and Hδ lines as criterion of stellar gravity and ratios of He II λ4541/He I λ4471, He II λ4200/(He I+He II λ4026), He II λ4541/He I λ4387, and He II λ4200/He I λ4144 as criterion of T eff.
Alaska IPASS database preparation manual.
P. McHugh; D. Olson; C. Schallau
1989-01-01
Describes the data, their sources, and the calibration procedures used in compiling a database for the Alaska IPASS (interactive policy analysis simulation system) model. Although this manual is for Alaska, it provides generic instructions for analysts preparing databases for other geographical areas.
Assessment of the SFC database for analysis and modeling
NASA Technical Reports Server (NTRS)
Centeno, Martha A.
1994-01-01
SFC is one of the four clusters that make up the Integrated Work Control System (IWCS), which will integrate the shuttle processing databases at Kennedy Space Center (KSC). The IWCS framework will enable communication among the four clusters and add new data collection protocols. The Shop Floor Control (SFC) module has been operational for two and a half years; however, at this stage, automatic links to the other 3 modules have not been implemented yet, except for a partial link to IOS (CASPR). SFC revolves around a DB/2 database with PFORMS acting as the database management system (DBMS). PFORMS is an off-the-shelf DB/2 application that provides a set of data entry screens and query forms. The main dynamic entity in the SFC and IOS database is a task; thus, the physical storage location and update privileges are driven by the status of the WAD. As we explored the SFC values, we realized that there was much to do before actually engaging in continuous analysis of the SFC data. Half way into this effort, it was realized that full scale analysis would have to be a future third phase of this effort. So, we concentrated on getting to know the contents of the database, and in establishing an initial set of tools to start the continuous analysis process. Specifically, we set out to: (1) provide specific procedures for statistical models, so as to enhance the TP-OAO office analysis and modeling capabilities; (2) design a data exchange interface; (3) prototype the interface to provide inputs to SCRAM; and (4) design a modeling database. These objectives were set with the expectation that, if met, they would provide former TP-OAO engineers with tools that would help them demonstrate the importance of process-based analyses. The latter, in return, will help them obtain the cooperation of various organizations in charting out their individual processes.
Building a Database for a Quantitative Model
NASA Technical Reports Server (NTRS)
Kahn, C. Joseph; Kleinhammer, Roger
2014-01-01
A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.
Analysis of a virtual memory model for maintaining database views
NASA Technical Reports Server (NTRS)
Kinsley, Kathryn C.; Hughes, Charles E.
1992-01-01
This paper presents an analytical model for predicting the performance of a new support strategy for database views. This strategy, called the virtual method, is compared with traditional methods for supporting views. The analytical model's predictions of improved performance by the virtual method are then validated by comparing these results with those achieved in an experimental implementation.
International forensic automotive paint database
NASA Astrophysics Data System (ADS)
Bishea, Gregory A.; Buckle, Joe L.; Ryland, Scott G.
1999-02-01
The Technical Working Group for Materials Analysis (TWGMAT) is supporting an international forensic automotive paint database. The Federal Bureau of Investigation and the Royal Canadian Mounted Police (RCMP) are collaborating on this effort through TWGMAT. This paper outlines the support and further development of the RCMP's Automotive Paint Database, `Paint Data Query'. This cooperative agreement augments and supports a current, validated, searchable, automotive paint database that is used to identify make(s), model(s), and year(s) of questioned paint samples in hit-and-run fatalities and other associated investigations involving automotive paint.
Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach
Ponce-de-Leon, Miguel; Calle-Espinosa, Jorge; Peretó, Juli; Montero, Francisco
2015-01-01
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information. PMID:26629901
NASA Astrophysics Data System (ADS)
Ehlmann, Bryon K.
Current scientific experiments are often characterized by massive amounts of very complex data and the need for complex data analysis software. Object-oriented database (OODB) systems have the potential of improving the description of the structure and semantics of this data and of integrating the analysis software with the data. This dissertation results from research to enhance OODB functionality and methodology to support scientific databases (SDBs) and, more specifically, to support a nuclear physics experiments database for the Continuous Electron Beam Accelerator Facility (CEBAF). This research to date has identified a number of problems related to the practical application of OODB technology to the conceptual design of the CEBAF experiments database and other SDBs: the lack of a generally accepted OODB design methodology, the lack of a standard OODB model, the lack of a clear conceptual level in existing OODB models, and the limited support in existing OODB systems for many common object relationships inherent in SDBs. To address these problems, the dissertation describes an Object-Relationship Diagram (ORD) and an Object-oriented Database Definition Language (ODDL) that provide tools that allow SDB design and development to proceed systematically and independently of existing OODB systems. These tools define multi-level, conceptual data models for SDB design, which incorporate a simple notation for describing common types of relationships that occur in SDBs. ODDL allows these relationships and other desirable SDB capabilities to be supported by an extended OODB system. A conceptual model of the CEBAF experiments database is presented in terms of ORDs and the ODDL to demonstrate their functionality and use and provide a foundation for future development of experimental nuclear physics software using an OODB approach.
NASA Technical Reports Server (NTRS)
ONeil, D. A.; Mankins, J. C.; Christensen, C. B.; Gresham, E. C.
2005-01-01
The Advanced Technology Lifecycle Analysis System (ATLAS), a spreadsheet analysis tool suite, applies parametric equations for sizing and lifecycle cost estimation. Performance, operation, and programmatic data used by the equations come from a Technology Tool Box (TTB) database. In this second TTB Technical Interchange Meeting (TIM), technologists, system model developers, and architecture analysts discussed methods for modeling technology decisions in spreadsheet models, identified specific technology parameters, and defined detailed development requirements. This Conference Publication captures the consensus of the discussions and provides narrative explanations of the tool suite, the database, and applications of ATLAS within NASA s changing environment.
NASA Astrophysics Data System (ADS)
Radhakrishnan, A.; Balaji, V.; Schweitzer, R.; Nikonov, S.; O'Brien, K.; Vahlenkamp, H.; Burger, E. F.
2016-12-01
There are distinct phases in the development cycle of an Earth system model. During the model development phase, scientists make changes to code and parameters and require rapid access to results for evaluation. During the production phase, scientists may make an ensemble of runs with different settings, and produce large quantities of output, that must be further analyzed and quality controlled for scientific papers and submission to international projects such as the Climate Model Intercomparison Project (CMIP). During this phase, provenance is a key concern:being able to track back from outputs to inputs. We will discuss one of the paths taken at GFDL in delivering tools across this lifecycle, offering on-demand analysis of data by integrating the use of GFDL's in-house FRE-Curator, Unidata's THREDDS and NOAA PMEL's Live Access Servers (LAS).Experience over this lifecycle suggests that a major difficulty in developing analysis capabilities is only partially the scientific content, but often devoted to answering the questions "where is the data?" and "how do I get to it?". "FRE-Curator" is the name of a database-centric paradigm used at NOAA GFDL to ingest information about the model runs into an RDBMS (Curator database). The components of FRE-Curator are integrated into Flexible Runtime Environment workflow and can be invoked during climate model simulation. The front end to FRE-Curator, known as the Model Development Database Interface (MDBI) provides an in-house web-based access to GFDL experiments: metadata, analysis output and more. In order to provide on-demand visualization, MDBI uses Live Access Servers which is a highly configurable web server designed to provide flexible access to geo-referenced scientific data, that makes use of OPeNDAP. Model output saved in GFDL's tape archive, the size of the database and experiments, continuous model development initiatives with more dynamic configurations add complexity and challenges in providing an on-demand visualization experience to our GFDL users.
IMPROVED SEARCH OF PRINCIPAL COMPONENT ANALYSIS DATABASES FOR SPECTRO-POLARIMETRIC INVERSION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Casini, R.; Lites, B. W.; Ramos, A. Asensio
2013-08-20
We describe a simple technique for the acceleration of spectro-polarimetric inversions based on principal component analysis (PCA) of Stokes profiles. This technique involves the indexing of the database models based on the sign of the projections (PCA coefficients) of the first few relevant orders of principal components of the four Stokes parameters. In this way, each model in the database can be attributed a distinctive binary number of 2{sup 4n} bits, where n is the number of PCA orders used for the indexing. Each of these binary numbers (indices) identifies a group of ''compatible'' models for the inversion of amore » given set of observed Stokes profiles sharing the same index. The complete set of the binary numbers so constructed evidently determines a partition of the database. The search of the database for the PCA inversion of spectro-polarimetric data can profit greatly from this indexing. In practical cases it becomes possible to approach the ideal acceleration factor of 2{sup 4n} as compared to the systematic search of a non-indexed database for a traditional PCA inversion. This indexing method relies on the existence of a physical meaning in the sign of the PCA coefficients of a model. For this reason, the presence of model ambiguities and of spectro-polarimetric noise in the observations limits in practice the number n of relevant PCA orders that can be used for the indexing.« less
Researchers in the National Exposure Research Laboratory (NERL) have performed a number of large human exposure measurement studies during the past decade. It is the goal of the NERL to make the data available to other researchers for analysis in order to further the scientific ...
dbMDEGA: a database for meta-analysis of differentially expressed genes in autism spectrum disorder.
Zhang, Shuyun; Deng, Libin; Jia, Qiyue; Huang, Shaoting; Gu, Junwang; Zhou, Fankun; Gao, Meng; Sun, Xinyi; Feng, Chang; Fan, Guangqin
2017-11-16
Autism spectrum disorders (ASD) are hereditary, heterogeneous and biologically complex neurodevelopmental disorders. Individual studies on gene expression in ASD cannot provide clear consensus conclusions. Therefore, a systematic review to synthesize the current findings from brain tissues and a search tool to share the meta-analysis results are urgently needed. Here, we conducted a meta-analysis of brain gene expression profiles in the current reported human ASD expression datasets (with 84 frozen male cortex samples, 17 female cortex samples, 32 cerebellum samples and 4 formalin fixed samples) and knock-out mouse ASD model expression datasets (with 80 collective brain samples). Then, we applied R language software and developed an interactive shared and updated database (dbMDEGA) displaying the results of meta-analysis of data from ASD studies regarding differentially expressed genes (DEGs) in the brain. This database, dbMDEGA ( https://dbmdega.shinyapps.io/dbMDEGA/ ), is a publicly available web-portal for manual annotation and visualization of DEGs in the brain from data from ASD studies. This database uniquely presents meta-analysis values and homologous forest plots of DEGs in brain tissues. Gene entries are annotated with meta-values, statistical values and forest plots of DEGs in brain samples. This database aims to provide searchable meta-analysis results based on the current reported brain gene expression datasets of ASD to help detect candidate genes underlying this disorder. This new analytical tool may provide valuable assistance in the discovery of DEGs and the elucidation of the molecular pathogenicity of ASD. This database model may be replicated to study other disorders.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases
Truong, Kevin; Ikura, Mitsuhiko
2003-01-01
Background Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. Results This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at . Conclusion As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time. PMID:12734020
Detailed Uncertainty Analysis of the Ares I A106 Liftoff/Transition Database
NASA Technical Reports Server (NTRS)
Hanke, Jeremy L.
2011-01-01
The Ares I A106 Liftoff/Transition Force and Moment Aerodynamics Database describes the aerodynamics of the Ares I Crew Launch Vehicle (CLV) from the moment of liftoff through the transition from high to low total angles of attack at low subsonic Mach numbers. The database includes uncertainty estimates that were developed using a detailed uncertainty quantification procedure. The Ares I Aerodynamics Panel developed both the database and the uncertainties from wind tunnel test data acquired in the NASA Langley Research Center s 14- by 22-Foot Subsonic Wind Tunnel Test 591 using a 1.75 percent scale model of the Ares I and the tower assembly. The uncertainty modeling contains three primary uncertainty sources: experimental uncertainty, database modeling uncertainty, and database query interpolation uncertainty. The final database and uncertainty model represent a significant improvement in the quality of the aerodynamic predictions for this regime of flight over the estimates previously used by the Ares Project. The maximum possible aerodynamic force pushing the vehicle towards the launch tower assembly in a dispersed case using this database saw a 40 percent reduction from the worst-case scenario in previously released data for Ares I.
Interactive Scene Analysis Module - A sensor-database fusion system for telerobotic environments
NASA Technical Reports Server (NTRS)
Cooper, Eric G.; Vazquez, Sixto L.; Goode, Plesent W.
1992-01-01
Accomplishing a task with telerobotics typically involves a combination of operator control/supervision and a 'script' of preprogrammed commands. These commands usually assume that the location of various objects in the task space conform to some internal representation (database) of that task space. The ability to quickly and accurately verify the task environment against the internal database would improve the robustness of these preprogrammed commands. In addition, the on-line initialization and maintenance of a task space database is difficult for operators using Cartesian coordinates alone. This paper describes the Interactive Scene' Analysis Module (ISAM) developed to provide taskspace database initialization and verification utilizing 3-D graphic overlay modelling, video imaging, and laser radar based range imaging. Through the fusion of taskspace database information and image sensor data, a verifiable taskspace model is generated providing location and orientation data for objects in a task space. This paper also describes applications of the ISAM in the Intelligent Systems Research Laboratory (ISRL) at NASA Langley Research Center, and discusses its performance relative to representation accuracy and operator interface efficiency.
Amber Vanden Wymelenberg; Patrick Minges; Grzegorz Sabat; Diego Martinez; Andrea Aerts; Asaf Salamov; Igor Grigoriev; Harris Shapiro; Nik Putnam; Paula Belinky; Carlos Dosoretz; Jill Gaskell; Phil Kersten; Dan Cullen
2006-01-01
The white-rot basidiomycete Phanerochaete chrysosporium employs extracellular enzymes to completely degrade the major polymers of wood: cellulose, hemicellulose, and lignin. Analysis of a total of 10,048 v2.1 gene models predicts 769 secreted proteins, a substantial increase over the 268 models identified in the earlier database (v1.0). Within the v2.1 âcomputational...
PathCase-SB architecture and database design
2011-01-01
Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889
Recent NASA Wake-Vortex Flight Tests, Flow-Physics Database and Wake-Development Analysis
NASA Technical Reports Server (NTRS)
Vicroy, Dan D.; Vijgen, Paul M.; Reimer, Heidi M.; Gallegos, Joey L.; Spalart, Philippe R.
1998-01-01
A series of flight tests over the ocean of a four engine turboprop airplane in the cruise configuration have provided a data set for improved understanding of wake vortex physics and atmospheric interaction. An integrated database has been compiled for wake characterization and validation of wake-vortex computational models. This paper describes the wake-vortex flight tests, the data processing, the database development and access, and results obtained from preliminary wake-characterization analysis using the data sets.
Ezra Tsur, Elishai
2017-01-01
Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.
A data analysis expert system for large established distributed databases
NASA Technical Reports Server (NTRS)
Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick
1987-01-01
A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.
PGSB/MIPS PlantsDB Database Framework for the Integration and Analysis of Plant Genome Data.
Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai; Gundlach, Heidrun; Mayer, Klaus F X
2017-01-01
Plant Genome and Systems Biology (PGSB), formerly Munich Institute for Protein Sequences (MIPS) PlantsDB, is a database framework for the integration and analysis of plant genome data, developed and maintained for more than a decade now. Major components of that framework are genome databases and analysis resources focusing on individual (reference) genomes providing flexible and intuitive access to data. Another main focus is the integration of genomes from both model and crop plants to form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny). Data exchange and integrated search functionality with/over many plant genome databases is provided within the transPLANT project.
NASA Technical Reports Server (NTRS)
Wang, Yi; Pant, Kapil; Brenner, Martin J.; Ouellette, Jeffrey A.
2018-01-01
This paper presents a data analysis and modeling framework to tailor and develop linear parameter-varying (LPV) aeroservoelastic (ASE) model database for flexible aircrafts in broad 2D flight parameter space. The Kriging surrogate model is constructed using ASE models at a fraction of grid points within the original model database, and then the ASE model at any flight condition can be obtained simply through surrogate model interpolation. The greedy sampling algorithm is developed to select the next sample point that carries the worst relative error between the surrogate model prediction and the benchmark model in the frequency domain among all input-output channels. The process is iterated to incrementally improve surrogate model accuracy till a pre-determined tolerance or iteration budget is met. The methodology is applied to the ASE model database of a flexible aircraft currently being tested at NASA/AFRC for flutter suppression and gust load alleviation. Our studies indicate that the proposed method can reduce the number of models in the original database by 67%. Even so the ASE models obtained through Kriging interpolation match the model in the original database constructed directly from the physics-based tool with the worst relative error far below 1%. The interpolated ASE model exhibits continuously-varying gains along a set of prescribed flight conditions. More importantly, the selected grid points are distributed non-uniformly in the parameter space, a) capturing the distinctly different dynamic behavior and its dependence on flight parameters, and b) reiterating the need and utility for adaptive space sampling techniques for ASE model database compaction. The present framework is directly extendible to high-dimensional flight parameter space, and can be used to guide the ASE model development, model order reduction, robust control synthesis and novel vehicle design of flexible aircraft.
Plant Reactome: a resource for plant pathways and comparative analysis
Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj
2017-01-01
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469
Liu, Jie; Zhang, Fu-Dong; Teng, Fei; Li, Jun; Wang, Zhi-Hong
2014-10-01
In order to in-situ detect the oil yield of oil shale, based on portable near infrared spectroscopy analytical technology, with 66 rock core samples from No. 2 well drilling of Fuyu oil shale base in Jilin, the modeling and analyzing methods for in-situ detection were researched. By the developed portable spectrometer, 3 data formats (reflectance, absorbance and K-M function) spectra were acquired. With 4 different modeling data optimization methods: principal component-mahalanobis distance (PCA-MD) for eliminating abnormal samples, uninformative variables elimination (UVE) for wavelength selection and their combina- tions: PCA-MD + UVE and UVE + PCA-MD, 2 modeling methods: partial least square (PLS) and back propagation artificial neural network (BPANN), and the same data pre-processing, the modeling and analyzing experiment were performed to determine the optimum analysis model and method. The results show that the data format, modeling data optimization method and modeling method all affect the analysis precision of model. Results show that whether or not using the optimization method, reflectance or K-M function is the proper spectrum format of the modeling database for two modeling methods. Using two different modeling methods and four different data optimization methods, the model precisions of the same modeling database are different. For PLS modeling method, the PCA-MD and UVE + PCA-MD data optimization methods can improve the modeling precision of database using K-M function spectrum data format. For BPANN modeling method, UVE, UVE + PCA-MD and PCA- MD + UVE data optimization methods can improve the modeling precision of database using any of the 3 spectrum data formats. In addition to using the reflectance spectra and PCA-MD data optimization method, modeling precision by BPANN method is better than that by PLS method. And modeling with reflectance spectra, UVE optimization method and BPANN modeling method, the model gets the highest analysis precision, its correlation coefficient (Rp) is 0.92, and its standard error of prediction (SEP) is 0.69%.
NASA Technical Reports Server (NTRS)
1996-01-01
Topics considered include: New approach to turbulence modeling; Second moment closure analysis of the backstep flow database; Prediction of the backflow and recovery regions in the backward facing step at various Reynolds numbers; Turbulent flame propagation in partially premixed flames; Ensemble averaged dynamic modeling. Also included a study of the turbulence structures of wall-bounded shear flows; Simulation and modeling of the elliptic streamline flow.
The Use of a Relational Database in Qualitative Research on Educational Computing.
ERIC Educational Resources Information Center
Winer, Laura R.; Carriere, Mario
1990-01-01
Discusses the use of a relational database as a data management and analysis tool for nonexperimental qualitative research, and describes the use of the Reflex Plus database in the Vitrine 2001 project in Quebec to study computer-based learning environments. Information systems are also discussed, and the use of a conceptual model is explained.…
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
NASA Astrophysics Data System (ADS)
Dziedzic, Adam; Mulawka, Jan
2014-11-01
NoSQL is a new approach to data storage and manipulation. The aim of this paper is to gain more insight into NoSQL databases, as we are still in the early stages of understanding when to use them and how to use them in an appropriate way. In this submission descriptions of selected NoSQL databases are presented. Each of the databases is analysed with primary focus on its data model, data access, architecture and practical usage in real applications. Furthemore, the NoSQL databases are compared in fields of data references. The relational databases offer foreign keys, whereas NoSQL databases provide us with limited references. An intermediate model between graph theory and relational algebra which can address the problem should be created. Finally, the proposal of a new approach to the problem of inconsistent references in Big Data storage systems is introduced.
Wang, Julia; Al-Ouran, Rami; Hu, Yanhui; Kim, Seon-Young; Wan, Ying-Wooi; Wangler, Michael F; Yamamoto, Shinya; Chao, Hsiao-Tuan; Comjean, Aram; Mohr, Stephanie E; Perrimon, Norbert; Liu, Zhandong; Bellen, Hugo J
2017-06-01
One major challenge encountered with interpreting human genetic variants is the limited understanding of the functional impact of genetic alterations on biological processes. Furthermore, there remains an unmet demand for an efficient survey of the wealth of information on human homologs in model organisms across numerous databases. To efficiently assess the large volume of publically available information, it is important to provide a concise summary of the most relevant information in a rapid user-friendly format. To this end, we created MARRVEL (model organism aggregated resources for rare variant exploration). MARRVEL is a publicly available website that integrates information from six human genetic databases and seven model organism databases. For any given variant or gene, MARRVEL displays information from OMIM, ExAC, ClinVar, Geno2MP, DGV, and DECIPHER. Importantly, it curates model organism-specific databases to concurrently display a concise summary regarding the human gene homologs in budding and fission yeast, worm, fly, fish, mouse, and rat on a single webpage. Experiment-based information on tissue expression, protein subcellular localization, biological process, and molecular function for the human gene and homologs in the seven model organisms are arranged into a concise output. Hence, rather than visiting multiple separate databases for variant and gene analysis, users can obtain important information by searching once through MARRVEL. Altogether, MARRVEL dramatically improves efficiency and accessibility to data collection and facilitates analysis of human genes and variants by cross-disciplinary integration of 18 million records available in public databases to facilitate clinical diagnosis and basic research. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Domain fusion analysis by applying relational algebra to protein sequence and domain databases.
Truong, Kevin; Ikura, Mitsuhiko
2003-05-06
Domain fusion analysis is a useful method to predict functionally linked proteins that may be involved in direct protein-protein interactions or in the same metabolic or signaling pathway. As separate domain databases like BLOCKS, PROSITE, Pfam, SMART, PRINTS-S, ProDom, TIGRFAMs, and amalgamated domain databases like InterPro continue to grow in size and quality, a computational method to perform domain fusion analysis that leverages on these efforts will become increasingly powerful. This paper proposes a computational method employing relational algebra to find domain fusions in protein sequence databases. The feasibility of this method was illustrated on the SWISS-PROT+TrEMBL sequence database using domain predictions from the Pfam HMM (hidden Markov model) database. We identified 235 and 189 putative functionally linked protein partners in H. sapiens and S. cerevisiae, respectively. From scientific literature, we were able to confirm many of these functional linkages, while the remainder offer testable experimental hypothesis. Results can be viewed at http://calcium.uhnres.utoronto.ca/pi. As the analysis can be computed quickly on any relational database that supports standard SQL (structured query language), it can be dynamically updated along with the sequence and domain databases, thereby improving the quality of predictions over time.
Mathematical models for exploring different aspects of genotoxicity and carcinogenicity databases.
Benigni, R; Giuliani, A
1991-12-01
One great obstacle to understanding and using the information contained in the genotoxicity and carcinogenicity databases is the very size of such databases. Their vastness makes them difficult to read; this leads to inadequate exploitation of the information, which becomes costly in terms of time, labor, and money. In its search for adequate approaches to the problem, the scientific community has, curiously, almost entirely neglected an existent series of very powerful methods of data analysis: the multivariate data analysis techniques. These methods were specifically designed for exploring large data sets. This paper presents the multivariate techniques and reports a number of applications to genotoxicity problems. These studies show how biology and mathematical modeling can be combined and how successful this combination is.
Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model
2014-04-01
time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest
CyanOmics: an integrated database of omics for the model cyanobacterium Synechococcus sp. PCC 7002.
Yang, Yaohua; Feng, Jie; Li, Tao; Ge, Feng; Zhao, Jindong
2015-01-01
Cyanobacteria are an important group of organisms that carry out oxygenic photosynthesis and play vital roles in both the carbon and nitrogen cycles of the Earth. The annotated genome of Synechococcus sp. PCC 7002, as an ideal model cyanobacterium, is available. A series of transcriptomic and proteomic studies of Synechococcus sp. PCC 7002 cells grown under different conditions have been reported. However, no database of such integrated omics studies has been constructed. Here we present CyanOmics, a database based on the results of Synechococcus sp. PCC 7002 omics studies. CyanOmics comprises one genomic dataset, 29 transcriptomic datasets and one proteomic dataset and should prove useful for systematic and comprehensive analysis of all those data. Powerful browsing and searching tools are integrated to help users directly access information of interest with enhanced visualization of the analytical results. Furthermore, Blast is included for sequence-based similarity searching and Cluster 3.0, as well as the R hclust function is provided for cluster analyses, to increase CyanOmics's usefulness. To the best of our knowledge, it is the first integrated omics analysis database for cyanobacteria. This database should further understanding of the transcriptional patterns, and proteomic profiling of Synechococcus sp. PCC 7002 and other cyanobacteria. Additionally, the entire database framework is applicable to any sequenced prokaryotic genome and could be applied to other integrated omics analysis projects. Database URL: http://lag.ihb.ac.cn/cyanomics. © The Author(s) 2015. Published by Oxford University Press.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions
Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.
2014-01-01
Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498
Detailed Uncertainty Analysis for Ares I Ascent Aerodynamics Wind Tunnel Database
NASA Technical Reports Server (NTRS)
Hemsch, Michael J.; Hanke, Jeremy L.; Walker, Eric L.; Houlden, Heather P.
2008-01-01
A detailed uncertainty analysis for the Ares I ascent aero 6-DOF wind tunnel database is described. While the database itself is determined using only the test results for the latest configuration, the data used for the uncertainty analysis comes from four tests on two different configurations at the Boeing Polysonic Wind Tunnel in St. Louis and the Unitary Plan Wind Tunnel at NASA Langley Research Center. Four major error sources are considered: (1) systematic errors from the balance calibration curve fits and model + balance installation, (2) run-to-run repeatability, (3) boundary-layer transition fixing, and (4) tunnel-to-tunnel reproducibility.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kruger, Albert A.; Muller, I.; Gilbo, K.
2013-11-13
The objectives of this work are aimed at the development of enhanced LAW propertycomposition models that expand the composition region covered by the models. The models of interest include PCT, VHT, viscosity and electrical conductivity. This is planned as a multi-year effort that will be performed in phases with the objectives listed below for the current phase. Incorporate property- composition data from the new glasses into the database. Assess the database and identify composition spaces in the database that need augmentation. Develop statistically-designed composition matrices to cover the composition regions identified in the above analysis. Preparemore » crucible melts of glass compositions from the statistically-designed composition matrix and measure the properties of interest. Incorporate the above property-composition data into the database. Assess existing models against the complete dataset and, as necessary, start development of new models.« less
Expert system development for commonality analysis in space programs
NASA Technical Reports Server (NTRS)
Yeager, Dorian P.
1987-01-01
This report is a combination of foundational mathematics and software design. A mathematical model of the Commonality Analysis problem was developed and some important properties discovered. The complexity of the problem is described herein and techniques, both deterministic and heuristic, for reducing that complexity are presented. Weaknesses are pointed out in the existing software (System Commonality Analysis Tool) and several improvements are recommended. It is recommended that: (1) an expert system for guiding the design of new databases be developed; (2) a distributed knowledge base be created and maintained for the purpose of encoding the commonality relationships between design items in commonality databases; (3) a software module be produced which automatically generates commonality alternative sets from commonality databases using the knowledge associated with those databases; and (4) a more complete commonality analysis module be written which is capable of generating any type of feasible solution.
Comparative and Predictive Multimedia Assessments Using Monte Carlo Uncertainty Analyses
NASA Astrophysics Data System (ADS)
Whelan, G.
2002-05-01
Multiple-pathway frameworks (sometimes referred to as multimedia models) provide a platform for combining medium-specific environmental models and databases, such that they can be utilized in a more holistic assessment of contaminant fate and transport in the environment. These frameworks provide a relatively seamless transfer of information from one model to the next and from databases to models. Within these frameworks, multiple models are linked, resulting in models that consume information from upstream models and produce information to be consumed by downstream models. The Framework for Risk Analysis in Multimedia Environmental Systems (FRAMES) is an example, which allows users to link their models to other models and databases. FRAMES is an icon-driven, site-layout platform that is an open-architecture, object-oriented system that interacts with environmental databases; helps the user construct a Conceptual Site Model that is real-world based; allows the user to choose the most appropriate models to solve simulation requirements; solves the standard risk paradigm of release transport and fate; and exposure/risk assessments to people and ecology; and presents graphical packages for analyzing results. FRAMES is specifically designed allow users to link their own models into a system, which contains models developed by others. This paper will present the use of FRAMES to evaluate potential human health exposures using real site data and realistic assumptions from sources, through the vadose and saturated zones, to exposure and risk assessment at three real-world sites, using the Multimedia Environmental Pollutant Assessment System (MEPAS), which is a multimedia model contained within FRAMES. These real-world examples use predictive and comparative approaches coupled with a Monte Carlo analysis. A predictive analysis is where models are calibrated to monitored site data, prior to the assessment, and a comparative analysis is where models are not calibrated but based solely on literature or judgement and is usually used to compare alternatives. In many cases, a combination is employed where the model is calibrated to a portion of the data (e.g., to determine hydrodynamics), then used to compare alternatives. Three subsurface-based multimedia examples are presented, increasing in complexity. The first presents the application of a predictive, deterministic assessment; the second presents a predictive and comparative, Monte Carlo analysis; and the third presents a comparative, multi-dimensional Monte Carlo analysis. Endpoints are typically presented in terms of concentration, hazard, risk, and dose, and because the vadose zone model typically represents a connection between a source and the aquifer, it does not generally represent the final medium in a multimedia risk assessment.
Scholl, Joep H G; van Hunsel, Florence P A M; Hak, Eelko; van Puijenbroek, Eugène P
2018-02-01
The statistical screening of pharmacovigilance databases containing spontaneously reported adverse drug reactions (ADRs) is mainly based on disproportionality analysis. The aim of this study was to improve the efficiency of full database screening using a prediction model-based approach. A logistic regression-based prediction model containing 5 candidate predictors was developed and internally validated using the Summary of Product Characteristics as the gold standard for the outcome. All drug-ADR associations, with the exception of those related to vaccines, with a minimum of 3 reports formed the training data for the model. Performance was based on the area under the receiver operating characteristic curve (AUC). Results were compared with the current method of database screening based on the number of previously analyzed associations. A total of 25 026 unique drug-ADR associations formed the training data for the model. The final model contained all 5 candidate predictors (number of reports, disproportionality, reports from healthcare professionals, reports from marketing authorization holders, Naranjo score). The AUC for the full model was 0.740 (95% CI; 0.734-0.747). The internal validity was good based on the calibration curve and bootstrapping analysis (AUC after bootstrapping = 0.739). Compared with the old method, the AUC increased from 0.649 to 0.740, and the proportion of potential signals increased by approximately 50% (from 12.3% to 19.4%). A prediction model-based approach can be a useful tool to create priority-based listings for signal detection in databases consisting of spontaneous ADRs. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
A GIS-Enabled, Michigan-Specific, Hierarchical Groundwater Modeling and Visualization System
NASA Astrophysics Data System (ADS)
Liu, Q.; Li, S.; Mandle, R.; Simard, A.; Fisher, B.; Brown, E.; Ross, S.
2005-12-01
Efficient management of groundwater resources relies on a comprehensive database that represents the characteristics of the natural groundwater system as well as analysis and modeling tools to describe the impacts of decision alternatives. Many agencies in Michigan have spent several years compiling expensive and comprehensive surface water and groundwater inventories and other related spatial data that describe their respective areas of responsibility. However, most often this wealth of descriptive data has only been utilized for basic mapping purposes. The benefits from analyzing these data, using GIS analysis functions or externally developed analysis models or programs, has yet to be systematically realized. In this talk, we present a comprehensive software environment that allows Michigan groundwater resources managers and frontline professionals to make more effective use of the available data and improve their ability to manage and protect groundwater resources, address potential conflicts, design cleanup schemes, and prioritize investigation activities. In particular, we take advantage of the Interactive Ground Water (IGW) modeling system and convert it to a customized software environment specifically for analyzing, modeling, and visualizing the Michigan statewide groundwater database. The resulting Michigan IGW modeling system (IGW-M) is completely window-based, fully interactive, and seamlessly integrated with a GIS mapping engine. The system operates in real-time (on the fly) providing dynamic, hierarchical mapping, modeling, spatial analysis, and visualization. Specifically, IGW-M allows water resources and environmental professionals in Michigan to: * Access and utilize the extensive data from the statewide groundwater database, interactively manipulate GIS objects, and display and query the associated data and attributes; * Analyze and model the statewide groundwater database, interactively convert GIS objects into numerical model features, automatically extract data and attributes, and simulate unsteady groundwater flow and contaminant transport in response to water and land management decisions; * Visualize and map model simulations and predictions with data from the statewide groundwater database in a seamless interactive environment. IGW-M has the potential to significantly improve the productivity of Michigan groundwater management investigations. It changes the role of engineers and scientists in modeling and analyzing the statewide groundwater database from heavily physical to cognitive problem-solving and decision-making tasks. The seamless real-time integration, real-time visual interaction, and real-time processing capability allows a user to focus on critical management issues, conflicts, and constraints, to quickly and iteratively examine conceptual approximations, management and planning scenarios, and site characterization assumptions, to identify dominant processes, to evaluate data worth and sensitivity, and to guide further data-collection activities. We illustrate the power and effectiveness of the M-IGW modeling and visualization system with a real case study and a real-time, live demonstration.
Plant Reactome: a resource for plant pathways and comparative analysis.
Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj
2017-01-04
Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
NoSQL Based 3D City Model Management System
NASA Astrophysics Data System (ADS)
Mao, B.; Harrie, L.; Cao, J.; Wu, Z.; Shen, J.
2014-04-01
To manage increasingly complicated 3D city models, a framework based on NoSQL database is proposed in this paper. The framework supports import and export of 3D city model according to international standards such as CityGML, KML/COLLADA and X3D. We also suggest and implement 3D model analysis and visualization in the framework. For city model analysis, 3D geometry data and semantic information (such as name, height, area, price and so on) are stored and processed separately. We use a Map-Reduce method to deal with the 3D geometry data since it is more complex, while the semantic analysis is mainly based on database query operation. For visualization, a multiple 3D city representation structure CityTree is implemented within the framework to support dynamic LODs based on user viewpoint. Also, the proposed framework is easily extensible and supports geoindexes to speed up the querying. Our experimental results show that the proposed 3D city management system can efficiently fulfil the analysis and visualization requirements.
Aerodynamic Optimization of Rocket Control Surface Geometry Using Cartesian Methods and CAD Geometry
NASA Technical Reports Server (NTRS)
Nelson, Andrea; Aftosmis, Michael J.; Nemec, Marian; Pulliam, Thomas H.
2004-01-01
Aerodynamic design is an iterative process involving geometry manipulation and complex computational analysis subject to physical constraints and aerodynamic objectives. A design cycle consists of first establishing the performance of a baseline design, which is usually created with low-fidelity engineering tools, and then progressively optimizing the design to maximize its performance. Optimization techniques have evolved from relying exclusively on designer intuition and insight in traditional trial and error methods, to sophisticated local and global search methods. Recent attempts at automating the search through a large design space with formal optimization methods include both database driven and direct evaluation schemes. Databases are being used in conjunction with surrogate and neural network models as a basis on which to run optimization algorithms. Optimization algorithms are also being driven by the direct evaluation of objectives and constraints using high-fidelity simulations. Surrogate methods use data points obtained from simulations, and possibly gradients evaluated at the data points, to create mathematical approximations of a database. Neural network models work in a similar fashion, using a number of high-fidelity database calculations as training iterations to create a database model. Optimal designs are obtained by coupling an optimization algorithm to the database model. Evaluation of the current best design then gives either a new local optima and/or increases the fidelity of the approximation model for the next iteration. Surrogate methods have also been developed that iterate on the selection of data points to decrease the uncertainty of the approximation model prior to searching for an optimal design. The database approximation models for each of these cases, however, become computationally expensive with increase in dimensionality. Thus the method of using optimization algorithms to search a database model becomes problematic as the number of design variables is increased.
Veneman, Jolien B; Saetnan, Eli R; Clare, Amanda J; Newbold, Charles J
2016-12-01
The body of peer-reviewed papers on enteric methane mitigation strategies in ruminants is rapidly growing and allows for better estimation of the true effect of each strategy though the use of meta-analysis methods. Here we present the development of an online database of measured methane mitigation strategies called MitiGate, currently comprising 412 papers. The database is accessible through an online user-friendly interface that allows data extraction with various levels of aggregation on one hand and data-uploading for submission to the database allowing for future refinement and updates of mitigation estimates as well as providing easy access to relevant data for integration into modelling efforts or policy recommendations. To demonstrate and verify the usefulness of the MitiGate database those studies where methane emissions were expressed per unit of intake (293 papers resulting in 845 treatment comparisons) were used in a meta-analysis. The meta-analysis of the current database estimated the effect size of each of the mitigation strategies as well as the associated variance and measure of heterogeneity. Currently, under-representation of certain strategies, geographic regions and long term studies are the main limitations in providing an accurate quantitative estimation of the mitigation potential of each strategy under varying animal production systems. We have thus implemented the facility for researchers to upload meta-data of their peer reviewed research through a simple input form in the hope that MitiGate will grow into a fully inclusive resource for those wishing to model methane mitigation strategies in ruminants. Copyright © 2016 Elsevier B.V. All rights reserved.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
GIS Toolsets for Planetary Geomorphology and Landing-Site Analysis
NASA Astrophysics Data System (ADS)
Nass, Andrea; van Gasselt, Stephan
2015-04-01
Modern Geographic Information Systems (GIS) allow expert and lay users alike to load and position geographic data and perform simple to highly complex surface analyses. For many applications dedicated and ready-to-use GIS tools are available in standard software systems while other applications require the modular combination of available basic tools to answer more specific questions. This also applies to analyses in modern planetary geomorphology where many of such (basic) tools can be used to build complex analysis tools, e.g. in image- and terrain model analysis. Apart from the simple application of sets of different tools, many complex tasks require a more sophisticated design for storing and accessing data using databases (e.g. ArcHydro for hydrological data analysis). In planetary sciences, complex database-driven models are often required to efficiently analyse potential landings sites or store rover data, but also geologic mapping data can be efficiently stored and accessed using database models rather than stand-alone shapefiles. For landings-site analyses, relief and surface roughness estimates are two common concepts that are of particular interest and for both, a number of different definitions co-exist. We here present an advanced toolset for the analysis of image and terrain-model data with an emphasis on extraction of landing site characteristics using established criteria. We provide working examples and particularly focus on the concepts of terrain roughness as it is interpreted in geomorphology and engineering studies.
Development of an Integrated Hydrologic Modeling System for Rainfall-Runoff Simulation
NASA Astrophysics Data System (ADS)
Lu, B.; Piasecki, M.
2008-12-01
This paper aims to present the development of an integrated hydrological model which involves functionalities of digital watershed processing, online data retrieval, hydrologic simulation and post-event analysis. The proposed system is intended to work as a back end to the CUAHSI HIS cyberinfrastructure developments. As a first step into developing this system, a physics-based distributed hydrologic model PIHM (Penn State Integrated Hydrologic Model) is wrapped into OpenMI(Open Modeling Interface and Environment ) environment so as to seamlessly interact with OpenMI compliant meteorological models. The graphical user interface is being developed from the openGIS application called MapWindows which permits functionality expansion through the addition of plug-ins. . Modules required to set up through the GUI workboard include those for retrieving meteorological data from existing database or meteorological prediction models, obtaining geospatial data from the output of digital watershed processing, and importing initial condition and boundary condition. They are connected to the OpenMI compliant PIHM to simulate rainfall-runoff processes and includes a module for automatically displaying output after the simulation. Online databases are accessed through the WaterOneFlow web services, and the retrieved data are either stored in an observation database(OD) following the schema of Observation Data Model(ODM) in case for time series support, or a grid based storage facility which may be a format like netCDF or a grid-based-data database schema . Specific development steps include the creation of a bridge to overcome interoperability issue between PIHM and the ODM, as well as the embedding of TauDEM (Terrain Analysis Using Digital Elevation Models) into the model. This module is responsible for developing watershed and stream network using digital elevation models. Visualizing and editing geospatial data is achieved by the usage of MapWinGIS, an ActiveX control developed by MapWindow team. After applying to the practical watershed, the performance of the model can be tested by the post-event analysis module.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kogalovskii, M.R.
This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.
Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo
2016-01-01
The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.
Aerodynamic Analyses and Database Development for Ares I Vehicle First Stage Separation
NASA Technical Reports Server (NTRS)
Pamadi, Bandu N.; Pei, Jing; Pinier, Jeremy T.; Klopfer, Goetz H.; Holland, Scott D.; Covell, Peter F.
2011-01-01
This paper presents the aerodynamic analysis and database development for first stage separation of Ares I A106 crew launch vehicle configuration. Separate 6-DOF databases were created for the first stage and upper stage and each database consists of three components: (a) isolated or freestream coefficients, (b) power-off proximity increments, and (c) power-on proximity increments. The isolated and power-off incremental databases were developed using data from 1% scaled model tests in AEDC VKF Tunnel A. The power-on proximity increments were developed using OVERFLOW CFD solutions. The database also includes incremental coefficients for one BDM and one USM failure scenarios.
[Establishment of the database of the 3D facial models for the plastic surgery based on network].
Liu, Zhe; Zhang, Hai-Lin; Zhang, Zheng-Guo; Qiao, Qun
2008-07-01
To collect the three-dimensional (3D) facial data of 30 facial deformity patients by the 3D scanner and establish a professional database based on Internet. It can be helpful for the clinical intervention. The primitive point data of face topography were collected by the 3D scanner. Then the 3D point cloud was edited by reverse engineering software to reconstruct the 3D model of the face. The database system was divided into three parts, including basic information, disease information and surgery information. The programming language of the web system is Java. The linkages between every table of the database are credibility. The query operation and the data mining are convenient. The users can visit the database via the Internet and use the image analysis system to observe the 3D facial models interactively. In this paper we presented a database and a web system adapt to the plastic surgery of human face. It can be used both in clinic and in basic research.
NASA Technical Reports Server (NTRS)
Pamadi, Bandu N.; Pei, Jing; Covell, Peter F.; Favaregh, Noah M.; Gumbert, Clyde R.; Hanke, Jeremy L.
2011-01-01
NASA Langley Research Center, in partnership with NASA Marshall Space Flight Center and NASA Ames Research Center, was involved in the aerodynamic analyses, testing, and database development for the Ares I A106 crew launch vehicle in support of the Ares Design and Analysis Cycle. This paper discusses the development of lift-off/transition and ascent databases. The lift-off/transition database was developed using data from tests on a 1.75% scale model of the A106 configuration in the NASA Langley 14x22 Subsonic Wind Tunnel. The power-off ascent database was developed using test data on a 1% A106 scale model from two different facilities, the Boeing Polysonic Wind Tunnel and the NASA Langley Unitary Plan Wind Tunnel. The ascent database was adjusted for differences in wind tunnel and flight Reynolds numbers using USM3D CFD code. The aerodynamic jet interaction effects due to first stage roll control system were modeled using USM3D and OVERFLOW CFD codes.
Identification and human condition analysis based on the human voice analysis
NASA Astrophysics Data System (ADS)
Mieshkov, Oleksandr Yu.; Novikov, Oleksandr O.; Novikov, Vsevolod O.; Fainzilberg, Leonid S.; Kotyra, Andrzej; Smailova, Saule; Kozbekova, Ainur; Imanbek, Baglan
2017-08-01
The paper presents a two-stage biotechnical system for human condition analysis that is based on analysis of human voice signal. At the initial stage, the voice signal is pre-processed and its characteristics in time domain are determined. At the first stage, the developed system is capable of identifying the person in the database on the basis of the extracted characteristics. At the second stage, the model of a human voice is built on the basis of the real voice signals after clustering the whole database.
Integrating diverse databases into an unified analysis framework: a Galaxy approach
Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton
2011-01-01
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983
M4AST - A Tool for Asteroid Modelling
NASA Astrophysics Data System (ADS)
Birlan, Mirel; Popescu, Marcel; Irimiea, Lucian; Binzel, Richard
2016-10-01
M4AST (Modelling for asteroids) is an online tool devoted to the analysis and interpretation of reflection spectra of asteroids in the visible and near-infrared spectral intervals. It consists into a spectral database of individual objects and a set of routines for analysis which address scientific aspects such as: taxonomy, curve matching with laboratory spectra, space weathering models, and mineralogical diagnosis. Spectral data were obtained using groundbased facilities; part of these data are precompiled from the literature[1].The database is composed by permanent and temporary files. Each permanent file contains a header and two or three columns (wavelength, spectral reflectance, and the error on spectral reflectance). Temporary files can be uploaded anonymously, and are purged for the property of submitted data. The computing routines are organized in order to accomplish several scientific objectives: visualize spectra, compute the asteroid taxonomic class, compare an asteroid spectrum with similar spectra of meteorites, and computing mineralogical parameters. One facility of using the Virtual Observatory protocols was also developed.A new version of the service was released in June 2016. This new release of M4AST contains a database and facilities to model more than 6,000 spectra of asteroids. A new web-interface was designed. This development allows new functionalities into a user-friendly environment. A bridge system of access and exploiting the database SMASS-MIT (http://smass.mit.edu) allows the treatment and analysis of these data in the framework of M4AST environment.Reference:[1] M. Popescu, M. Birlan, and D.A. Nedelcu, "Modeling of asteroids: M4AST," Astronomy & Astrophysics 544, EDP Sciences, pp. A130, 2012.
DOT National Transportation Integrated Search
2014-01-01
This study developed a new snow model and a database which warehouses geometric, weather and traffic : data on New Jersey highways. The complexity of the model development lies in considering variable road : width, different spreading/plowing pattern...
Voss, Frank D.; Mastin, Mark C.
2012-01-01
A database was developed to automate model execution and to provide users with Internet access to voluminous data products ranging from summary figures to model output timeseries. Database-enabled Internet tools were developed to allow users to create interactive graphs of output results based on their analysis needs. For example, users were able to create graphs by selecting time intervals, greenhouse gas emission scenarios, general circulation models, and specific hydrologic variables.
Validation Database Based Thermal Analysis of an Advanced RPS Concept
NASA Technical Reports Server (NTRS)
Balint, Tibor S.; Emis, Nickolas D.
2006-01-01
Advanced RPS concepts can be conceived, designed and assessed using high-end computational analysis tools. These predictions may provide an initial insight into the potential performance of these models, but verification and validation are necessary and required steps to gain confidence in the numerical analysis results. This paper discusses the findings from a numerical validation exercise for a small advanced RPS concept, based on a thermal analysis methodology developed at JPL and on a validation database obtained from experiments performed at Oregon State University. Both the numerical and experimental configurations utilized a single GPHS module enabled design, resembling a Mod-RTG concept. The analysis focused on operating and environmental conditions during the storage phase only. This validation exercise helped to refine key thermal analysis and modeling parameters, such as heat transfer coefficients, and conductivity and radiation heat transfer values. Improved understanding of the Mod-RTG concept through validation of the thermal model allows for future improvements to this power system concept.
Space Launch System Ascent Static Aerodynamic Database Development
NASA Technical Reports Server (NTRS)
Pinier, Jeremy T.; Bennett, David W.; Blevins, John A.; Erickson, Gary E.; Favaregh, Noah M.; Houlden, Heather P.; Tomek, William G.
2014-01-01
This paper describes the wind tunnel testing work and data analysis required to characterize the static aerodynamic environment of NASA's Space Launch System (SLS) ascent portion of flight. Scaled models of the SLS have been tested in transonic and supersonic wind tunnels to gather the high fidelity data that is used to build aerodynamic databases. A detailed description of the wind tunnel test that was conducted to produce the latest version of the database is presented, and a representative set of aerodynamic data is shown. The wind tunnel data quality remains very high, however some concerns with wall interference effects through transonic Mach numbers are also discussed. Post-processing and analysis of the wind tunnel dataset are crucial for the development of a formal ascent aerodynamics database.
ANALYSIS OF HUMAN ACTIVITY DATA FOR USE IN MODELING ENVIRONMENTAL EXPOSURES
Human activity data are a critical part of exposure models being developed by the US EPA's National Exposure Research Laboratory (NERL). An analysis of human activity data within NERL's Consolidated Human Activity Database (CHAD) was performed in two areas relevant to exposure ...
Detecting Spatial Patterns of Natural Hazards from the Wikipedia Knowledge Base
NASA Astrophysics Data System (ADS)
Fan, J.; Stewart, K.
2015-07-01
The Wikipedia database is a data source of immense richness and variety. Included in this database are thousands of geotagged articles, including, for example, almost real-time updates on current and historic natural hazards. This includes usercontributed information about the location of natural hazards, the extent of the disasters, and many details relating to response, impact, and recovery. In this research, a computational framework is proposed to detect spatial patterns of natural hazards from the Wikipedia database by combining topic modeling methods with spatial analysis techniques. The computation is performed on the Neon Cluster, a high performance-computing cluster at the University of Iowa. This work uses wildfires as the exemplar hazard, but this framework is easily generalizable to other types of hazards, such as hurricanes or flooding. Latent Dirichlet Allocation (LDA) modeling is first employed to train the entire English Wikipedia dump, transforming the database dump into a 500-dimension topic model. Over 230,000 geo-tagged articles are then extracted from the Wikipedia database, spatially covering the contiguous United States. The geo-tagged articles are converted into an LDA topic space based on the topic model, with each article being represented as a weighted multidimension topic vector. By treating each article's topic vector as an observed point in geographic space, a probability surface is calculated for each of the topics. In this work, Wikipedia articles about wildfires are extracted from the Wikipedia database, forming a wildfire corpus and creating a basis for the topic vector analysis. The spatial distribution of wildfire outbreaks in the US is estimated by calculating the weighted sum of the topic probability surfaces using a map algebra approach, and mapped using GIS. To provide an evaluation of the approach, the estimation is compared to wildfire hazard potential maps created by the USDA Forest service.
Review of Methods for Buildings Energy Performance Modelling
NASA Astrophysics Data System (ADS)
Krstić, Hrvoje; Teni, Mihaela
2017-10-01
Research presented in this paper gives a brief review of methods used for buildings energy performance modelling. This paper gives also a comprehensive review of the advantages and disadvantages of available methods as well as the input parameters used for modelling buildings energy performance. European Directive EPBD obliges the implementation of energy certification procedure which gives an insight on buildings energy performance via exiting energy certificate databases. Some of the methods for buildings energy performance modelling mentioned in this paper are developed by employing data sets of buildings which have already undergone an energy certification procedure. Such database is used in this paper where the majority of buildings in the database have already gone under some form of partial retrofitting - replacement of windows or installation of thermal insulation but still have poor energy performance. The case study presented in this paper utilizes energy certificates database obtained from residential units in Croatia (over 400 buildings) in order to determine the dependence between buildings energy performance and variables from database by using statistical dependencies tests. Building energy performance in database is presented with building energy efficiency rate (from A+ to G) which is based on specific annual energy needs for heating for referential climatic data [kWh/(m2a)]. Independent variables in database are surfaces and volume of the conditioned part of the building, building shape factor, energy used for heating, CO2 emission, building age and year of reconstruction. Research results presented in this paper give an insight in possibilities of methods used for buildings energy performance modelling. Further on it gives an analysis of dependencies between buildings energy performance as a dependent variable and independent variables from the database. Presented results could be used for development of new building energy performance predictive model.
SSER: Species specific essential reactions database.
Labena, Abraham A; Ye, Yuan-Nong; Dong, Chuan; Zhang, Fa-Z; Guo, Feng-Biao
2017-04-19
Essential reactions are vital components of cellular networks. They are the foundations of synthetic biology and are potential candidate targets for antimetabolic drug design. Especially if a single reaction is catalyzed by multiple enzymes, then inhibiting the reaction would be a better option than targeting the enzymes or the corresponding enzyme-encoding gene. The existing databases such as BRENDA, BiGG, KEGG, Bio-models, Biosilico, and many others offer useful and comprehensive information on biochemical reactions. But none of these databases especially focus on essential reactions. Therefore, building a centralized repository for this class of reactions would be of great value. Here, we present a species-specific essential reactions database (SSER). The current version comprises essential biochemical and transport reactions of twenty-six organisms which are identified via flux balance analysis (FBA) combined with manual curation on experimentally validated metabolic network models. Quantitative data on the number of essential reactions, number of the essential reactions associated with their respective enzyme-encoding genes and shared essential reactions across organisms are the main contents of the database. SSER would be a prime source to obtain essential reactions data and related gene and metabolite information and it can significantly facilitate the metabolic network models reconstruction and analysis, and drug target discovery studies. Users can browse, search, compare and download the essential reactions of organisms of their interest through the website http://cefg.uestc.edu.cn/sser .
Wang, Wei; Xia, Minxuan; Chen, Jie; Deng, Fenni; Yuan, Rui; Zhang, Xiaopei; Shen, Fafu
2016-12-01
The data presented in this paper is supporting the research article "Genome-Wide Analysis of Superoxide Dismutase Gene Family in Gossypium raimondii and G. arboreum" [1]. In this data article, we present phylogenetic tree showing dichotomy with two different clusters of SODs inferred by the Bayesian method of MrBayes (version 3.2.4), "Bayesian phylogenetic inference under mixed models" [2], Ramachandran plots of G. raimondii and G. arboreum SODs, the protein sequence used to generate 3D sructure of proteins and the template accession via SWISS-MODEL server, "SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information." [3] and motif sequences of SODs identified by InterProScan (version 4.8) with the Pfam database, "Pfam: the protein families database" [4].
Steyaert, Louis T.; Loveland, Thomas R.; Brown, Jesslyn F.; Reed, Bradley C.
1993-01-01
Environmental modelers are testing and evaluating a prototype land cover characteristics database for the conterminous United States developed by the EROS Data Center of the U.S. Geological Survey and the University of Nebraska Center for Advanced Land Management Information Technologies. This database was developed from multi temporal, 1-kilometer advanced very high resolution radiometer (AVHRR) data for 1990 and various ancillary data sets such as elevation, ecological regions, and selected climatic normals. Several case studies using this database were analyzed to illustrate the integration of satellite remote sensing and geographic information systems technologies with land-atmosphere interactions models at a variety of spatial and temporal scales. The case studies are representative of contemporary environmental simulation modeling at local to regional levels in global change research, land and water resource management, and environmental simulation modeling at local to regional levels in global change research, land and water resource management and environmental risk assessment. The case studies feature land surface parameterizations for atmospheric mesoscale and global climate models; biogenic-hydrocarbons emissions models; distributed parameter watershed and other hydrological models; and various ecological models such as ecosystem, dynamics, biogeochemical cycles, ecotone variability, and equilibrium vegetation models. The case studies demonstrate the important of multi temporal AVHRR data to develop to develop and maintain a flexible, near-realtime land cover characteristics database. Moreover, such a flexible database is needed to derive various vegetation classification schemes, to aggregate data for nested models, to develop remote sensing algorithms, and to provide data on dynamic landscape characteristics. The case studies illustrate how such a database supports research on spatial heterogeneity, land use, sensitivity analysis, and scaling issues involving regional extrapolations and parameterizations of dynamic land processes within simulation models.
Creating a model to detect dairy cattle farms with poor welfare using a national database.
Krug, C; Haskell, M J; Nunes, T; Stilwell, G
2015-12-01
The objective of this study was to determine whether dairy farms with poor cow welfare could be identified using a national database for bovine identification and registration that monitors cattle deaths and movements. The welfare of dairy cattle was assessed using the Welfare Quality(®) protocol (WQ) on 24 Portuguese dairy farms and on 1930 animals. Five farms were classified as having poor welfare and the other 19 were classified as having good welfare. Fourteen million records from the national cattle database were analysed to identify potential welfare indicators for dairy farms. Fifteen potential national welfare indicators were calculated based on that database, and the link between the results on the WQ evaluation and the national cattle database was made using the identification code of each farm. Within the potential national welfare indicators, only two were significantly different between farms with good welfare and poor welfare, 'proportion of on-farm deaths' (p<0.01) and 'female/male birth ratio' (p<0.05). To determine whether the database welfare indicators could be used to distinguish farms with good welfare from farms with poor welfare, we created a model using the classifier J48 of Waikato Environment for Knowledge Analysis. The model was a decision tree based on two variables, 'proportion of on-farm deaths' and 'calving-to-calving interval', and it was able to correctly identify 70% and 79% of the farms classified as having poor and good welfare, respectively. The national cattle database analysis could be useful in helping official veterinary services in detecting farms that have poor welfare and also in determining which welfare indicators are poor on each particular farm. Copyright © 2015 Elsevier B.V. All rights reserved.
SSME environment database development
NASA Technical Reports Server (NTRS)
Reardon, John
1987-01-01
The internal environment of the Space Shuttle Main Engine (SSME) is being determined from hot firings of the prototype engines and from model tests using either air or water as the test fluid. The objectives are to develop a database system to facilitate management and analysis of test measurements and results, to enter available data into the the database, and to analyze available data to establish conventions and procedures to provide consistency in data normalization and configuration geometry references.
IAC - INTEGRATED ANALYSIS CAPABILITY
NASA Technical Reports Server (NTRS)
Frisch, H. P.
1994-01-01
The objective of the Integrated Analysis Capability (IAC) system is to provide a highly effective, interactive analysis tool for the integrated design of large structures. With the goal of supporting the unique needs of engineering analysis groups concerned with interdisciplinary problems, IAC was developed to interface programs from the fields of structures, thermodynamics, controls, and system dynamics with an executive system and database to yield a highly efficient multi-disciplinary system. Special attention is given to user requirements such as data handling and on-line assistance with operational features, and the ability to add new modules of the user's choice at a future date. IAC contains an executive system, a data base, general utilities, interfaces to various engineering programs, and a framework for building interfaces to other programs. IAC has shown itself to be effective in automatic data transfer among analysis programs. IAC 2.5, designed to be compatible as far as possible with Level 1.5, contains a major upgrade in executive and database management system capabilities, and includes interfaces to enable thermal, structures, optics, and control interaction dynamics analysis. The IAC system architecture is modular in design. 1) The executive module contains an input command processor, an extensive data management system, and driver code to execute the application modules. 2) Technical modules provide standalone computational capability as well as support for various solution paths or coupled analyses. 3) Graphics and model generation interfaces are supplied for building and viewing models. Advanced graphics capabilities are provided within particular analysis modules such as INCA and NASTRAN. 4) Interface modules provide for the required data flow between IAC and other modules. 5) User modules can be arbitrary executable programs or JCL procedures with no pre-defined relationship to IAC. 6) Special purpose modules are included, such as MIMIC (Model Integration via Mesh Interpolation Coefficients), which transforms field values from one model to another; LINK, which simplifies incorporation of user specific modules into IAC modules; and DATAPAC, the National Bureau of Standards statistical analysis package. The IAC database contains structured files which provide a common basis for communication between modules and the executive system, and can contain unstructured files such as NASTRAN checkpoint files, DISCOS plot files, object code, etc. The user can define groups of data and relations between them. A full data manipulation and query system operates with the database. The current interface modules comprise five groups: 1) Structural analysis - IAC contains a NASTRAN interface for standalone analysis or certain structural/control/thermal combinations. IAC provides enhanced structural capabilities for normal modes and static deformation analysis via special DMAP sequences. IAC 2.5 contains several specialized interfaces from NASTRAN in support of multidisciplinary analysis. 2) Thermal analysis - IAC supports finite element and finite difference techniques for steady state or transient analysis. There are interfaces for the NASTRAN thermal analyzer, SINDA/SINFLO, and TRASYS II. FEMNET, which converts finite element structural analysis models to finite difference thermal analysis models, is also interfaced with the IAC database. 3) System dynamics - The DISCOS simulation program which allows for either nonlinear time domain analysis or linear frequency domain analysis, is fully interfaced to the IAC database management capability. 4) Control analysis - Interfaces for the ORACLS, SAMSAN, NBOD2, and INCA programs allow a wide range of control system analyses and synthesis techniques. Level 2.5 includes EIGEN, which provides tools for large order system eigenanalysis, and BOPACE, which allows for geometric capabilities and finite element analysis with nonlinear material. Also included in IAC level 2.5 is SAMSAN 3.1, an engineering analysis program which contains a general purpose library of over 600 subroutin
NASA Technical Reports Server (NTRS)
Okong'o, Nora; Bellan, Josette
2005-01-01
Models for large eddy simulation (LES) are assessed on a database obtained from direct numerical simulations (DNS) of supercritical binary-species temporal mixing layers. The analysis is performed at the DNS transitional states for heptane/nitrogen, oxygen/hydrogen and oxygen/helium mixing layers. The incorporation of simplifying assumptions that are validated on the DNS database leads to a set of LES equations that requires only models for the subgrid scale (SGS) fluxes, which arise from filtering the convective terms in the DNS equations. Constant-coefficient versions of three different models for the SGS fluxes are assessed and calibrated. The Smagorinsky SGS-flux model shows poor correlations with the SGS fluxes, while the Gradient and Similarity models have high correlations, as well as good quantitative agreement with the SGS fluxes when the calibrated coefficients are used.
Database resources of the National Center for Biotechnology Information: 2002 update
Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.
2002-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human¡VMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:11752242
NASA Astrophysics Data System (ADS)
Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan
2010-10-01
The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
Rosato, Stefano; D'Errigo, Paola; Badoni, Gabriella; Fusco, Danilo; Perucci, Carlo A; Seccareccia, Fulvia
2008-08-01
The availability of two contemporary sources of information about coronary artery bypass graft (CABG) interventions, allowed 1) to verify the feasibility of performing outcome evaluation studies using administrative data sources, and 2) to compare hospital performance obtainable using the CABG Project clinical database with hospital performance derived from the use of current administrative data. Interventions recorded in the CABG Project were linked to the hospital discharge record (HDR) administrative database. Only the linked records were considered for subsequent analyses (46% of the total CABG Project). A new selected population "clinical card-HDR" was then defined. Two independent risk-adjustment models were applied, each of them using information derived from one of the two different sources. Then, HDR information was supplemented with some patient preoperative conditions from the CABG clinical database. The two models were compared in terms of their adaptability to data. Hospital performances identified by the two different models and significantly different from the mean was compared. In only 4 of the 13 hospitals considered for analysis, the results obtained using the HDR model did not completely overlap with those obtained by the CABG model. When comparing statistical parameters of the HDR model and the HDR model + patient preoperative conditions, the latter showed the best adaptability to data. In this "clinical card-HDR" population, hospital performance assessment obtained using information from the clinical database is similar to that derived from the use of current administrative data. However, when risk-adjustment models built on administrative databases are supplemented with a few clinical variables, their statistical parameters improve and hospital performance assessment becomes more accurate.
This study presents an evaluation of summertime daily maximum ozone concentrations over North America (NA) and Europe (EU) using the database generated during Phase 1 of the Air Quality Model Evaluation International Initiative (AQMEII). The analysis focuses on identifying tempor...
NASA Astrophysics Data System (ADS)
Wang, Ji-Peng; François, Bertrand; Lambert, Pierre
2017-09-01
Estimating hydraulic conductivity from particle size distribution (PSD) is an important issue for various engineering problems. Classical models such as Hazen model, Beyer model, and Kozeny-Carman model usually regard the grain diameter at 10% passing (d10) as an effective grain size and the effects of particle size uniformity (in Beyer model) or porosity (in Kozeny-Carman model) are sometimes embedded. This technical note applies the dimensional analysis (Buckingham's ∏ theorem) to analyze the relationship between hydraulic conductivity and particle size distribution (PSD). The porosity is regarded as a dependent variable on the grain size distribution in unconsolidated conditions. It indicates that the coefficient of grain size uniformity and a dimensionless group representing the gravity effect, which is proportional to the mean grain volume, are the main two determinative parameters for estimating hydraulic conductivity. Regression analysis is then carried out on a database comprising 431 samples collected from different depositional environments and new equations are developed for hydraulic conductivity estimation. The new equation, validated in specimens beyond the database, shows an improved prediction comparing to using the classic models.
Sweetness prediction of natural compounds.
Chéron, Jean-Baptiste; Casciuc, Iuri; Golebiowski, Jérôme; Antonczak, Serge; Fiorucci, Sébastien
2017-04-15
Based on the most exhaustive database of sweeteners with known sweetness values, a new quantitative structure-activity relationship model for sweetness prediction has been set up. Analysis of the physico-chemical properties of sweeteners in the database indicates that the structure of most potent sweeteners combines a hydrophobic scaffold functionalized by a limited number of hydrogen bond sites (less than 4 hydrogen bond donors and 10 acceptors), with a moderate molecular weight ranging from 350 to 450g·mol -1 . Prediction of sweetness, bitterness and toxicity properties of the largest database of natural compounds have been performed. In silico screening reveals that the majority of the predicted natural intense sweeteners comprise saponin or stevioside scaffolds. The model highlights that their sweetness potency is comparable to known natural sweeteners. The identified compounds provide a rational basis to initiate the design and chemosensory analysis of new low-calorie sweeteners. Copyright © 2016 Elsevier Ltd. All rights reserved.
Database assessment of CMIP5 and hydrological models to determine flood risk areas
NASA Astrophysics Data System (ADS)
Limlahapun, Ponthip; Fukui, Hiromichi
2016-11-01
Solutions for water-related disasters may not be solved with a single scientific method. Based on this premise, we involved logic conceptions, associate sequential result amongst models, and database applications attempting to analyse historical and future scenarios in the context of flooding. The three main models used in this study are (1) the fifth phase of the Coupled Model Intercomparison Project (CMIP5) to derive precipitation; (2) the Integrated Flood Analysis System (IFAS) to extract amount of discharge; and (3) the Hydrologic Engineering Center (HEC) model to generate inundated areas. This research notably focused on integrating data regardless of system-design complexity, and database approaches are significantly flexible, manageable, and well-supported for system data transfer, which makes them suitable for monitoring a flood. The outcome of flood map together with real-time stream data can help local communities identify areas at-risk of flooding in advance.
TransAtlasDB: an integrated database connecting expression data, metadata and variants
Adetunji, Modupeore O; Lamont, Susan J; Schmidt, Carl J
2018-01-01
Abstract High-throughput transcriptome sequencing (RNAseq) is the universally applied method for target-free transcript identification and gene expression quantification, generating huge amounts of data. The constraint of accessing such data and interpreting results can be a major impediment in postulating suitable hypothesis, thus an innovative storage solution that addresses these limitations, such as hard disk storage requirements, efficiency and reproducibility are paramount. By offering a uniform data storage and retrieval mechanism, various data can be compared and easily investigated. We present a sophisticated system, TransAtlasDB, which incorporates a hybrid architecture of both relational and NoSQL databases for fast and efficient data storage, processing and querying of large datasets from transcript expression analysis with corresponding metadata, as well as gene-associated variants (such as SNPs) and their predicted gene effects. TransAtlasDB provides the data model of accurate storage of the large amount of data derived from RNAseq analysis and also methods of interacting with the database, either via the command-line data management workflows, written in Perl, with useful functionalities that simplifies the complexity of data storage and possibly manipulation of the massive amounts of data generated from RNAseq analysis or through the web interface. The database application is currently modeled to handle analyses data from agricultural species, and will be expanded to include more species groups. Overall TransAtlasDB aims to serve as an accessible repository for the large complex results data files derived from RNAseq gene expression profiling and variant analysis. Database URL: https://modupeore.github.io/TransAtlasDB/ PMID:29688361
In silico analysis of fragile histidine triad involved in regression of carcinoma.
Rasheed, Muhammad Asif; Tariq, Fatima; Afzal, Sara; Mannanv, Shazia
2017-04-01
Hepatocellular carcinoma (HCCa) is a primary malignancy of the liver. Many different proteins are involved in HCCa including insulin growth factor (IGF) II , signal transducers and activators of transcription (STAT) 3, STAT4, mothers against decapentaplegic homolog 4 (SMAD 4), fragile histidine triad (FHIT) and selective internal radiation therapy (SIRT) etc. The present study is based on the bioinformatics analysis of FHIT protein in order to understand the proteomics aspect and improvement of the diagnosis of the disease based on the protein. Different information related to protein were gathered from different databases, including National Centre for Biotechnology Information (NCBI) Gene, Protein and Online Mendelian Inheritance in Man (OMIM) databases, Uniprot database, String database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database. Moreover, the structure of the protein and evaluation of the quality of the structure were included from Easy modeler programme. Hence, this analysis not only helped to gather information related to the protein at one place, but also analysed the structure and quality of the protein to conclude that the protein has a role in carcinoma.
The Muon Conditions Data Management:. Database Architecture and Software Infrastructure
NASA Astrophysics Data System (ADS)
Verducci, Monica
2010-04-01
The management of the Muon Conditions Database will be one of the most challenging applications for Muon System, both in terms of data volumes and rates, but also in terms of the variety of data stored and their analysis. The Muon conditions database is responsible for almost all of the 'non-event' data and detector quality flags storage needed for debugging of the detector operations and for performing the reconstruction and the analysis. In particular for the early data, the knowledge of the detector performance, the corrections in term of efficiency and calibration will be extremely important for the correct reconstruction of the events. In this work, an overview of the entire Muon conditions database architecture is given, in particular the different sources of the data and the storage model used, including the database technology associated. Particular emphasis is given to the Data Quality chain: the flow of the data, the analysis and the final results are described. In addition, the description of the software interfaces used to access to the conditions data are reported, in particular, in the ATLAS Offline Reconstruction framework ATHENA environment.
NASA Technical Reports Server (NTRS)
Gaonkar, G. H.; Subramanian, S.
1996-01-01
Since the early 1990s the Aeroflightdynamics Directorate at the Ames Research Center has been conducting tests on isolated hingeless rotors in hover and forward flight. The primary objective is to generate a database on aeroelastic stability in trimmed flight for torsionally soft rotors at realistic tip speeds. The rotor test model has four soft inplane blades of NACA 0012 airfoil section with low torsional stiffness. The collective pitch and shaft tilt are set prior to each test run, and then the rotor is trimmed in the following sense: the longitudinal and lateral cyclic pitch controls are adjusted through a swashplate to minimize the 1/rev flapping moment at the 12 percent radial station. In hover, the database comprises lag regressive-mode damping with pitch variations. In forward flight the database comprises cyclic pitch controls, root flap moment and lag regressive-mode damping with advance ratio, shaft angle and pitch variations. This report presents the predictions and their correlation with the database. A modal analysis is used, in which nonrotating modes in flap bending, lag bending and torsion are computed from the measured blade mass and stiffness distributions. The airfoil aerodynamics is represented by the ONERA dynamic stall models of lift, drag and pitching moment, and the wake dynamics is represented by a state-space wake model. The trim analysis of finding, the cyclic controls and the corresponding, periodic responses is based on periodic shooting with damped Newton iteration; the Floquet transition matrix (FTM) comes out as a byproduct. The stabillty analysis of finding the frequencies and damping levels is based on the eigenvalue-eigenvector analysis of the FTM. All the structural and aerodynamic states are included from modeling to trim analysis. A major finding is that dynamic wake dramatically improves the correlation for the lateral cyclic pitch control. Overall, the correlation is fairly good.
GIS applications for military operations in coastal zones
Fleming, S.; Jordan, T.; Madden, M.; Usery, E.L.; Welch, R.
2009-01-01
In order to successfully support current and future US military operations in coastal zones, geospatial information must be rapidly integrated and analyzed to meet ongoing force structure evolution and new mission directives. Coastal zones in a military-operational environment are complex regions that include sea, land and air features that demand high-volume databases of extreme detail within relatively narrow geographic corridors. Static products in the form of analog maps at varying scales traditionally have been used by military commanders and their operational planners. The rapidly changing battlefield of 21st Century warfare, however, demands dynamic mapping solutions. Commercial geographic information system (GIS) software for military-specific applications is now being developed and employed with digital databases to provide customized digital maps of variable scale, content and symbolization tailored to unique demands of military units. Research conducted by the Center for Remote Sensing and Mapping Science at the University of Georgia demonstrated the utility of GIS-based analysis and digital map creation when developing large-scale (1:10,000) products from littoral warfare databases. The methodology employed-selection of data sources (including high resolution commercial images and Lidar), establishment of analysis/modeling parameters, conduct of vehicle mobility analysis, development of models and generation of products (such as a continuous sea-land DEM and geo-visualization of changing shorelines with tidal levels)-is discussed. Based on observations and identified needs from the National Geospatial-Intelligence Agency, formerly the National Imagery and Mapping Agency, and the Department of Defense, prototype GIS models for military operations in sea, land and air environments were created from multiple data sets of a study area at US Marine Corps Base Camp Lejeune, North Carolina. Results of these models, along with methodologies for developing large-scale littoral warfare databases, aid the National Geospatial-Intelligence Agency in meeting littoral warfare analysis, modeling and map generation requirements for US military organizations. ?? 2008 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).
GIS applications for military operations in coastal zones
NASA Astrophysics Data System (ADS)
Fleming, S.; Jordan, T.; Madden, M.; Usery, E. L.; Welch, R.
In order to successfully support current and future US military operations in coastal zones, geospatial information must be rapidly integrated and analyzed to meet ongoing force structure evolution and new mission directives. Coastal zones in a military-operational environment are complex regions that include sea, land and air features that demand high-volume databases of extreme detail within relatively narrow geographic corridors. Static products in the form of analog maps at varying scales traditionally have been used by military commanders and their operational planners. The rapidly changing battlefield of 21st Century warfare, however, demands dynamic mapping solutions. Commercial geographic information system (GIS) software for military-specific applications is now being developed and employed with digital databases to provide customized digital maps of variable scale, content and symbolization tailored to unique demands of military units. Research conducted by the Center for Remote Sensing and Mapping Science at the University of Georgia demonstrated the utility of GIS-based analysis and digital map creation when developing large-scale (1:10,000) products from littoral warfare databases. The methodology employed-selection of data sources (including high resolution commercial images and Lidar), establishment of analysis/modeling parameters, conduct of vehicle mobility analysis, development of models and generation of products (such as a continuous sea-land DEM and geo-visualization of changing shorelines with tidal levels)-is discussed. Based on observations and identified needs from the National Geospatial-Intelligence Agency, formerly the National Imagery and Mapping Agency, and the Department of Defense, prototype GIS models for military operations in sea, land and air environments were created from multiple data sets of a study area at US Marine Corps Base Camp Lejeune, North Carolina. Results of these models, along with methodologies for developing large-scale littoral warfare databases, aid the National Geospatial-Intelligence Agency in meeting littoral warfare analysis, modeling and map generation requirements for US military organizations.
Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P
2012-03-15
Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
NASA Astrophysics Data System (ADS)
Murumkar, Prashant Revan; Zambre, Vishal Prakash; Yadav, Mange Ram
2010-02-01
A chemical feature-based pharmacophore model was developed for Tumor Necrosis Factor-α converting enzyme (TACE) inhibitors. A five point pharmacophore model having two hydrogen bond acceptors (A), one hydrogen bond donor (D) and two aromatic rings (R) with discrete geometries as pharmacophoric features was developed. The pharmacophore model so generated was then utilized for in silico screening of a database. The pharmacophore model so developed was validated by using four compounds having proven TACE inhibitory activity which were grafted into the database. These compounds mapped well onto the five listed pharmacophoric features. This validated pharmacophore model was also used for alignment of molecules in CoMFA and CoMSIA analysis. The contour maps of the CoMFA/CoMSIA models were utilized to provide structural insight for activity improvement of potential novel TACE inhibitors. The pharmacophore model so developed could be used for in silico screening of any commercial/in house database for identification of TACE inhibiting lead compounds, and the leads so identified could be optimized using the developed CoMSIA model. The present work highlights the tremendous potential of the two mutually complementary ligand-based drug designing techniques (i.e. pharmacophore mapping and 3D-QSAR analysis) using TACE inhibitors as prototype biologically active molecules.
DataHub: Knowledge-based data management for data discovery
NASA Astrophysics Data System (ADS)
Handley, Thomas H.; Li, Y. Philip
1993-08-01
Currently available database technology is largely designed for business data-processing applications, and seems inadequate for scientific applications. The research described in this paper, the DataHub, will address the issues associated with this shortfall in technology utilization and development. The DataHub development is addressing the key issues in scientific data management of scientific database models and resource sharing in a geographically distributed, multi-disciplinary, science research environment. Thus, the DataHub will be a server between the data suppliers and data consumers to facilitate data exchanges, to assist science data analysis, and to provide as systematic approach for science data management. More specifically, the DataHub's objectives are to provide support for (1) exploratory data analysis (i.e., data driven analysis); (2) data transformations; (3) data semantics capture and usage; analysis-related knowledge capture and usage; and (5) data discovery, ingestion, and extraction. Applying technologies that vary from deductive databases, semantic data models, data discovery, knowledge representation and inferencing, exploratory data analysis techniques and modern man-machine interfaces, DataHub will provide a prototype, integrated environement to support research scientists' needs in multiple disciplines (i.e. oceanography, geology, and atmospheric) while addressing the more general science data management issues. Additionally, the DataHub will provide data management services to exploratory data analysis applications such as LinkWinds and NCSA's XIMAGE.
[Relational database for urinary stone ambulatory consultation. Assessment of initial outcomes].
Sáenz Medina, J; Páez Borda, A; Crespo Martinez, L; Gómez Dos Santos, V; Barrado, C; Durán Poveda, M
2010-05-01
To create a relational database for monitoring lithiasic patients. We describe the architectural details and the initial results of the statistical analysis. Microsoft Access 2002 was used as template. Four different tables were constructed to gather demographic data (table 1), clinical and laboratory findings (table 2), stone features (table 3) and therapeutic approach (table 4). For a reliability analysis of the database the number of correctly stored data was gathered. To evaluate the performance of the database, a prospective analysis was conducted, from May 2004 to August 2009, on 171 stone free patients after treatment (EWSL, surgery or medical) from a total of 511 patients stored in the database. Lithiasic status (stone free or stone relapse) was used as primary end point, while demographic factors (age, gender), lithiasic history, upper urinary tract alterations and characteristics of the stone (side, location, composition and size) were considered as predictive factors. An univariate analysis was conducted initially by chi square test and supplemented by Kaplan Meier estimates for time to stone recurrence. A multiple Cox proportional hazards regression model was generated to jointly assess the prognostic value of the demographic factors and the predictive value of stones characteristics. For the reliability analysis 22,084 data were available corresponding to 702 consultations on 511 patients. Analysis of data showed a recurrence rate of 85.4% (146/171, median time to recurrence 608 days, range 70-1758). In the univariate and multivariate analysis, none of the factors under consideration had a significant effect on recurrence rate (p=ns). The relational database is useful for monitoring patients with urolithiasis. It allows easy control and update, as well as data storage for later use. The analysis conducted for its evaluation showed no influence of demographic factors and stone features on stone recurrence.
Hydrologic Derivatives for Modeling and Analysis—A new global high-resolution database
Verdin, Kristine L.
2017-07-17
The U.S. Geological Survey has developed a new global high-resolution hydrologic derivative database. Loosely modeled on the HYDRO1k database, this new database, entitled Hydrologic Derivatives for Modeling and Analysis, provides comprehensive and consistent global coverage of topographically derived raster layers (digital elevation model data, flow direction, flow accumulation, slope, and compound topographic index) and vector layers (streams and catchment boundaries). The coverage of the data is global, and the underlying digital elevation model is a hybrid of three datasets: HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales), GMTED2010 (Global Multi-resolution Terrain Elevation Data 2010), and the SRTM (Shuttle Radar Topography Mission). For most of the globe south of 60°N., the raster resolution of the data is 3 arc-seconds, corresponding to the resolution of the SRTM. For the areas north of 60°N., the resolution is 7.5 arc-seconds (the highest resolution of the GMTED2010 dataset) except for Greenland, where the resolution is 30 arc-seconds. The streams and catchments are attributed with Pfafstetter codes, based on a hierarchical numbering system, that carry important topological information. This database is appropriate for use in continental-scale modeling efforts. The work described in this report was conducted by the U.S. Geological Survey in cooperation with the National Aeronautics and Space Administration Goddard Space Flight Center.
Qiao, Yuanhua; Keren, Nir; Mannan, M Sam
2009-08-15
Risk assessment and management of transportation of hazardous materials (HazMat) require the estimation of accident frequency. This paper presents a methodology to estimate hazardous materials transportation accident frequency by utilizing publicly available databases and expert knowledge. The estimation process addresses route-dependent and route-independent variables. Negative binomial regression is applied to an analysis of the Department of Public Safety (DPS) accident database to derive basic accident frequency as a function of route-dependent variables, while the effects of route-independent variables are modeled by fuzzy logic. The integrated methodology provides the basis for an overall transportation risk analysis, which can be used later to develop a decision support system.
Analysis of NASA Common Research Model Dynamic Data
NASA Technical Reports Server (NTRS)
Balakrishna, S.; Acheson, Michael J.
2011-01-01
Recent NASA Common Research Model (CRM) tests at the Langley National Transonic Facility (NTF) and Ames 11-foot Transonic Wind Tunnel (11-foot TWT) have generated an experimental database for CFD code validation. The database consists of force and moment, surface pressures and wideband wing-root dynamic strain/wing Kulite data from continuous sweep pitch polars. The dynamic data sets, acquired at 12,800 Hz sampling rate, are analyzed in this study to evaluate CRM wing buffet onset and potential CRM wing flow separation.
MIPS plant genome information resources.
Spannagl, Manuel; Haberer, Georg; Ernst, Rebecca; Schoof, Heiko; Mayer, Klaus F X
2007-01-01
The Munich Institute for Protein Sequences (MIPS) has been involved in maintaining plant genome databases since the Arabidopsis thaliana genome project. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable data sets for model plant genomes as a backbone against which experimental data, for example from high-throughput functional genomics, can be organized and evaluated. In addition, model genomes also form a scaffold for comparative genomics, and much can be learned from genome-wide evolutionary studies.
Genomics Community Resources | Informatics Technology for Cancer Research (ITCR)
To facilitate genomic research and the dissemination of its products, National Human Genome Research Institute (NHGRI) supports genomic resources that are crucial for basic research, disease studies, model organism studies, and other biomedical research. Awards under this FOA will support the development and distribution of genomic resources that will be valuable for the broad research community, using cost-effective approaches. Such resources include (but are not limited to) databases and informatics resources (such as human and model organism databases, ontologies, and analysi
Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hagan, Ross F.
2016-08-29
This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less
Schell, Scott R
2006-02-01
Enforcement of the Health Insurance Portability and Accountability Act (HIPAA) began in April, 2003. Designed as a law mandating health insurance availability when coverage was lost, HIPAA imposed sweeping and broad-reaching protections of patient privacy. These changes dramatically altered clinical research by placing sizeable regulatory burdens upon investigators with threat of severe and costly federal and civil penalties. This report describes development of an algorithmic approach to clinical research database design based upon a central key-shared data (CK-SD) model allowing researchers to easily analyze, distribute, and publish clinical research without disclosure of HIPAA Protected Health Information (PHI). Three clinical database formats (small clinical trial, operating room performance, and genetic microchip array datasets) were modeled using standard structured query language (SQL)-compliant databases. The CK database was created to contain PHI data, whereas a shareable SD database was generated in real-time containing relevant clinical outcome information while protecting PHI items. Small (< 100 records), medium (< 50,000 records), and large (> 10(8) records) model databases were created, and the resultant data models were evaluated in consultation with an HIPAA compliance officer. The SD database models complied fully with HIPAA regulations, and resulting "shared" data could be distributed freely. Unique patient identifiers were not required for treatment or outcome analysis. Age data were resolved to single-integer years, grouping patients aged > 89 years. Admission, discharge, treatment, and follow-up dates were replaced with enrollment year, and follow-up/outcome intervals calculated eliminating original data. Two additional data fields identified as PHI (treating physician and facility) were replaced with integer values, and the original data corresponding to these values were stored in the CK database. Use of the algorithm at the time of database design did not increase cost or design effort. The CK-SD model for clinical database design provides an algorithm for investigators to create, maintain, and share clinical research data compliant with HIPAA regulations. This model is applicable to new projects and large institutional datasets, and should decrease regulatory efforts required for conduct of clinical research. Application of the design algorithm early in the clinical research enterprise does not increase cost or the effort of data collection.
2012-09-01
relative performance of several conventional SQL and NoSQL databases with a set of one billion file block hashes. Digital Forensics, Sector Hashing, Full... NoSQL databases with a set of one billion file block hashes. v THIS PAGE INTENTIONALLY LEFT BLANK vi Table of Contents List of Acronyms and...Operating System NOOP No Operation assembly instruction NoSQL “Not only SQL” model for non-relational database management NSRL National Software
ERIC Educational Resources Information Center
Brown, Cecelia
2003-01-01
Discusses the growth in use and acceptance of Web-based genomic and proteomic databases (GPD) in scholarly communication. Confirms the role of GPD in the scientific literature cycle, suggests GPD are a storage and retrieval mechanism for molecular biology information, and recommends that existing models of scientific communication be updated to…
Family Support in Nursing Homes Serving Residents with a Mental Health History
ERIC Educational Resources Information Center
Frahm, Kathryn; Gammonley, Denise; Zhang, Ning Jackie; Paek, Seung Chun
2010-01-01
Using 2003 nursing home data from the Minimum Data Set (MDS) database, this study investigated the role of family support among nursing homes serving residents with a mental health history. Exploratory factor analysis was used to create and test a conceptual model of family support using indicators located within the MDS database. Families were…
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
NASA Astrophysics Data System (ADS)
Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia
2017-05-01
A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.
Pan, Leyun; Cheng, Caixia; Haberkorn, Uwe; Dimitrakopoulou-Strauss, Antonia
2017-05-07
A variety of compartment models are used for the quantitative analysis of dynamic positron emission tomography (PET) data. Traditionally, these models use an iterative fitting (IF) method to find the least squares between the measured and calculated values over time, which may encounter some problems such as the overfitting of model parameters and a lack of reproducibility, especially when handling noisy data or error data. In this paper, a machine learning (ML) based kinetic modeling method is introduced, which can fully utilize a historical reference database to build a moderate kinetic model directly dealing with noisy data but not trying to smooth the noise in the image. Also, due to the database, the presented method is capable of automatically adjusting the models using a multi-thread grid parameter searching technique. Furthermore, a candidate competition concept is proposed to combine the advantages of the ML and IF modeling methods, which could find a balance between fitting to historical data and to the unseen target curve. The machine learning based method provides a robust and reproducible solution that is user-independent for VOI-based and pixel-wise quantitative analysis of dynamic PET data.
MODELING LEACHING OF VIRUSES BY THE MONTE CARLO METHOD
A predictive screening model was developed for fate and transport
of viruses in the unsaturated zone. A database of input parameters
allowed Monte Carlo analysis with the model. The resulting kernel
densities of predicted attenuation during percolation indicated very ...
AIR QUALITY FORECAST DATABASE AND ANALYSIS
In 2003, NOAA and EPA signed a Memorandum of Agreement to collaborate on the design and implementation of a capability to produce daily air quality modeling forecast information for the U.S. NOAA's ETA meteorological model and EPA's Community Multiscale Air Quality (CMAQ) model ...
Hydroacoustic propagation grids for the CTBT knowledge databaes BBN technical memorandum W1303
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. Angell
1998-05-01
The Hydroacoustic Coverage Assessment Model (HydroCAM) has been used to develop components of the hydroacoustic knowledge database required by operational monitoring systems, particularly the US National Data Center (NDC). The database, which consists of travel time, amplitude correction and travel time standard deviation grids, is planned to support source location, discrimination and estimation functions of the monitoring network. The grids will also be used under the current BBN subcontract to support an analysis of the performance of the International Monitoring System (IMS) and national sensor systems. This report describes the format and contents of the hydroacoustic knowledgebase grids, and themore » procedures and model parameters used to generate these grids. Comparisons between the knowledge grids, measured data and other modeled results are presented to illustrate the strengths and weaknesses of the current approach. A recommended approach for augmenting the knowledge database with a database of expected spectral/waveform characteristics is provided in the final section of the report.« less
Modeling Real-Time Applications with Reusable Design Patterns
NASA Astrophysics Data System (ADS)
Rekhis, Saoussen; Bouassida, Nadia; Bouaziz, Rafik
Real-Time (RT) applications, which manipulate important volumes of data, need to be managed with RT databases that deal with time-constrained data and time-constrained transactions. In spite of their numerous advantages, RT databases development remains a complex task, since developers must study many design issues related to the RT domain. In this paper, we tackle this problem by proposing RT design patterns that allow the modeling of structural and behavioral aspects of RT databases. We show how RT design patterns can provide design assistance through architecture reuse of reoccurring design problems. In addition, we present an UML profile that represents patterns and facilitates further their reuse. This profile proposes, on one hand, UML extensions allowing to model the variability of patterns in the RT context and, on another hand, extensions inspired from the MARTE (Modeling and Analysis of Real-Time Embedded systems) profile.
Specifying the ISS Plasma Environment
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Diekmann, Anne; Neergaard, Linda; Bui, Them; Mikatarian, Ronald; Barsamian, Hagop; Koontz, Steven
2002-01-01
Quantifying the spacecraft charging risks and corresponding hazards for the International Space Station (ISS) requires a plasma environment specification describing the natural variability of ionospheric temperature (Te) and density (Ne). Empirical ionospheric specification and forecast models such as the International Reference Ionosphere (IN) model typically only provide estimates of long term (seasonal) mean Te and Ne values for the low Earth orbit environment. Knowledge of the Te and Ne variability as well as the likelihood of extreme deviations from the mean values are required to estimate both the magnitude and frequency of occurrence of potentially hazardous spacecraft charging environments for a given ISS construction stage and flight configuration. This paper describes the statistical analysis of historical ionospheric low Earth orbit plasma measurements used to estimate Ne, Te variability in the ISS flight environment. The statistical variability analysis of Ne and Te enables calculation of the expected frequency of occurrence of any particular values of Ne and Te, especially those that correspond to possibly hazardous spacecraft charging environments. The database used in the original analysis included measurements from the AE-C, AE-D, and DE-2 satellites. Recent work on the database has added additional satellites to the database and ground based incoherent scatter radar observations as well. Deviations of the data values from the IRI estimated Ne, Te parameters for each data point provide a statistical basis for modeling the deviations of the plasma environment from the IRI model output.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.
2001-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:11125038
Lacagnina, Valerio; Leto-Barone, Maria S; La Piana, Simona; Seidita, Aurelio; Pingitore, Giuseppe; Di Lorenzo, Gabriele
2014-01-01
This article uses the logistic regression model for diagnostic decision making in patients with chronic nasal symptoms. We studied the ability of the logistic regression model, obtained by the evaluation of a database, to detect patients with positive allergy skin-prick test (SPT) and patients with negative SPT. The model developed was validated using the data set obtained from another medical institution. The analysis was performed using a database obtained from a questionnaire administered to the patients with nasal symptoms containing personal data, clinical data, and results of allergy testing (SPT). All variables found to be significantly different between patients with positive and negative SPT (p < 0.05) were selected for the logistic regression models and were analyzed with backward stepwise logistic regression, evaluated with area under the curve of the receiver operating characteristic curve. A second set of patients from another institution was used to prove the model. The accuracy of the model in identifying, over the second set, both patients whose SPT will be positive and negative was high. The model detected 96% of patients with nasal symptoms and positive SPT and classified 94% of those with negative SPT. This study is preliminary to the creation of a software that could help the primary care doctors in a diagnostic decision making process (need of allergy testing) in patients complaining of chronic nasal symptoms.
[Development of an analyzing system for soil parameters based on NIR spectroscopy].
Zheng, Li-Hua; Li, Min-Zan; Sun, Hong
2009-10-01
A rapid estimation system for soil parameters based on spectral analysis was developed by using object-oriented (OO) technology. A class of SOIL was designed. The instance of the SOIL class is the object of the soil samples with the particular type, specific physical properties and spectral characteristics. Through extracting the effective information from the modeling spectral data of soil object, a map model was established between the soil parameters and its spectral data, while it was possible to save the mapping model parameters in the database of the model. When forecasting the content of any soil parameter, the corresponding prediction model of this parameter can be selected with the same soil type and the similar soil physical properties of objects. And after the object of target soil samples was carried into the prediction model and processed by the system, the accurate forecasting content of the target soil samples could be obtained. The system includes modules such as file operations, spectra pretreatment, sample analysis, calibrating and validating, and samples content forecasting. The system was designed to run out of equipment. The parameters and spectral data files (*.xls) of the known soil samples can be input into the system. Due to various data pretreatment being selected according to the concrete conditions, the results of predicting content will appear in the terminal and the forecasting model can be stored in the model database. The system reads the predicting models and their parameters are saved in the model database from the module interface, and then the data of the tested samples are transferred into the selected model. Finally the content of soil parameters can be predicted by the developed system. The system was programmed with Visual C++6.0 and Matlab 7.0. And the Access XP was used to create and manage the model database.
NASA Astrophysics Data System (ADS)
Löbling, L.
2017-03-01
Aluminum (Al) nucleosynthesis takes place during the asymptotic-giant-branch (AGB) phase of stellar evolution. Al abundance determinations in hot white dwarf stars provide constraints to understand this process. Precise abundance measurements require advanced non-local thermodynamic stellar-atmosphere models and reliable atomic data. In the framework of the German Astrophysical Virtual Observatory (GAVO), the Tübingen Model-Atom Database (TMAD) contains ready-to- use model atoms for elements from hydrogen to barium. A revised, elaborated Al model atom has recently been added. We present preliminary stellar-atmosphere models and emergent Al line spectra for the hot white dwarfs G191-B2B and RE 0503-289.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2006-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology
Wheeler, David L.; Church, Deanna M.; Federhen, Scott; Lash, Alex E.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Tatusova, Tatiana A.; Wagner, Lukas
2003-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:12519941
DOE Office of Scientific and Technical Information (OSTI.GOV)
W. Lynn Watney; John H. Doveton
GEMINI (Geo-Engineering Modeling through Internet Informatics) is a public-domain web application focused on analysis and modeling of petroleum reservoirs and plays (http://www.kgs.ukans.edu/Gemini/index.html). GEMINI creates a virtual project by ''on-the-fly'' assembly and analysis of on-line data either from the Kansas Geological Survey or uploaded from the user. GEMINI's suite of geological and engineering web applications for reservoir analysis include: (1) petrofacies-based core and log modeling using an interactive relational rock catalog and log analysis modules; (2) a well profile module; (3) interactive cross sections to display ''marked'' wireline logs; (4) deterministic gridding and mapping of petrophysical data; (5) calculation and mappingmore » of layer volumetrics; (6) material balance calculations; (7) PVT calculator; (8) DST analyst, (9) automated hydrocarbon association navigator (KHAN) for database mining, and (10) tutorial and help functions. The Kansas Hydrocarbon Association Navigator (KHAN) utilizes petrophysical databases to estimate hydrocarbon pay or other constituent at a play- or field-scale. Databases analyzed and displayed include digital logs, core analysis and photos, DST, and production data. GEMINI accommodates distant collaborations using secure password protection and authorized access. Assembled data, analyses, charts, and maps can readily be moved to other applications. GEMINI's target audience includes small independents and consultants seeking to find, quantitatively characterize, and develop subtle and bypassed pays by leveraging the growing base of digital data resources. Participating companies involved in the testing and evaluation of GEMINI included Anadarko, BP, Conoco-Phillips, Lario, Mull, Murfin, and Pioneer Resources.« less
A BRDF-BPDF database for the analysis of Earth target reflectances
NASA Astrophysics Data System (ADS)
Breon, Francois-Marie; Maignan, Fabienne
2017-01-01
Land surface reflectance is not isotropic. It varies with the observation geometry that is defined by the sun, view zenith angles, and the relative azimuth. In addition, the reflectance is linearly polarized. The reflectance anisotropy is quantified by the bidirectional reflectance distribution function (BRDF), while its polarization properties are defined by the bidirectional polarization distribution function (BPDF). The POLDER radiometer that flew onboard the PARASOL microsatellite remains the only space instrument that measured numerous samples of the BRDF and BPDF of Earth targets. Here, we describe a database of representative BRDFs and BPDFs derived from the POLDER measurements. From the huge number of data acquired by the spaceborne instrument over a period of 7 years, we selected a set of targets with high-quality observations. The selection aimed for a large number of observations, free of significant cloud or aerosol contamination, acquired in diverse observation geometries with a focus on the backscatter direction that shows the specific hot spot signature. The targets are sorted according to the 16-class International Geosphere-Biosphere Programme (IGBP) land cover classification system, and the target selection aims at a spatial representativeness within the class. The database thus provides a set of high-quality BRDF and BPDF samples that can be used to assess the typical variability of natural surface reflectances or to evaluate models. It is available freely from the PANGAEA website (doi:10.1594/PANGAEA.864090). In addition to the database, we provide a visualization and analysis tool based on the Interactive Data Language (IDL). It allows an interactive analysis of the measurements and a comparison against various BRDF and BPDF analytical models. The present paper describes the input data, the selection principles, the database format, and the analysis tool
Uddin, Md Jamal; Groenwold, Rolf H H; de Boer, Anthonius; Gardarsdottir, Helga; Martin, Elisa; Candore, Gianmario; Belitser, Svetlana V; Hoes, Arno W; Roes, Kit C B; Klungel, Olaf H
2016-03-01
Instrumental variable (IV) analysis can control for unmeasured confounding, yet it has not been widely used in pharmacoepidemiology. We aimed to assess the performance of IV analysis using different IVs in multiple databases in a study of antidepressant use and hip fracture. Information on adults with at least one prescription of a selective serotonin reuptake inhibitor (SSRI) or tricyclic antidepressant (TCA) during 2001-2009 was extracted from the THIN (UK), BIFAP (Spain), and Mondriaan (Netherlands) databases. IVs were created using the proportion of SSRI prescriptions per practice or using the one, five, or ten previous prescriptions by a physician. Data were analysed using conventional Cox regression and two-stage IV models. In the conventional analysis, SSRI (vs. TCA) was associated with an increased risk of hip fracture, which was consistently found across databases: the adjusted hazard ratio (HR) was approximately 1.35 for time-fixed and 1.50 to 2.49 for time-varying SSRI use, while the IV analysis based on the IVs that appeared to satisfy the IV assumptions showed conflicting results, e.g. the adjusted HRs ranged from 0.55 to 2.75 for time-fixed exposure. IVs for time-varying exposure violated at least one IV assumption and were therefore invalid. This multiple database study shows that the performance of IV analysis varied across the databases for time-fixed and time-varying exposures and strongly depends on the definition of IVs. It remains challenging to obtain valid IVs in pharmacoepidemiological studies, particularly for time-varying exposure, and IV analysis should therefore be interpreted cautiously. Copyright © 2016 John Wiley & Sons, Ltd.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2008-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790
Using an Integrated, Multi-disciplinary Framework to Support Quantitative Microbial Risk Assessments
The Framework for Risk Analysis in Multimedia Environmental Systems (FRAMES) provides the infrastructure to link disparate models and databases seamlessly, giving an assessor the ability to construct an appropriate conceptual site model from a host of modeling choices, so a numbe...
Viral genome analysis and knowledge management.
Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette
2013-01-01
One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.
Naithani, Sushma; Jaiswal, Pankaj
2017-01-01
The species-specific plant Pathway Genome Databases (PGDBs) based on the BioCyc platform provide a conceptual model of the cellular metabolic network of an organism. Such frameworks allow analysis of the genome-scale expression data to understand changes in the overall metabolisms of an organism (or organs, tissues, and cells) in response to various extrinsic (e.g. developmental and differentiation) and/or extrinsic signals (e.g. pathogens and abiotic stresses) from the surrounding environment. Using FragariaCyc, a pathway database for the diploid strawberry Fragaria vesca, we show (1) the basic navigation across a PGDB; (2) a case study of pathway comparison across plant species; and (3) an example of RNA-Seq data analysis using Omics Viewer tool. The protocols described here generally apply to other Pathway Tools-based PGDBs.
Modeling Constellation Virtual Missions Using the Vdot(Trademark) Process Management Tool
NASA Technical Reports Server (NTRS)
Hardy, Roger; ONeil, Daniel; Sturken, Ian; Nix, Michael; Yanez, Damian
2011-01-01
The authors have identified a software tool suite that will support NASA's Virtual Mission (VM) effort. This is accomplished by transforming a spreadsheet database of mission events, task inputs and outputs, timelines, and organizations into process visualization tools and a Vdot process management model that includes embedded analysis software as well as requirements and information related to data manipulation and transfer. This paper describes the progress to date, and the application of the Virtual Mission to not only Constellation but to other architectures, and the pertinence to other aerospace applications. Vdot s intuitive visual interface brings VMs to life by turning static, paper-based processes into active, electronic processes that can be deployed, executed, managed, verified, and continuously improved. A VM can be executed using a computer-based, human-in-the-loop, real-time format, under the direction and control of the NASA VM Manager. Engineers in the various disciplines will not have to be Vdot-proficient but rather can fill out on-line, Excel-type databases with the mission information discussed above. The author s tool suite converts this database into several process visualization tools for review and into Microsoft Project, which can be imported directly into Vdot. Many tools can be embedded directly into Vdot, and when the necessary data/information is received from a preceding task, the analysis can be initiated automatically. Other NASA analysis tools are too complex for this process but Vdot automatically notifies the tool user that the data has been received and analysis can begin. The VM can be simulated from end-to-end using the author s tool suite. The planned approach for the Vdot-based process simulation is to generate the process model from a database; other advantages of this semi-automated approach are the participants can be geographically remote and after refining the process models via the human-in-the-loop simulation, the system can evolve into a process management server for the actual process.
Anderson, Beth M.; Stevens, Michael C.; Glahn, David C.; Assaf, Michal; Pearlson, Godfrey D.
2013-01-01
We present a modular, high performance, open-source database system that incorporates popular neuroimaging database features with novel peer-to-peer sharing, and a simple installation. An increasing number of imaging centers have created a massive amount of neuroimaging data since fMRI became popular more than 20 years ago, with much of that data unshared. The Neuroinformatics Database (NiDB) provides a stable platform to store and manipulate neuroimaging data and addresses several of the impediments to data sharing presented by the INCF Task Force on Neuroimaging Datasharing, including 1) motivation to share data, 2) technical issues, and 3) standards development. NiDB solves these problems by 1) minimizing PHI use, providing a cost effective simple locally stored platform, 2) storing and associating all data (including genome) with a subject and creating a peer-to-peer sharing model, and 3) defining a sample, normalized definition of a data storage structure that is used in NiDB. NiDB not only simplifies the local storage and analysis of neuroimaging data, but also enables simple sharing of raw data and analysis methods, which may encourage further sharing. PMID:23912507
SPIRE Data-Base Management System
NASA Technical Reports Server (NTRS)
Fuechsel, C. F.
1984-01-01
Spacelab Payload Integration and Rocket Experiment (SPIRE) data-base management system (DBMS) based on relational model of data bases. Data bases typically used for engineering and mission analysis tasks and, unlike most commercially available systems, allow data items and data structures stored in forms suitable for direct analytical computation. SPIRE DBMS designed to support data requests from interactive users as well as applications programs.
Timothy G.F. Kittel; Nan. A. Rosenbloom; J.A. Royle; C. Daly; W.P. Gibson; H.H. Fisher; P. Thornton; D.N. Yates; S. Aulenbach; C. Kaufman; R. McKeown; Dominque Bachelet; David S. Schimel
2004-01-01
Analysis and simulation of biospheric responses to historical forcing require surface climate data that capture those aspects of climate that control ecological processes, including key spatial gradients and modes of temporal variability. We developed a multivariate, gridded historical climate dataset for the conterminous USA as a common input database for the...
NASA Astrophysics Data System (ADS)
Michel-Sendis, Franco; Martinez-González, Jesus; Gauld, Ian
2017-09-01
SFCOMPO-2.0 is a database of experimental isotopic concentrations measured in destructive radiochemical analysis of spent nuclear fuel (SNF) samples. The database includes corresponding design description of the fuel rods and assemblies, relevant operating conditions and characteristics of the host reactors necessary for modelling and simulation. Aimed at establishing a thorough, reliable, and publicly available resource for code and data validation of safety-related applications, SFCOMPO-2.0 is developed and maintained by the OECD Nuclear Energy Agency (NEA). The SFCOMPO-2.0 database is a Java application which is downloadable from the NEA website.
Multidimensional Learner Model In Intelligent Learning System
NASA Astrophysics Data System (ADS)
Deliyska, B.; Rozeva, A.
2009-11-01
The learner model in an intelligent learning system (ILS) has to ensure the personalization (individualization) and the adaptability of e-learning in an online learner-centered environment. ILS is a distributed e-learning system whose modules can be independent and located in different nodes (servers) on the Web. This kind of e-learning is achieved through the resources of the Semantic Web and is designed and developed around a course, group of courses or specialty. An essential part of ILS is learner model database which contains structured data about learner profile and temporal status in the learning process of one or more courses. In the paper a learner model position in ILS is considered and a relational database is designed from learner's domain ontology. Multidimensional modeling agent for the source database is designed and resultant learner data cube is presented. Agent's modules are proposed with corresponding algorithms and procedures. Multidimensional (OLAP) analysis guidelines on the resultant learner module for designing dynamic learning strategy have been highlighted.
The aquatic animals' transcriptome resource for comparative functional analysis.
Chou, Chih-Hung; Huang, Hsi-Yuan; Huang, Wei-Chih; Hsu, Sheng-Da; Hsiao, Chung-Der; Liu, Chia-Yu; Chen, Yu-Hung; Liu, Yu-Chen; Huang, Wei-Yun; Lee, Meng-Lin; Chen, Yi-Chang; Huang, Hsien-Da
2018-05-09
Aquatic animals have great economic and ecological importance. Among them, non-model organisms have been studied regarding eco-toxicity, stress biology, and environmental adaptation. Due to recent advances in next-generation sequencing techniques, large amounts of RNA-seq data for aquatic animals are publicly available. However, currently there is no comprehensive resource exist for the analysis, unification, and integration of these datasets. This study utilizes computational approaches to build a new resource of transcriptomic maps for aquatic animals. This aquatic animal transcriptome map database dbATM provides de novo assembly of transcriptome, gene annotation and comparative analysis of more than twenty aquatic organisms without draft genome. To improve the assembly quality, three computational tools (Trinity, Oases and SOAPdenovo-Trans) were employed to enhance individual transcriptome assembly, and CAP3 and CD-HIT-EST software were then used to merge these three assembled transcriptomes. In addition, functional annotation analysis provides valuable clues to gene characteristics, including full-length transcript coding regions, conserved domains, gene ontology and KEGG pathways. Furthermore, all aquatic animal genes are essential for comparative genomics tasks such as constructing homologous gene groups and blast databases and phylogenetic analysis. In conclusion, we establish a resource for non model organism aquatic animals, which is great economic and ecological importance and provide transcriptomic information including functional annotation and comparative transcriptome analysis. The database is now publically accessible through the URL http://dbATM.mbc.nctu.edu.tw/ .
MetPetDB: A database for metamorphic geochemistry
NASA Astrophysics Data System (ADS)
Spear, Frank S.; Hallett, Benjamin; Pyle, Joseph M.; Adalı, Sibel; Szymanski, Boleslaw K.; Waters, Anthony; Linder, Zak; Pearce, Shawn O.; Fyffe, Matthew; Goldfarb, Dennis; Glickenhouse, Nickolas; Buletti, Heather
2009-12-01
We present a data model for the initial implementation of MetPetDB, a geochemical database specific to metamorphic rock samples. The database is designed around the concept of preservation of spatial relationships, at all scales, of chemical analyses and their textural setting. Objects in the database (samples) represent physical rock samples; each sample may contain one or more subsamples with associated geochemical and image data. Samples, subsamples, geochemical data, and images are described with attributes (some required, some optional); these attributes also serve as search delimiters. All data in the database are classified as published (i.e., archived or published data), public or private. Public and published data may be freely searched and downloaded. All private data is owned; permission to view, edit, download and otherwise manipulate private data may be granted only by the data owner; all such editing operations are recorded by the database to create a data version log. The sharing of data permissions among a group of collaborators researching a common sample is done by the sample owner through the project manager. User interaction with MetPetDB is hosted by a web-based platform based upon the Java servlet application programming interface, with the PostgreSQL relational database. The database web portal includes modules that allow the user to interact with the database: registered users may save and download public and published data, upload private data, create projects, and assign permission levels to project collaborators. An Image Viewer module provides for spatial integration of image and geochemical data. A toolkit consisting of plotting and geochemical calculation software for data analysis and a mobile application for viewing the public and published data is being developed. Future issues to address include population of the database, integration with other geochemical databases, development of the analysis toolkit, creation of data models for derivative data, and building a community-wide user base. It is believed that this and other geochemical databases will enable more productive collaborations, generate more efficient research efforts, and foster new developments in basic research in the field of solid earth geochemistry.
Efficient hemodynamic event detection utilizing relational databases and wavelet analysis
NASA Technical Reports Server (NTRS)
Saeed, M.; Mark, R. G.
2001-01-01
Development of a temporal query framework for time-oriented medical databases has hitherto been a challenging problem. We describe a novel method for the detection of hemodynamic events in multiparameter trends utilizing wavelet coefficients in a MySQL relational database. Storage of the wavelet coefficients allowed for a compact representation of the trends, and provided robust descriptors for the dynamics of the parameter time series. A data model was developed to allow for simplified queries along several dimensions and time scales. Of particular importance, the data model and wavelet framework allowed for queries to be processed with minimal table-join operations. A web-based search engine was developed to allow for user-defined queries. Typical queries required between 0.01 and 0.02 seconds, with at least two orders of magnitude improvement in speed over conventional queries. This powerful and innovative structure will facilitate research on large-scale time-oriented medical databases.
Assessing efficiency of software production for NASA-SEL data
NASA Technical Reports Server (NTRS)
Vonmayrhauser, Anneliese; Roeseler, Armin
1993-01-01
This paper uses production models to identify and quantify efficient allocation of resources and key drivers of software productivity for project data in the NASA-SEL database. While analysis allows identification of efficient projects, many of the metrics that could have provided a more detailed analysis are not at a level of measurement to allow production model analysis. Production models must be used with proper parameterization to be successful. This may mean a new look at which metrics are helpful for efficiency assessment.
Texas flexible pavements overlays : review and analysis of existing databases.
DOT National Transportation Integrated Search
2011-12-01
Proper calibration of pavement design and rehabilitation performance models to : conditions in Texas is essential for cost-effective flexible pavement design. The degree of : excellence with which TxDOTs pavement design models is calibrated will d...
An object model and database for functional genomics.
Jones, Andrew; Hunt, Ela; Wastling, Jonathan M; Pizarro, Angel; Stoeckert, Christian J
2004-07-10
Large-scale functional genomics analysis is now feasible and presents significant challenges in data analysis, storage and querying. Data standards are required to enable the development of public data repositories and to improve data sharing. There is an established data format for microarrays (microarray gene expression markup language, MAGE-ML) and a draft standard for proteomics (PEDRo). We believe that all types of functional genomics experiments should be annotated in a consistent manner, and we hope to open up new ways of comparing multiple datasets used in functional genomics. We have created a functional genomics experiment object model (FGE-OM), developed from the microarray model, MAGE-OM and two models for proteomics, PEDRo and our own model (Gla-PSI-Glasgow Proposal for the Proteomics Standards Initiative). FGE-OM comprises three namespaces representing (i) the parts of the model common to all functional genomics experiments; (ii) microarray-specific components; and (iii) proteomics-specific components. We believe that FGE-OM should initiate discussion about the contents and structure of the next version of MAGE and the future of proteomics standards. A prototype database called RNA And Protein Abundance Database (RAPAD), based on FGE-OM, has been implemented and populated with data from microbial pathogenesis. FGE-OM and the RAPAD schema are available from http://www.gusdb.org/fge.html, along with a set of more detailed diagrams. RAPAD can be accessed by registration at the site.
Stochastic Model for the Vocabulary Growth in Natural Languages
NASA Astrophysics Data System (ADS)
Gerlach, Martin; Altmann, Eduardo G.
2013-04-01
We propose a stochastic model for the number of different words in a given database which incorporates the dependence on the database size and historical changes. The main feature of our model is the existence of two different classes of words: (i) a finite number of core words, which have higher frequency and do not affect the probability of a new word to be used, and (ii) the remaining virtually infinite number of noncore words, which have lower frequency and, once used, reduce the probability of a new word to be used in the future. Our model relies on a careful analysis of the Google Ngram database of books published in the last centuries, and its main consequence is the generalization of Zipf’s and Heaps’ law to two-scaling regimes. We confirm that these generalizations yield the best simple description of the data among generic descriptive models and that the two free parameters depend only on the language but not on the database. From the point of view of our model, the main change on historical time scales is the composition of the specific words included in the finite list of core words, which we observe to decay exponentially in time with a rate of approximately 30 words per year for English.
Specification of the ISS Plasma Environment Variability
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Neergaard, Linda F.; Bui, Them H.; Mikatarian, Ronald R.; Barsamian, H.; Koontz, Steven L.
2002-01-01
Quantifying the spacecraft charging risks and corresponding hazards for the International Space Station (ISS) requires a plasma environment specification describing the natural variability of ionospheric temperature (Te) and density (Ne). Empirical ionospheric specification and forecast models such as the International Reference Ionosphere (IRI) model typically only provide estimates of long term (seasonal) mean Te and Ne values for the low Earth orbit environment. Knowledge of the Te and Ne variability as well as the likelihood of extreme deviations from the mean values are required to estimate both the magnitude and frequency of occurrence of potentially hazardous spacecraft charging environments for a given ISS construction stage and flight configuration. This paper describes the statistical analysis of historical ionospheric low Earth orbit plasma measurements used to estimate Ne, Te variability in the ISS flight environment. The statistical variability analysis of Ne and Te enables calculation of the expected frequency of Occurrence of any particular values of Ne and Te, especially those that correspond to possibly hazardous spacecraft charging environments. The database used in the original analysis included measurements from the AE-C, AE-D, and DE-2 satellites. Recent work on the database has added additional satellites to the database and ground based incoherent scatter radar observations as well. Deviations of the data values from the IRI estimated Ne, Te parameters for each data point provide a statistical basis for modeling the deviations of the plasma environment from the IRI model output. This technique, while developed specifically for the Space Station analysis, can also be generalized to provide ionospheric plasma environment risk specification models for low Earth orbit over an altitude range of 200 km through approximately 1000 km.
Thirty-seven species identified in the Clark County Multi-Species Habitat Conservation Plan were
previously modeled through the Southwest Regional Gap Analysis Project. Existing SWReGAP habitat
models and modeling databases were used to facilitate the revision of mo...
Analysis of Landslide Hazard Impact Using the Landslide Database for Germany
NASA Astrophysics Data System (ADS)
Klose, M.; Damm, B.
2014-12-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still shows a comprehensive research history in Germany, but only one focused on development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present contribution reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this contribution, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of different case studies in the German Central Uplands. The case study results exemplify database application in analysis of vulnerability to landslides, impact statistics, and hazard or cost modeling.
Fleeman, N; McLeod, C; Bagust, A; Beale, S; Boland, A; Dundar, Y; Jorgensen, A; Payne, K; Pirmohamed, M; Pushpakom, S; Walley, T; de Warren-Penny, P; Dickson, R
2010-01-01
To determine whether testing for cytochrome P450 (CYP) polymorphisms in adults entering antipsychotic treatment for schizophrenia leads to improvement in outcomes, is useful in medical, personal or public health decision-making, and is a cost-effective use of health-care resources. The following electronic databases were searched for relevant published literature: Cochrane Controlled Trials Register, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effectiveness, EMBASE, Health Technology Assessment database, ISI Web of Knowledge, MEDLINE, PsycINFO, NHS Economic Evaluation Database, Health Economic Evaluation Database, Cost-effectiveness Analysis (CEA) Registry and the Centre for Health Economics website. In addition, publicly available information on various genotyping tests was sought from the internet and advisory panel members. A systematic review of analytical validity, clinical validity and clinical utility of CYP testing was undertaken. Data were extracted into structured tables and narratively discussed, and meta-analysis was undertaken when possible. A review of economic evaluations of CYP testing in psychiatry and a review of economic models related to schizophrenia were also carried out. For analytical validity, 46 studies of a range of different genotyping tests for 11 different CYP polymorphisms (most commonly CYP2D6) were included. Sensitivity and specificity were high (99-100%). For clinical validity, 51 studies were found. In patients tested for CYP2D6, an association between genotype and tardive dyskinesia (including Abnormal Involuntary Movement Scale scores) was found. The only other significant finding linked the CYP2D6 genotype to parkinsonism. One small unpublished study met the inclusion criteria for clinical utility. One economic evaluation assessing the costs and benefits of CYP testing for prescribing antidepressants and 28 economic models of schizophrenia were identified; none was suitable for developing a model to examine the cost-effectiveness of CYP testing. Tests for determining genotypes appear to be accurate although not all aspects of analytical validity were reported. Given the absence of convincing evidence from clinical validity studies, the lack of clinical utility and economic studies, and the unsuitability of published schizophrenia models, no model was developed; instead key features and data requirements for economic modelling are presented. Recommendations for future research cover both aspects of research quality and data that will be required to inform the development of future economic models.
Aerodynamic Tests of the Space Launch System for Database Development
NASA Technical Reports Server (NTRS)
Pritchett, Victor E.; Mayle, Melody N.; Blevins, John A.; Crosby, William A.; Purinton, David C.
2014-01-01
The Aerosciences Branch (EV33) at the George C. Marshall Space Flight Center (MSFC) has been responsible for a series of wind tunnel tests on the National Aeronautics and Space Administration's (NASA) Space Launch System (SLS) vehicles. The primary purpose of these tests was to obtain aerodynamic data during the ascent phase and establish databases that can be used by the Guidance, Navigation, and Mission Analysis Branch (EV42) for trajectory simulations. The paper describes the test particulars regarding models and measurements and the facilities used, as well as database preparations.
Well-to-refinery emissions and net-energy analysis of China's crude-oil supply
NASA Astrophysics Data System (ADS)
Masnadi, Mohammad S.; El-Houjeiri, Hassan M.; Schunack, Dominik; Li, Yunpo; Roberts, Samori O.; Przesmitzki, Steven; Brandt, Adam R.; Wang, Michael
2018-03-01
Oil is China's second-largest energy source, so it is essential to understand the country's greenhouse gas emissions from crude-oil production. Chinese crude supply is sourced from numerous major global petroleum producers. Here, we use a per-barrel well-to-refinery life-cycle analysis model with data derived from hundreds of public and commercial sources to model the Chinese crude mix and the upstream carbon intensities and energetic productivity of China's crude supply. We generate a carbon-denominated supply curve representing Chinese crude-oil supply from 146 oilfields in 20 countries. The selected fields are estimated to emit between 1.5 and 46.9 g CO2eq MJ-1 of oil, with volume-weighted average emissions of 8.4 g CO2eq MJ-1. These estimates are higher than some existing databases, illustrating the importance of bottom-up models to support life-cycle analysis databases. This study provides quantitative insight into China's energy policy and the economic and environmental implications of China's oil consumption.
QSAR modeling and chemical space analysis of antimalarial compounds
NASA Astrophysics Data System (ADS)
Sidorov, Pavel; Viira, Birgit; Davioud-Charvet, Elisabeth; Maran, Uko; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2017-05-01
Generative topographic mapping (GTM) has been used to visualize and analyze the chemical space of antimalarial compounds as well as to build predictive models linking structure of molecules with their antimalarial activity. For this, a database, including 3000 molecules tested in one or several of 17 anti- Plasmodium activity assessment protocols, has been compiled by assembling experimental data from in-house and ChEMBL databases. GTM classification models built on subsets corresponding to individual bioassays perform similarly to the earlier reported SVM models. Zones preferentially populated by active and inactive molecules, respectively, clearly emerge in the class landscapes supported by the GTM model. Their analysis resulted in identification of privileged structural motifs of potential antimalarial compounds. Projection of marketed antimalarial drugs on this map allowed us to delineate several areas in the chemical space corresponding to different mechanisms of antimalarial activity. This helped us to make a suggestion about the mode of action of the molecules populating these zones.
QSAR modeling and chemical space analysis of antimalarial compounds.
Sidorov, Pavel; Viira, Birgit; Davioud-Charvet, Elisabeth; Maran, Uko; Marcou, Gilles; Horvath, Dragos; Varnek, Alexandre
2017-05-01
Generative topographic mapping (GTM) has been used to visualize and analyze the chemical space of antimalarial compounds as well as to build predictive models linking structure of molecules with their antimalarial activity. For this, a database, including ~3000 molecules tested in one or several of 17 anti-Plasmodium activity assessment protocols, has been compiled by assembling experimental data from in-house and ChEMBL databases. GTM classification models built on subsets corresponding to individual bioassays perform similarly to the earlier reported SVM models. Zones preferentially populated by active and inactive molecules, respectively, clearly emerge in the class landscapes supported by the GTM model. Their analysis resulted in identification of privileged structural motifs of potential antimalarial compounds. Projection of marketed antimalarial drugs on this map allowed us to delineate several areas in the chemical space corresponding to different mechanisms of antimalarial activity. This helped us to make a suggestion about the mode of action of the molecules populating these zones.
Analysis of DIRAC's behavior using model checking with process algebra
NASA Astrophysics Data System (ADS)
Remenska, Daniela; Templon, Jeff; Willemse, Tim; Bal, Henri; Verstoep, Kees; Fokkink, Wan; Charpentier, Philippe; Graciani Diaz, Ricardo; Lanciotti, Elisa; Roiser, Stefan; Ciba, Krzysztof
2012-12-01
DIRAC is the grid solution developed to support LHCb production activities as well as user data analysis. It consists of distributed services and agents delivering the workload to the grid resources. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, staging requests, etc. Agents use polling to check and possibly react to changes in the system state. Each agent's logic is relatively simple; the main complexity lies in their cooperation. Agents run concurrently, and collaborate using the databases as shared memory. The databases can be accessed directly by the agents if running locally or through a DIRAC service interface if necessary. This shared-memory model causes entities to occasionally get into inconsistent states. Tracing and fixing such problems becomes formidable due to the inherent parallelism present. We propose more rigorous methods to cope with this. Model checking is one such technique for analysis of an abstract model of a system. Unlike conventional testing, it allows full control over the parallel processes execution, and supports exhaustive state-space exploration. We used the mCRL2 language and toolset to model the behavior of two related DIRAC subsystems: the workload and storage management system. Based on process algebra, mCRL2 allows defining custom data types as well as functions over these. This makes it suitable for modeling the data manipulations made by DIRAC's agents. By visualizing the state space and replaying scenarios with the toolkit's simulator, we have detected race-conditions and deadlocks in these systems, which, in several cases, were confirmed to occur in the reality. Several properties of interest were formulated and verified with the tool. Our future direction is automating the translation from DIRAC to a formal model.
CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database
Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.
2017-01-01
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705
NASA Astrophysics Data System (ADS)
Bartolini, S.; Becerril, L.; Martí, J.
2014-11-01
One of the most important issues in modern volcanology is the assessment of volcanic risk, which will depend - among other factors - on both the quantity and quality of the available data and an optimum storage mechanism. This will require the design of purpose-built databases that take into account data format and availability and afford easy data storage and sharing, and will provide for a more complete risk assessment that combines different analyses but avoids any duplication of information. Data contained in any such database should facilitate spatial and temporal analysis that will (1) produce probabilistic hazard models for future vent opening, (2) simulate volcanic hazards and (3) assess their socio-economic impact. We describe the design of a new spatial database structure, VERDI (Volcanic managEment Risk Database desIgn), which allows different types of data, including geological, volcanological, meteorological, monitoring and socio-economic information, to be manipulated, organized and managed. The root of the question is to ensure that VERDI will serve as a tool for connecting different kinds of data sources, GIS platforms and modeling applications. We present an overview of the database design, its components and the attributes that play an important role in the database model. The potential of the VERDI structure and the possibilities it offers in regard to data organization are here shown through its application on El Hierro (Canary Islands). The VERDI database will provide scientists and decision makers with a useful tool that will assist to conduct volcanic risk assessment and management.
BNDB - the Biochemical Network Database.
Küntzer, Jan; Backes, Christina; Blum, Torsten; Gerasch, Andreas; Kaufmann, Michael; Kohlbacher, Oliver; Lenhof, Hans-Peter
2007-10-02
Technological advances in high-throughput techniques and efficient data acquisition methods have resulted in a massive amount of life science data. The data is stored in numerous databases that have been established over the last decades and are essential resources for scientists nowadays. However, the diversity of the databases and the underlying data models make it difficult to combine this information for solving complex problems in systems biology. Currently, researchers typically have to browse several, often highly focused, databases to obtain the required information. Hence, there is a pressing need for more efficient systems for integrating, analyzing, and interpreting these data. The standardization and virtual consolidation of the databases is a major challenge resulting in a unified access to a variety of data sources. We present the Biochemical Network Database (BNDB), a powerful relational database platform, allowing a complete semantic integration of an extensive collection of external databases. BNDB is built upon a comprehensive and extensible object model called BioCore, which is powerful enough to model most known biochemical processes and at the same time easily extensible to be adapted to new biological concepts. Besides a web interface for the search and curation of the data, a Java-based viewer (BiNA) provides a powerful platform-independent visualization and navigation of the data. BiNA uses sophisticated graph layout algorithms for an interactive visualization and navigation of BNDB. BNDB allows a simple, unified access to a variety of external data sources. Its tight integration with the biochemical network library BN++ offers the possibility for import, integration, analysis, and visualization of the data. BNDB is freely accessible at http://www.bndb.org.
The construction and assessment of a statistical model for the prediction of protein assay data.
Pittman, J; Sacks, J; Young, S Stanley
2002-01-01
The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.
Senachak, Jittisak; Cheevadhanarak, Supapon; Hongsthong, Apiradee
2015-07-29
Spirulina (Arthrospira) platensis is the only cyanobacterium that in addition to being studied at the molecular level and subjected to gene manipulation, can also be mass cultivated in outdoor ponds for commercial use as a food supplement. Thus, encountering environmental changes, including temperature stresses, is common during the mass production of Spirulina. The use of cyanobacteria as an experimental platform, especially for photosynthetic gene manipulation in plants and bacteria, is becoming increasingly important. Understanding the mechanisms and protein-protein interaction networks that underlie low- and high-temperature responses is relevant to Spirulina mass production. To accomplish this goal, high-throughput techniques such as OMICs analyses are used. Thus, large datasets must be collected, managed and subjected to information extraction. Therefore, databases including (i) proteomic analysis and protein-protein interaction (PPI) data and (ii) domain/motif visualization tools are required for potential use in temperature response models for plant chloroplasts and photosynthetic bacteria. A web-based repository was developed including an embedded database, SpirPro, and tools for network visualization. Proteome data were analyzed integrated with protein-protein interactions and/or metabolic pathways from KEGG. The repository provides various information, ranging from raw data (2D-gel images) to associated results, such as data from interaction and/or pathway analyses. This integration allows in silico analyses of protein-protein interactions affected at the metabolic level and, particularly, analyses of interactions between and within the affected metabolic pathways under temperature stresses for comparative proteomic analysis. The developed tool, which is coded in HTML with CSS/JavaScript and depicted in Scalable Vector Graphics (SVG), is designed for interactive analysis and exploration of the constructed network. SpirPro is publicly available on the web at http://spirpro.sbi.kmutt.ac.th . SpirPro is an analysis platform containing an integrated proteome and PPI database that provides the most comprehensive data on this cyanobacterium at the systematic level. As an integrated database, SpirPro can be applied in various analyses, such as temperature stress response networking analysis in cyanobacterial models and interacting domain-domain analysis between proteins of interest.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions.
Karr, Jonathan R; Phillips, Nolan C; Covert, Markus W
2014-01-01
Mechanistic 'whole-cell' models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. http://www.wholecellsimdb.org SOURCE CODE REPOSITORY: URL: http://github.com/CovertLab/WholeCellSimDB. © The Author(s) 2014. Published by Oxford University Press.
Yeung, Daniel; Boes, Peter; Ho, Meng Wei; Li, Zuofeng
2015-05-08
Image-guided radiotherapy (IGRT), based on radiopaque markers placed in the prostate gland, was used for proton therapy of prostate patients. Orthogonal X-rays and the IBA Digital Image Positioning System (DIPS) were used for setup correction prior to treatment and were repeated after treatment delivery. Following a rationale for margin estimates similar to that of van Herk,(1) the daily post-treatment DIPS data were analyzed to determine if an adaptive radiotherapy plan was necessary. A Web application using ASP.NET MVC5, Entity Framework, and an SQL database was designed to automate this process. The designed features included state-of-the-art Web technologies, a domain model closely matching the workflow, a database-supporting concurrency and data mining, access to the DIPS database, secured user access and roles management, and graphing and analysis tools. The Model-View-Controller (MVC) paradigm allowed clean domain logic, unit testing, and extensibility. Client-side technologies, such as jQuery, jQuery Plug-ins, and Ajax, were adopted to achieve a rich user environment and fast response. Data models included patients, staff, treatment fields and records, correction vectors, DIPS images, and association logics. Data entry, analysis, workflow logics, and notifications were implemented. The system effectively modeled the clinical workflow and IGRT process.
Wauchope, R Don; Ahuja, Lajpat R; Arnold, Jeffrey G; Bingner, Ron; Lowrance, Richard; van Genuchten, Martinus T; Adams, Larry D
2003-01-01
We present an overview of USDA Agricultural Research Service (ARS) computer models and databases related to pest-management science, emphasizing current developments in environmental risk assessment and management simulation models. The ARS has a unique national interdisciplinary team of researchers in surface and sub-surface hydrology, soil and plant science, systems analysis and pesticide science, who have networked to develop empirical and mechanistic computer models describing the behavior of pests, pest responses to controls and the environmental impact of pest-control methods. Historically, much of this work has been in support of production agriculture and in support of the conservation programs of our 'action agency' sister, the Natural Resources Conservation Service (formerly the Soil Conservation Service). Because we are a public agency, our software/database products are generally offered without cost, unless they are developed in cooperation with a private-sector cooperator. Because ARS is a basic and applied research organization, with development of new science as our highest priority, these products tend to be offered on an 'as-is' basis with limited user support except for cooperating R&D relationship with other scientists. However, rapid changes in the technology for information analysis and communication continually challenge our way of doing business.
2011-09-01
rate Python’s maturity as “High.” Python is nine years old and has been continuously been developed and enhanced since then. During fiscal year 2010...We rate Python’s developer toolkit availability/extensibility as “Yes.” Python runs on a SQL database and is 64 compatible with Oracle database...MODEL...........................................................................11 D. GOAL DEVELOPMENT
Stochastic Modeling and Analysis of Energy Commodity Spot Price Processes
2014-06-27
Eurocurrency data set [129] collected from Forex database . Table 10: Estimates m̂k, βm̂k,k, µm̂k,k, δm̂k,k, σm̂k,k, γm̂k,k for U. S. Treasury Bill...U. S. Eurocurrency rateUS dollar Eurocurrency data set January 1990-December 2004, Forex Database . [130] U. S. Energy Information Administration
Claims-based risk model for first severe COPD exacerbation.
Stanford, Richard H; Nag, Arpita; Mapel, Douglas W; Lee, Todd A; Rosiello, Richard; Schatz, Michael; Vekeman, Francis; Gauthier-Loiselle, Marjolaine; Merrigan, J F Philip; Duh, Mei Sheng
2018-02-01
To develop and validate a predictive model for first severe chronic obstructive pulmonary disease (COPD) exacerbation using health insurance claims data and to validate the risk measure of controller medication to total COPD treatment (controller and rescue) ratio (CTR). A predictive model was developed and validated in 2 managed care databases: Truven Health MarketScan database and Reliant Medical Group database. This secondary analysis assessed risk factors, including CTR, during the baseline period (Year 1) to predict risk of severe exacerbation in the at-risk period (Year 2). Patients with COPD who were 40 years or older and who had at least 1 COPD medication dispensed during the year following COPD diagnosis were included. Subjects with severe exacerbations in the baseline year were excluded. Risk factors in the baseline period were included as potential predictors in multivariate analysis. Performance was evaluated using C-statistics. The analysis included 223,824 patients. The greatest risk factors for first severe exacerbation were advanced age, chronic oxygen therapy usage, COPD diagnosis type, dispensing of 4 or more canisters of rescue medication, and having 2 or more moderate exacerbations. A CTR of 0.3 or greater was associated with a 14% lower risk of severe exacerbation. The model performed well with C-statistics, ranging from 0.711 to 0.714. This claims-based risk model can predict the likelihood of first severe COPD exacerbation. The CTR could also potentially be used to target populations at greatest risk for severe exacerbations. This could be relevant for providers and payers in approaches to prevent severe exacerbations and reduce costs.
[A basic research to share Fourier transform near-infrared spectrum information resource].
Zhang, Lu-Da; Li, Jun-Hui; Zhao, Long-Lian; Zhao, Li-Li; Qin, Fang-Li; Yan, Yan-Lu
2004-08-01
A method to share the information resource in the database of Fourier transform near-infrared(FTNIR) spectrum information of agricultural products and utilize the spectrum information sufficiently is explored in this paper. Mapping spectrum information from one instrument to another is studied to express the spectrum information accurately between the instruments. Then mapping spectrum information is used to establish a mathematical model of quantitative analysis without including standard samples. The analysis result is that the relative coefficient r is 0.941 and the relative error is 3.28% between the model estimate values and the Kjeldahl's value for the protein content of twenty-two wheat samples, while the relative coefficient r is 0.963 and the relative error is 2.4% for the other model, which is established by using standard samples. It is shown that the spectrum information can be shared by using the mapping spectrum information. So it can be concluded that the spectrum information in one FTNIR spectrum information database can be transformed to another instrument's mapping spectrum information, which makes full use of the information resource in the database of FTNIR spectrum information to realize the resource sharing between different instruments.
Web application and database modeling of traffic impact analysis using Google Maps
NASA Astrophysics Data System (ADS)
Yulianto, Budi; Setiono
2017-06-01
Traffic impact analysis (TIA) is a traffic study that aims at identifying the impact of traffic generated by development or change in land use. In addition to identifying the traffic impact, TIA is also equipped with mitigation measurement to minimize the arising traffic impact. TIA has been increasingly important since it was defined in the act as one of the requirements in the proposal of Building Permit. The act encourages a number of TIA studies in various cities in Indonesia, including Surakarta. For that reason, it is necessary to study the development of TIA by adopting the concept Transportation Impact Control (TIC) in the implementation of the TIA standard document and multimodal modeling. It includes TIA's standardization for technical guidelines, database and inspection by providing TIA checklists, monitoring and evaluation. The research was undertaken by collecting the historical data of junctions, modeling of the data in the form of relational database, building a user interface for CRUD (Create, Read, Update and Delete) the TIA data in the form of web programming with Google Maps libraries. The result research is a system that provides information that helps the improvement and repairment of TIA documents that exist today which is more transparent, reliable and credible.
The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses.
Kuiken, Carla; Thurmond, Jim; Dimitrijevic, Mira; Yoon, Hyejin
2012-01-01
Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55,000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52-61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38-D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov.
Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex
2015-09-01
As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions.
Moskalev, Alexey; Chernyagina, Elizaveta; de Magalhães, João Pedro; Barardo, Diogo; Thoppil, Harikrishnan; Shaposhnikov, Mikhail; Budovsky, Arie; Fraifeld, Vadim E.; Garazha, Andrew; Tsvetkov, Vasily; Bronovitsky, Evgeny; Bogomolov, Vladislav; Scerbacov, Alexei; Kuryan, Oleg; Gurinovich, Roman; Jellen, Leslie C.; Kennedy, Brian; Mamoshina, Polina; Dobrovolskaya, Evgeniya; Aliper, Alex; Kaminsky, Dmitry; Zhavoronkov, Alex
2015-01-01
As the level of interest in aging research increases, there is a growing number of geroprotectors, or therapeutic interventions that aim to extend the healthy lifespan and repair or reduce aging-related damage in model organisms and, eventually, in humans. There is a clear need for a manually-curated database of geroprotectors to compile and index their effects on aging and age-related diseases and link these effects to relevant studies and multiple biochemical and drug databases. Here, we introduce the first such resource, Geroprotectors (http://geroprotectors.org). Geroprotectors is a public, rapidly explorable database that catalogs over 250 experiments involving over 200 known or candidate geroprotectors that extend lifespan in model organisms. Each compound has a comprehensive profile complete with biochemistry, mechanisms, and lifespan effects in various model organisms, along with information ranging from chemical structure, side effects, and toxicity to FDA drug status. These are presented in a visually intuitive, efficient framework fit for casual browsing or in-depth research alike. Data are linked to the source studies or databases, providing quick and convenient access to original data. The Geroprotectors database facilitates cross-study, cross-organism, and cross-discipline analysis and saves countless hours of inefficient literature and web searching. Geroprotectors is a one-stop, knowledge-sharing, time-saving resource for researchers seeking healthy aging solutions. PMID:26342919
Database resources of the National Center for Biotechnology Information
Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104
Database resources of the National Center for Biotechnology Information
2013-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264
Database resources of the National Center for Biotechnology Information.
Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Ostell, James; Miller, Vadim; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene
2007-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian
2009-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
BiGG Models: A platform for integrating, standardizing and sharing genome-scale models
King, Zachary A.; Lu, Justin; Drager, Andreas; ...
2015-10-17
In this study, genome-scale metabolic models are mathematically structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scalemore » metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data.« less
BiGG Models: A platform for integrating, standardizing and sharing genome-scale models
King, Zachary A.; Lu, Justin; Dräger, Andreas; Miller, Philip; Federowicz, Stephen; Lerman, Joshua A.; Ebrahim, Ali; Palsson, Bernhard O.; Lewis, Nathan E.
2016-01-01
Genome-scale metabolic models are mathematically-structured knowledge bases that can be used to predict metabolic pathway usage and growth phenotypes. Furthermore, they can generate and test hypotheses when integrated with experimental data. To maximize the value of these models, centralized repositories of high-quality models must be established, models must adhere to established standards and model components must be linked to relevant databases. Tools for model visualization further enhance their utility. To meet these needs, we present BiGG Models (http://bigg.ucsd.edu), a completely redesigned Biochemical, Genetic and Genomic knowledge base. BiGG Models contains more than 75 high-quality, manually-curated genome-scale metabolic models. On the website, users can browse, search and visualize models. BiGG Models connects genome-scale models to genome annotations and external databases. Reaction and metabolite identifiers have been standardized across models to conform to community standards and enable rapid comparison across models. Furthermore, BiGG Models provides a comprehensive application programming interface for accessing BiGG Models with modeling and analysis tools. As a resource for highly curated, standardized and accessible models of metabolism, BiGG Models will facilitate diverse systems biology studies and support knowledge-based analysis of diverse experimental data. PMID:26476456
Zhang, Y J; Zhou, D H; Bai, Z P; Xue, F X
2018-02-10
Objective: To quantitatively analyze the current status and development trends regarding the land use regression (LUR) models on ambient air pollution studies. Methods: Relevant literature from the PubMed database before June 30, 2017 was analyzed, using the Bibliographic Items Co-occurrence Matrix Builder (BICOMB 2.0). Keywords co-occurrence networks, cluster mapping and timeline mapping were generated, using the CiteSpace 5.1.R5 software. Relevant literature identified in three Chinese databases was also reviewed. Results: Four hundred sixty four relevant papers were retrieved from the PubMed database. The number of papers published showed an annual increase, in line with the growing trend of the index. Most papers were published in the journal of Environmental Health Perspectives . Results from the Co-word cluster analysis identified five clusters: cluster#0 consisted of birth cohort studies related to the health effects of prenatal exposure to air pollution; cluster#1 referred to land use regression modeling and exposure assessment; cluster#2 was related to the epidemiology on traffic exposure; cluster#3 dealt with the exposure to ultrafine particles and related health effects; cluster#4 described the exposure to black carbon and related health effects. Data from Timeline mapping indicated that cluster#0 and#1 were the main research areas while cluster#3 and#4 were the up-coming hot areas of research. Ninety four relevant papers were retrieved from the Chinese databases with most of them related to studies on modeling. Conclusion: In order to better assess the health-related risks of ambient air pollution, and to best inform preventative public health intervention policies, application of LUR models to environmental epidemiology studies in China should be encouraged.
GIS model for identifying urban areas vulnerable to noise pollution: case study
NASA Astrophysics Data System (ADS)
Bilaşco, Ştefan; Govor, Corina; Roşca, Sanda; Vescan, Iuliu; Filip, Sorin; Fodorean, Ioan
2017-04-01
The unprecedented expansion of the national car ownership over the last few years has been determined by economic growth and the need for the population and economic agents to reduce travel time in progressively expanding large urban centres. This has led to an increase in the level of road noise and a stronger impact on the quality of the environment. Noise pollution generated by means of transport represents one of the most important types of pollution with negative effects on a population's health in large urban areas. As a consequence, tolerable limits of sound intensity for the comfort of inhabitants have been determined worldwide and the generation of sound maps has been made compulsory in order to identify the vulnerable zones and to make recommendations how to decrease the negative impact on humans. In this context, the present study aims at presenting a GIS spatial analysis model-based methodology for identifying and mapping zones vulnerable to noise pollution. The developed GIS model is based on the analysis of all the components influencing sound propagation, represented as vector databases (points of sound intensity measurements, buildings, lands use, transport infrastructure), raster databases (DEM), and numerical databases (wind direction and speed, sound intensity). Secondly, the hourly changes (for representative hours) were analysed to identify the hotspots characterised by major traffic flows specific to rush hours. The validated results of the model are represented by GIS databases and useful maps for the local public administration to use as a source of information and in the process of making decisions.
Trezza, Alfonso; Bernini, Andrea; Langella, Andrea; Ascher, David B; Pires, Douglas E V; Sodi, Andrea; Passerini, Ilaria; Pelo, Elisabetta; Rizzo, Stanislao; Niccolai, Neri; Spiga, Ottavia
2017-10-01
The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-05-25
The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NASA Astrophysics Data System (ADS)
Sushko, Iurii; Novotarskyi, Sergii; Körner, Robert; Pandey, Anil Kumar; Rupp, Matthias; Teetz, Wolfram; Brandmaier, Stefan; Abdelaziz, Ahmed; Prokopenko, Volodymyr V.; Tanchuk, Vsevolod Y.; Todeschini, Roberto; Varnek, Alexandre; Marcou, Gilles; Ertl, Peter; Potemkin, Vladimir; Grishina, Maria; Gasteiger, Johann; Schwab, Christof; Baskin, Igor I.; Palyulin, Vladimir A.; Radchenko, Eugene V.; Welsh, William J.; Kholodovych, Vladyslav; Chekmarev, Dmitriy; Cherkasov, Artem; Aires-de-Sousa, Joao; Zhang, Qing-You; Bender, Andreas; Nigsch, Florian; Patiny, Luc; Williams, Antony; Tkachenko, Valery; Tetko, Igor V.
2011-06-01
The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. As compared to other similar systems, OCHEM is not intended to re-implement the existing tools or models but rather to invite the original authors to contribute their results, make them publicly available, share them with other users and to become members of the growing research community. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. The ultimate goal of OCHEM is collecting all possible chemoinformatics tools within one simple, reliable and user-friendly resource. The OCHEM is free for web users and it is available online at http://www.ochem.eu.
A study on spatial decision support systems for HIV/AIDS prevention based on COM GIS technology
NASA Astrophysics Data System (ADS)
Yang, Kun; Luo, Huasong; Peng, Shungyun; Xu, Quanli
2007-06-01
Based on the deeply analysis of the current status and the existing problems of GIS technology applications in Epidemiology, this paper has proposed the method and process for establishing the spatial decision support systems of AIDS epidemic prevention by integrating the COM GIS, Spatial Database, GPS, Remote Sensing, and Communication technologies, as well as ASP and ActiveX software development technologies. One of the most important issues for constructing the spatial decision support systems of AIDS epidemic prevention is how to integrate the AIDS spreading models with GIS. The capabilities of GIS applications in the AIDS epidemic prevention have been described here in this paper firstly. Then some mature epidemic spreading models have also been discussed for extracting the computation parameters. Furthermore, a technical schema has been proposed for integrating the AIDS spreading models with GIS and relevant geospatial technologies, in which the GIS and model running platforms share a common spatial database and the computing results can be spatially visualized on Desktop or Web GIS clients. Finally, a complete solution for establishing the decision support systems of AIDS epidemic prevention has been offered in this paper based on the model integrating methods and ESRI COM GIS software packages. The general decision support systems are composed of data acquisition sub-systems, network communication sub-systems, model integrating sub-systems, AIDS epidemic information spatial database sub-systems, AIDS epidemic information querying and statistical analysis sub-systems, AIDS epidemic dynamic surveillance sub-systems, AIDS epidemic information spatial analysis and decision support sub-systems, as well as AIDS epidemic information publishing sub-systems based on Web GIS.
Integrating the Allen Brain Institute Cell Types Database into Automated Neuroscience Workflow.
Stockton, David B; Santamaria, Fidel
2017-10-01
We developed software tools to download, extract features, and organize the Cell Types Database from the Allen Brain Institute (ABI) in order to integrate its whole cell patch clamp characterization data into the automated modeling/data analysis cycle. To expand the potential user base we employed both Python and MATLAB. The basic set of tools downloads selected raw data and extracts cell, sweep, and spike features, using ABI's feature extraction code. To facilitate data manipulation we added a tool to build a local specialized database of raw data plus extracted features. Finally, to maximize automation, we extended our NeuroManager workflow automation suite to include these tools plus a separate investigation database. The extended suite allows the user to integrate ABI experimental and modeling data into an automated workflow deployed on heterogeneous computer infrastructures, from local servers, to high performance computing environments, to the cloud. Since our approach is focused on workflow procedures our tools can be modified to interact with the increasing number of neuroscience databases being developed to cover all scales and properties of the nervous system.
Powell, Kimberly R; Peterson, Shenita R
Web of Science and Scopus are the leading databases of scholarly impact. Recent studies outside the field of nursing report differences in journal coverage and quality. A comparative analysis of nursing publications reported impact. Journal coverage by each database for the field of nursing was compared. Additionally, publications by 2014 nursing faculty were collected in both databases and compared for overall coverage and reported quality, as modeled by Scimajo Journal Rank, peer review status, and MEDLINE inclusion. Individual author impact, modeled by the h-index, was calculated by each database for comparison. Scopus offered significantly higher journal coverage. For 2014 faculty publications, 100% of journals were found in Scopus, Web of Science offered 82%. No significant difference was found in the quality of reported journals. Author h-index was found to be higher in Scopus. When reporting faculty publications and scholarly impact, academic nursing programs may be better represented by Scopus, without compromising journal quality. Programs with strong interdisciplinary work should examine all areas of strength to ensure appropriate coverage. Copyright © 2017 Elsevier Inc. All rights reserved.
Valid Statistical Analysis for Logistic Regression with Multiple Sources
NASA Astrophysics Data System (ADS)
Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Kuchinke, Wolfgang; Ohmann, Christian; Verheij, Robert A; van Veen, Evert-Ben; Arvanitis, Theodoros N; Taweel, Adel; Delaney, Brendan C
2014-12-01
To develop a model describing core concepts and principles of data flow, data privacy and confidentiality, in a simple and flexible way, using concise process descriptions and a diagrammatic notation applied to research workflow processes. The model should help to generate robust data privacy frameworks for research done with patient data. Based on an exploration of EU legal requirements for data protection and privacy, data access policies, and existing privacy frameworks of research projects, basic concepts and common processes were extracted, described and incorporated into a model with a formal graphical representation and a standardised notation. The Unified Modelling Language (UML) notation was enriched by workflow and own symbols to enable the representation of extended data flow requirements, data privacy and data security requirements, privacy enhancing techniques (PET) and to allow privacy threat analysis for research scenarios. Our model is built upon the concept of three privacy zones (Care Zone, Non-care Zone and Research Zone) containing databases, data transformation operators, such as data linkers and privacy filters. Using these model components, a risk gradient for moving data from a zone of high risk for patient identification to a zone of low risk can be described. The model was applied to the analysis of data flows in several general clinical research use cases and two research scenarios from the TRANSFoRm project (e.g., finding patients for clinical research and linkage of databases). The model was validated by representing research done with the NIVEL Primary Care Database in the Netherlands. The model allows analysis of data privacy and confidentiality issues for research with patient data in a structured way and provides a framework to specify a privacy compliant data flow, to communicate privacy requirements and to identify weak points for an adequate implementation of data privacy. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Associative memory model for searching an image database by image snippet
NASA Astrophysics Data System (ADS)
Khan, Javed I.; Yun, David Y.
1994-09-01
This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.
NASA Astrophysics Data System (ADS)
Zhao, L.; Chen, P.; Jordan, T. H.; Olsen, K. B.; Maechling, P.; Faerman, M.
2004-12-01
The Southern California Earthquake Center (SCEC) is developing a Community Modeling Environment (CME) to facilitate the computational pathways of physics-based seismic hazard analysis (Maechling et al., this meeting). Major goals are to facilitate the forward modeling of seismic wavefields in complex geologic environments, including the strong ground motions that cause earthquake damage, and the inversion of observed waveform data for improved models of Earth structure and fault rupture. Here we report on a unified approach to these coupled inverse problems that is based on the ability to generate and manipulate wavefields in densely gridded 3D Earth models. A main element of this approach is a database of receiver Green tensors (RGT) for the seismic stations, which comprises all of the spatial-temporal displacement fields produced by the three orthogonal unit impulsive point forces acting at each of the station locations. Once the RGT database is established, synthetic seismograms for any earthquake can be simply calculated by extracting a small, source-centered volume of the RGT from the database and applying the reciprocity principle. The partial derivatives needed for point- and finite-source inversions can be generated in the same way. Moreover, the RGT database can be employed in full-wave tomographic inversions launched from a 3D starting model, because the sensitivity (Fréchet) kernels for travel-time and amplitude anomalies observed at seismic stations in the database can be computed by convolving the earthquake-induced displacement field with the station RGTs. We illustrate all elements of this unified analysis with an RGT database for 33 stations of the California Integrated Seismic Network in and around the Los Angeles Basin, which we computed for the 3D SCEC Community Velocity Model (SCEC CVM3.0) using a fourth-order staggered-grid finite-difference code. For a spatial grid spacing of 200 m and a time resolution of 10 ms, the calculations took ~19,000 node-hours on the Linux cluster at USC's High-Performance Computing Center. The 33-station database with a volume of ~23.5 TB was archived in the SCEC digital library at the San Diego Supercomputer Center using the Storage Resource Broker (SRB). From a laptop, anyone with access to this SRB collection can compute synthetic seismograms for an arbitrary source in the CVM in a matter of minutes. Efficient approaches have been implemented to use this RGT database in the inversions of waveforms for centroid and finite moment tensors and tomographic inversions to improve the CVM. Our experience with these large problems suggests areas where the cyberinfrastructure currently available for geoscience computation needs to be improved.
Integration of Evidence Base into a Probabilistic Risk Assessment
NASA Technical Reports Server (NTRS)
Saile, Lyn; Lopez, Vilma; Bickham, Grandin; Kerstman, Eric; FreiredeCarvalho, Mary; Byrne, Vicky; Butler, Douglas; Myers, Jerry; Walton, Marlei
2011-01-01
INTRODUCTION: A probabilistic decision support model such as the Integrated Medical Model (IMM) utilizes an immense amount of input data that necessitates a systematic, integrated approach for data collection, and management. As a result of this approach, IMM is able to forecasts medical events, resource utilization and crew health during space flight. METHODS: Inflight data is the most desirable input for the Integrated Medical Model. Non-attributable inflight data is collected from the Lifetime Surveillance for Astronaut Health study as well as the engineers, flight surgeons, and astronauts themselves. When inflight data is unavailable cohort studies, other models and Bayesian analyses are used, in addition to subject matters experts input on occasion. To determine the quality of evidence of a medical condition, the data source is categorized and assigned a level of evidence from 1-5; the highest level is one. The collected data reside and are managed in a relational SQL database with a web-based interface for data entry and review. The database is also capable of interfacing with outside applications which expands capabilities within the database itself. Via the public interface, customers can access a formatted Clinical Findings Form (CLiFF) that outlines the model input and evidence base for each medical condition. Changes to the database are tracked using a documented Configuration Management process. DISSCUSSION: This strategic approach provides a comprehensive data management plan for IMM. The IMM Database s structure and architecture has proven to support additional usages. As seen by the resources utilization across medical conditions analysis. In addition, the IMM Database s web-based interface provides a user-friendly format for customers to browse and download the clinical information for medical conditions. It is this type of functionality that will provide Exploratory Medicine Capabilities the evidence base for their medical condition list. CONCLUSION: The IMM Database in junction with the IMM is helping NASA aerospace program improve the health care and reduce risk for the astronauts crew. Both the database and model will continue to expand to meet customer needs through its multi-disciplinary evidence based approach to managing data. Future expansion could serve as a platform for a Space Medicine Wiki of medical conditions.
novPTMenzy: a database for enzymes involved in novel post-translational modifications
Khater, Shradha; Mohanty, Debasisa
2015-01-01
With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459
NASA Astrophysics Data System (ADS)
Mehdizadeh, Saeid
2018-04-01
Evapotranspiration (ET) is considered as a key factor in hydrological and climatological studies, agricultural water management, irrigation scheduling, etc. It can be directly measured using lysimeters. Moreover, other methods such as empirical equations and artificial intelligence methods can be used to model ET. In the recent years, artificial intelligence methods have been widely utilized to estimate reference evapotranspiration (ETo). In the present study, local and external performances of multivariate adaptive regression splines (MARS) and gene expression programming (GEP) were assessed for estimating daily ETo. For this aim, daily weather data of six stations with different climates in Iran, namely Urmia and Tabriz (semi-arid), Isfahan and Shiraz (arid), Yazd and Zahedan (hyper-arid) were employed during 2000-2014. Two types of input patterns consisting of weather data-based and lagged ETo data-based scenarios were considered to develop the models. Four statistical indicators including root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), and mean absolute percentage error (MAPE) were used to check the accuracy of models. The local performance of models revealed that the MARS and GEP approaches have the capability to estimate daily ETo using the meteorological parameters and the lagged ETo data as inputs. Nevertheless, the MARS had the best performance in the weather data-based scenarios. On the other hand, considerable differences were not observed in the models' accuracy for the lagged ETo data-based scenarios. In the innovation of this study, novel hybrid models were proposed in the lagged ETo data-based scenarios through combination of MARS and GEP models with autoregressive conditional heteroscedasticity (ARCH) time series model. It was concluded that the proposed novel models named MARS-ARCH and GEP-ARCH improved the performance of ETo modeling compared to the single MARS and GEP. In addition, the external analysis of the performance of models at stations with similar climatic conditions denoted the applicability of nearby station' data for estimation of the daily ETo at target station.
Analysis on the Correlation of Traffic Flow in Hainan Province Based on Baidu Search
NASA Astrophysics Data System (ADS)
Chen, Caixia; Shi, Chun
2018-03-01
Internet search data records user’s search attention and consumer demand, providing necessary database for the Hainan traffic flow model. Based on Baidu Index, with Hainan traffic flow as example, this paper conduct both qualitative and quantitative analysis on the relationship between search keyword from Baidu Index and actual Hainan tourist traffic flow, and build multiple regression model by SPSS.
Gacesa, Ranko; Zucko, Jurica; Petursdottir, Solveig K; Gudmundsdottir, Elisabet Eik; Fridjonsson, Olafur H; Diminic, Janko; Long, Paul F; Cullum, John; Hranueli, Daslav; Hreggvidsson, Gudmundur O; Starcevic, Antonio
2017-06-01
The MEGGASENSE platform constructs relational databases of DNA or protein sequences. The default functional analysis uses 14 106 hidden Markov model (HMM) profiles based on sequences in the KEGG database. The Solr search engine allows sophisticated queries and a BLAST search function is also incorporated. These standard capabilities were used to generate the SCATT database from the predicted proteome of Streptomyces cattleya . The implementation of a specialised metagenome database (AMYLOMICS) for bioprospecting of carbohydrate-modifying enzymes is described. In addition to standard assembly of reads, a novel 'functional' assembly was developed, in which screening of reads with the HMM profiles occurs before the assembly. The AMYLOMICS database incorporates additional HMM profiles for carbohydrate-modifying enzymes and it is illustrated how the combination of HMM and BLAST analyses helps identify interesting genes. A variety of different proteome and metagenome databases have been generated by MEGGASENSE.
NASA Technical Reports Server (NTRS)
Arnold, Steven M. (Editor); Wong, Terry T. (Editor)
2011-01-01
Topics covered include: An Annotative Review of Multiscale Modeling and its Application to Scales Inherent in the Field of ICME; and A Multiscale, Nonlinear, Modeling Framework Enabling the Design and Analysis of Composite Materials and Structures.
NASA Technical Reports Server (NTRS)
Donnellan, Andrea; Parker, Jay W.; Lyzenga, Gregory A.; Granat, Robert A.; Norton, Charles D.; Rundle, John B.; Pierce, Marlon E.; Fox, Geoffrey C.; McLeod, Dennis; Ludwig, Lisa Grant
2012-01-01
QuakeSim 2.0 improves understanding of earthquake processes by providing modeling tools and integrating model applications and various heterogeneous data sources within a Web services environment. QuakeSim is a multisource, synergistic, data-intensive environment for modeling the behavior of earthquake faults individually, and as part of complex interacting systems. Remotely sensed geodetic data products may be explored, compared with faults and landscape features, mined by pattern analysis applications, and integrated with models and pattern analysis applications in a rich Web-based and visualization environment. Integration of heterogeneous data products with pattern informatics tools enables efficient development of models. Federated database components and visualization tools allow rapid exploration of large datasets, while pattern informatics enables identification of subtle, but important, features in large data sets. QuakeSim is valuable for earthquake investigations and modeling in its current state, and also serves as a prototype and nucleus for broader systems under development. The framework provides access to physics-based simulation tools that model the earthquake cycle and related crustal deformation. Spaceborne GPS and Inter ferometric Synthetic Aperture (InSAR) data provide information on near-term crustal deformation, while paleoseismic geologic data provide longerterm information on earthquake fault processes. These data sources are integrated into QuakeSim's QuakeTables database system, and are accessible by users or various model applications. UAVSAR repeat pass interferometry data products are added to the QuakeTables database, and are available through a browseable map interface or Representational State Transfer (REST) interfaces. Model applications can retrieve data from Quake Tables, or from third-party GPS velocity data services; alternatively, users can manually input parameters into the models. Pattern analysis of GPS and seismicity data has proved useful for mid-term forecasting of earthquakes, and for detecting subtle changes in crustal deformation. The GPS time series analysis has also proved useful as a data-quality tool, enabling the discovery of station anomalies and data processing and distribution errors. Improved visualization tools enable more efficient data exploration and understanding. Tools provide flexibility to science users for exploring data in new ways through download links, but also facilitate standard, intuitive, and routine uses for science users and end users such as emergency responders.
Kim, Woo-Yeon; Kang, Sungsoo; Kim, Byoung-Chul; Oh, Jeehyun; Cho, Seongwoong; Bhak, Jong; Choi, Jong-Soon
2008-01-01
Cyanobacteria are model organisms for studying photosynthesis, carbon and nitrogen assimilation, evolution of plant plastids, and adaptability to environmental stresses. Despite many studies on cyanobacteria, there is no web-based database of their regulatory and signaling protein-protein interaction networks to date. We report a database and website SynechoNET that provides predicted protein-protein interactions. SynechoNET shows cyanobacterial domain-domain interactions as well as their protein-level interactions using the model cyanobacterium, Synechocystis sp. PCC 6803. It predicts the protein-protein interactions using public interaction databases that contain mutually complementary and redundant data. Furthermore, SynechoNET provides information on transmembrane topology, signal peptide, and domain structure in order to support the analysis of regulatory membrane proteins. Such biological information can be queried and visualized in user-friendly web interfaces that include the interactive network viewer and search pages by keyword and functional category. SynechoNET is an integrated protein-protein interaction database designed to analyze regulatory membrane proteins in cyanobacteria. It provides a platform for biologists to extend the genomic data of cyanobacteria by predicting interaction partners, membrane association, and membrane topology of Synechocystis proteins. SynechoNET is freely available at http://synechocystis.org/ or directly at http://bioportal.kobic.kr/SynechoNET/.
Advanced Modeling and Uncertainty Quantification for Flight Dynamics; Interim Results and Challenges
NASA Technical Reports Server (NTRS)
Hyde, David C.; Shweyk, Kamal M.; Brown, Frank; Shah, Gautam
2014-01-01
As part of the NASA Vehicle Systems Safety Technologies (VSST), Assuring Safe and Effective Aircraft Control Under Hazardous Conditions (Technical Challenge #3), an effort is underway within Boeing Research and Technology (BR&T) to address Advanced Modeling and Uncertainty Quantification for Flight Dynamics (VSST1-7). The scope of the effort is to develop and evaluate advanced multidisciplinary flight dynamics modeling techniques, including integrated uncertainties, to facilitate higher fidelity response characterization of current and future aircraft configurations approaching and during loss-of-control conditions. This approach is to incorporate multiple flight dynamics modeling methods for aerodynamics, structures, and propulsion, including experimental, computational, and analytical. Also to be included are techniques for data integration and uncertainty characterization and quantification. This research shall introduce new and updated multidisciplinary modeling and simulation technologies designed to improve the ability to characterize airplane response in off-nominal flight conditions. The research shall also introduce new techniques for uncertainty modeling that will provide a unified database model comprised of multiple sources, as well as an uncertainty bounds database for each data source such that a full vehicle uncertainty analysis is possible even when approaching or beyond Loss of Control boundaries. Methodologies developed as part of this research shall be instrumental in predicting and mitigating loss of control precursors and events directly linked to causal and contributing factors, such as stall, failures, damage, or icing. The tasks will include utilizing the BR&T Water Tunnel to collect static and dynamic data to be compared to the GTM extended WT database, characterizing flight dynamics in off-nominal conditions, developing tools for structural load estimation under dynamic conditions, devising methods for integrating various modeling elements into a real-time simulation capability, generating techniques for uncertainty modeling that draw data from multiple modeling sources, and providing a unified database model that includes nominal plus increments for each flight condition. This paper presents status of testing in the BR&T water tunnel and analysis of the resulting data and efforts to characterize these data using alternative modeling methods. Program challenges and issues are also presented.
2004-03-01
with MySQL . This choice was made because MySQL is open source. Any significant database engine such as Oracle or MS- SQL or even MS Access can be used...10 Figure 6. The DoD vs . Commercial Life Cycle...necessarily be interested in SCADA network security 13. MySQL (Database server) – This station represents a typical data server for a web page
The Effects of Liberalizing World Agricultural Trade: A Review of Modeling Studies
2006-06-01
agricultural policy issues. Like the LINKAGE model analysis, the GTAP-AGR model analysis uses version 6.05 of the GTAP database with a base year of...the 2006 study—fully liberal- izing agricultural policy in equal increments from 2005 through 2010—would increase world economic welfare in 2015 by...Burfisher, ed., Agricultural Policy Reform in the WTO—The Road Ahead, Agricultural Economic Report No. 802 (U.S. Department of Agriculture, Economic
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Aßmann, C
2016-06-01
Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.
Stability assessment of structures under earthquake hazard through GRID technology
NASA Astrophysics Data System (ADS)
Prieto Castrillo, F.; Boton Fernandez, M.
2009-04-01
This work presents a GRID framework to estimate the vulnerability of structures under earthquake hazard. The tool has been designed to cover the needs of a typical earthquake engineering stability analysis; preparation of input data (pre-processing), response computation and stability analysis (post-processing). In order to validate the application over GRID, a simplified model of structure under artificially generated earthquake records has been implemented. To achieve this goal, the proposed scheme exploits the GRID technology and its main advantages (parallel intensive computing, huge storage capacity and collaboration analysis among institutions) through intensive interaction among the GRID elements (Computing Element, Storage Element, LHC File Catalogue, federated database etc.) The dynamical model is described by a set of ordinary differential equations (ODE's) and by a set of parameters. Both elements, along with the integration engine, are encapsulated into Java classes. With this high level design, subsequent improvements/changes of the model can be addressed with little effort. In the procedure, an earthquake record database is prepared and stored (pre-processing) in the GRID Storage Element (SE). The Metadata of these records is also stored in the GRID federated database. This Metadata contains both relevant information about the earthquake (as it is usual in a seismic repository) and also the Logical File Name (LFN) of the record for its later retrieval. Then, from the available set of accelerograms in the SE, the user can specify a range of earthquake parameters to carry out a dynamic analysis. This way, a GRID job is created for each selected accelerogram in the database. At the GRID Computing Element (CE), displacements are then obtained by numerical integration of the ODE's over time. The resulting response for that configuration is stored in the GRID Storage Element (SE) and the maximum structure displacement is computed. Then, the corresponding Metadata containing the response LFN, earthquake magnitude and maximum structure displacement is also stored. Finally, the displacements are post-processed through a statistically-based algorithm from the available Metadata to obtain the probability of collapse of the structure for different earthquake magnitudes. From this study, it is possible to build a vulnerability report for the structure type and seismic data. The proposed methodology can be combined with the on-going initiatives to build a European earthquake record database. In this context, Grid enables collaboration analysis over shared seismic data and results among different institutions.
NASA Technical Reports Server (NTRS)
Trouve, A.; Veynante, D.; Bray, K. N. C.; Mantel, T.
1994-01-01
Current flamelot models based on a description of the flame surface dynamics require the closure of two inter-related equations: a transport equation for the mean reaction progress variable, (tilde)c, and a transport equation for the flame surface density, Sigma. The coupling between these two equations is investigated using direct numerical simulations (DNS) with emphasis on the correlation between the turbulent fluxes of (tilde)c, bar(pu''c''), and Sigma, (u'')(sub S)Sigma. Two different DNS databases are used in the present work: a database developed at CTR by A. Trouve and a database developed by C. J. Rutland using a different code. Both databases correspond to statistically one-dimensional premixed flames in isotropic turbulent flow. The run parameters, however, are significantly different, and the two databases correspond to different combustion regimes. It is found that in all simulated flames, the correlation between bar(pu''c'') and (u'')(sub S)Sigma is always strong. The sign, however, of the turbulent flux of (tilde)c or Sigma with respect to the mean gradients, delta(tilde)c/delta(x) or delta(Sigma)/delta(x), is case-dependent. The CTR database is found to exhibit gradient turbulent transport of (tilde)c and Sigma, whereas the Rutland DNS features counter-gradient diffusion. The two databases are analyzed and compared using various tools (a local analysis of the flow field near the flame, a classical analysis of the conservation equation for (tilde)(u''c''), and a thin flame theoretical analysis). A mechanism is then proposed to explain the discrepancies between the two databases and a preliminary simple criterion is derived to predict the occurrence of gradient/counter-gradient turbulent diffusion.
Molecular Oxygen in the Thermosphere: Issues and Measurement Strategies
NASA Astrophysics Data System (ADS)
Picone, J. M.; Hedin, A. E.; Drob, D. P.; Meier, R. R.; Bishop, J.; Budzien, S. A.
2002-05-01
We review the state of empirical knowledge regarding the distribution of molecular oxygen in the lower thermosphere (100-200 km), as embodied by the new NRLMSISE-00 empirical atmospheric model, its predecessors, and the underlying databases. For altitudes above 120 km, the two major classes of data (mass spectrometer and solar ultraviolet [UV] absorption) disagree significantly regarding the magnitude of the O2 density and the dependence on solar activity. As a result, the addition of the Solar Maximum Mission (SMM) data set (based on solar UV absorption) to the NRLMSIS database has directly impacted the new model, increasing the complexity of the model's formulation and generally reducing the thermospheric O2 density relative to MSISE-90. Beyond interest in the thermosphere itself, this issue materially affects detailed models of ionospheric chemistry and dynamics as well as modeling of the upper atmospheric airglow. Because these are key elements of both experimental and operational systems which measure and forecast the near-Earth space environment, we present strategies for augmenting the database through analysis of existing data and through future measurements in order to resolve this issue.
IAC - INTEGRATED ANALYSIS CAPABILITY
NASA Technical Reports Server (NTRS)
Frisch, H. P.
1994-01-01
The objective of the Integrated Analysis Capability (IAC) system is to provide a highly effective, interactive analysis tool for the integrated design of large structures. With the goal of supporting the unique needs of engineering analysis groups concerned with interdisciplinary problems, IAC was developed to interface programs from the fields of structures, thermodynamics, controls, and system dynamics with an executive system and database to yield a highly efficient multi-disciplinary system. Special attention is given to user requirements such as data handling and on-line assistance with operational features, and the ability to add new modules of the user's choice at a future date. IAC contains an executive system, a data base, general utilities, interfaces to various engineering programs, and a framework for building interfaces to other programs. IAC has shown itself to be effective in automatic data transfer among analysis programs. IAC 2.5, designed to be compatible as far as possible with Level 1.5, contains a major upgrade in executive and database management system capabilities, and includes interfaces to enable thermal, structures, optics, and control interaction dynamics analysis. The IAC system architecture is modular in design. 1) The executive module contains an input command processor, an extensive data management system, and driver code to execute the application modules. 2) Technical modules provide standalone computational capability as well as support for various solution paths or coupled analyses. 3) Graphics and model generation interfaces are supplied for building and viewing models. Advanced graphics capabilities are provided within particular analysis modules such as INCA and NASTRAN. 4) Interface modules provide for the required data flow between IAC and other modules. 5) User modules can be arbitrary executable programs or JCL procedures with no pre-defined relationship to IAC. 6) Special purpose modules are included, such as MIMIC (Model Integration via Mesh Interpolation Coefficients), which transforms field values from one model to another; LINK, which simplifies incorporation of user specific modules into IAC modules; and DATAPAC, the National Bureau of Standards statistical analysis package. The IAC database contains structured files which provide a common basis for communication between modules and the executive system, and can contain unstructured files such as NASTRAN checkpoint files, DISCOS plot files, object code, etc. The user can define groups of data and relations between them. A full data manipulation and query system operates with the database. The current interface modules comprise five groups: 1) Structural analysis - IAC contains a NASTRAN interface for standalone analysis or certain structural/control/thermal combinations. IAC provides enhanced structural capabilities for normal modes and static deformation analysis via special DMAP sequences. IAC 2.5 contains several specialized interfaces from NASTRAN in support of multidisciplinary analysis. 2) Thermal analysis - IAC supports finite element and finite difference techniques for steady state or transient analysis. There are interfaces for the NASTRAN thermal analyzer, SINDA/SINFLO, and TRASYS II. FEMNET, which converts finite element structural analysis models to finite difference thermal analysis models, is also interfaced with the IAC database. 3) System dynamics - The DISCOS simulation program which allows for either nonlinear time domain analysis or linear frequency domain analysis, is fully interfaced to the IAC database management capability. 4) Control analysis - Interfaces for the ORACLS, SAMSAN, NBOD2, and INCA programs allow a wide range of control system analyses and synthesis techniques. Level 2.5 includes EIGEN, which provides tools for large order system eigenanalysis, and BOPACE, which allows for geometric capabilities and finite element analysis with nonlinear material. Also included in IAC level 2.5 is SAMSAN 3.1, an engineering analysis program which contains a general purpose library of over 600 subroutines for numerical analysis. 5) Graphics - The graphics package IPLOT is included in IAC. IPLOT generates vector displays of tabular data in the form of curves, charts, correlation tables, etc. Either DI3000 or PLOT-10 graphics software is required for full graphic capability. In addition to these analysis tools, IAC 2.5 contains an IGES interface which allows the user to read arbitrary IGES files into an IAC database and to edit and output new IGES files. IAC is available by license for a period of 10 years to approved U.S. licensees. The licensed program product includes one set of supporting documentation. Additional copies may be purchased separately. IAC is written in FORTRAN 77 and has been implemented on a DEC VAX series computer operating under VMS. IAC can be executed by multiple concurrent users in batch or interactive mode. The program is structured to allow users to easily delete those program capabilities and "how to" examples they do not want in order to reduce the size of the package. The basic central memory requirement for IAC is approximately 750KB. The following programs are also available from COSMIC as separate packages: NASTRAN, SINDA/SINFLO, TRASYS II, DISCOS, ORACLS, SAMSAN, NBOD2, and INCA. The development of level 2.5 of IAC was completed in 1989.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, X; Fatyga, M; Vora, S
Purpose: To determine if differences in patient positioning methods have an impact on the incidence and modeling of grade >=2 acute rectal toxicity in prostate cancer patients who were treated with Intensity Modulated Radiation Therapy (IMRT). Methods: We compared two databases of patients treated with radiation therapy for prostate cancer: a database of 79 patients who were treated with 7 field IMRT and daily image guided positioning based on implanted gold markers (IGRTdb), and a database of 302 patients who were treated with 5 field IMRT and daily positioning using a trans-abdominal ultrasound system (USdb). Complete planning dosimetry was availablemore » for IGRTdb patients while limited planning dosimetry, recorded at the time of planning, was available for USdb patients. We fit Lyman-Kutcher-Burman (LKB) model to IGRTdb only, and Univariate Logistic Regression (ULR) NTCP model to both databases. We perform Receiver Operating Characteristics analysis to determine the predictive power of NTCP models. Results: The incidence of grade >= 2 acute rectal toxicity in IGRTdb was 20%, while the incidence in USdb was 54%. Fits of both LKB and ULR models yielded predictive NTCP models for IGRTdb patients with Area Under the Curve (AUC) in the 0.63 – 0.67 range. Extrapolation of the ULR model from IGRTdb to planning dosimetry in USdb predicts that the incidence of acute rectal toxicity in USdb should not exceed 40%. Fits of the ULR model to the USdb do not yield predictive NTCP models and their AUC is consistent with AUC = 0.5. Conclusion: Accuracy of a patient positioning system affects clinically observed toxicity rates and the quality of NTCP models that can be derived from toxicity data. Poor correlation between planned and clinically delivered dosimetry may lead to erroneous or poorly performing NTCP models, even if the number of patients in a database is large.« less
1993-07-09
Calculate Oil and solve iteratively equation (18) for q and (l)-(S) forex . 4, Solve the velocity problemn through equation (19) to calculate q and (6)-(10) to...object.oriented models for the database to store the system information f1l. Using OOP on the formalism level is more difficult and a current field of...Multidimensional Physical Systems: Graph-theoretic Modeling, Systems and Cybernetics, vol 21 (1992), 5 .9-71 JV A RELATIONAL DATABASE FOR GENERAL
Mysql Data-Base Applications for Dst-Like Physics Analysis
NASA Astrophysics Data System (ADS)
Tsenov, Roumen
2004-07-01
The data and analysis model developed and being used in the HARP experiment for studying hadron production at CERN Proton Synchrotron is discussed. Emphasis is put on usage of data-base (DB) back-ends for persistent storing and retrieving "alive" C++ objects encapsulating raw and reconstructed data. Concepts of "Data Summary Tape" (DST) as a logical collection of DB-persistent data of different types, and of "intermediate DST" (iDST) as a physical "tag" of DST, are introduced. iDST level of persistency allows a powerful, DST-level of analysis to be performed by applications running on an isolated machine (even laptop) with no connection to the experiment's main data storage. Implementation of these concepts is considered.
dbHiMo: a web-based epigenomics platform for histone-modifying enzymes
Choi, Jaeyoung; Kim, Ki-Tae; Huh, Aram; Kwon, Seomun; Hong, Changyoung; Asiegbu, Fred O.; Jeon, Junhyun; Lee, Yong-Hwan
2015-01-01
Over the past two decades, epigenetics has evolved into a key concept for understanding regulation of gene expression. Among many epigenetic mechanisms, covalent modifications such as acetylation and methylation of lysine residues on core histones emerged as a major mechanism in epigenetic regulation. Here, we present the database for histone-modifying enzymes (dbHiMo; http://hme.riceblast.snu.ac.kr/) aimed at facilitating functional and comparative analysis of histone-modifying enzymes (HMEs). HMEs were identified by applying a search pipeline built upon profile hidden Markov model (HMM) to proteomes. The database incorporates 11 576 HMEs identified from 603 proteomes including 483 fungal, 32 plants and 51 metazoan species. The dbHiMo provides users with web-based personalized data browsing and analysis tools, supporting comparative and evolutionary genomics. With comprehensive data entries and associated web-based tools, our database will be a valuable resource for future epigenetics/epigenomics studies. Database URL: http://hme.riceblast.snu.ac.kr/ PMID:26055100
Encryption Characteristics of Two USB-based Personal Health Record Devices
Wright, Adam; Sittig, Dean F.
2007-01-01
Personal health records (PHRs) hold great promise for empowering patients and increasing the accuracy and completeness of health information. We reviewed two small USB-based PHR devices that allow a patient to easily store and transport their personal health information. Both devices offer password protection and encryption features. Analysis of the devices shows that they store their data in a Microsoft Access database. Due to a flaw in the encryption of this database, recovering the user’s password can be accomplished with minimal effort. Our analysis also showed that, rather than encrypting health information with the password chosen by the user, the devices stored the user’s password as a string in the database and then encrypted that database with a common password set by the manufacturer. This is another serious vulnerability. This article describes the weaknesses we discovered, outlines three critical flaws with the security model used by the devices, and recommends four guidelines for improving the security of similar devices. PMID:17460132
NASA Astrophysics Data System (ADS)
Ferroud, Anouck; Chesnaux, Romain; Rafini, Silvain
2018-01-01
The flow dimension parameter n, derived from the Generalized Radial Flow model, is a valuable tool to investigate the actual flow regimes that really occur during a pumping test rather than suppose them to be radial, as postulated by the Theis-derived models. A numerical approach has shown that, when the flow dimension is not radial, using the derivative analysis rather than the conventional Theis and Cooper-Jacob methods helps to estimate much more accurately the hydraulic conductivity of the aquifer. Although n has been analysed in numerous studies including field-based studies, there is a striking lack of knowledge about its occurrence in nature and how it may be related to the hydrogeological setting. This study provides an overview of the occurrence of n in natural aquifers located in various geological contexts including crystalline rock, carbonate rock and granular aquifers. A comprehensive database is compiled from governmental and industrial sources, based on 69 constant-rate pumping tests. By means of a sequential analysis approach, we systematically performed a flow dimension analysis in which straight segments on drawdown-log derivative time series are interpreted as successive, specific and independent flow regimes. To reduce the uncertainties inherent in the identification of n sequences, we used the proprietary SIREN code to execute a dual simultaneous fit on both the drawdown and the drawdown-log derivative signals. Using the stated database, we investigate the frequency with which the radial and non-radial flow regimes occur in fractured rock and granular aquifers, and also provide outcomes that indicate the lack of applicability of Theis-derived models in representing nature. The results also emphasize the complexity of hydraulic signatures observed in nature by pointing out n sequential signals and non-integer n values that are frequently observed in the database.
Karagiannidou, Maria; Wittenberg, Raphael; Landeiro, Filipa Isabel Trigo; Park, A-La; Fry, Andra; Knapp, Martin; Tockhorn-Heidenreich, Antje; Castro Sanchez, Amparo Yovanna; Ghinai, Isaac; Handels, Ron; Lecomte, Pascal; Wolstenholme, Jane
2018-01-01
Introduction Dementia is one of the greatest health challenges the world will face in the coming decades, as it is one of the principal causes of disability and dependency among older people. Economic modelling is used widely across many health conditions to inform decisions on health and social care policy and practice. The aim of this literature review is to systematically identify, review and critically evaluate existing health economics models in dementia. We included the full spectrum of dementia, including Alzheimer’s disease (AD), from preclinical stages through to severe dementia and end of life. This review forms part of the Real world Outcomes across the Alzheimer’s Disease spectrum for better care: multimodal data Access Platform (ROADMAP) project. Methods and analysis Electronic searches were conducted in Medical Literature Analysis and Retrieval System Online, Excerpta Medica dataBASE, Economic Literature Database, NHS Economic Evaluation Database, Cochrane Central Register of Controlled Trials, Cost-Effectiveness Analysis Registry, Research Papers in Economics, Database of Abstracts of Reviews of Effectiveness, Science Citation Index, Turning Research Into Practice and Open Grey for studies published between January 2000 and the end of June 2017. Two reviewers will independently assess each study against predefined eligibility criteria. A third reviewer will resolve any disagreement. Data will be extracted using a predefined data extraction form following best practice. Study quality will be assessed using the Phillips checklist for decision analytic modelling. A narrative synthesis will be used. Ethics and dissemination The results will be made available in a scientific peer-reviewed journal paper, will be presented at relevant conferences and will also be made available through the ROADMAP project. PROSPERO registration number CRD42017073874. PMID:29884696
Information model construction of MES oriented to mechanical blanking workshop
NASA Astrophysics Data System (ADS)
Wang, Jin-bo; Wang, Jin-ye; Yue, Yan-fang; Yao, Xue-min
2016-11-01
Manufacturing Execution System (MES) is one of the crucial technologies to implement informatization management in manufacturing enterprises, and the construction of its information model is the base of MES database development. Basis on the analysis of the manufacturing process information in mechanical blanking workshop and the information requirement of MES every function module, the IDEF1X method was adopted to construct the information model of MES oriented to mechanical blanking workshop, and a detailed description of the data structure feature included in MES every function module and their logical relationship was given from the point of view of information relationship, which laid the foundation for the design of MES database.
Tribology and Friction of Soft Materials: Mississippi State Case Study
2010-03-18
elastomers , foams, and fabrics. B. Develop internal state variable (ISV) material model. Model will be calibrated using database and verified...Rubbers Natural rubber Santoprene (Vulcanized Elastomer ) Styrene Butadiene Rubber (SBR) Foams Polypropylene Foam Polyurethane Foam Fabrics Kevlar...Axially symmetric model PC Disk PC Numerical Implementation in FEM Codes Experiment SEM Optical methods ISV Model Void Nucleation FEM Analysis
The Watershed Health Assessment Tools-Investigating Fisheries (WHAT-IF) is a decision-analysis modeling toolkit for personal computers that supports watershed and fisheries management. The WHAT-IF toolkit includes a relational database, help-system functions and documentation, a...
Second-Tier Database for Ecosystem Focus, 2003-2004 Annual Report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
University of Washington, Columbia Basin Research, DART Project Staff,
2004-12-01
The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities essential to sound operational and resource management. The database also assists with juvenile and adult mainstem passage modeling supporting federal decisions affecting the operation of the FCRPS. The Second-Tier Database known as Data Access in Real Time (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures for evaluating the condition of Columbia Basin salmonid stocks. These services are critical tomore » BPA's implementation of its fish and wildlife responsibilities under the Endangered Species Act (ESA).« less
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.
Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong
2015-11-01
The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.
The LANL hemorrhagic fever virus database, a new platform for analyzing biothreat viruses
Kuiken, Carla; Thurmond, Jim; Dimitrijevic, Mira; Yoon, Hyejin
2012-01-01
Hemorrhagic fever viruses (HFVs) are a diverse set of over 80 viral species, found in 10 different genera comprising five different families: arena-, bunya-, flavi-, filo- and togaviridae. All these viruses are highly variable and evolve rapidly, making them elusive targets for the immune system and for vaccine and drug design. About 55 000 HFV sequences exist in the public domain today. A central website that provides annotated sequences and analysis tools will be helpful to HFV researchers worldwide. The HFV sequence database collects and stores sequence data and provides a user-friendly search interface and a large number of sequence analysis tools, following the model of the highly regarded and widely used Los Alamos HIV database [Kuiken, C., B. Korber, and R.W. Shafer, HIV sequence databases. AIDS Rev, 2003. 5: p. 52–61]. The database uses an algorithm that aligns each sequence to a species-wide reference sequence. The NCBI RefSeq database [Sayers et al. (2011) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 39, D38–D51.] is used for this; if a reference sequence is not available, a Blast search finds the best candidate. Using this method, sequences in each genus can be retrieved pre-aligned. The HFV website can be accessed via http://hfv.lanl.gov. PMID:22064861
ERIC Educational Resources Information Center
Pozo, Pablo; Grao-Cruces, Alberto; Pérez-Ordás, Raquel
2018-01-01
The purpose of this study was to conduct a review of research on the Teaching Personal and Social Responsibility model-based programme within physical education. Papers selected for analysis were found through searches of Web of Science, SportDiscus (EBSCO), SCOPUS, and ERIC (ProQuest) databases. The keywords "responsibility model" and…
GénoPlante-Info (GPI): a collection of databases and bioinformatics resources for plant genomics
Samson, Delphine; Legeai, Fabrice; Karsenty, Emmanuelle; Reboux, Sébastien; Veyrieras, Jean-Baptiste; Just, Jeremy; Barillot, Emmanuel
2003-01-01
Génoplante is a partnership program between public French institutes (INRA, CIRAD, IRD and CNRS) and private companies (Biogemma, Bayer CropScience and Bioplante) that aims at developing genome analysis programs for crop species (corn, wheat, rapeseed, sunflower and pea) and model plants (Arabidopsis and rice). The outputs of these programs form a wealth of information (genomic sequence, transcriptome, proteome, allelic variability, mapping and synteny, and mutation data) and tools (databases, interfaces, analysis software), that are being integrated and made public at the public bioinformatics resource centre of Génoplante: GénoPlante-Info (GPI). This continuous flood of data and tools is regularly updated and will grow continuously during the coming two years. Access to the GPI databases and tools is available at http://genoplante-info.infobiogen.fr/. PMID:12519976
CycADS: an annotation database system to ease the development and update of BioCyc databases
Vellozo, Augusto F.; Véron, Amélie S.; Baa-Puyoulet, Patrice; Huerta-Cepas, Jaime; Cottret, Ludovic; Febvay, Gérard; Calevro, Federica; Rahbé, Yvan; Douglas, Angela E.; Gabaldón, Toni; Sagot, Marie-France; Charles, Hubert; Colella, Stefano
2011-01-01
In recent years, genomes from an increasing number of organisms have been sequenced, but their annotation remains a time-consuming process. The BioCyc databases offer a framework for the integrated analysis of metabolic networks. The Pathway tool software suite allows the automated construction of a database starting from an annotated genome, but it requires prior integration of all annotations into a specific summary file or into a GenBank file. To allow the easy creation and update of a BioCyc database starting from the multiple genome annotation resources available over time, we have developed an ad hoc data management system that we called Cyc Annotation Database System (CycADS). CycADS is centred on a specific database model and on a set of Java programs to import, filter and export relevant information. Data from GenBank and other annotation sources (including for example: KAAS, PRIAM, Blast2GO and PhylomeDB) are collected into a database to be subsequently filtered and extracted to generate a complete annotation file. This file is then used to build an enriched BioCyc database using the PathoLogic program of Pathway Tools. The CycADS pipeline for annotation management was used to build the AcypiCyc database for the pea aphid (Acyrthosiphon pisum) whose genome was recently sequenced. The AcypiCyc database webpage includes also, for comparative analyses, two other metabolic reconstruction BioCyc databases generated using CycADS: TricaCyc for Tribolium castaneum and DromeCyc for Drosophila melanogaster. Linked to its flexible design, CycADS offers a powerful software tool for the generation and regular updating of enriched BioCyc databases. The CycADS system is particularly suited for metabolic gene annotation and network reconstruction in newly sequenced genomes. Because of the uniform annotation used for metabolic network reconstruction, CycADS is particularly useful for comparative analysis of the metabolism of different organisms. Database URL: http://www.cycadsys.org PMID:21474551
Effects of exposure to malathion on blood glucose concentration: a meta-analysis.
Ramirez-Vargas, Marco Antonio; Flores-Alfaro, Eugenia; Uriostegui-Acosta, Mayrut; Alvarez-Fitz, Patricia; Parra-Rojas, Isela; Moreno-Godinez, Ma Elena
2018-02-01
Exposure to malathion (an organophosphate pesticide widely used around the world) has been associated with alterations in blood glucose concentration in animal models. However, the results are inconsistent. The aim of this meta-analysis was to evaluate whether malathion exposure can disturb the concentrations of blood glucose in exposed rats. We performed a literature search of online databases including PubMed, EBSCO, and Google Scholar and reviewed original articles that analyzed the relation between malathion exposure and glucose levels in animal models. The selection of articles was based on inclusion and exclusion criteria. The database search identified thirty-five possible articles, but only eight fulfilled our inclusion criteria, and these studies were included in the meta-analysis. The effect of malathion on blood glucose concentration showed a non-monotonic dose-response curve. In addition, pooled analysis showed that blood glucose concentrations were 3.3-fold higher in exposed rats than in the control group (95% CI, 2-5; Z = 3.9; p < 0.0001) in a random-effect model. This result suggested that alteration of glucose homeostasis is a possible mechanism of toxicity associated with exposure to malathion.
eXtended CASA Line Analysis Software Suite (XCLASS)
NASA Astrophysics Data System (ADS)
Möller, T.; Endres, C.; Schilke, P.
2017-02-01
The eXtended CASA Line Analysis Software Suite (XCLASS) is a toolbox for the Common Astronomy Software Applications package (CASA) containing new functions for modeling interferometric and single dish data. Among the tools is the myXCLASS program which calculates synthetic spectra by solving the radiative transfer equation for an isothermal object in one dimension, whereas the finite source size and dust attenuation are considered as well. Molecular data required by the myXCLASS program are taken from an embedded SQLite3 database containing entries from the Cologne Database for Molecular Spectroscopy (CDMS) and JPL using the Virtual Atomic and Molecular Data Center (VAMDC) portal. Additionally, the toolbox provides an interface for the model optimizer package Modeling and Analysis Generic Interface for eXternal numerical codes (MAGIX), which helps to find the best description of observational data using myXCLASS (or another external model program), that is, finding the parameter set that most closely reproduces the data. http://www.astro.uni-koeln.de/projects/schilke/myXCLASSInterface A copy of the code is available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A7
Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A
2011-08-15
Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
An integrated approach to reservoir modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donaldson, K.
1993-08-01
The purpose of this research is to evaluate the usefulness of the following procedural and analytical methods in investigating the heterogeneity of the oil reserve for the Mississipian Big Injun Sandstone of the Granny Creek field, Clay and Roane counties, West Virginia: (1) relational database, (2) two-dimensional cross sections, (3) true three-dimensional modeling, (4) geohistory analysis, (5) a rule-based expert system, and (6) geographical information systems. The large data set could not be effectively integrated and interpreted without this approach. A relational database was designed to fully integrate three- and four-dimensional data. The database provides an effective means for maintainingmore » and manipulating the data. A two-dimensional cross section program was designed to correlate stratigraphy, depositional environments, porosity, permeability, and petrographic data. This flexible design allows for additional four-dimensional data. Dynamic Graphics[sup [trademark
NASA Astrophysics Data System (ADS)
Cheng, Tao; Rivard, Benoit; Sánchez-Azofeifa, Arturo G.; Féret, Jean-Baptiste; Jacquemoud, Stéphane; Ustin, Susan L.
2014-01-01
Leaf mass per area (LMA), the ratio of leaf dry mass to leaf area, is a trait of central importance to the understanding of plant light capture and carbon gain. It can be estimated from leaf reflectance spectroscopy in the infrared region, by making use of information about the absorption features of dry matter. This study reports on the application of continuous wavelet analysis (CWA) to the estimation of LMA across a wide range of plant species. We compiled a large database of leaf reflectance spectra acquired within the framework of three independent measurement campaigns (ANGERS, LOPEX and PANAMA) and generated a simulated database using the PROSPECT leaf optical properties model. CWA was applied to the measured and simulated databases to extract wavelet features that correlate with LMA. These features were assessed in terms of predictive capability and robustness while transferring predictive models from the simulated database to the measured database. The assessment was also conducted with two existing spectral indices, namely the Normalized Dry Matter Index (NDMI) and the Normalized Difference index for LMA (NDLMA). Five common wavelet features were determined from the two databases, which showed significant correlations with LMA (R2: 0.51-0.82, p < 0.0001). The best robustness (R2 = 0.74, RMSE = 18.97 g/m2 and Bias = 0.12 g/m2) was obtained using a combination of two low-scale features (1639 nm, scale 4) and (2133 nm, scale 5), the first being predominantly important. The transferability of the wavelet-based predictive model to the whole measured database was either better than or comparable to those based on spectral indices. Additionally, only the wavelet-based model showed consistent predictive capabilities among the three measured data sets. In comparison, the models based on spectral indices were sensitive to site-specific data sets. Integrating the NDLMA spectral index and the two robust wavelet features improved the LMA prediction. One of the bands used by this spectral index, 1368 nm, was located in a strong atmospheric water absorption region and replacing it with the next available band (1340 nm) led to lower predictive accuracies. However, the two wavelet features were not affected by data quality in the atmospheric absorption regions and therefore showed potential for canopy-level investigations. The wavelet approach provides a different perspective into spectral responses to LMA variation than the traditional spectral indices and holds greater promise for implementation with airborne or spaceborne imaging spectroscopy data for mapping canopy foliar dry biomass.
The Gap Analysis Program (GAP) is a national interagency program that maps the distribution of plant communities and selected animal species and compares these distributions with land stewardship to identify biotic elements at potential risk of endangerment. Acquisition of primar...
Analysis of Patent Activity in the Field of Quantum Information Processing
NASA Astrophysics Data System (ADS)
Winiarczyk, Ryszard; Gawron, Piotr; Miszczak, Jarosław Adam; Pawela, Łukasz; Puchała, Zbigniew
2013-03-01
This paper provides an analysis of patent activity in the field of quantum information processing. Data from the PatentScope database from the years 1993-2011 was used. In order to predict the future trends in the number of filed patents time series models were used.
Knowlton, Michelle N; Li, Tongbin; Ren, Yongliang; Bill, Brent R; Ellis, Lynda Bm; Ekker, Stephen C
2008-01-07
The zebrafish is a powerful model vertebrate amenable to high throughput in vivo genetic analyses. Examples include reverse genetic screens using morpholino knockdown, expression-based screening using enhancer trapping and forward genetic screening using transposon insertional mutagenesis. We have created a database to facilitate web-based distribution of data from such genetic studies. The MOrpholino DataBase is a MySQL relational database with an online, PHP interface. Multiple quality control levels allow differential access to data in raw and finished formats. MODBv1 includes sequence information relating to almost 800 morpholinos and their targets and phenotypic data regarding the dose effect of each morpholino (mortality, toxicity and defects). To improve the searchability of this database, we have incorporated a fixed-vocabulary defect ontology that allows for the organization of morpholino affects based on anatomical structure affected and defect produced. This also allows comparison between species utilizing Phenotypic Attribute Trait Ontology (PATO) designated terminology. MODB is also cross-linked with ZFIN, allowing full searches between the two databases. MODB offers users the ability to retrieve morpholino data by sequence of morpholino or target, name of target, anatomical structure affected and defect produced. MODB data can be used for functional genomic analysis of morpholino design to maximize efficacy and minimize toxicity. MODB also serves as a template for future sequence-based functional genetic screen databases, and it is currently being used as a model for the creation of a mutagenic insertional transposon database.
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2011-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
A dynamic appearance descriptor approach to facial actions temporal modeling.
Jiang, Bihan; Valstar, Michel; Martinez, Brais; Pantic, Maja
2014-02-01
Both the configuration and the dynamics of facial expressions are crucial for the interpretation of human facial behavior. Yet to date, the vast majority of reported efforts in the field either do not take the dynamics of facial expressions into account, or focus only on prototypic facial expressions of six basic emotions. Facial dynamics can be explicitly analyzed by detecting the constituent temporal segments in Facial Action Coding System (FACS) Action Units (AUs)-onset, apex, and offset. In this paper, we present a novel approach to explicit analysis of temporal dynamics of facial actions using the dynamic appearance descriptor Local Phase Quantization from Three Orthogonal Planes (LPQ-TOP). Temporal segments are detected by combining a discriminative classifier for detecting the temporal segments on a frame-by-frame basis with Markov Models that enforce temporal consistency over the whole episode. The system is evaluated in detail over the MMI facial expression database, the UNBC-McMaster pain database, the SAL database, the GEMEP-FERA dataset in database-dependent experiments, in cross-database experiments using the Cohn-Kanade, and the SEMAINE databases. The comparison with other state-of-the-art methods shows that the proposed LPQ-TOP method outperforms the other approaches for the problem of AU temporal segment detection, and that overall AU activation detection benefits from dynamic appearance information.
A Participants' DSS for a Management Game with a DSS Generator.
ERIC Educational Resources Information Center
Yeo, Gee Kin; Nah, Fui Hoon
1992-01-01
Describes the design of a decision support system (DSS) for a management game called MAGNUS (Management Game for National University of Singapore). Built-in models for performance analysis and decision making are explained; database query and model building are described; and future work is discussed. (11 references) (LRW)
HESI EXPOSURE FACTORS DATABASE FOR AGGREGATE AND CUMULATIVE RISK ASSESSMENT
In recent years, the risk analysis community has broadened its use of complex aggregate and cumulative residential exposure models (e.g., to meet the requirements of the 1996 Food Quality Protection Act). The value of these models is their ability to incorporate a range of inp...
NASA Astrophysics Data System (ADS)
Ştefan, Bilaşco; Sanda, Roşca; Ioan, Fodorean; Iuliu, Vescan; Sorin, Filip; Dănuţ, Petrea
2017-12-01
Maramureş Land is mostly characterized by agricultural and forestry land use due to its specific configuration of topography and its specific pedoclimatic conditions. Taking into consideration the trend of the last century from the perspective of land management, a decrease in the surface of agricultural lands to the advantage of built-up and grass lands, as well as an accelerated decrease in the forest cover due to uncontrolled and irrational forest exploitation, has become obvious. The field analysis performed on the territory of Maramureş Land has highlighted a high frequency of two geomorphologic processes — landslides and soil erosion — which have a major negative impact on land use due to their rate of occurrence. The main aim of the present study is the GIS modeling of the two geomorphologic processes, determining a state of vulnerability (the USLE model for soil erosion and a quantitative model based on the morphometric characteristics of the territory, derived from the HG. 447/2003) and their integration in a complex model of cumulated vulnerability identification. The modeling of the risk exposure was performed using a quantitative approach based on models and equations of spatial analysis, which were developed with modeled raster data structures and primary vector data, through a matrix highlighting the correspondence between vulnerability and land use classes. The quantitative analysis of the risk was performed by taking into consideration the exposure classes as modeled databases and the land price as a primary alphanumeric database using spatial analysis techniques for each class by means of the attribute table. The spatial results highlight the territories with a high risk to present geomorphologic processes that have a high degree of occurrence and represent a useful tool in the process of spatial planning.
NASA Astrophysics Data System (ADS)
Ştefan, Bilaşco; Sanda, Roşca; Ioan, Fodorean; Iuliu, Vescan; Sorin, Filip; Dănuţ, Petrea
2018-06-01
Maramureş Land is mostly characterized by agricultural and forestry land use due to its specific configuration of topography and its specific pedoclimatic conditions. Taking into consideration the trend of the last century from the perspective of land management, a decrease in the surface of agricultural lands to the advantage of built-up and grass lands, as well as an accelerated decrease in the forest cover due to uncontrolled and irrational forest exploitation, has become obvious. The field analysis performed on the territory of Maramureş Land has highlighted a high frequency of two geomorphologic processes — landslides and soil erosion — which have a major negative impact on land use due to their rate of occurrence. The main aim of the present study is the GIS modeling of the two geomorphologic processes, determining a state of vulnerability (the USLE model for soil erosion and a quantitative model based on the morphometric characteristics of the territory, derived from the HG. 447/2003) and their integration in a complex model of cumulated vulnerability identification. The modeling of the risk exposure was performed using a quantitative approach based on models and equations of spatial analysis, which were developed with modeled raster data structures and primary vector data, through a matrix highlighting the correspondence between vulnerability and land use classes. The quantitative analysis of the risk was performed by taking into consideration the exposure classes as modeled databases and the land price as a primary alphanumeric database using spatial analysis techniques for each class by means of the attribute table. The spatial results highlight the territories with a high risk to present geomorphologic processes that have a high degree of occurrence and represent a useful tool in the process of spatial planning.
Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun
2015-01-01
B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q2 = 0.621, r2pred = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained. PMID:26035757
Xie, Huiding; Chen, Lijun; Zhang, Jianqiang; Xie, Xiaoguang; Qiu, Kaixiong; Fu, Jijun
2015-05-29
B-Raf kinase is an important target in treatment of cancers. In order to design and find potent B-Raf inhibitors (BRIs), 3D pharmacophore models were created using the Genetic Algorithm with Linear Assignment of Hypermolecular Alignment of Database (GALAHAD). The best pharmacophore model obtained which was used in effective alignment of the data set contains two acceptor atoms, three donor atoms and three hydrophobes. In succession, comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) were performed on 39 imidazopyridine BRIs to build three dimensional quantitative structure-activity relationship (3D QSAR) models based on both pharmacophore and docking alignments. The CoMSIA model based on the pharmacophore alignment shows the best result (q(2) = 0.621, r(2)(pred) = 0.885). This 3D QSAR approach provides significant insights that are useful for designing potent BRIs. In addition, the obtained best pharmacophore model was used for virtual screening against the NCI2000 database. The hit compounds were further filtered with molecular docking, and their biological activities were predicted using the CoMSIA model, and three potential BRIs with new skeletons were obtained.
Demircan, Turan; Keskin, Ilknur; Dumlu, Seda Nilgün; Aytürk, Nilüfer; Avşaroğlu, Mahmut Erhan; Akgün, Emel; Öztürk, Gürkan; Baykal, Ahmet Tarık
2017-01-01
Salamander axolotl has been emerging as an important model for stem cell research due to its powerful regenerative capacity. Several advantages, such as the high capability of advanced tissue, organ, and appendages regeneration, promote axolotl as an ideal model system to extend our current understanding on the mechanisms of regeneration. Acknowledging the common molecular pathways between amphibians and mammals, there is a great potential to translate the messages from axolotl research to mammalian studies. However, the utilization of axolotl is hindered due to the lack of reference databases of genomic, transcriptomic, and proteomic data. Here, we introduce the proteome analysis of the axolotl tail section searched against an mRNA-seq database. We translated axolotl mRNA sequences to protein sequences and annotated these to process the LC-MS/MS data and identified 1001 nonredundant proteins. Functional classification of identified proteins was performed by gene ontology searches. The presence of some of the identified proteins was validated by in situ antibody labeling. Furthermore, we have analyzed the proteome expressional changes postamputation at three time points to evaluate the underlying mechanisms of the regeneration process. Taken together, this work expands the proteomics data of axolotl to contribute to its establishment as a fully utilized model. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Application of bioinformatics tools and databases in microbial dehalogenation research (a review).
Satpathy, R; Konkimalla, V B; Ratha, J
2015-01-01
Microbial dehalogenation is a biochemical process in which the halogenated substances are catalyzed enzymatically in to their non-halogenated form. The microorganisms have a wide range of organohalogen degradation ability both explicit and non-specific in nature. Most of these halogenated organic compounds being pollutants need to be remediated; therefore, the current approaches are to explore the potential of microbes at a molecular level for effective biodegradation of these substances. Several microorganisms with dehalogenation activity have been identified and characterized. In this aspect, the bioinformatics plays a key role to gain deeper knowledge in this field of dehalogenation. To facilitate the data mining, many tools have been developed to annotate these data from databases. Therefore, with the discovery of a microorganism one can predict a gene/protein, sequence analysis, can perform structural modelling, metabolic pathway analysis, biodegradation study and so on. This review highlights various methods of bioinformatics approach that describes the application of various databases and specific tools in the microbial dehalogenation fields with special focus on dehalogenase enzymes. Attempts have also been made to decipher some recent applications of in silico modeling methods that comprise of gene finding, protein modelling, Quantitative Structure Biodegradibility Relationship (QSBR) study and reconstruction of metabolic pathways employed in dehalogenation research area.
NASA Technical Reports Server (NTRS)
Worrall, Diana M. (Editor); Biemesderfer, Chris (Editor); Barnes, Jeannette (Editor)
1992-01-01
Consideration is given to a definition of a distribution format for X-ray data, the Einstein on-line system, the NASA/IPAC extragalactic database, COBE astronomical databases, Cosmic Background Explorer astronomical databases, the ADAM software environment, the Groningen Image Processing System, search for a common data model for astronomical data analysis systems, deconvolution for real and synthetic apertures, pitfalls in image reconstruction, a direct method for spectral and image restoration, and a discription of a Poisson imagery super resolution algorithm. Also discussed are multivariate statistics on HI and IRAS images, a faint object classification using neural networks, a matched filter for improving SNR of radio maps, automated aperture photometry of CCD images, interactive graphics interpreter, the ROSAT extreme ultra-violet sky survey, a quantitative study of optimal extraction, an automated analysis of spectra, applications of synthetic photometry, an algorithm for extra-solar planet system detection and data reduction facilities for the William Herschel telescope.
Scientific information repository assisting reflectance spectrometry in legal medicine.
Belenki, Liudmila; Sterzik, Vera; Bohnert, Michael; Zimmermann, Klaus; Liehr, Andreas W
2012-06-01
Reflectance spectrometry is a fast and reliable method for the characterization of human skin if the spectra are analyzed with respect to a physical model describing the optical properties of human skin. For a field study performed at the Institute of Legal Medicine and the Freiburg Materials Research Center of the University of Freiburg, a scientific information repository has been developed, which is a variant of an electronic laboratory notebook and assists in the acquisition, management, and high-throughput analysis of reflectance spectra in heterogeneous research environments. At the core of the repository is a database management system hosting the master data. It is filled with primary data via a graphical user interface (GUI) programmed in Java, which also enables the user to browse the database and access the results of data analysis. The latter is carried out via Matlab, Python, and C programs, which retrieve the primary data from the scientific information repository, perform the analysis, and store the results in the database for further usage.
Application Program Interface for the Orion Aerodynamics Database
NASA Technical Reports Server (NTRS)
Robinson, Philip E.; Thompson, James
2013-01-01
The Application Programming Interface (API) for the Crew Exploration Vehicle (CEV) Aerodynamic Database has been developed to provide the developers of software an easily implemented, fully self-contained method of accessing the CEV Aerodynamic Database for use in their analysis and simulation tools. The API is programmed in C and provides a series of functions to interact with the database, such as initialization, selecting various options, and calculating the aerodynamic data. No special functions (file read/write, table lookup) are required on the host system other than those included with a standard ANSI C installation. It reads one or more files of aero data tables. Previous releases of aerodynamic databases for space vehicles have only included data tables and a document of the algorithm and equations to combine them for the total aerodynamic forces and moments. This process required each software tool to have a unique implementation of the database code. Errors or omissions in the documentation, or errors in the implementation, led to a lengthy and burdensome process of having to debug each instance of the code. Additionally, input file formats differ for each space vehicle simulation tool, requiring the aero database tables to be reformatted to meet the tool s input file structure requirements. Finally, the capabilities for built-in table lookup routines vary for each simulation tool. Implementation of a new database may require an update to and verification of the table lookup routines. This may be required if the number of dimensions of a data table exceeds the capability of the simulation tools built-in lookup routines. A single software solution was created to provide an aerodynamics software model that could be integrated into other simulation and analysis tools. The highly complex Orion aerodynamics model can then be quickly included in a wide variety of tools. The API code is written in ANSI C for ease of portability to a wide variety of systems. The input data files are in standard formatted ASCII, also for improved portability. The API contains its own implementation of multidimensional table reading and lookup routines. The same aerodynamics input file can be used without modification on all implementations. The turnaround time from aerodynamics model release to a working implementation is significantly reduced
Using Cotton Model Simulations to Estimate Optimally Profitable Irrigation Strategies
NASA Astrophysics Data System (ADS)
Mauget, S. A.; Leiker, G.; Sapkota, P.; Johnson, J.; Maas, S.
2011-12-01
In recent decades irrigation pumping from the Ogallala Aquifer has led to declines in saturated thickness that have not been compensated for by natural recharge, which has led to questions about the long-term viability of agriculture in the cotton producing areas of west Texas. Adopting irrigation management strategies that optimize profitability while reducing irrigation waste is one way of conserving the aquifer's water resource. Here, a database of modeled cotton yields generated under drip and center pivot irrigated and dryland production scenarios is used in a stochastic dominance analysis that identifies such strategies under varying commodity price and pumping cost conditions. This database and analysis approach will serve as the foundation for a web-based decision support tool that will help producers identify optimal irrigation treatments under specified cotton price, electricity cost, and depth to water table conditions.
Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases
Zhang, Hongpo
2018-01-01
Quantitative analysis and prediction can help to reduce the risk of cardiovascular disease. Quantitative prediction based on traditional model has low accuracy. The variance of model prediction based on shallow neural network is larger. In this paper, cardiovascular disease prediction model based on improved deep belief network (DBN) is proposed. Using the reconstruction error, the network depth is determined independently, and unsupervised training and supervised optimization are combined. It ensures the accuracy of model prediction while guaranteeing stability. Thirty experiments were performed independently on the Statlog (Heart) and Heart Disease Database data sets in the UCI database. Experimental results showed that the mean of prediction accuracy was 91.26% and 89.78%, respectively. The variance of prediction accuracy was 5.78 and 4.46, respectively. PMID:29854369
NASA Astrophysics Data System (ADS)
Owens, P. R.; Libohova, Z.; Seybold, C. A.; Wills, S. A.; Peaslee, S.; Beaudette, D.; Lindbo, D. L.
2017-12-01
The measurement errors and spatial prediction uncertainties of soil properties in the modeling community are usually assessed against measured values when available. However, of equal importance is the assessment of errors and uncertainty impacts on cost benefit analysis and risk assessments. Soil pH was selected as one of the most commonly measured soil properties used for liming recommendations. The objective of this study was to assess the error size from different sources and their implications with respect to management decisions. Error sources include measurement methods, laboratory sources, pedotransfer functions, database transections, spatial aggregations, etc. Several databases of measured and predicted soil pH were used for this study including the United States National Cooperative Soil Survey Characterization Database (NCSS-SCDB), the US Soil Survey Geographic (SSURGO) Database. The distribution of errors among different sources from measurement methods to spatial aggregation showed a wide range of values. The greatest RMSE of 0.79 pH units was from spatial aggregation (SSURGO vs Kriging), while the measurement methods had the lowest RMSE of 0.06 pH units. Assuming the order of data acquisition based on the transaction distance i.e. from measurement method to spatial aggregation the RMSE increased from 0.06 to 0.8 pH units suggesting an "error propagation". This has major implications for practitioners and modeling community. Most soil liming rate recommendations are based on 0.1 pH unit increments, while the desired soil pH level increments are based on 0.4 to 0.5 pH units. Thus, even when the measured and desired target soil pH are the same most guidelines recommend 1 ton ha-1 lime, which translates in 111 ha-1 that the farmer has to factor in the cost-benefit analysis. However, this analysis need to be based on uncertainty predictions (0.5-1.0 pH units) rather than measurement errors (0.1 pH units) which would translate in 555-1,111 investment that need to be assessed against the risk. The modeling community can benefit from such analysis, however, error size and spatial distribution for global and regional predictions need to be assessed against the variability of other drivers and impact on management decisions.
The Chinchilla Research Resource Database: resource for an otolaryngology disease model
Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.
2016-01-01
The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523
Sukhinin, Dmitrii I.; Engel, Andreas K.; Manger, Paul; Hilgetag, Claus C.
2016-01-01
Databases of structural connections of the mammalian brain, such as CoCoMac (cocomac.g-node.org) or BAMS (https://bams1.org), are valuable resources for the analysis of brain connectivity and the modeling of brain dynamics in species such as the non-human primate or the rodent, and have also contributed to the computational modeling of the human brain. Another animal model that is widely used in electrophysiological or developmental studies is the ferret; however, no systematic compilation of brain connectivity is currently available for this species. Thus, we have started developing a database of anatomical connections and architectonic features of the ferret brain, the Ferret(connect)ome, www.Ferretome.org. The Ferretome database has adapted essential features of the CoCoMac methodology and legacy, such as the CoCoMac data model. This data model was simplified and extended in order to accommodate new data modalities that were not represented previously, such as the cytoarchitecture of brain areas. The Ferretome uses a semantic parcellation of brain regions as well as a logical brain map transformation algorithm (objective relational transformation, ORT). The ORT algorithm was also adopted for the transformation of architecture data. The database is being developed in MySQL and has been populated with literature reports on tract-tracing observations in the ferret brain using a custom-designed web interface that allows efficient and validated simultaneous input and proofreading by multiple curators. The database is equipped with a non-specialist web interface. This interface can be extended to produce connectivity matrices in several formats, including a graphical representation superimposed on established ferret brain maps. An important feature of the Ferretome database is the possibility to trace back entries in connectivity matrices to the original studies archived in the system. Currently, the Ferretome contains 50 reports on connections comprising 20 injection reports with more than 150 labeled source and target areas, the majority reflecting connectivity of subcortical nuclei and 15 descriptions of regional brain architecture. We hope that the Ferretome database will become a useful resource for neuroinformatics and neural modeling, and will support studies of the ferret brain as well as facilitate advances in comparative studies of mesoscopic brain connectivity. PMID:27242503
NASA Astrophysics Data System (ADS)
Hashimoto, Shoji; Nanko, Kazuki; Ťupek, Boris; Lehtonen, Aleksi
2017-03-01
Future climate change will dramatically change the carbon balance in the soil, and this change will affect the terrestrial carbon stock and the climate itself. Earth system models (ESMs) are used to understand the current climate and to project future climate conditions, but the soil organic carbon (SOC) stock simulated by ESMs and those of observational databases are not well correlated when the two are compared at fine grid scales. However, the specific key processes and factors, as well as the relationships among these factors that govern the SOC stock, remain unclear; the inclusion of such missing information would improve the agreement between modeled and observational data. In this study, we sought to identify the influential factors that govern global SOC distribution in observational databases, as well as those simulated by ESMs. We used a data-mining (machine-learning) (boosted regression trees - BRT) scheme to identify the factors affecting the SOC stock. We applied BRT scheme to three observational databases and 15 ESM outputs from the fifth phase of the Coupled Model Intercomparison Project (CMIP5) and examined the effects of 13 variables/factors categorized into five groups (climate, soil property, topography, vegetation, and land-use history). Globally, the contributions of mean annual temperature, clay content, carbon-to-nitrogen (CN) ratio, wetland ratio, and land cover were high in observational databases, whereas the contributions of the mean annual temperature, land cover, and net primary productivity (NPP) were predominant in the SOC distribution in ESMs. A comparison of the influential factors at a global scale revealed that the most distinct differences between the SOCs from the observational databases and ESMs were the low clay content and CN ratio contributions, and the high NPP contribution in the ESMs. The results of this study will aid in identifying the causes of the current mismatches between observational SOC databases and ESM outputs and improve the modeling of terrestrial carbon dynamics in ESMs. This study also reveals how a data-mining algorithm can be used to assess model outputs.
Sukhinin, Dmitrii I; Engel, Andreas K; Manger, Paul; Hilgetag, Claus C
2016-01-01
Databases of structural connections of the mammalian brain, such as CoCoMac (cocomac.g-node.org) or BAMS (https://bams1.org), are valuable resources for the analysis of brain connectivity and the modeling of brain dynamics in species such as the non-human primate or the rodent, and have also contributed to the computational modeling of the human brain. Another animal model that is widely used in electrophysiological or developmental studies is the ferret; however, no systematic compilation of brain connectivity is currently available for this species. Thus, we have started developing a database of anatomical connections and architectonic features of the ferret brain, the Ferret(connect)ome, www.Ferretome.org. The Ferretome database has adapted essential features of the CoCoMac methodology and legacy, such as the CoCoMac data model. This data model was simplified and extended in order to accommodate new data modalities that were not represented previously, such as the cytoarchitecture of brain areas. The Ferretome uses a semantic parcellation of brain regions as well as a logical brain map transformation algorithm (objective relational transformation, ORT). The ORT algorithm was also adopted for the transformation of architecture data. The database is being developed in MySQL and has been populated with literature reports on tract-tracing observations in the ferret brain using a custom-designed web interface that allows efficient and validated simultaneous input and proofreading by multiple curators. The database is equipped with a non-specialist web interface. This interface can be extended to produce connectivity matrices in several formats, including a graphical representation superimposed on established ferret brain maps. An important feature of the Ferretome database is the possibility to trace back entries in connectivity matrices to the original studies archived in the system. Currently, the Ferretome contains 50 reports on connections comprising 20 injection reports with more than 150 labeled source and target areas, the majority reflecting connectivity of subcortical nuclei and 15 descriptions of regional brain architecture. We hope that the Ferretome database will become a useful resource for neuroinformatics and neural modeling, and will support studies of the ferret brain as well as facilitate advances in comparative studies of mesoscopic brain connectivity.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-01-01
Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Development Of New Databases For Tsunami Hazard Analysis In California
NASA Astrophysics Data System (ADS)
Wilson, R. I.; Barberopoulou, A.; Borrero, J. C.; Bryant, W. A.; Dengler, L. A.; Goltz, J. D.; Legg, M.; McGuire, T.; Miller, K. M.; Real, C. R.; Synolakis, C.; Uslu, B.
2009-12-01
The California Geological Survey (CGS) has partnered with other tsunami specialists to produce two statewide databases to facilitate the evaluation of tsunami hazard products for both emergency response and land-use planning and development. A robust, State-run tsunami deposit database is being developed that compliments and expands on existing databases from the National Geophysical Data Center (global) and the USGS (Cascadia). Whereas these existing databases focus on references or individual tsunami layers, the new State-maintained database concentrates on the location and contents of individual borings/trenches that sample tsunami deposits. These data provide an important observational benchmark for evaluating the results of tsunami inundation modeling. CGS is collaborating with and sharing the database entry form with other states to encourage its continued development beyond California’s coastline so that historic tsunami deposits can be evaluated on a regional basis. CGS is also developing an internet-based, tsunami source scenario database and forum where tsunami source experts and hydrodynamic modelers can discuss the validity of tsunami sources and their contribution to hazard assessments for California and other coastal areas bordering the Pacific Ocean. The database includes all distant and local tsunami sources relevant to California starting with the forty scenarios evaluated during the creation of the recently completed statewide series of tsunami inundation maps for emergency response planning. Factors germane to probabilistic tsunami hazard analyses (PTHA), such as event histories and recurrence intervals, are also addressed in the database and discussed in the forum. Discussions with other tsunami source experts will help CGS determine what additional scenarios should be considered in PTHA for assessing the feasibility of generating products of value to local land-use planning and development.
Vernetti, Lawrence; Bergenthal, Luke; Shun, Tong Ying; Taylor, D. Lansing
2016-01-01
Abstract Microfluidic human organ models, microphysiology systems (MPS), are currently being developed as predictive models of drug safety and efficacy in humans. To design and validate MPS as predictive of human safety liabilities requires safety data for a reference set of compounds, combined with in vitro data from the human organ models. To address this need, we have developed an internet database, the MPS database (MPS-Db), as a powerful platform for experimental design, data management, and analysis, and to combine experimental data with reference data, to enable computational modeling. The present study demonstrates the capability of the MPS-Db in early safety testing using a human liver MPS to relate the effects of tolcapone and entacapone in the in vitro model to human in vivo effects. These two compounds were chosen to be evaluated as a representative pair of marketed drugs because they are structurally similar, have the same target, and were found safe or had an acceptable risk in preclinical and clinical trials, yet tolcapone induced unacceptable levels of hepatotoxicity while entacapone was found to be safe. Results demonstrate the utility of the MPS-Db as an essential resource for relating in vitro organ model data to the multiple biochemical, preclinical, and clinical data sources on in vivo drug effects. PMID:28781990
Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle
NASA Astrophysics Data System (ADS)
Huh, Lynn
The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.
A spatial database for landslides in northern Bavaria: A methodological approach
NASA Astrophysics Data System (ADS)
Jäger, Daniel; Kreuzer, Thomas; Wilde, Martina; Bemm, Stefan; Terhorst, Birgit
2018-04-01
Landslide databases provide essential information for hazard modeling, damages on buildings and infrastructure, mitigation, and research needs. This study presents the development of a landslide database system named WISL (Würzburg Information System on Landslides), currently storing detailed landslide data for northern Bavaria, Germany, in order to enable scientific queries as well as comparisons with other regional landslide inventories. WISL is based on free open source software solutions (PostgreSQL, PostGIS) assuring good correspondence of the various softwares and to enable further extensions with specific adaptions of self-developed software. Apart from that, WISL was designed to be particularly compatible for easy communication with other databases. As a central pre-requisite for standardized, homogeneous data acquisition in the field, a customized data sheet for landslide description was compiled. This sheet also serves as an input mask for all data registration procedures in WISL. A variety of "in-database" solutions for landslide analysis provides the necessary scalability for the database, enabling operations at the local server. In its current state, WISL already enables extensive analysis and queries. This paper presents an example analysis of landslides in Oxfordian Limestones in the northeastern Franconian Alb, northern Bavaria. The results reveal widely differing landslides in terms of geometry and size. Further queries related to landslide activity classifies the majority of the landslides as currently inactive, however, they clearly possess a certain potential for remobilization. Along with some active mass movements, a significant percentage of landslides potentially endangers residential areas or infrastructure. The main aspect of future enhancements of the WISL database is related to data extensions in order to increase research possibilities, as well as to transfer the system to other regions and countries.
Mitchell, Joshua M.; Fan, Teresa W.-M.; Lane, Andrew N.; Moseley, Hunter N. B.
2014-01-01
Large-scale identification of metabolites is key to elucidating and modeling metabolism at the systems level. Advances in metabolomics technologies, particularly ultra-high resolution mass spectrometry (MS) enable comprehensive and rapid analysis of metabolites. However, a significant barrier to meaningful data interpretation is the identification of a wide range of metabolites including unknowns and the determination of their role(s) in various metabolic networks. Chemoselective (CS) probes to tag metabolite functional groups combined with high mass accuracy provide additional structural constraints for metabolite identification and quantification. We have developed a novel algorithm, Chemically Aware Substructure Search (CASS) that efficiently detects functional groups within existing metabolite databases, allowing for combined molecular formula and functional group (from CS tagging) queries to aid in metabolite identification without a priori knowledge. Analysis of the isomeric compounds in both Human Metabolome Database (HMDB) and KEGG Ligand demonstrated a high percentage of isomeric molecular formulae (43 and 28%, respectively), indicating the necessity for techniques such as CS-tagging. Furthermore, these two databases have only moderate overlap in molecular formulae. Thus, it is prudent to use multiple databases in metabolite assignment, since each major metabolite database represents different portions of metabolism within the biosphere. In silico analysis of various CS-tagging strategies under different conditions for adduct formation demonstrate that combined FT-MS derived molecular formulae and CS-tagging can uniquely identify up to 71% of KEGG and 37% of the combined KEGG/HMDB database vs. 41 and 17%, respectively without adduct formation. This difference between database isomer disambiguation highlights the strength of CS-tagging for non-lipid metabolite identification. However, unique identification of complex lipids still needs additional information. PMID:25120557
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete.
Pour, Sadaf Moallemi; Alam, M Shahria; Milani, Abbas S
2016-08-30
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models.
NASA Astrophysics Data System (ADS)
Hayrapetyan, David B.; Hovhannisyan, Levon; Mantashyan, Paytsar A.
2013-04-01
The analysis of complex spectra is an actual problem for modern science. The work is devoted to the creation of a software package, which analyzes spectrum in the different formats, possesses by dynamic knowledge database and self-study mechanism, performs automated analysis of the spectra compound based on knowledge database by application of certain algorithms. In the software package as searching systems, hyper-spherical random search algorithms, gradient algorithms and genetic searching algorithms were used. The analysis of Raman and IR spectrum of diamond-like carbon (DLC) samples were performed by elaborated program. After processing the data, the program immediately displays all the calculated parameters of DLC.
A survey of commercial object-oriented database management systems
NASA Technical Reports Server (NTRS)
Atkins, John
1992-01-01
The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.
Geologic datasets for weights-of-evidence analysis in northeast Washington: 2. Mineral databases
Boleneus, D.E.
1999-01-01
Digital mineral databases are necessary to carry out weights-of-evidence modeling of mineral resources for epithermal gold and carbonate-hosted lead-zinc deposits in northeast Washington. This report describes spreadsheet tables consisting of: 1) training sites for epithermal gold, 2) placer gold sites, 3) training sites for carbonate-hosted lead-zinc, and 4) small lead-zinc mines and prospects. A fifth table provides location data about sites in the four tables.
Operations and Modeling Analysis
NASA Technical Reports Server (NTRS)
Ebeling, Charles
2005-01-01
The Reliability and Maintainability Analysis Tool (RMAT) provides NASA the capability to estimate reliability and maintainability (R&M) parameters and operational support requirements for proposed space vehicles based upon relationships established from both aircraft and Shuttle R&M data. RMAT has matured both in its underlying database and in its level of sophistication in extrapolating this historical data to satisfy proposed mission requirements, maintenance concepts and policies, and type of vehicle (i.e. ranging from aircraft like to shuttle like). However, a companion analyses tool, the Logistics Cost Model (LCM) has not reached the same level of maturity as RMAT due, in large part, to nonexistent or outdated cost estimating relationships and underlying cost databases, and it's almost exclusive dependence on Shuttle operations and logistics cost input parameters. As a result, the full capability of the RMAT/LCM suite of analysis tools to take a conceptual vehicle and derive its operations and support requirements along with the resulting operating and support costs has not been realized.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, M; Robertson, S; Moore, J
Purpose: Advancement in Radiation Oncology (RO) practice develops through evidence based medicine and clinical trial. Knowledge usable for treatment planning, decision support and research is contained in our clinical data, stored in an Oncospace database. This data store and the tools for populating and analyzing it are compatible with standard RO practice and are shared with collaborating institutions. The question is - what protocol for system development and data sharing within an Oncospace Consortium? We focus our example on the technology and data meaning necessary to share across the Consortium. Methods: Oncospace consists of a database schema, planning and outcomemore » data import and web based analysis tools.1) Database: The Consortium implements a federated data store; each member collects and maintains its own data within an Oncospace schema. For privacy, PHI is contained within a single table, accessible to the database owner.2) Import: Spatial dose data from treatment plans (Pinnacle or DICOM) is imported via Oncolink. Treatment outcomes are imported from an OIS (MOSAIQ).3) Analysis: JHU has built a number of webpages to answer analysis questions. Oncospace data can also be analyzed via MATLAB or SAS queries.These materials are available to Consortium members, who contribute enhancements and improvements. Results: 1) The Oncospace Consortium now consists of RO centers at JHU, UVA, UW and the University of Toronto. These members have successfully installed and populated Oncospace databases with over 1000 patients collectively.2) Members contributing code and getting updates via SVN repository. Errors are reported and tracked via Redmine. Teleconferences include strategizing design and code reviews.3) Successfully remotely queried federated databases to combine multiple institutions’ DVH data for dose-toxicity analysis (see below – data combined from JHU and UW Oncospace). Conclusion: RO data sharing can and has been effected according to the Oncospace Consortium model: http://oncospace.radonc.jhmi.edu/ . John Wong - SRA from Elekta; Todd McNutt - SRA from Elekta; Michael Bowers - funded by Elekta.« less
NASA Astrophysics Data System (ADS)
Tien Bui, Dieu; Hoang, Nhat-Duc
2017-09-01
In this study, a probabilistic model, named as BayGmmKda, is proposed for flood susceptibility assessment in a study area in central Vietnam. The new model is a Bayesian framework constructed by a combination of a Gaussian mixture model (GMM), radial-basis-function Fisher discriminant analysis (RBFDA), and a geographic information system (GIS) database. In the Bayesian framework, GMM is used for modeling the data distribution of flood-influencing factors in the GIS database, whereas RBFDA is utilized to construct a latent variable that aims at enhancing the model performance. As a result, the posterior probabilistic output of the BayGmmKda model is used as flood susceptibility index. Experiment results showed that the proposed hybrid framework is superior to other benchmark models, including the adaptive neuro-fuzzy inference system and the support vector machine. To facilitate the model implementation, a software program of BayGmmKda has been developed in MATLAB. The BayGmmKda program can accurately establish a flood susceptibility map for the study region. Accordingly, local authorities can overlay this susceptibility map onto various land-use maps for the purpose of land-use planning or management.
Gianni, Daniele; McKeever, Steve; Yu, Tommy; Britten, Randall; Delingette, Hervé; Frangi, Alejandro; Hunter, Peter; Smith, Nicolas
2010-06-28
Sharing and reusing anatomical models over the Web offers a significant opportunity to progress the investigation of cardiovascular diseases. However, the current sharing methodology suffers from the limitations of static model delivery (i.e. embedding static links to the models within Web pages) and of a disaggregated view of the model metadata produced by publications and cardiac simulations in isolation. In the context of euHeart--a research project targeting the description and representation of cardiovascular models for disease diagnosis and treatment purposes--we aim to overcome the above limitations with the introduction of euHeartDB, a Web-enabled database for anatomical models of the heart. The database implements a dynamic sharing methodology by managing data access and by tracing all applications. In addition to this, euHeartDB establishes a knowledge link with the physiome model repository by linking geometries to CellML models embedded in the simulation of cardiac behaviour. Furthermore, euHeartDB uses the exFormat--a preliminary version of the interoperable FieldML data format--to effectively promote reuse of anatomical models, and currently incorporates Continuum Mechanics, Image Analysis, Signal Processing and System Identification Graphical User Interface (CMGUI), a rendering engine, to provide three-dimensional graphical views of the models populating the database. Currently, euHeartDB stores 11 cardiac geometries developed within the euHeart project consortium.
Life sciences domain analysis model
Freimuth, Robert R; Freund, Elaine T; Schick, Lisa; Sharma, Mukesh K; Stafford, Grace A; Suzek, Baris E; Hernandez, Joyce; Hipp, Jason; Kelley, Jenny M; Rokicki, Konrad; Pan, Sue; Buckler, Andrew; Stokes, Todd H; Fernandez, Anna; Fore, Ian; Buetow, Kenneth H
2012-01-01
Objective Meaningful exchange of information is a fundamental challenge in collaborative biomedical research. To help address this, the authors developed the Life Sciences Domain Analysis Model (LS DAM), an information model that provides a framework for communication among domain experts and technical teams developing information systems to support biomedical research. The LS DAM is harmonized with the Biomedical Research Integrated Domain Group (BRIDG) model of protocol-driven clinical research. Together, these models can facilitate data exchange for translational research. Materials and methods The content of the LS DAM was driven by analysis of life sciences and translational research scenarios and the concepts in the model are derived from existing information models, reference models and data exchange formats. The model is represented in the Unified Modeling Language and uses ISO 21090 data types. Results The LS DAM v2.2.1 is comprised of 130 classes and covers several core areas including Experiment, Molecular Biology, Molecular Databases and Specimen. Nearly half of these classes originate from the BRIDG model, emphasizing the semantic harmonization between these models. Validation of the LS DAM against independently derived information models, research scenarios and reference databases supports its general applicability to represent life sciences research. Discussion The LS DAM provides unambiguous definitions for concepts required to describe life sciences research. The processes established to achieve consensus among domain experts will be applied in future iterations and may be broadly applicable to other standardization efforts. Conclusions The LS DAM provides common semantics for life sciences research. Through harmonization with BRIDG, it promotes interoperability in translational science. PMID:22744959
National Institute of Standards and Technology Data Gateway
SRD 100 Database for Simulation of Electron Spectra for Surface Analysis (SESSA)Database for Simulation of Electron Spectra for Surface Analysis (SESSA) (PC database for purchase) This database has been designed to facilitate quantitative interpretation of Auger-electron and X-ray photoelectron spectra and to improve the accuracy of quantitation in routine analysis. The database contains all physical data needed to perform quantitative interpretation of an electron spectrum for a thin-film specimen of given composition. A simulation module provides an estimate of peak intensities as well as the energy and angular distributions of the emitted electron flux.
The eNanoMapper database for nanomaterial safety information
Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413
The eNanoMapper database for nanomaterial safety information.
Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon
2015-01-01
The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).
Network-Based Visual Analysis of Tabular Data
ERIC Educational Resources Information Center
Liu, Zhicheng
2012-01-01
Tabular data is pervasive in the form of spreadsheets and relational databases. Although tables often describe multivariate data without explicit network semantics, it may be advantageous to explore the data modeled as a graph or network for analysis. Even when a given table design conveys some static network semantics, analysts may want to look…
In recent years, the risk analysis community has broadened its use of complex aggregate and cumulative residential exposure models (e.g., to meet the requirements of the 1996 Food Quality Protection Act). The value of these models is their ability to incorporate a range of input...
Space Shuttle critical function audit
NASA Technical Reports Server (NTRS)
Sacks, Ivan J.; Dipol, John; Su, Paul
1990-01-01
A large fault-tolerance model of the main propulsion system of the US space shuttle has been developed. This model is being used to identify single components and pairs of components that will cause loss of shuttle critical functions. In addition, this model is the basis for risk quantification of the shuttle. The process used to develop and analyze the model is digraph matrix analysis (DMA). The DMA modeling and analysis process is accessed via a graphics-based computer user interface. This interface provides coupled display of the integrated system schematics, the digraph models, the component database, and the results of the fault tolerance and risk analyses.
Validation analysis of probabilistic models of dietary exposure to food additives.
Gilsenan, M B; Thompson, R L; Lambe, J; Gibney, M J
2003-10-01
The validity of a range of simple conceptual models designed specifically for the estimation of food additive intakes using probabilistic analysis was assessed. Modelled intake estimates that fell below traditional conservative point estimates of intake and above 'true' additive intakes (calculated from a reference database at brand level) were considered to be in a valid region. Models were developed for 10 food additives by combining food intake data, the probability of an additive being present in a food group and additive concentration data. Food intake and additive concentration data were entered as raw data or as a lognormal distribution, and the probability of an additive being present was entered based on the per cent brands or the per cent eating occasions within a food group that contained an additive. Since the three model components assumed two possible modes of input, the validity of eight (2(3)) model combinations was assessed. All model inputs were derived from the reference database. An iterative approach was employed in which the validity of individual model components was assessed first, followed by validation of full conceptual models. While the distribution of intake estimates from models fell below conservative intakes, which assume that the additive is present at maximum permitted levels (MPLs) in all foods in which it is permitted, intake estimates were not consistently above 'true' intakes. These analyses indicate the need for more complex models for the estimation of food additive intakes using probabilistic analysis. Such models should incorporate information on market share and/or brand loyalty.
FDA toxicity databases and real-time data entry.
Arvidson, Kirk B
2008-11-15
Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributed in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.
Nie, Wei; Li, Bing; Xiu, Qing-Yu
2012-01-01
A number of studies assessed the association of -675 4G/5G polymorphism in the promoter region of plasminogen activator inhibitor (PAI)-1 gene with asthma in different populations. However, most studies reported inconclusive results. A meta-analysis was conducted to investigate the association between polymorphism in the PAI-1 gene and asthma susceptibility. Databases including Pubmed, EMBASE, HuGE Literature Finder, Wanfang Database, China National Knowledge Infrastructure (CNKI) and Weipu Database were searched to find relevant studies. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to assess the strength of association in the dominant model, recessive model, codominant model, and additive model. Eight studies involving 1817 cases and 2327 controls were included. Overall, significant association between 4G/5G polymorphism and asthma susceptibility was observed for 4G4G+4G5G vs. 5G5G (OR = 1.56, 95% CI 1.12-2.18, P = 0.008), 4G/4G vs. 4G/5G+5G/5G (OR = 1.38, 95% CI 1.06-1.80, P = 0.02), 4G/4G vs. 5G/5G (OR = 1.80, 95% CI 1.17-2.76, P = 0.007), 4G/5G vs. 5G/5G (OR = 1.40, 95% CI 1.07-1.84, P = 0.02), and 4G vs. 5G (OR = 1.35, 95% CI 1.08-1.68, P = 0.008). This meta-analysis suggested that the -675 4G/5G polymorphism of PAI-1 gene was a risk factor of asthma.
Xiu, Qing-yu
2012-01-01
Background A number of studies assessed the association of −675 4G/5G polymorphism in the promoter region of plasminogen activator inhibitor (PAI)-1 gene with asthma in different populations. However, most studies reported inconclusive results. A meta-analysis was conducted to investigate the association between polymorphism in the PAI-1 gene and asthma susceptibility. Methods Databases including Pubmed, EMBASE, HuGE Literature Finder, Wanfang Database, China National Knowledge Infrastructure (CNKI) and Weipu Database were searched to find relevant studies. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to assess the strength of association in the dominant model, recessive model, codominant model, and additive model. Results Eight studies involving 1817 cases and 2327 controls were included. Overall, significant association between 4G/5G polymorphism and asthma susceptibility was observed for 4G4G+4G5G vs. 5G5G (OR = 1.56, 95% CI 1.12–2.18, P = 0.008), 4G/4G vs. 4G/5G+5G/5G (OR = 1.38, 95% CI 1.06–1.80, P = 0.02), 4G/4G vs. 5G/5G (OR = 1.80, 95% CI 1.17–2.76, P = 0.007), 4G/5G vs. 5G/5G (OR = 1.40, 95% CI 1.07–1.84, P = 0.02), and 4G vs. 5G (OR = 1.35, 95% CI 1.08–1.68, P = 0.008). Conclusions This meta-analysis suggested that the −675 4G/5G polymorphism of PAI-1 gene was a risk factor of asthma. PMID:22479620
Paganoni, Sabrina; Nicholson, Katharine; Chan, James; Shui, Amy; Schoenfeld, David; Sherman, Alexander; Berry, James; Cudkowicz, Merit; Atassi, Nazem
2018-03-01
Urate has been identified as a predictor of amyotrophic lateral sclerosis (ALS) survival in some but not all studies. Here we leverage the recent expansion of the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database to study the association between urate levels and ALS survival. Pooled data of 1,736 ALS participants from the PRO-ACT database were analyzed. Cox proportional hazards regression models were used to evaluate associations between urate levels at trial entry and survival. After adjustment for potential confounders (i.e., creatinine and body mass index), there was an 11% reduction in risk of reaching a survival endpoint during the study with each 1-mg/dL increase in uric acid levels (adjusted hazard ratio 0.89, 95% confidence interval 0.82-0.97, P < 0.01). Our pooled analysis provides further support for urate as a prognostic factor for survival in ALS and confirms the utility of the PRO-ACT database as a powerful resource for ALS epidemiological research. Muscle Nerve 57: 430-434, 2018. © 2017 Wiley Periodicals, Inc.
Java Web Simulation (JWS); a web based database of kinetic models.
Snoep, J L; Olivier, B G
2002-01-01
Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.
PDXliver: a database of liver cancer patient derived xenograft mouse models.
He, Sheng; Hu, Bo; Li, Chao; Lin, Ping; Tang, Wei-Guo; Sun, Yun-Fan; Feng, Fang-You-Min; Guo, Wei; Li, Jia; Xu, Yang; Yao, Qian-Lan; Zhang, Xin; Qiu, Shuang-Jian; Zhou, Jian; Fan, Jia; Li, Yi-Xue; Li, Hong; Yang, Xin-Rong
2018-05-09
Liver cancer is the second leading cause of cancer-related deaths and characterized by heterogeneity and drug resistance. Patient-derived xenograft (PDX) models have been widely used in cancer research because they reproduce the characteristics of original tumors. However, the current studies of liver cancer PDX mice are scattered and the number of available PDX models are too small to represent the heterogeneity of liver cancer patients. To improve this situation and to complement available PDX models related resources, here we constructed a comprehensive database, PDXliver, to integrate and analyze liver cancer PDX models. Currently, PDXliver contains 116 PDX models from Chinese liver cancer patients, 51 of them were established by the in-house PDX platform and others were curated from the public literatures. These models are annotated with complete information, including clinical characteristics of patients, genome-wide expression profiles, germline variations, somatic mutations and copy number alterations. Analysis of expression subtypes and mutated genes show that PDXliver represents the diversity of human patients. Another feature of PDXliver is storing drug response data of PDX mice, which makes it possible to explore the association between molecular profiles and drug sensitivity. All data can be accessed via the Browse and Search pages. Additionally, two tools are provided to interactively visualize the omics data of selected PDXs or to compare two groups of PDXs. As far as we known, PDXliver is the first public database of liver cancer PDX models. We hope that this comprehensive resource will accelerate the utility of PDX models and facilitate liver cancer research. The PDXliver database is freely available online at: http://www.picb.ac.cn/PDXliver/.
Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-01-01
Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
GlobTherm, a global database on thermal tolerances for aquatic and terrestrial organisms.
Bennett, Joanne M; Calosi, Piero; Clusella-Trullas, Susana; Martínez, Brezo; Sunday, Jennifer; Algar, Adam C; Araújo, Miguel B; Hawkins, Bradford A; Keith, Sally; Kühn, Ingolf; Rahbek, Carsten; Rodríguez, Laura; Singer, Alexander; Villalobos, Fabricio; Ángel Olalla-Tárraga, Miguel; Morales-Castilla, Ignacio
2018-03-13
How climate affects species distributions is a longstanding question receiving renewed interest owing to the need to predict the impacts of global warming on biodiversity. Is climate change forcing species to live near their critical thermal limits? Are these limits likely to change through natural selection? These and other important questions can be addressed with models relating geographical distributions of species with climate data, but inferences made with these models are highly contingent on non-climatic factors such as biotic interactions. Improved understanding of climate change effects on species will require extensive analysis of thermal physiological traits, but such data are both scarce and scattered. To overcome current limitations, we created the GlobTherm database. The database contains experimentally derived species' thermal tolerance data currently comprising over 2,000 species of terrestrial, freshwater, intertidal and marine multicellular algae, plants, fungi, and animals. The GlobTherm database will be maintained and curated by iDiv with the aim to keep expanding it, and enable further investigations on the effects of climate on the distribution of life on Earth.
A neotropical Miocene pollen database employing image-based search and semantic modeling.
Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W; Jaramillo, Carlos; Shyu, Chi-Ren
2014-08-01
Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery.
New laboratory approach to study Titan ionospheric chemistry
NASA Astrophysics Data System (ADS)
Thissen, R.; Dutuit, O.; Pernot, P.; Carrasco, N.; Lilensten, J.; Quirico, E.; Schmitt, B.
The exploration of Titan reveals a very complex chemistry occurring in the ionospheric region of the atmosphere. In order to interpret the observations performed by the Cassini spectrometers, we need to improve our description of the ion molecule chemistry involving nitrogen and hydrocarbons. Up to now, models are based on databases compiled over the years. These are quite complete to describe the major ions, but lack of accuracy for some of them, they totally neglect the questions of isomerization or chemical functionality in the description of ionic species and still miss a lot of inputs for ionic species heavier than 50 daltons. We propose to improve the databases by systematic measurements of ion molecule reaction rates, and further structural description, by means of a high resolution mass spectrometer, allowing for MS/MS structural analysis of the ionic species. A thorough evaluation of nowadays databases by means of uncertainty propagation will lead our choice of the most important reactions to be studied. This study shall also lead to educated choice for chemistry simplification, which is mandatory in order to include the chemistry in 3D or fluid models of the atmosphere. We plan as well to use extracts from tholins as molecular source for our analysis.
Updated regulation curation model at the Saccharomyces Genome Database
Engel, Stacia R; Skrzypek, Marek S; Hellerstedt, Sage T; Wong, Edith D; Nash, Robert S; Weng, Shuai; Binkley, Gail; Sheppard, Travis K; Karra, Kalpana; Cherry, J Michael
2018-01-01
Abstract The Saccharomyces Genome Database (SGD) provides comprehensive, integrated biological information for the budding yeast Saccharomyces cerevisiae, along with search and analysis tools to explore these data, enabling the discovery of functional relationships between sequence and gene products in fungi and higher organisms. We have recently expanded our data model for regulation curation to address regulation at the protein level in addition to transcription, and are presenting the expanded data on the ‘Regulation’ pages at SGD. These pages include a summary describing the context under which the regulator acts, manually curated and high-throughput annotations showing the regulatory relationships for that gene and a graphical visualization of its regulatory network and connected networks. For genes whose products regulate other genes or proteins, the Regulation page includes Gene Ontology enrichment analysis of the biological processes in which those targets participate. For DNA-binding transcription factors, we also provide other information relevant to their regulatory function, such as DNA binding site motifs and protein domains. As with other data types at SGD, all regulatory relationships and accompanying data are available through YeastMine, SGD’s data warehouse based on InterMine. Database URL: http://www.yeastgenome.org PMID:29688362
NASA Astrophysics Data System (ADS)
Knosp, B.; Gangl, M. E.; Hristova-Veleva, S. M.; Kim, R. M.; Lambrigtsen, B.; Li, P.; Niamsuwan, N.; Shen, T. P. J.; Turk, F. J.; Vu, Q. A.
2014-12-01
The JPL Tropical Cyclone Information System (TCIS) brings together satellite, aircraft, and model forecast data from several NASA, NOAA, and other data centers to assist researchers in comparing and analyzing data related to tropical cyclones. The TCIS has been supporting specific science field campaigns, such as the Genesis and Rapid Intensification Processes (GRIP) campaign and the Hurricane and Severe Storm Sentinel (HS3) campaign, by creating near real-time (NRT) data visualization portals. These portals are intended to assist in mission planning, enhance the understanding of current physical processes, and improve model data by comparing it to satellite and aircraft observations. The TCIS NRT portals allow the user to view plots on a Google Earth interface. To compliment these visualizations, the team has been working on developing data analysis tools to let the user actively interrogate areas of Level 2 swath and two-dimensional plots they see on their screen. As expected, these observation and model data are quite voluminous and bottlenecks in the system architecture can occur when the databases try to run geospatial searches for data files that need to be read by the tools. To improve the responsiveness of the data analysis tools, the TCIS team has been conducting studies on how to best store Level 2 swath footprints and run sub-second geospatial searches to discover data. The first objective was to improve the sampling accuracy of the footprints being stored in the TCIS database by comparing the Java-based NASA PO.DAAC Level 2 Swath Generator with a TCIS Python swath generator. The second objective was to compare the performance of four database implementations - MySQL, MySQL+Solr, MongoDB, and PostgreSQL - to see which database management system would yield the best geospatial query and storage performance. The final objective was to integrate our chosen technologies with our Joint Probability Density Function (Joint PDF), Wave Number Analysis, and Automated Rotational Center Hurricane Eye Retrieval (ARCHER) tools. In this presentation, we will compare the enabling technologies we tested and discuss which ones we selected for integration into the TCIS' data analysis tool architecture. We will also show how these techniques have been automated to provide access to NRT data through our analysis tools.
MicroRNAs for osteosarcoma in the mouse: a meta-analysis
Chang, Junli; Yao, Min; Li, Yimian; Zhao, Dongfeng; Hu, Shaopu; Cui, Xuejun; Liu, Gang; Shi, Qi; Wang, Yongjun; Yang, Yanping
2016-01-01
Osteosarcoma (OS) is the most common primary malignant bone carcinoma with high morbidity that happens mainly in children and young adults. As the key components of gene-regulatory networks, microRNAs (miRNAs) control many critical pathophysiological processes, including initiation and progression of cancers. The objective of this study is to summarize and evaluate the potential of miRNAs as targets for prevention and treatment of OS in mouse models, and to explore the methodological quality of current studies. We searched PubMed, Web of Science, Embase, Wan Fang Database, VIP Database, China Knowledge Resource Integrated Database, and Chinese BioMedical since their beginning date to 10 May 2016. Two reviewers separately screened the controlled studies, which estimate the effects of miRNAs on osteosarcoma in mice. A pair-wise analysis was performed. Thirty six studies with enough randomization were selected and included in the meta-analysis. We found that blocking oncogenic or restoring decreased miRNAs in cancer cells could significantly suppress the progression of OS in vivo, as assessed by tumor volume and tumor weight. This meta-analysis suggests that miRNAs are potential therapeutic targets for OS and correction of the altered expression of miRNAs significantly suppresses the progression of OS in mouse models, however, the overall methodological quality of studies included here was low, and more animal studies with the rigourous design must be carried out before a miRNA-based treatment could be translated from animal studies to clinical trials. PMID:27852052
A resource for benchmarking the usefulness of protein structure models.
Carbajo, Daniel; Tramontano, Anna
2012-08-02
Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application. This paper describes a database and related software tools that allow testing of a given structure based method on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database. IMPLEMENTATION, AVAILABILITY AND REQUIREMENTS: Project name: A resource for benchmarking the usefulness of protein structure models. Project home page: http://bl210.caspur.it/MODEL-DB/MODEL-DB_web/MODindex.php.Operating system(s): Platform independent. Programming language: Perl-BioPerl (program); mySQL, Perl DBI and DBD modules (database); php, JavaScript, Jmol scripting (web server). Other requirements: Java Runtime Environment v1.4 or later, Perl, BioPerl, CPAN modules, HHsearch, Modeller, LGA, NCBI Blast package, DSSP, Speedfill (Surfnet) and PSAIA. License: Free. Any restrictions to use by non-academics: No.
Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A
2017-11-01
Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.
ANALYSIS OF DISCRIMINATING FACTORS IN HUMAN ACTIVITIES THAT AFFECT EXPOSURE
Accurately modeling exposure to particulate matter (PM) and other pollutants ultimately involves the utilization of human location-activity databases to assist in understanding the potential variability of microenvironmental exposures. This paper critically considers and stati...
Minnesota's Tech Prep Outcome Evaluation Model.
ERIC Educational Resources Information Center
Brown, James M.; Pucel, David; Twohig, Cathy; Semler, Steve; Kuchinke, K. Peter
1998-01-01
Describes the Minnesota Tech Prep Consortia Evaluation System, which collects outcomes data on enrollment, retention, related job placement, higher education, dropouts, and diplomas/degrees awarded. Explains outcome measures, database development, data collection and analysis methods, and remaining challenges. (SK)
NASA Technical Reports Server (NTRS)
Yeske, Lanny A.
1998-01-01
Numerous FY1998 student research projects were sponsored by the Mississippi State University Center for Air Sea Technology. This technical note describes these projects which include research on: (1) Graphical User Interfaces, (2) Master Environmental Library, (3) Database Management Systems, (4) Naval Interactive Data Analysis System, (5) Relocatable Modeling Environment, (6) Tidal Models, (7) Book Inventories, (8) System Analysis, (9) World Wide Web Development, (10) Virtual Data Warehouse, (11) Enterprise Information Explorer, (12) Equipment Inventories, (13) COADS, and (14) JavaScript Technology.
ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species.
Zeng, Victor; Extavour, Cassandra G
2012-01-01
The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu.
Application of 3D Spatio-Temporal Data Modeling, Management, and Analysis in DB4GEO
NASA Astrophysics Data System (ADS)
Kuper, P. V.; Breunig, M.; Al-Doori, M.; Thomsen, A.
2016-10-01
Many of todaýs world wide challenges such as climate change, water supply and transport systems in cities or movements of crowds need spatio-temporal data to be examined in detail. Thus the number of examinations in 3D space dealing with geospatial objects moving in space and time or even changing their shapes in time will rapidly increase in the future. Prominent spatio-temporal applications are subsurface reservoir modeling, water supply after seawater desalination and the development of transport systems in mega cities. All of these applications generate large spatio-temporal data sets. However, the modeling, management and analysis of 3D geo-objects with changing shape and attributes in time still is a challenge for geospatial database architectures. In this article we describe the application of concepts for the modeling, management and analysis of 2.5D and 3D spatial plus 1D temporal objects implemented in DB4GeO, our service-oriented geospatial database architecture. An example application with spatio-temporal data of a landfill, near the city of Osnabrück in Germany demonstrates the usage of the concepts. Finally, an outlook on our future research focusing on new applications with big data analysis in three spatial plus one temporal dimension in the United Arab Emirates, especially the Dubai area, is given.
Reiner, Bruce I
2017-10-01
Conventional peer review practice is compromised by a number of well-documented biases, which in turn limit standard of care analysis, which is fundamental to determination of medical malpractice. In addition to these intrinsic biases, other existing deficiencies exist in current peer review including the lack of standardization, objectivity, retrospective practice, and automation. An alternative model to address these deficiencies would be one which is completely blinded to the peer reviewer, requires independent reporting from both parties, utilizes automated data mining techniques for neutral and objective report analysis, and provides data reconciliation for resolution of finding-specific report differences. If properly implemented, this peer review model could result in creation of a standardized referenceable peer review database which could further assist in customizable education, technology refinement, and implementation of real-time context and user-specific decision support.
Computer Administering of the Psychological Investigations: Set-Relational Representation
NASA Astrophysics Data System (ADS)
Yordzhev, Krasimir
Computer administering of a psychological investigation is the computer representation of the entire procedure of psychological assessments - test construction, test implementation, results evaluation, storage and maintenance of the developed database, its statistical processing, analysis and interpretation. A mathematical description of psychological assessment with the aid of personality tests is discussed in this article. The set theory and the relational algebra are used in this description. A relational model of data, needed to design a computer system for automation of certain psychological assessments is given. Some finite sets and relation on them, which are necessary for creating a personality psychological test, are described. The described model could be used to develop real software for computer administering of any psychological test and there is full automation of the whole process: test construction, test implementation, result evaluation, storage of the developed database, statistical implementation, analysis and interpretation. A software project for computer administering personality psychological tests is suggested.
CTLA-4 and MDR1 polymorphisms increase the risk for ulcerative colitis: A meta-analysis.
Zhao, Jia-Jun; Wang, Di; Yao, Hui; Sun, Da-Wei; Li, Hong-Yu
2015-09-14
To evaluate the correlations between cytotoxic T lymphocyte-associated antigen-4 (CTLA-4) and multi-drug resistance 1 (MDR1) genes polymorphisms with ulcerative colitis (UC) risk. PubMed, EMBASE, Web of Science, Cochrane Library, CBM databases, Springerlink, Wiley, EBSCO, Ovid, Wanfang database, VIP database, China National Knowledge Infrastructure, and Weipu Journal databases were exhaustively searched using combinations of keywords relating to CTLA-4, MDR1 and UC. The published studies were filtered using our stringent inclusion and exclusion criteria, the quality assessment for each eligible study was conducted using Critical Appraisal Skill Program and the resultant high-quality data from final selected studies were analyzed using Comprehensive Meta-analysis 2.0 (CMA 2.0) software. The correlations between SNPs of CTLA-4 gene, MDR1 gene and the risk of UC were evaluated by OR at 95%CI. Z test was carried out to evaluate the significance of overall effect values. Cochran's Q-statistic and I(2) tests were applied to quantify heterogeneity among studies. Funnel plots, classic fail-safe N and Egger's linear regression test were inspected for indication of publication bias. A total of 107 studies were initially retrieved and 12 studies were eventually selected for meta-analysis. These 12 case-control studies involved 1860 UC patients and 2663 healthy controls. Our major result revealed that single nucleotide polymorphisms (SNPs) of CTLA-4 gene rs3087243 G > A and rs231775 G > A may increase the risk of UC (rs3087243 G > A: allele model: OR = 1.365, 95%CI: 1.023-1.822, P = 0.035; dominant model: OR = 1.569, 95%CI: 1.269-1.940, P < 0.001; rs231775 G > A: allele model: OR = 1.583, 95%CI: = 1.306-1.918, P < 0.001; dominant model: OR = 1.805, 95%CI: 1.393-2.340, P < 0.001). In addition, based on our result, SNPs of MDR1 gene rs1045642 C > T might also confer a significant increases for the risk of UC (allele model: OR = 1.389, 95%CI: 1.214-1.590, P < 0.001; dominant model: OR = 1.518, 95%CI: 1.222-1.886, P < 0.001). CTLA-4 gene rs3087243 G > A and rs231775 G > A, and MDR1 gene rs1045642 C > T might confer an increase for UC risk.
2001-09-01
replication) -- all from Visual Basic and VBA . In fact, we found that the SQL Server engine actually had a plethora of options, most formidable of...2002, the new SQL Server 2000 database engine, and Microsoft Visual Basic.NET. This thesis describes our use of the Spiral Development Model to...versions of Microsoft products? Specifically, the pending release of Microsoft Office 2002, the new SQL Server 2000 database engine, and Microsoft
ERIC Educational Resources Information Center
Malrieu, Denise
1983-01-01
This overview of the ERIC system begins with a brief history of the system; a description of the types and numbers of materials contained in the database; sources of types of information for educators that are not processed by ERIC; and the various publications and reference materials produced by and for the system. The analysis of ERIC usage in…
Toward the automated generation of genome-scale metabolic networks in the SEED.
DeJongh, Matthew; Formsma, Kevin; Boillot, Paul; Gould, John; Rycenga, Matthew; Best, Aaron
2007-04-26
Current methods for the automated generation of genome-scale metabolic networks focus on genome annotation and preliminary biochemical reaction network assembly, but do not adequately address the process of identifying and filling gaps in the reaction network, and verifying that the network is suitable for systems level analysis. Thus, current methods are only sufficient for generating draft-quality networks, and refinement of the reaction network is still largely a manual, labor-intensive process. We have developed a method for generating genome-scale metabolic networks that produces substantially complete reaction networks, suitable for systems level analysis. Our method partitions the reaction space of central and intermediary metabolism into discrete, interconnected components that can be assembled and verified in isolation from each other, and then integrated and verified at the level of their interconnectivity. We have developed a database of components that are common across organisms, and have created tools for automatically assembling appropriate components for a particular organism based on the metabolic pathways encoded in the organism's genome. This focuses manual efforts on that portion of an organism's metabolism that is not yet represented in the database. We have demonstrated the efficacy of our method by reverse-engineering and automatically regenerating the reaction network from a published genome-scale metabolic model for Staphylococcus aureus. Additionally, we have verified that our method capitalizes on the database of common reaction network components created for S. aureus, by using these components to generate substantially complete reconstructions of the reaction networks from three other published metabolic models (Escherichia coli, Helicobacter pylori, and Lactococcus lactis). We have implemented our tools and database within the SEED, an open-source software environment for comparative genome annotation and analysis. Our method sets the stage for the automated generation of substantially complete metabolic networks for over 400 complete genome sequences currently in the SEED. With each genome that is processed using our tools, the database of common components grows to cover more of the diversity of metabolic pathways. This increases the likelihood that components of reaction networks for subsequently processed genomes can be retrieved from the database, rather than assembled and verified manually.
SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers
Mann, Michael B
2018-01-01
Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366
Design and Analysis Tools for Supersonic Inlets
NASA Technical Reports Server (NTRS)
Slater, John W.; Folk, Thomas C.
2009-01-01
Computational tools are being developed for the design and analysis of supersonic inlets. The objective is to update existing tools and provide design and low-order aerodynamic analysis capability for advanced inlet concepts. The Inlet Tools effort includes aspects of creating an electronic database of inlet design information, a document describing inlet design and analysis methods, a geometry model for describing the shape of inlets, and computer tools that implement the geometry model and methods. The geometry model has a set of basic inlet shapes that include pitot, two-dimensional, axisymmetric, and stream-traced inlet shapes. The inlet model divides the inlet flow field into parts that facilitate the design and analysis methods. The inlet geometry model constructs the inlet surfaces through the generation and transformation of planar entities based on key inlet design factors. Future efforts will focus on developing the inlet geometry model, the inlet design and analysis methods, a Fortran 95 code to implement the model and methods. Other computational platforms, such as Java, will also be explored.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
Improving accuracy and power with transfer learning using a meta-analytic database.
Schwartz, Yannick; Varoquaux, Gaël; Pallier, Christophe; Pinel, Philippe; Poline, Jean-Baptiste; Thirion, Bertrand
2012-01-01
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the basis of the coordinates of previously detected effects. In this paper, we propose to use a database of images, rather than coordinates, and frame the problem as transfer learning: learning a discriminant model on a reference task to apply it to a different but related new task. To facilitate statistical analysis of small cohorts, we use a sparse discriminant model that selects predictive voxels on the reference task and thus provides a principled procedure to define ROIs. The benefits of our approach are twofold. First it uses the reference database for prediction, i.e., to provide potential biomarkers in a clinical setting. Second it increases statistical power on the new task. We demonstrate on a set of 18 pairs of functional MRI experimental conditions that our approach gives good prediction. In addition, on a specific transfer situation involving different scanners at different locations, we show that voxel selection based on transfer learning leads to higher detection power on small cohorts.
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete
Pour, Sadaf Moallemi; Alam, M. Shahria; Milani, Abbas S.
2016-01-01
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models. PMID:28773859
Feng, Ssj; Sechopoulos, I
2012-06-01
To develop an objective model of the shape of the compressed breast undergoing mammographic or tomosynthesis acquisition. Automated thresholding and edge detection was performed on 984 anonymized digital mammograms (492 craniocaudal (CC) view mammograms and 492 medial lateral oblique (MLO) view mammograms), to extract the edge of each breast. Principal Component Analysis (PCA) was performed on these edge vectors to identify a limited set of parameters and eigenvectors that. These parameters and eigenvectors comprise a model that can be used to describe the breast shapes present in acquired mammograms and to generate realistic models of breasts undergoing acquisition. Sample breast shapes were then generated from this model and evaluated. The mammograms in the database were previously acquired for a separate study and authorized for use in further research. The PCA successfully identified two principal components and their corresponding eigenvectors, forming the basis for the breast shape model. The simulated breast shapes generated from the model are reasonable approximations of clinically acquired mammograms. Using PCA, we have obtained models of the compressed breast undergoing mammographic or tomosynthesis acquisition based on objective analysis of a large image database. Up to now, the breast in the CC view has been approximated as a semi-circular tube, while there has been no objectively-obtained model for the MLO view breast shape. Such models can be used for various breast imaging research applications, such as x-ray scatter estimation and correction, dosimetry estimates, and computer-aided detection and diagnosis. © 2012 American Association of Physicists in Medicine.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2013-01-01
A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
A Framework for Cloudy Model Optimization and Database Storage
NASA Astrophysics Data System (ADS)
Calvén, Emilia; Helton, Andrew; Sankrit, Ravi
2018-01-01
We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
Conceptual and logical level of database modeling
NASA Astrophysics Data System (ADS)
Hunka, Frantisek; Matula, Jiri
2016-06-01
Conceptual and logical levels form the top most levels of database modeling. Usually, ORM (Object Role Modeling) and ER diagrams are utilized to capture the corresponding schema. The final aim of business process modeling is to store its results in the form of database solution. For this reason, value oriented business process modeling which utilizes ER diagram to express the modeling entities and relationships between them are used. However, ER diagrams form the logical level of database schema. To extend possibilities of different business process modeling methodologies, the conceptual level of database modeling is needed. The paper deals with the REA value modeling approach to business process modeling using ER-diagrams, and derives conceptual model utilizing ORM modeling approach. Conceptual model extends possibilities for value modeling to other business modeling approaches.
Hawthorne L. Beyer; Jeff Jenness; Samuel A. Cushman
2010-01-01
Spatial information systems (SIS) is a term that describes a wide diversity of concepts, techniques, and technologies related to the capture, management, display and analysis of spatial information. It encompasses technologies such as geographic information systems (GIS), global positioning systems (GPS), remote sensing, and relational database management systems (...
Enabling search over encrypted multimedia databases
NASA Astrophysics Data System (ADS)
Lu, Wenjun; Swaminathan, Ashwin; Varna, Avinash L.; Wu, Min
2009-02-01
Performing information retrieval tasks while preserving data confidentiality is a desirable capability when a database is stored on a server maintained by a third-party service provider. This paper addresses the problem of enabling content-based retrieval over encrypted multimedia databases. Search indexes, along with multimedia documents, are first encrypted by the content owner and then stored onto the server. Through jointly applying cryptographic techniques, such as order preserving encryption and randomized hash functions, with image processing and information retrieval techniques, secure indexing schemes are designed to provide both privacy protection and rank-ordered search capability. Retrieval results on an encrypted color image database and security analysis of the secure indexing schemes under different attack models show that data confidentiality can be preserved while retaining very good retrieval performance. This work has promising applications in secure multimedia management.
Value of shared preclinical safety studies - The eTOX database.
Briggs, Katharine; Barber, Chris; Cases, Montserrat; Marc, Philippe; Steger-Hartmann, Thomas
2015-01-01
A first analysis of a database of shared preclinical safety data for 1214 small molecule drugs and drug candidates extracted from 3970 reports donated by thirteen pharmaceutical companies for the eTOX project (www.etoxproject.eu) is presented. Species, duration of exposure and administration route data were analysed to assess if large enough subsets of homogenous data are available for building in silico predictive models. Prevalence of treatment related effects for the different types of findings recorded were analysed. The eTOX ontology was used to determine the most common treatment-related clinical chemistry and histopathology findings reported in the database. The data were then mined to evaluate sensitivity of established in vivo biomarkers for liver toxicity risk assessment. The value of the database to inform other drug development projects during early drug development is illustrated by a case study.
Scheduled Civil Aircraft Emission Inventories for 1999: Database Development and Analysis
NASA Technical Reports Server (NTRS)
Sutkus, Donald J., Jr.; Baughcum, Steven L.; DuBois, Douglas P.
2001-01-01
This report describes the development of a three-dimensional database of aircraft fuel burn and emissions (NO(x), CO, and hydrocarbons) for the scheduled commercial aircraft fleet for each month of 1999. Global totals of emissions and fuel burn for 1999 are compared to global totals from 1992 and 2015 databases. 1999 fuel burn, departure and distance totals for selected airlines are compared to data reported on DOT Form 41 to evaluate the accuracy of the calculations. DOT Form T-100 data were used to determine typical payloads for freighter aircraft and this information was used to model freighter aircraft more accurately by using more realistic payloads. Differences in the calculation methodology used to create the 1999 fuel burn and emissions database from the methodology used in previous work are described and evaluated.
Computational assessment of model-based wave separation using a database of virtual subjects.
Hametner, Bernhard; Schneider, Magdalena; Parragh, Stephanie; Wassertheurer, Siegfried
2017-11-07
The quantification of arterial wave reflection is an important area of interest in arterial pulse wave analysis. It can be achieved by wave separation analysis (WSA) if both the aortic pressure waveform and the aortic flow waveform are known. For better applicability, several mathematical models have been established to estimate aortic flow solely based on pressure waveforms. The aim of this study is to investigate and verify the model-based wave separation of the ARCSolver method on virtual pulse wave measurements. The study is based on an open access virtual database generated via simulations. Seven cardiac and arterial parameters were varied within physiological healthy ranges, leading to a total of 3325 virtual healthy subjects. For assessing the model-based ARCSolver method computationally, this method was used to perform WSA based on the aortic root pressure waveforms of the virtual patients. Asa reference, the values of WSA using both the pressure and flow waveforms provided by the virtual database were taken. The investigated parameters showed a good overall agreement between the model-based method and the reference. Mean differences and standard deviations were -0.05±0.02AU for characteristic impedance, -3.93±1.79mmHg for forward pressure amplitude, 1.37±1.56mmHg for backward pressure amplitude and 12.42±4.88% for reflection magnitude. The results indicate that the mathematical blood flow model of the ARCSolver method is a feasible surrogate for a measured flow waveform and provides a reasonable way to assess arterial wave reflection non-invasively in healthy subjects. Copyright © 2017 Elsevier Ltd. All rights reserved.
Optimal tree increment models for the Northeastern United Statesq
Don C. Bragg
2003-01-01
used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
Optimal Tree Increment Models for the Northeastern United States
Don C. Bragg
2005-01-01
I used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
American Association of University Women: Branch Operations Data Modeling Case
ERIC Educational Resources Information Center
Harris, Ranida B.; Wedel, Thomas L.
2015-01-01
A nationally prominent woman's advocacy organization is featured in this case study. The scenario may be used as a teaching case, an assignment, or a project in systems analysis and design as well as database design classes. Students are required to document the system operations and requirements, apply logical data modeling concepts, and design…
Information support of monitoring of technical condition of buildings in construction risk area
NASA Astrophysics Data System (ADS)
Skachkova, M. E.; Lepihina, O. Y.; Ignatova, V. V.
2018-05-01
The paper presents the results of the research devoted to the development of a model of information support of monitoring buildings technical condition; these buildings are located in the construction risk area. As a result of the visual and instrumental survey, as well as the analysis of existing approaches and techniques, attributive and cartographic databases have been created. These databases allow monitoring defects and damages of buildings located in a 30-meter risk area from the object under construction. The classification of structures and defects of these buildings under survey is presented. The functional capabilities of the developed model and the field of it practical applications are determined.
Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier
2003-01-01
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
Integrating forensic information in a crime intelligence database.
Rossy, Quentin; Ioset, Sylvain; Dessimoz, Damien; Ribaux, Olivier
2013-07-10
Since 2008, intelligence units of six states of the western part of Switzerland have been sharing a common database for the analysis of high volume crimes. On a daily basis, events reported to the police are analysed, filtered and classified to detect crime repetitions and interpret the crime environment. Several forensic outcomes are integrated in the system such as matches of traces with persons, and links between scenes detected by the comparison of forensic case data. Systematic procedures have been settled to integrate links assumed mainly through DNA profiles, shoemarks patterns and images. A statistical outlook on a retrospective dataset of series from 2009 to 2011 of the database informs for instance on the number of repetition detected or confirmed and increased by forensic case data. Time needed to obtain forensic intelligence in regard with the type of marks treated, is seen as a critical issue. Furthermore, the underlying integration process of forensic intelligence into the crime intelligence database raised several difficulties in regards of the acquisition of data and the models used in the forensic databases. Solutions found and adopted operational procedures are described and discussed. This process form the basis to many other researches aimed at developing forensic intelligence models. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
NASA Galactic Cosmic Radiation Environment Model: Badhwar - O'Neill (2014)
NASA Technical Reports Server (NTRS)
Golge, S.; O'Neill, P. M.; Slaba, T. C.
2015-01-01
The Badhwar-O'Neill (BON) Galactic Cosmic Ray (GCR) flux model has been used by NASA to certify microelectronic systems and in the analysis of radiation health risks for human space flight missions. Of special interest to NASA is the kinetic energy region below 4.0 GeV/n due to the fact that exposure from GCR behind shielding (e.g., inside a space vehicle) is heavily influenced by the GCR particles from this energy domain. The BON model numerically solves the Fokker-Planck differential equation to account for particle transport in the heliosphere due to diffusion, convection, and adiabatic deceleration under the assumption of a spherically symmetric heliosphere. The model utilizes a comprehensive database of GCR measurements from various particle detectors to determine boundary conditions. By using an updated GCR database and improved model fit parameters, the new BON model (BON14) is significantly improved over the previous BON models for describing the GCR radiation environment of interest to human space flight.
NASA Galactic Cosmic Radiation Environment Model: Badhwar-O'Neill (2014)
NASA Technical Reports Server (NTRS)
O'Neill, P. M.; Golge, S.; Slaba, T. C.
2015-01-01
The Badhwar-O'Neill (BON) Galactic Cosmic Ray (GCR) flux model is used by NASA to certify microelectronic systems and in the analysis of radiation health risks for human space flight missions. Of special interest to NASA is the kinetic energy region below 4.0 GeV/n due to the fact that exposure from GCR behind shielding (e.g., inside a space vehicle) is heavily influenced by the GCR particles from this energy domain. The BON model numerically solves the Fokker-Planck differential equation to account for particle transport in the heliosphere due to diffusion, convection, and adiabatic deceleration under the assumption of a spherically symmetric heliosphere. The model utilizes a GCR measurements database from various particle detectors to determine the boundary conditions. By using an updated GCR database and improved model fit parameters, the new BON model (BON14) is significantly improved over the previous BON models for describing the GCR radiation environment of interest to human space flight.
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Evolution of grid-wide access to database resident information in ATLAS using Frontier
NASA Astrophysics Data System (ADS)
Barberis, D.; Bujor, F.; de Stefano, J.; Dewhurst, A. L.; Dykstra, D.; Front, D.; Gallas, E.; Gamboa, C. F.; Luehring, F.; Walker, R.
2012-12-01
The ATLAS experiment deployed Frontier technology worldwide during the initial year of LHC collision data taking to enable user analysis jobs running on the Worldwide LHC Computing Grid to access database resident data. Since that time, the deployment model has evolved to optimize resources, improve performance, and streamline maintenance of Frontier and related infrastructure. In this presentation we focus on the specific changes in the deployment and improvements undertaken, such as the optimization of cache and launchpad location, the use of RPMs for more uniform deployment of underlying Frontier related components, improvements in monitoring, optimization of fail-over, and an increasing use of a centrally managed database containing site specific information (for configuration of services and monitoring). In addition, analysis of Frontier logs has allowed us a deeper understanding of problematic queries and understanding of use cases. Use of the system has grown beyond user analysis and subsystem specific tasks such as calibration and alignment, extending into production processing areas, such as initial reconstruction and trigger reprocessing. With a more robust and tuned system, we are better equipped to satisfy the still growing number of diverse clients and the demands of increasingly sophisticated processing and analysis.
State analysis requirements database for engineering complex embedded systems
NASA Technical Reports Server (NTRS)
Bennett, Matthew B.; Rasmussen, Robert D.; Ingham, Michel D.
2004-01-01
It has become clear that spacecraft system complexity is reaching a threshold where customary methods of control are no longer affordable or sufficiently reliable. At the heart of this problem are the conventional approaches to systems and software engineering based on subsystem-level functional decomposition, which fail to scale in the tangled web of interactions typically encountered in complex spacecraft designs. Furthermore, there is a fundamental gap between the requirements on software specified by systems engineers and the implementation of these requirements by software engineers. Software engineers must perform the translation of requirements into software code, hoping to accurately capture the systems engineer's understanding of the system behavior, which is not always explicitly specified. This gap opens up the possibility for misinterpretation of the systems engineer's intent, potentially leading to software errors. This problem is addressed by a systems engineering tool called the State Analysis Database, which provides a tool for capturing system and software requirements in the form of explicit models. This paper describes how requirements for complex aerospace systems can be developed using the State Analysis Database.
A scattering database of marine particles and its application in optical analysis
NASA Astrophysics Data System (ADS)
Xu, G.; Yang, P.; Kattawar, G.; Zhang, X.
2016-12-01
In modeling the scattering properties of marine particles (e.g. phytoplankton), the laboratory studies imply a need to properly account for the influence of particle morphology, in addition to size and composition. In this study, a marine particle scattering database is constructed using a collection of distorted hexahedral shapes. Specifically, the scattering properties of each size bin and refractive index are obtained by the ensemble average associated with distorted hexahedra with randomly tilted facets and selected aspect ratios (from elongated to flattened). The randomness degree in shape-generation process defines the geometric irregularity of the particles in the group. The geometric irregularity and particle aspect ratios constitute a set of "shape factors" to be accounted for (e.g. in best-fit analysis). To cover most of the marine particle size range, we combine the Invariant Imbedding T-matrix (II-TM) method and the Physical-Geometric Optics Hybrid (PGOH) method in the calculations. The simulated optical properties are shown and compared with those obtained from Lorenz-Mie Theory. Using the scattering database, we present a preliminary optical analysis of laboratory-measured optical properties of marine particles.
A Model Based Mars Climate Database for the Mission Design
NASA Technical Reports Server (NTRS)
2005-01-01
A viewgraph presentation on a model based climate database is shown. The topics include: 1) Why a model based climate database?; 2) Mars Climate Database v3.1 Who uses it ? (approx. 60 users!); 3) The new Mars Climate database MCD v4.0; 4) MCD v4.0: what's new ? 5) Simulation of Water ice clouds; 6) Simulation of Water ice cycle; 7) A new tool for surface pressure prediction; 8) Acces to the database MCD 4.0; 9) How to access the database; and 10) New web access
Databases for multilevel biophysiology research available at Physiome.jp.
Asai, Yoshiyuki; Abe, Takeshi; Li, Li; Oka, Hideki; Nomura, Taishin; Kitano, Hiroaki
2015-01-01
Physiome.jp (http://physiome.jp) is a portal site inaugurated in 2007 to support model-based research in physiome and systems biology. At Physiome.jp, several tools and databases are available to support construction of physiological, multi-hierarchical, large-scale models. There are three databases in Physiome.jp, housing mathematical models, morphological data, and time-series data. In late 2013, the site was fully renovated, and in May 2015, new functions were implemented to provide information infrastructure to support collaborative activities for developing models and performing simulations within the database framework. This article describes updates to the databases implemented since 2013, including cooperation among the three databases, interactive model browsing, user management, version management of models, management of parameter sets, and interoperability with applications.
[Construction and application of special analysis database of geoherbs based on 3S technology].
Guo, Lan-ping; Huang, Lu-qi; Lv, Dong-mei; Shao, Ai-juan; Wang, Jian
2007-09-01
In this paper,the structures, data sources, data codes of "the spacial analysis database of geoherbs" based 3S technology are introduced, and the essential functions of the database, such as data management, remote sensing, spacial interpolation, spacial statistics, spacial analysis and developing are described. At last, two examples for database usage are given, the one is classification and calculating of NDVI index of remote sensing image in geoherbal area of Atractylodes lancea, the other one is adaptation analysis of A. lancea. These indicate that "the spacial analysis database of geoherbs" has bright prospect in spacial analysis of geoherbs.
Turi, Christina E; Murch, Susan J
2013-07-09
Ethnobotanical research and the study of plants used for rituals, ceremonies and to connect with the spirit world have led to the discovery of many novel psychoactive compounds such as nicotine, caffeine, and cocaine. In North America, spiritual and ceremonial uses of plants are well documented and can be accessed online via the University of Michigan's Native American Ethnobotany Database. The objective of the study was to compare Residual, Bayesian, Binomial and Imprecise Dirichlet Model (IDM) analyses of ritual, ceremonial and spiritual plants in Moerman's ethnobotanical database and to identify genera that may be good candidates for the discovery of novel psychoactive compounds. The database was queried with the following format "Family Name AND Ceremonial OR Spiritual" for 263 North American botanical families. Spiritual and ceremonial flora consisted of 86 families with 517 species belonging to 292 genera. Spiritual taxa were then grouped further into ceremonial medicines and items categories. Residual, Bayesian, Binomial and IDM analysis were performed to identify over and under-utilized families. The 4 statistical approaches were in good agreement when identifying under-utilized families but large families (>393 species) were underemphasized by Binomial, Bayesian and IDM approaches for over-utilization. Residual, Binomial, and IDM analysis identified similar families as over-utilized in the medium (92-392 species) and small (<92 species) classes. The families Apiaceae, Asteraceae, Ericacea, Pinaceae and Salicaceae were identified as significantly over-utilized as ceremonial medicines in medium and large sized families. Analysis of genera within the Apiaceae and Asteraceae suggest that the genus Ligusticum and Artemisia are good candidates for facilitating the discovery of novel psychoactive compounds. The 4 statistical approaches were not consistent in the selection of over-utilization of flora. Residual analysis revealed overall trends that were supported by Binomial analysis when separated into small, medium and large families. The Bayesian, Binomial and IDM approaches identified different genera as potentially important. Species belonging to the genus Artemisia and Ligusticum were most consistently identified and may be valuable in future studies of the ethnopharmacology. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Internet-based distributed collaborative environment for engineering education and design
NASA Astrophysics Data System (ADS)
Sun, Qiuli
2001-07-01
This research investigates the use of the Internet for engineering education, design, and analysis through the presentation of a Virtual City environment. The main focus of this research was to provide an infrastructure for engineering education, test the concept of distributed collaborative design and analysis, develop and implement the Virtual City environment, and assess the environment's effectiveness in the real world. A three-tier architecture was adopted in the development of the prototype, which contains an online database server, a Web server as well as multi-user servers, and client browsers. The environment is composed of five components, a 3D virtual world, multiple Internet-based multimedia modules, an online database, a collaborative geometric modeling module, and a collaborative analysis module. The environment was designed using multiple Intenet-based technologies, such as Shockwave, Java, Java 3D, VRML, Perl, ASP, SQL, and a database. These various technologies together formed the basis of the environment and were programmed to communicate smoothly with each other. Three assessments were conducted over a period of three semesters. The Virtual City is open to the public at www.vcity.ou.edu. The online database was designed to manage the changeable data related to the environment. The virtual world was used to implement 3D visualization and tie the multimedia modules together. Students are allowed to build segments of the 3D virtual world upon completion of appropriate undergraduate courses in civil engineering. The end result is a complete virtual world that contains designs from all of their coursework and is viewable on the Internet. The environment is a content-rich educational system, which can be used to teach multiple engineering topics with the help of 3D visualization, animations, and simulations. The concept of collaborative design and analysis using the Internet was investigated and implemented. Geographically dispersed users can build the same geometric model simultaneously over the Internet and communicate with each other through a chat room. They can also conduct finite element analysis collaboratively on the same object over the Internet. They can mesh the same object, apply and edit the same boundary conditions and forces, obtain the same analysis results, and then discuss the results through the Internet.
Weckwerth, Wolfram; Wienkoop, Stefanie; Hoehenwarter, Wolfgang; Egelhofer, Volker; Sun, Xiaoliang
2014-01-01
Genome sequencing and systems biology are revolutionizing life sciences. Proteomics emerged as a fundamental technique of this novel research area as it is the basis for gene function analysis and modeling of dynamic protein networks. Here a complete proteomics platform suited for functional genomics and systems biology is presented. The strategy includes MAPA (mass accuracy precursor alignment; http://www.univie.ac.at/mosys/software.html ) as a rapid exploratory analysis step; MASS WESTERN for targeted proteomics; COVAIN ( http://www.univie.ac.at/mosys/software.html ) for multivariate statistical analysis, data integration, and data mining; and PROMEX ( http://www.univie.ac.at/mosys/databases.html ) as a database module for proteogenomics and proteotypic peptides for targeted analysis. Moreover, the presented platform can also be utilized to integrate metabolomics and transcriptomics data for the analysis of metabolite-protein-transcript correlations and time course analysis using COVAIN. Examples for the integration of MAPA and MASS WESTERN data, proteogenomic and metabolic modeling approaches for functional genomics, phosphoproteomics by integration of MOAC (metal-oxide affinity chromatography) with MAPA, and the integration of metabolomics, transcriptomics, proteomics, and physiological data using this platform are presented. All software and step-by-step tutorials for data processing and data mining can be downloaded from http://www.univie.ac.at/mosys/software.html.
FDA toxicity databases and real-time data entry
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arvidson, Kirk B.
Structure-searchable electronic databases are valuable new tools that are assisting the FDA in its mission to promptly and efficiently review incoming submissions for regulatory approval of new food additives and food contact substances. The Center for Food Safety and Applied Nutrition's Office of Food Additive Safety (CFSAN/OFAS), in collaboration with Leadscope, Inc., is consolidating genetic toxicity data submitted in food additive petitions from the 1960s to the present day. The Center for Drug Evaluation and Research, Office of Pharmaceutical Science's Informatics and Computational Safety Analysis Staff (CDER/OPS/ICSAS) is separately gathering similar information from their submissions. Presently, these data are distributedmore » in various locations such as paper files, microfiche, and non-standardized toxicology memoranda. The organization of the data into a consistent, searchable format will reduce paperwork, expedite the toxicology review process, and provide valuable information to industry that is currently available only to the FDA. Furthermore, by combining chemical structures with genetic toxicity information, biologically active moieties can be identified and used to develop quantitative structure-activity relationship (QSAR) modeling and testing guidelines. Additionally, chemicals devoid of toxicity data can be compared to known structures, allowing for improved safety review through the identification and analysis of structural analogs. Four database frameworks have been created: bacterial mutagenesis, in vitro chromosome aberration, in vitro mammalian mutagenesis, and in vivo micronucleus. Controlled vocabularies for these databases have been established. The four separate genetic toxicity databases are compiled into a single, structurally-searchable database for easy accessibility of the toxicity information. Beyond the genetic toxicity databases described here, additional databases for subchronic, chronic, and teratogenicity studies have been prepared.« less
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
NASA Technical Reports Server (NTRS)
Applebaum, Michael P.; Hall, Leslie, H.; Eppard, William M.; Purinton, David C.; Campbell, John R.; Blevins, John A.
2015-01-01
This paper describes the development, testing, and utilization of an aerodynamic force and moment database for the Space Launch System (SLS) Service Module (SM) panel jettison event. The database is a combination of inviscid Computational Fluid Dynamic (CFD) data and MATLAB code written to query the data at input values of vehicle/SM panel parameters and return the aerodynamic force and moment coefficients of the panels as they are jettisoned from the vehicle. The database encompasses over 5000 CFD simulations with the panels either in the initial stages of separation where they are hinged to the vehicle, in close proximity to the vehicle, or far enough from the vehicle that body interference effects are neglected. A series of viscous CFD check cases were performed to assess the accuracy of the Euler solutions for this class of problem and good agreement was obtained. The ultimate goal of the panel jettison database was to create a tool that could be coupled with any 6-Degree-Of-Freedom (DOF) dynamics model to rapidly predict SM panel separation from the SLS vehicle in a quasi-unsteady manner. Results are presented for panel jettison simulations that utilize the database at various SLS flight conditions. These results compare favorably to an approach that directly couples a 6-DOF model with the Cart3D Euler flow solver and obtains solutions for the panels at exact locations. This paper demonstrates a method of using inviscid CFD simulations coupled with a 6-DOF model that provides adequate fidelity to capture the physics of this complex multiple moving-body panel separation event.
Database resources of the National Center for Biotechnology Information
Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry
2014-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429
NASA Astrophysics Data System (ADS)
Knosp, B.; Gangl, M.; Hristova-Veleva, S. M.; Kim, R. M.; Li, P.; Turk, J.; Vu, Q. A.
2015-12-01
The JPL Tropical Cyclone Information System (TCIS) brings together satellite, aircraft, and model forecast data from several NASA, NOAA, and other data centers to assist researchers in comparing and analyzing data and model forecast related to tropical cyclones. The TCIS has been running a near-real time (NRT) data portal during North Atlantic hurricane season that typically runs from June through October each year, since 2010. Data collected by the TCIS varies by type, format, contents, and frequency and is served to the user in two ways: (1) as image overlays on a virtual globe and (2) as derived output from a suite of analysis tools. In order to support these two functions, the data must be collected and then made searchable by criteria such as date, mission, product, pressure level, and geospatial region. Creating a database architecture that is flexible enough to manage, intelligently interrogate, and ultimately present this disparate data to the user in a meaningful way has been the primary challenge. The database solution for the TCIS has been to use a hybrid MySQL + Solr implementation. After testing other relational database and NoSQL solutions, such as PostgreSQL and MongoDB respectively, this solution has given the TCIS the best offerings in terms of query speed and result reliability. This database solution also supports the challenging (and memory overwhelming) geospatial queries that are necessary to support analysis tools requested by users. Though hardly new technologies on their own, our implementation of MySQL + Solr had to be customized and tuned to be able to accurately store, index, and search the TCIS data holdings. In this presentation, we will discuss how we arrived on our MySQL + Solr database architecture, why it offers us the most consistent fast and reliable results, and how it supports our front end so that we can offer users a look into our "big data" holdings.
Assessment of CFD-based Response Surface Model for Ares I Supersonic Ascent Aerodynamics
NASA Technical Reports Server (NTRS)
Hanke, Jeremy L.
2011-01-01
The Ascent Force and Moment Aerodynamic (AFMA) Databases (DBs) for the Ares I Crew Launch Vehicle (CLV) were typically based on wind tunnel (WT) data, with increments provided by computational fluid dynamics (CFD) simulations for aspects of the vehicle that could not be tested in the WT tests. During the Design Analysis Cycle 3 analysis for the outer mold line (OML) geometry designated A106, a major tunnel mishap delayed the WT test for supersonic Mach numbers (M) greater than 1.6 in the Unitary Plan Wind Tunnel at NASA Langley Research Center, and the test delay pushed the final delivery of the A106 AFMA DB back by several months. The aero team developed an interim database based entirely on the already completed CFD simulations to mitigate the impact of the delay. This CFD-based database used a response surface methodology based on radial basis functions to predict the aerodynamic coefficients for M > 1.6 based on only the CFD data from both WT and flight Reynolds number conditions. The aero team used extensive knowledge of the previous AFMA DB for the A103 OML to guide the development of the CFD-based A106 AFMA DB. This report details the development of the CFD-based A106 Supersonic AFMA DB, constructs a prediction of the database uncertainty using data available at the time of development, and assesses the overall quality of the CFD-based DB both qualitatively and quantitatively. This assessment confirms that a reasonable aerodynamic database can be constructed for launch vehicles at supersonic conditions using only CFD data if sufficient knowledge of the physics and expected behavior is available. This report also demonstrates the applicability of non-parametric response surface modeling using radial basis functions for development of aerodynamic databases that exhibit both linear and non-linear behavior throughout a large data space.
Food Composition Database Format and Structure: A User Focused Approach
Clancy, Annabel K.; Woods, Kaitlyn; McMahon, Anne; Probst, Yasmine
2015-01-01
This study aimed to investigate the needs of Australian food composition database user’s regarding database format and relate this to the format of databases available globally. Three semi structured synchronous online focus groups (M = 3, F = 11) and n = 6 female key informant interviews were recorded. Beliefs surrounding the use, training, understanding, benefits and limitations of food composition data and databases were explored. Verbatim transcriptions underwent preliminary coding followed by thematic analysis with NVivo qualitative analysis software to extract the final themes. Schematic analysis was applied to the final themes related to database format. Desktop analysis also examined the format of six key globally available databases. 24 dominant themes were established, of which five related to format; database use, food classification, framework, accessibility and availability, and data derivation. Desktop analysis revealed that food classification systems varied considerably between databases. Microsoft Excel was a common file format used in all databases, and available software varied between countries. User’s also recognised that food composition databases format should ideally be designed specifically for the intended use, have a user-friendly food classification system, incorporate accurate data with clear explanation of data derivation and feature user input. However, such databases are limited by data availability and resources. Further exploration of data sharing options should be considered. Furthermore, user’s understanding of food composition data and databases limitations is inherent to the correct application of non-specific databases. Therefore, further exploration of user FCDB training should also be considered. PMID:26554836
NASA Technical Reports Server (NTRS)
Maluf, David A. (Inventor); Bell, David G. (Inventor); Gurram, Mohana M. (Inventor); Gawdiak, Yuri O. (Inventor)
2009-01-01
A system for managing a project that includes multiple tasks and a plurality of workers. Input information includes characterizations based upon a human model, a team model and a product model. Periodic reports, such as a monthly report, a task plan report, a budget report and a risk management report, are generated and made available for display or further analysis. An extensible database allows searching for information based upon context and upon content.
Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-05-01
To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
VerSeDa: vertebrate secretome database
Cortazar, Ana R.; Oguiza, José A.
2017-01-01
Based on the current tools, de novo secretome (full set of proteins secreted by an organism) prediction is a time consuming bioinformatic task that requires a multifactorial analysis in order to obtain reliable in silico predictions. Hence, to accelerate this process and offer researchers a reliable repository where secretome information can be obtained for vertebrates and model organisms, we have developed VerSeDa (Vertebrate Secretome Database). This freely available database stores information about proteins that are predicted to be secreted through the classical and non-classical mechanisms, for the wide range of vertebrate species deposited at the NCBI, UCSC and ENSEMBL sites. To our knowledge, VerSeDa is the only state-of-the-art database designed to store secretome data from multiple vertebrate genomes, thus, saving an important amount of time spent in the prediction of protein features that can be retrieved from this repository directly. Database URL: VerSeDa is freely available at http://genomics.cicbiogune.es/VerSeDa/index.php PMID:28365718
Solomon, Nancy Pearl; Dietsch, Angela M; Dietrich-Burns, Katie E; Styrmisdottir, Edda L; Armao, Christopher S
2016-05-01
This report describes the development and preliminary analysis of a database for traumatically injured military service members with dysphagia. A multidimensional database was developed to capture clinical variables related to swallowing. Data were derived from clinical records and instrumental swallow studies, and ranged from demographics, injury characteristics, swallowing biomechanics, medications, and standardized tools (e.g., Glasgow Coma Scale, Penetration-Aspiration Scale). Bayesian Belief Network modeling was used to analyze the data at intermediate points, guide data collection, and predict outcomes. Predictive models were validated with independent data via receiver operating characteristic curves. The first iteration of the model (n = 48) revealed variables that could be collapsed for the second model (n = 96). The ability to predict recovery from dysphagia improved from the second to third models (area under the curve = 0.68 to 0.86). The third model, based on 161 cases, revealed "initial diet restrictions" as first-degree, and "Glasgow Coma Scale, intubation history, and diet change" as second-degree associates for diet restrictions at discharge. This project demonstrates the potential for bioinformatics to advance understanding of dysphagia. This database in concert with Bayesian Belief Network modeling makes it possible to explore predictive relationships between injuries and swallowing function, individual variability in recovery, and appropriate treatment options. Reprint & Copyright © 2016 Association of Military Surgeons of the U.S.
NASA Astrophysics Data System (ADS)
Heffernan, Julieanne; Biedermann, Eric; Mayes, Alexander; Livings, Richard; Jauriqui, Leanne; Goodlet, Brent; Aldrin, John C.; Mazdiyasni, Siamack
2018-04-01
Process Compensated Resonant Testing (PCRT) is a full-body nondestructive testing (NDT) method that measures the resonance frequencies of a part and correlates them to the part's material and/or damage state. PCRT testing is used in the automotive, aerospace, and power generation industries via automated PASS/FAIL inspections to distinguish parts with nominal process variation from those with the defect(s) of interest. Traditional PCRT tests are created through the statistical analysis of populations of "good" and "bad" parts. However, gathering a statistically significant number of parts can be costly and time-consuming, and the availability of defective parts may be limited. This work uses virtual databases of good and bad parts to create two targeted PCRT inspections for single crystal (SX) nickel-based superalloy turbine blades. Using finite element (FE) models, populations were modeled to include variations in geometric dimensions, material properties, crystallographic orientation, and creep damage. Model results were verified by comparing the frequency variation in the modeled populations with the measured frequency variations of several physical blade populations. Additionally, creep modeling results were verified through the experimental evaluation of coupon geometries. A virtual database of resonance spectra was created from the model data. The virtual database was used to create PCRT inspections to detect crystallographic defects and creep strain. Quantification of creep strain values using the PCRT inspection results was also demonstrated.
McCabe, Simon; Arndt, Jamie; Goldenberg, Jamie L; Vess, Matthew; Vail, Kenneth E; Gibbons, Frederick X; Rogers, Ross
2015-03-01
To use insights from an integration of the terror management health model and the prototype willingness model to inform and improve nutrition-related behavior using an ecologically valid outcome. Prior to shopping, grocery shoppers were exposed to a reminder of mortality (or pain) and then visualized a healthy (vs. neutral) prototype. Receipts were collected postshopping and food items purchased were coded using a nutrition database. Compared with those in the control conditions, participants who received the mortality reminder and who were led to visualize a healthy eater prototype purchased more nutritious foods. The integration of the terror management health model and the prototype willingness model has the potential for both basic and applied advances and offers a generative ground for future research. PsycINFO Database Record (c) 2015 APA, all rights reserved.
NASA Astrophysics Data System (ADS)
Liang, Y.; Gallaher, D. W.; Grant, G.; Lv, Q.
2011-12-01
Change over time, is the central driver of climate change detection. The goal is to diagnose the underlying causes, and make projections into the future. In an effort to optimize this process we have developed the Data Rod model, an object-oriented approach that provides the ability to query grid cell changes and their relationships to neighboring grid cells through time. The time series data is organized in time-centric structures called "data rods." A single data rod can be pictured as the multi-spectral data history at one grid cell: a vertical column of data through time. This resolves the long-standing problem of managing time-series data and opens new possibilities for temporal data analysis. This structure enables rapid time- centric analysis at any grid cell across multiple sensors and satellite platforms. Collections of data rods can be spatially and temporally filtered, statistically analyzed, and aggregated for use with pattern matching algorithms. Likewise, individual image pixels can be extracted to generate multi-spectral imagery at any spatial and temporal location. The Data Rods project has created a series of prototype databases to store and analyze massive datasets containing multi-modality remote sensing data. Using object-oriented technology, this method overcomes the operational limitations of traditional relational databases. To demonstrate the speed and efficiency of time-centric analysis using the Data Rods model, we have developed a sea ice detection algorithm. This application determines the concentration of sea ice in a small spatial region across a long temporal window. If performed using traditional analytical techniques, this task would typically require extensive data downloads and spatial filtering. Using Data Rods databases, the exact spatio-temporal data set is immediately available No extraneous data is downloaded, and all selected data querying occurs transparently on the server side. Moreover, fundamental statistical calculations such as running averages are easily implemented against the time-centric columns of data.
Use of modeling to identify vulnerabilities to human error in laparoscopy.
Funk, Kenneth H; Bauer, James D; Doolen, Toni L; Telasha, David; Nicolalde, R Javier; Reeber, Miriam; Yodpijit, Nantakrit; Long, Myra
2010-01-01
This article describes an exercise to investigate the utility of modeling and human factors analysis in understanding surgical processes and their vulnerabilities to medical error. A formal method to identify error vulnerabilities was developed and applied to a test case of Veress needle insertion during closed laparoscopy. A team of 2 surgeons, a medical assistant, and 3 engineers used hierarchical task analysis and Integrated DEFinition language 0 (IDEF0) modeling to create rich models of the processes used in initial port creation. Using terminology from a standardized human performance database, detailed task descriptions were written for 4 tasks executed in the process of inserting the Veress needle. Key terms from the descriptions were used to extract from the database generic errors that could occur. Task descriptions with potential errors were translated back into surgical terminology. Referring to the process models and task descriptions, the team used a modified failure modes and effects analysis (FMEA) to consider each potential error for its probability of occurrence, its consequences if it should occur and be undetected, and its probability of detection. The resulting likely and consequential errors were prioritized for intervention. A literature-based validation study confirmed the significance of the top error vulnerabilities identified using the method. Ongoing work includes design and evaluation of procedures to correct the identified vulnerabilities and improvements to the modeling and vulnerability identification methods. Copyright 2010 AAGL. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Auer, M.; Agugiaro, G.; Billen, N.; Loos, L.; Zipf, A.
2014-05-01
Many important Cultural Heritage sites have been studied over long periods of time by different means of technical equipment, methods and intentions by different researchers. This has led to huge amounts of heterogeneous "traditional" datasets and formats. The rising popularity of 3D models in the field of Cultural Heritage in recent years has brought additional data formats and makes it even more necessary to find solutions to manage, publish and study these data in an integrated way. The MayaArch3D project aims to realize such an integrative approach by establishing a web-based research platform bringing spatial and non-spatial databases together and providing visualization and analysis tools. Especially the 3D components of the platform use hierarchical segmentation concepts to structure the data and to perform queries on semantic entities. This paper presents a database schema to organize not only segmented models but also different Levels-of-Details and other representations of the same entity. It is further implemented in a spatial database which allows the storing of georeferenced 3D data. This enables organization and queries by semantic, geometric and spatial properties. As service for the delivery of the segmented models a standardization candidate of the OpenGeospatialConsortium (OGC), the Web3DService (W3DS) has been extended to cope with the new database schema and deliver a web friendly format for WebGL rendering. Finally a generic user interface is presented which uses the segments as navigation metaphor to browse and query the semantic segmentation levels and retrieve information from an external database of the German Archaeological Institute (DAI).
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets
NASA Astrophysics Data System (ADS)
Sorokine, A.; Stewart, R. N.
2017-10-01
Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Karagiannidou, Maria; Wittenberg, Raphael; Landeiro, Filipa Isabel Trigo; Park, A-La; Fry, Andra; Knapp, Martin; Gray, Alastair M; Tockhorn-Heidenreich, Antje; Castro Sanchez, Amparo Yovanna; Ghinai, Isaac; Handels, Ron; Lecomte, Pascal; Wolstenholme, Jane
2018-06-08
Dementia is one of the greatest health challenges the world will face in the coming decades, as it is one of the principal causes of disability and dependency among older people. Economic modelling is used widely across many health conditions to inform decisions on health and social care policy and practice. The aim of this literature review is to systematically identify, review and critically evaluate existing health economics models in dementia. We included the full spectrum of dementia, including Alzheimer's disease (AD), from preclinical stages through to severe dementia and end of life. This review forms part of the Real world Outcomes across the Alzheimer's Disease spectrum for better care: multimodal data Access Platform (ROADMAP) project. Electronic searches were conducted in Medical Literature Analysis and Retrieval System Online, Excerpta Medica dataBASE, Economic Literature Database, NHS Economic Evaluation Database, Cochrane Central Register of Controlled Trials, Cost-Effectiveness Analysis Registry, Research Papers in Economics, Database of Abstracts of Reviews of Effectiveness, Science Citation Index, Turning Research Into Practice and Open Grey for studies published between January 2000 and the end of June 2017. Two reviewers will independently assess each study against predefined eligibility criteria. A third reviewer will resolve any disagreement. Data will be extracted using a predefined data extraction form following best practice. Study quality will be assessed using the Phillips checklist for decision analytic modelling. A narrative synthesis will be used. The results will be made available in a scientific peer-reviewed journal paper, will be presented at relevant conferences and will also be made available through the ROADMAP project. CRD42017073874. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.
Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh
2016-01-01
Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.
NeisseriaBase: a specialised Neisseria genomic resource and analysis platform
Zheng, Wenning; Mutha, Naresh V.R.; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S.; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah
2016-01-01
Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my. PMID:27017950
Murumkar, Prashant R; Giridhar, Rajani; Yadav, Mange Ram
2008-04-01
A set of 29 benzothiadiazepine hydroxamates having selective tumor necrosis factor-alpha converting enzyme inhibitory activity were used to compare the quality and predictive power of 3D-quantitative structure-activity relationship, comparative molecular field analysis, and comparative molecular similarity indices models for the atom-based, centroid/atom-based, data-based, and docked conformer-based alignment. Removal of two outliers from the initial training set of molecules improved the predictivity of models. Among the 3D-quantitative structure-activity relationship models developed using the above four alignments, the database alignment provided the optimal predictive comparative molecular field analysis model for the training set with cross-validated r(2) (q(2)) = 0.510, non-cross-validated r(2) = 0.972, standard error of estimates (s) = 0.098, and F = 215.44 and the optimal comparative molecular similarity indices model with cross-validated r(2) (q(2)) = 0.556, non-cross-validated r(2) = 0.946, standard error of estimates (s) = 0.163, and F = 99.785. These models also showed the best test set prediction for six compounds with predictive r(2) values of 0.460 and 0.535, respectively. The contour maps obtained from 3D-quantitative structure-activity relationship studies were appraised for activity trends for the molecules analyzed. The comparative molecular similarity indices models exhibited good external predictivity as compared with that of comparative molecular field analysis models. The data generated from the present study helped us to further design and report some novel and potent tumor necrosis factor-alpha converting enzyme inhibitors.
SSBD: a database of quantitative data of spatiotemporal dynamics of biological phenomena
Tohsato, Yukako; Ho, Kenneth H. L.; Kyoda, Koji; Onami, Shuichi
2016-01-01
Motivation: Rapid advances in live-cell imaging analysis and mathematical modeling have produced a large amount of quantitative data on spatiotemporal dynamics of biological objects ranging from molecules to organisms. There is now a crucial need to bring these large amounts of quantitative biological dynamics data together centrally in a coherent and systematic manner. This will facilitate the reuse of this data for further analysis. Results: We have developed the Systems Science of Biological Dynamics database (SSBD) to store and share quantitative biological dynamics data. SSBD currently provides 311 sets of quantitative data for single molecules, nuclei and whole organisms in a wide variety of model organisms from Escherichia coli to Mus musculus. The data are provided in Biological Dynamics Markup Language format and also through a REST API. In addition, SSBD provides 188 sets of time-lapse microscopy images from which the quantitative data were obtained and software tools for data visualization and analysis. Availability and Implementation: SSBD is accessible at http://ssbd.qbic.riken.jp. Contact: sonami@riken.jp PMID:27412095
SSBD: a database of quantitative data of spatiotemporal dynamics of biological phenomena.
Tohsato, Yukako; Ho, Kenneth H L; Kyoda, Koji; Onami, Shuichi
2016-11-15
Rapid advances in live-cell imaging analysis and mathematical modeling have produced a large amount of quantitative data on spatiotemporal dynamics of biological objects ranging from molecules to organisms. There is now a crucial need to bring these large amounts of quantitative biological dynamics data together centrally in a coherent and systematic manner. This will facilitate the reuse of this data for further analysis. We have developed the Systems Science of Biological Dynamics database (SSBD) to store and share quantitative biological dynamics data. SSBD currently provides 311 sets of quantitative data for single molecules, nuclei and whole organisms in a wide variety of model organisms from Escherichia coli to Mus musculus The data are provided in Biological Dynamics Markup Language format and also through a REST API. In addition, SSBD provides 188 sets of time-lapse microscopy images from which the quantitative data were obtained and software tools for data visualization and analysis. SSBD is accessible at http://ssbd.qbic.riken.jp CONTACT: sonami@riken.jp. © The Author 2016. Published by Oxford University Press.
Validation of multi-mission satellite altimetry for the Baltic Sea region
NASA Astrophysics Data System (ADS)
Kudryavtseva, Nadia; Soomere, Tarmo; Giudici, Andrea
2016-04-01
Currently, three sources of wave data are available for the research community, namely, buoys, modelling, and satellite altimetry. The buoy measurements provide high-quality time series of wave properties but they are deployed only in a few locations. Wave modelling covers large domains and provides good results for the open sea conditions. However, the limitation of modelling is that the results are dependent on wind quality and assumptions put into the model. Satellite altimetry in many occasions provides homogeneous data over large sea areas with an appreciable spatial and temporal resolution. The use of satellite altimetry is problematic in coastal areas and partially ice-covered water bodies. These limitations can be circumvented by careful analysis of the geometry of the basin, ice conditions and spatial coverage of each altimetry snapshot. In this poster, for the first time, we discuss a validation of 30 years of multi-mission altimetry covering the whole Baltic Sea. We analysed data from RADS database (Scharroo et al. 2013) which span from 1985 to 2015. To assess the limitations of the satellite altimeter data quality, the data were cross-matched with available wave measurements from buoys of the Swedish Meteorological and Hydrological Institute and Finnish Meteorological Institute. The altimeter-measured significant wave heights showed a very good correspondence with the wave buoys. We show that the data with backscatter coefficients more than 13.5 and high errors in significant wave heights and range should be excluded. We also examined the effect of ice cover and distance from the land on satellite altimetry measurements. The analysis of cross-matches between the satellite altimetry data and buoys' measurements shows that the data are only corrupted in the nearshore domain within 0.2 degrees from the coast. The statistical analysis showed a significant decrease in wave heights for sea areas with ice concentration more than 30 percent. We also checked and corrected the data for biases between different missions. This analysis provides a unique uniform database of satellite altimetry measurements over the whole Baltic Sea, which can be further used for finding biases in wave modelling and studies of wave climatology. The database is available upon request.
FY17 Status Report on the Initial Development of a Constitutive Model for Grade 91 Steel
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messner, M. C.; Phan, V. -T.; Sham, T. -L.
Grade 91 is a candidate structural material for high temperature advanced reactor applications. Existing ASME Section III, Subsection HB, Subpart B simplified design rules based on elastic analysis are setup as conservative screening tools with the intent to supplement these screening rules with full inelastic analysis when required. The Code provides general guidelines for suitable inelastic models but does not provide constitutive model implementations. This report describes the development of an inelastic constitutive model for Gr. 91 steel aimed at fulfilling the ASME Code requirements and being included into a new Section III Code appendix, HBB-Z. A large database ofmore » over 300 experiments on Gr. 91 was collected and converted to a standard XML form. Five families of Gr. 91 material models were identified in the literature. Of these five, two are potentially suitable for use in the ASME code. These two models were implemented and evaluated against the experimental database. Both models have deficiencies so the report develops a framework for developing and calibrating an improved model. This required creating a new modeling method for representing changes in material rate sensitivity across the full ASME allowable temperature range for Gr. 91 structural components: room temperature to 650° C. On top of this framework for rate sensitivity the report describes calibrating a model for work hardening and softening in the material using genetic algorithm optimization. Future work will focus on improving this trial model by including tension/compression asymmetry observed in experiments and necessary to capture material ratcheting under zero mean stress and by improving the optimization and analysis framework.« less
The PMDB Protein Model Database
Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna
2006-01-01
The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873
Hou, Xiaowen; Chen, Xin; Shi, Jingpu
2015-07-01
The association between 5, 10-methylenetetrahydrofolate reductase (MTHFR) C677T gene polymorphism and premature coronary artery disease (PCAD) is controversial. To explore a more precise estimation of the association, a meta-analysis was conducted in the present study. The relevant studies were identified by searching PubMed, EMBASE, the Web of Science, Cochrane Collaboration Database, Chinese National Knowledge Infrastructure, Wanfang Database and China Biological Medicine up to November, 2014. The meta-analysis was performed by STATA 11. 21 studies with a total of 6912 subjects, including 2972 PCAD patients and 3940 controls. The pooled analysis showed that MTHFR C677T gene polymorphism was probably associated with PCAD (CT vs. CC: OR=1.13, 95% CI=1.01-1.27; dominant model: OR=1.16, 95% CI=1.04-1.29; recessive model: OR=1.19, 95% CI=1.00-1.40; allele analysis: OR=1.17, 95% CI=1.01-1.34). Subgroup analysis by plasma homocysteine concentration showed a significant association in the homocysteine >15μmol/L subgroup (CT vs. CC: OR=1.44, 95% CI=1.10-1.88; TT vs. CC: OR=2.51, 95% CI=1.12-5.63; dominant model: OR=1.51, 95% CI=1.16-1.96; recessive model: OR=2.33, 95% CI=1.05-5.20; allele analysis: OR=1.48, 95% CI=1.18-1.87). Subgroup analysis by continent displayed a significant association among the Asian population (CT vs. CC: OR=1.51, 95% CI=1.23-1.86; TT vs. CC: OR=2.81, 95% CI=1.87-4.23; dominant model: OR=1.65, 95% CI=1.35-2.01; recessive model: OR=2.22, 95% CI=1.53-3.21; allele analysis: OR=1.61, 95% CI=1.37-1.89). The statistical stability and reliability was demonstrated by sensitivity analysis and publication bias outcomes. In conclusion, the meta-analysis suggests that MTHFR C677T gene polymorphism may be associated with PCAD. Copyright © 2015 Elsevier B.V. All rights reserved.
Biermann, Martin
2014-04-01
Clinical trials aiming for regulatory approval of a therapeutic agent must be conducted according to Good Clinical Practice (GCP). Clinical Data Management Systems (CDMS) are specialized software solutions geared toward GCP-trials. They are however less suited for data management in small non-GCP research projects. For use in researcher-initiated non-GCP studies, we developed a client-server database application based on the public domain CakePHP framework. The underlying MySQL database uses a simple data model based on only five data tables. The graphical user interface can be run in any web browser inside the hospital network. Data are validated upon entry. Data contained in external database systems can be imported interactively. Data are automatically anonymized on import, and the key lists identifying the subjects being logged to a restricted part of the database. Data analysis is performed by separate statistics and analysis software connecting to the database via a generic Open Database Connectivity (ODBC) interface. Since its first pilot implementation in 2011, the solution has been applied to seven different clinical research projects covering different clinical problems in different organ systems such as cancer of the thyroid and the prostate glands. This paper shows how the adoption of a generic web application framework is a feasible, flexible, low-cost, and user-friendly way of managing multidimensional research data in researcher-initiated non-GCP clinical projects. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Data Quality Control and Maintenance for the Qweak Experiment
NASA Astrophysics Data System (ADS)
Heiner, Nicholas; Spayde, Damon
2014-03-01
The Qweak collaboration seeks to quantify the weak charge of a proton through the analysis of the parity-violating electron asymmetry in elastic electron-proton scattering. The asymmetry is calculated by measuring how many electrons deflect from a hydrogen target at the chosen scattering angle for aligned and anti-aligned electron spins, then evaluating the difference between the numbers of deflections that occurred for both polarization states. The weak charge can then be extracted from this data. Knowing the weak charge will allow us to calculate the electroweak mixing angle for the particular Q2 value of the chosen electrons, which the Standard Model makes a firm prediction for. Any significant deviation from this prediction would be a prime indicator of the existence of physics beyond what the Standard Model describes. After the experiment was conducted at Jefferson Lab, collected data was stored within a MySQL database for further analysis. I will present an overview of the database and its functions as well as a demonstration of the quality checks and maintenance performed on the data itself. These checks include an analysis of errors occurring throughout the experiment, specifically data acquisition errors within the main detector array, and an analysis of data cuts.
User's Manual for RESRAD-OFFSITE Version 2.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, C.; Gnanapragasam, E.; Biwer, B. M.
2007-09-05
The RESRAD-OFFSITE code is an extension of the RESRAD (onsite) code, which has been widely used for calculating doses and risks from exposure to radioactively contaminated soils. The development of RESRAD-OFFSITE started more than 10 years ago, but new models and methodologies have been developed, tested, and incorporated since then. Some of the new models have been benchmarked against other independently developed (international) models. The databases used have also expanded to include all the radionuclides (more than 830) contained in the International Commission on Radiological Protection (ICRP) 38 database. This manual provides detailed information on the design and application ofmore » the RESRAD-OFFSITE code. It describes in detail the new models used in the code, such as the three-dimensional dispersion groundwater flow and radionuclide transport model, the Gaussian plume model for atmospheric dispersion, and the deposition model used to estimate the accumulation of radionuclides in offsite locations and in foods. Potential exposure pathways and exposure scenarios that can be modeled by the RESRAD-OFFSITE code are also discussed. A user's guide is included in Appendix A of this manual. The default parameter values and parameter distributions are presented in Appendix B, along with a discussion on the statistical distributions for probabilistic analysis. A detailed discussion on how to reduce run time, especially when conducting probabilistic (uncertainty) analysis, is presented in Appendix C of this manual.« less
ERIC Educational Resources Information Center
Budsankom, Prayoonsri; Sawangboon, Tatsirin; Damrongpanit, Suntorapot; Chuensirimongkol, Jariya
2015-01-01
The purpose of the research is to develop and identify the validity of factors affecting higher order thinking skills (HOTS) of students. The thinking skills can be divided into three types: analytical, critical, and creative thinking. This analysis is done by applying the meta-analytic structural equation modeling (MASEM) based on a database of…
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.
Hallin, Peter F; Ussery, David W
2004-12-12
Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
Ensemble of ground subsidence hazard maps using fuzzy logic
NASA Astrophysics Data System (ADS)
Park, Inhye; Lee, Jiyeong; Saro, Lee
2014-06-01
Hazard maps of ground subsidence around abandoned underground coal mines (AUCMs) in Samcheok, Korea, were constructed using fuzzy ensemble techniques and a geographical information system (GIS). To evaluate the factors related to ground subsidence, a spatial database was constructed from topographic, geologic, mine tunnel, land use, groundwater, and ground subsidence maps. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 70/30 for training and validation of the models. The relationships between the detected ground-subsidence area and the factors were identified and quantified by frequency ratio (FR), logistic regression (LR) and artificial neural network (ANN) models. The relationships were used as factor ratings in the overlay analysis to create ground-subsidence hazard indexes and maps. The three GSH maps were then used as new input factors and integrated using fuzzy-ensemble methods to make better hazard maps. All of the hazard maps were validated by comparison with known subsidence areas that were not used directly in the analysis. As the result, the ensemble model was found to be more effective in terms of prediction accuracy than the individual model.
Kikuchi, Akira; Nakazato, Takeru; Ito, Katsuhiko; Nojima, Yosui; Yokoyama, Takeshi; Iwabuchi, Kikuo; Bono, Hidemasa; Toyoda, Atsushi; Fujiyama, Asao; Sato, Ryoichi; Tabunoki, Hiroko
2017-01-13
Various insect species have been added to genomic databases over the years. Thus, researchers can easily obtain online genomic information on invertebrates and insects. However, many incorrectly annotated genes are included in these databases, which can prevent the correct interpretation of subsequent functional analyses. To address this problem, we used a combination of dry and wet bench processes to select functional genes from public databases. Enolase is an important glycolytic enzyme in all organisms. We used a combination of dry and wet bench processes to identify functional enolases in the silkworm Bombyx mori (BmEno). First, we detected five annotated enolases from public databases using a Hidden Markov Model (HMM) search, and then through cDNA cloning, Northern blotting, and RNA-seq analysis, we revealed three functional enolases in B. mori: BmEno1, BmEno2, and BmEnoC. BmEno1 contained a conserved key amino acid residue for metal binding and substrate binding in other species. However, BmEno2 and BmEnoC showed a change in this key amino acid. Phylogenetic analysis showed that BmEno2 and BmEnoC were distinct from BmEno1 and other enolases, and were distributed only in lepidopteran clusters. BmEno1 was expressed in all of the tissues used in our study. In contrast, BmEno2 was mainly expressed in the testis with some expression in the ovary and suboesophageal ganglion. BmEnoC was weakly expressed in the testis. Quantitative RT-PCR showed that the mRNA expression of BmEno2 and BmEnoC correlated with testis development; thus, BmEno2 and BmEnoC may be related to lepidopteran-specific spermiogenesis. We identified and characterized three functional enolases from public databases with a combination of dry and wet bench processes in the silkworm B. mori. In addition, we determined that BmEno2 and BmEnoC had species-specific functions. Our strategy could be helpful for the detection of minor genes and functional genes in non-model organisms from public databases.
MIPS PlantsDB: a database framework for comparative plant genome research.
Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel
2013-01-01
The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.
MIPS PlantsDB: a database framework for comparative plant genome research
Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel
2013-01-01
The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886
IAC-1.5 - INTEGRATED ANALYSIS CAPABILITY
NASA Technical Reports Server (NTRS)
Vos, R. G.
1994-01-01
The objective of the Integrated Analysis Capability (IAC) system is to provide a highly effective, interactive analysis tool for the integrated design of large structures. IAC was developed to interface programs from the fields of structures, thermodynamics, controls, and system dynamics with an executive system and a database to yield a highly efficient multi-disciplinary system. Special attention is given to user requirements such as data handling and on-line assistance with operational features, and the ability to add new modules of the user's choice at a future date. IAC contains an executive system, a database, general utilities, interfaces to various engineering programs, and a framework for building interfaces to other programs. IAC has shown itself to be effective in automating data transfer among analysis programs. The IAC system architecture is modular in design. 1) The executive module contains an input command processor, an extensive data management system, and driver code to execute the application modules. 2) Technical modules provide standalone computational capability as well as support for various solution paths or coupled analyses. 3) Graphics and model generation modules are supplied for building and viewing models. 4) Interface modules provide for the required data flow between IAC and other modules. 5) User modules can be arbitrary executable programs or JCL procedures with no pre-defined relationship to IAC. 6) Special purpose modules are included, such as MIMIC (Model Integration via Mesh Interpolation Coefficients), which transforms field values from one model to another; LINK, which simplifies incorporation of user specific modules into IAC modules; and DATAPAC, the National Bureau of Standards statistical analysis package. The IAC database contains structured files which provide a common basis for communication between modules and the executive system, and can contain unstructured files such as NASTRAN checkpoint files, DISCOS plot files, object code, etc. The user can define groups of data and relations between them. A full data manipulation and query system operates with the database. The current interface modules comprise five groups: 1) Structural analysis - IAC contains a NASTRAN interface for standalone analysis or certain structural/control/thermal combinations. IAC provides enhanced structural capabilities for normal modes and static deformation analysis via special DMAP sequences. 2) Thermal analysis - IAC supports finite element and finite difference techniques for steady state or transient analysis. There are interfaces for the NASTRAN thermal analyzer, SINDA/SINFLO, and TRASYS II. 3) System dynamics - A DISCOS interface allows full use of this simulation program for either nonlinear time domain analysis or linear frequency domain analysis. 4) Control analysis - Interfaces for the ORACLS, SAMSAN, NBOD2, and INCA programs allow a wide range of control system analyses and synthesis techniques. 5) Graphics - The graphics packages PLOT and MOSAIC are included in IAC. PLOT generates vector displays of tabular data in the form of curves, charts, correlation tables, etc., while MOSAIC generates color raster displays of either tabular of array type data. Either DI3000 or PLOT-10 graphics software is required for full graphics capability. IAC is available by license for a period of 10 years to approved licensees. The licensed program product includes one complete set of supporting documentation. Additional copies of the documentation may be purchased separately. IAC is written in FORTRAN 77 and has been implemented on a DEC VAX series computer operating under VMS. IAC can be executed by multiple concurrent users in batch or interactive mode. The basic central memory requirement is approximately 750KB. IAC includes the executive system, graphics modules, a database, general utilities, and the interfaces to all analysis and controls programs described above. Source code is provided for the control programs ORACLS, SAMSAN, NBOD2, and DISCOS. The following programs are also available from COSMIC a
IAC-1.5 - INTEGRATED ANALYSIS CAPABILITY
NASA Technical Reports Server (NTRS)
Vos, R. G.
1994-01-01
The objective of the Integrated Analysis Capability (IAC) system is to provide a highly effective, interactive analysis tool for the integrated design of large structures. IAC was developed to interface programs from the fields of structures, thermodynamics, controls, and system dynamics with an executive system and a database to yield a highly efficient multi-disciplinary system. Special attention is given to user requirements such as data handling and on-line assistance with operational features, and the ability to add new modules of the user's choice at a future date. IAC contains an executive system, a database, general utilities, interfaces to various engineering programs, and a framework for building interfaces to other programs. IAC has shown itself to be effective in automating data transfer among analysis programs. The IAC system architecture is modular in design. 1) The executive module contains an input command processor, an extensive data management system, and driver code to execute the application modules. 2) Technical modules provide standalone computational capability as well as support for various solution paths or coupled analyses. 3) Graphics and model generation modules are supplied for building and viewing models. 4) Interface modules provide for the required data flow between IAC and other modules. 5) User modules can be arbitrary executable programs or JCL procedures with no pre-defined relationship to IAC. 6) Special purpose modules are included, such as MIMIC (Model Integration via Mesh Interpolation Coefficients), which transforms field values from one model to another; LINK, which simplifies incorporation of user specific modules into IAC modules; and DATAPAC, the National Bureau of Standards statistical analysis package. The IAC database contains structured files which provide a common basis for communication between modules and the executive system, and can contain unstructured files such as NASTRAN checkpoint files, DISCOS plot files, object code, etc. The user can define groups of data and relations between them. A full data manipulation and query system operates with the database. The current interface modules comprise five groups: 1) Structural analysis - IAC contains a NASTRAN interface for standalone analysis or certain structural/control/thermal combinations. IAC provides enhanced structural capabilities for normal modes and static deformation analysis via special DMAP sequences. 2) Thermal analysis - IAC supports finite element and finite difference techniques for steady state or transient analysis. There are interfaces for the NASTRAN thermal analyzer, SINDA/SINFLO, and TRASYS II. 3) System dynamics - A DISCOS interface allows full use of this simulation program for either nonlinear time domain analysis or linear frequency domain analysis. 4) Control analysis - Interfaces for the ORACLS, SAMSAN, NBOD2, and INCA programs allow a wide range of control system analyses and synthesis techniques. 5) Graphics - The graphics packages PLOT and MOSAIC are included in IAC. PLOT generates vector displays of tabular data in the form of curves, charts, correlation tables, etc., while MOSAIC generates color raster displays of either tabular of array type data. Either DI3000 or PLOT-10 graphics software is required for full graphics capability. IAC is available by license for a period of 10 years to approved licensees. The licensed program product includes one complete set of supporting documentation. Additional copies of the documentation may be purchased separately. IAC is written in FORTRAN 77 and has been implemented on a DEC VAX series computer operating under VMS. IAC can be executed by multiple concurrent users in batch or interactive mode. The basic central memory requirement is approximately 750KB. IAC includes the executive system, graphics modules, a database, general utilities, and the interfaces to all analysis and controls programs described above. Source code is provided for the control programs ORACLS, SAMSAN, NBOD2, and DISCOS. The following programs are also available from COSMIC as separate packages: NASTRAN, SINDA/SINFLO, TRASYS II, DISCOS, ORACLS, SAMSAN, NBOD2, and INCA. IAC was developed in 1985.
Implementation of a computer database testing and analysis program.
Rouse, Deborah P
2007-01-01
The author is the coordinator of a computer software database testing and analysis program implemented in an associate degree nursing program. Computer software database programs help support the testing development and analysis process. Critical thinking is measurable and promoted with their use. The reader of this article will learn what is involved in procuring and implementing a computer database testing and analysis program in an academic nursing program. The use of the computerized database for testing and analysis will be approached as a method to promote and evaluate the nursing student's critical thinking skills and to prepare the nursing student for the National Council Licensure Examination.
Jeffries, D J; Donkor, S; Brookes, R H; Fox, A; Hill, P C
2004-09-01
The data requirements of a large multidisciplinary tuberculosis case contact study are complex. We describe an ACCESS-based relational database system that meets our rigorous requirements for data entry and validation, while being user-friendly, flexible, exportable, and easy to install on a network or stand alone system. This includes the development of a double data entry package for epidemiology and laboratory data, semi-automated entry of ELISPOT data directly from the plate reader, and a suite of new programmes for the manipulation and integration of flow cytometry data. The double entered epidemiology and immunology databases are combined into a separate database, providing a near-real-time analysis of immuno-epidemiological data, allowing important trends to be identified early and major decisions about the study to be made and acted on. This dynamic data management model is portable and can easily be applied to other studies.
Concepts and data model for a co-operative neurovascular database.
Mansmann, U; Taylor, W; Porter, P; Bernarding, J; Jäger, H R; Lasjaunias, P; Terbrugge, K; Meisel, J
2001-08-01
Problems of clinical management of neurovascular diseases are very complex. This is caused by the chronic character of the diseases, a long history of symptoms and diverse treatments. If patients are to benefit from treatment, then treatment decisions have to rely on reliable and accurate knowledge of the natural history of the disease and the various treatments. Recent developments in statistical methodology and experience from electronic patient records are used to establish an information infrastructure based on a centralized register. A protocol to collect data on neurovascular diseases with technical as well as logistical aspects of implementing a database for neurovascular diseases are described. The database is designed as a co-operative tool of audit and research available to co-operating centres. When a database is linked to a systematic patient follow-up, it can be used to study prognosis. Careful analysis of patient outcome is valuable for decision-making.
NASA Technical Reports Server (NTRS)
Rauch, T.; Rudkowski, A.; Kampka, D.; Werner, K.; Kruk, J. W.; Moehler, S.
2014-01-01
Context. In the framework of the Virtual Observatory (VO), the German Astrophysical VO (GAVO) developed the registered service TheoSSA (Theoretical Stellar Spectra Access). It provides easy access to stellar spectral energy distributions (SEDs) and is intended to ingest SEDs calculated by any model-atmosphere code, generally for all effective temperatures, surface gravities, and elemental compositions. We will establish a database of SEDs of flux standards that are easily accessible via TheoSSA's web interface. Aims. The OB-type subdwarf Feige 110 is a standard star for flux calibration. State-of-the-art non-local thermodynamic equilibrium stellar-atmosphere models that consider opacities of species up to trans-iron elements will be used to provide a reliable synthetic spectrum to compare with observations. Methods. In case of Feige 110, we demonstrate that the model reproduces not only its overall continuum shape from the far-ultraviolet (FUV) to the optical wavelength range but also the numerous metal lines exhibited in its FUV spectrum. Results. We present a state-of-the-art spectral analysis of Feige 110. We determined Teff =47 250 +/- 2000 K, log g=6.00 +/- 0.20, and the abundances of He, N, P, S, Ti, V, Cr, Mn, Fe, Co, Ni, Zn, and Ge. Ti, V, Mn, Co, Zn, and Ge were identified for the first time in this star. Upper abundance limits were derived for C, O, Si, Ca, and Sc. Conclusions. The TheoSSA database of theoretical SEDs of stellar flux standards guarantees that the flux calibration of astronomical data and cross-calibration between different instruments can be based on models and SEDs calculated with state-of-the-art model atmosphere codes.
Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio
2017-01-01
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet. PMID:28695067
Costa, Raquel L; Gadelha, Luiz; Ribeiro-Alves, Marcelo; Porto, Fábio
2017-01-01
There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.
ASM Based Synthesis of Handwritten Arabic Text Pages
Al-Hamadi, Ayoub; Elzobi, Moftah; El-etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available. PMID:26295059
ASM Based Synthesis of Handwritten Arabic Text Pages.
Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
NASA Astrophysics Data System (ADS)
Brooks, G. R.
2011-12-01
Dust storm forecasting is a critical part of military theater operations in Afghanistan and Iraq as well as other strategic areas of the globe. The Air Force Weather Agency (AFWA) has been using the Dust Transport Application (DTA) as a forecasting tool since 2001. Initially developed by The Johns Hopkins University Applied Physics Laboratory (JHUAPL), output products include dust concentration and reduction of visibility due to dust. The performance of the products depends on several factors including the underlying dust source database, treatment of soil moisture, parameterization of dust processes, and validity of the input atmospheric model data. Over many years of analysis, seasonal dust forecast biases of the DTA have been observed and documented. As these products are unique and indispensible for U.S. and NATO forces, amendments were required to provide the best forecasts possible. One of the quickest ways to scientifically address the dust concentration biases noted over time was to analyze the weaknesses in, and adjust the dust source database. Dust source database strengths and weaknesses, the satellite analysis and adjustment process, and tests which confirmed the resulting improvements in the final dust concentration and visibility products will be shown.
NASA Technical Reports Server (NTRS)
Shapiro, Linda G.; Tanimoto, Steven L.; Ahrens, James P.
1996-01-01
The goal of this task was to create a design and prototype implementation of a database environment that is particular suited for handling the image, vision and scientific data associated with the NASA's EOC Amazon project. The focus was on a data model and query facilities that are designed to execute efficiently on parallel computers. A key feature of the environment is an interface which allows a scientist to specify high-level directives about how query execution should occur.
Neural Network Cloud Classification Research
1993-03-01
analysis of the database made this study possible. We would also like to thank Don Chisolm and Rosemary Dyer for their enlightening discussions and...elements of the model correspond closely to neurophysiological data about the visual cortex. Efficient versions of the BCS and FCS have been
Alliance Building in the Information and Online Database Industry.
ERIC Educational Resources Information Center
Alexander, Johanna Olson
2001-01-01
Presents an analysis of information industry alliance formation using environmental scanning methods. Highlights include why libraries and academic institutions should be interested; a literature review; historical context; industry and market structures; commercial and academic models; trends; and implications for information providers,…
Research for and by Practitioners.
ERIC Educational Resources Information Center
Templin, Thomas J.; And Others
1992-01-01
Seven articles discuss research by and for practitioners. The topics include demystification of research for practitioners, experiences with helping teacher researchers, an application of a collaborative action research model, one health practitioner's experience, creating a dance research database, basic data analysis for nonresearchers, and why…
dbHiMo: a web-based epigenomics platform for histone-modifying enzymes.
Choi, Jaeyoung; Kim, Ki-Tae; Huh, Aram; Kwon, Seomun; Hong, Changyoung; Asiegbu, Fred O; Jeon, Junhyun; Lee, Yong-Hwan
2015-01-01
Over the past two decades, epigenetics has evolved into a key concept for understanding regulation of gene expression. Among many epigenetic mechanisms, covalent modifications such as acetylation and methylation of lysine residues on core histones emerged as a major mechanism in epigenetic regulation. Here, we present the database for histone-modifying enzymes (dbHiMo; http://hme.riceblast.snu.ac.kr/) aimed at facilitating functional and comparative analysis of histone-modifying enzymes (HMEs). HMEs were identified by applying a search pipeline built upon profile hidden Markov model (HMM) to proteomes. The database incorporates 11,576 HMEs identified from 603 proteomes including 483 fungal, 32 plants and 51 metazoan species. The dbHiMo provides users with web-based personalized data browsing and analysis tools, supporting comparative and evolutionary genomics. With comprehensive data entries and associated web-based tools, our database will be a valuable resource for future epigenetics/epigenomics studies. © The Author(s) 2015. Published by Oxford University Press.
A neotropical Miocene pollen database employing image-based search and semantic modeling1
Han, Jing Ginger; Cao, Hongfei; Barb, Adrian; Punyasena, Surangi W.; Jaramillo, Carlos; Shyu, Chi-Ren
2014-01-01
• Premise of the study: Digital microscopic pollen images are being generated with increasing speed and volume, producing opportunities to develop new computational methods that increase the consistency and efficiency of pollen analysis and provide the palynological community a computational framework for information sharing and knowledge transfer. • Methods: Mathematical methods were used to assign trait semantics (abstract morphological representations) of the images of neotropical Miocene pollen and spores. Advanced database-indexing structures were built to compare and retrieve similar images based on their visual content. A Web-based system was developed to provide novel tools for automatic trait semantic annotation and image retrieval by trait semantics and visual content. • Results: Mathematical models that map visual features to trait semantics can be used to annotate images with morphology semantics and to search image databases with improved reliability and productivity. Images can also be searched by visual content, providing users with customized emphases on traits such as color, shape, and texture. • Discussion: Content- and semantic-based image searches provide a powerful computational platform for pollen and spore identification. The infrastructure outlined provides a framework for building a community-wide palynological resource, streamlining the process of manual identification, analysis, and species discovery. PMID:25202648
Health economic analyses in medical nutrition: a systematic literature review.
Walzer, Stefan; Droeschel, Daniel; Nuijten, Mark; Chevrou-Séverac, Hélène
2014-01-01
Medical nutrition is a specific nutrition category either covering specific dietary needs and/or nutrient deficiency in patients or feeding patients unable to eat normally. Medical nutrition is regulated by a specific bill in Europe and in the US, with specific legislation and guidelines, and is provided to patients with special nutritional needs and indications for nutrition support. Therefore, medical nutrition products are delivered by medical prescription and supervised by health care professionals. Although these products have existed for more than 2 decades, health economic evidence of medical nutrition interventions is scarce. This research assesses the current published health economic evidence for medical nutrition by performing a systematic literature review related to health economic analysis of medical nutrition. A systematic literature search was done using standard literature databases, including PubMed, the Health Technology Assessment Database, and the National Health Service Economic Evaluation Database. Additionally, a free web-based search was conducted using the same search terms utilized in the systematic database search. The clinical background and basis of the analysis, health economic design, and results were extracted from the papers finally selected. The Drummond checklist was used to validate the quality of health economic modeling studies and the AMSTAR (A Measurement Tool to Assess Systematic Reviews) checklist was used for published systematic reviews. Fifty-three papers were identified and obtained via PubMed, or directly via journal webpages for further assessment. Thirty-two papers were finally included in a thorough data extraction procedure, including those identified by a "gray literature search" utilizing the Google search engine and cross-reference searches. Results regarding content of the studies showed that malnutrition was the underlying clinical condition in most cases (32%). In addition, gastrointestinal disorders (eg, surgery, cancer) were often analyzed. In terms of settings, 56% of papers covered inpatients, whereas 14 papers (44%) captured outpatients, including patients in community centers. Interestingly, in comparison with the papers identified overall, very few health economic models were found. Most of the articles were modeling analyses and economic trials in different design settings. Overall, only eight health economic models were published and were validated applying the Drummond checklist. In summary, most of the models included were carried out to quite a high standard, although some areas were identified for further improvement. Of the two systematic health economic reviews identified, one achieved the highest quality score when applying the AMSTAR checklist. The reasons for finding only a few modeling studies but quite a large number of clinical trials with health economic endpoints, might be different. Until recently, health economics has not been required for reimbursement or coverage decisions concerning medical nutrition interventions. Further, there might be specifics of medical nutrition which might not allow easy modeling and consequently explain the limited uptake so far. The health economic data on medical nutrition generated and published is quite ample. However, it has been primarily based on database analysis and clinical studies. Only a few modeling analyses have been carried out, indicating a need for further research to understand the specifics of medical nutrition and their applicability for health economic modeling.
Li, Haiquan; Dai, Xinbin; Zhao, Xuechun
2008-05-01
Membrane transport proteins play a crucial role in the import and export of ions, small molecules or macromolecules across biological membranes. Currently, there are a limited number of published computational tools which enable the systematic discovery and categorization of transporters prior to costly experimental validation. To approach this problem, we utilized a nearest neighbor method which seamlessly integrates homologous search and topological analysis into a machine-learning framework. Our approach satisfactorily distinguished 484 transporter families in the Transporter Classification Database, a curated and representative database for transporters. A five-fold cross-validation on the database achieved a positive classification rate of 72.3% on average. Furthermore, this method successfully detected transporters in seven model and four non-model organisms, ranging from archaean to mammalian species. A preliminary literature-based validation has cross-validated 65.8% of our predictions on the 11 organisms, including 55.9% of our predictions overlapping with 83.6% of the predicted transporters in TransportDB.
Direct hydrocarbon identification using AVO analysis in the Malay Basin
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lye, Y.C.; Yaacob, M.R.; Birkett, N.E.
1994-07-01
Esso Production Malaysia Inc. and Petronas Carigali Sdn. Bhd. have been conducting AVO (amplitude versus offset) processing and interpretation since April 1991 in an attempt to identify hydrocarbon fluids predrill. The major part of this effort was in the PM-5, PM-8, and PM-9 contract areas where an extensive exploration program is underway. To date, more than 1000 km of seismic data have been analyzed using the AVO technique, and the results were used to support the drilling of more than 50 exploration and delineation wells. Gather modeling of well data was used to calibrate and predict the presence of hydrocarbonmore » in proposed well locations. In order to generate accurate gather models, a geophysical properties and AVO database was needed, and great effort was spent in producing an accurate and complete database. This database is continuously being updated so that an experience file can be built to further improve the reliability of the AVO prediction.« less
The BaMM web server for de-novo motif discovery and regulatory sequence analysis.
Kiesel, Anja; Roth, Christian; Ge, Wanwan; Wess, Maximilian; Meier, Markus; Söding, Johannes
2018-05-28
The BaMM web server offers four tools: (i) de-novo discovery of enriched motifs in a set of nucleotide sequences, (ii) scanning a set of nucleotide sequences with motifs to find motif occurrences, (iii) searching with an input motif for similar motifs in our BaMM database with motifs for >1000 transcription factors, trained from the GTRD ChIP-seq database and (iv) browsing and keyword searching the motif database. In contrast to most other servers, we represent sequence motifs not by position weight matrices (PWMs) but by Bayesian Markov Models (BaMMs) of order 4, which we showed previously to perform substantially better in ROC analyses than PWMs or first order models. To address the inadequacy of P- and E-values as measures of motif quality, we introduce the AvRec score, the average recall over the TP-to-FP ratio between 1 and 100. The BaMM server is freely accessible without registration at https://bammmotif.mpibpc.mpg.de.
Non-Coding RNA Analysis Using the Rfam Database.
Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I
2018-06-01
Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Introduction to the DISRUPT postprandial database: subjects, studies and methodologies.
Jackson, Kim G; Clarke, Dave T; Murray, Peter; Lovegrove, Julie A; O'Malley, Brendan; Minihane, Anne M; Williams, Christine M
2010-03-01
Dysregulation of lipid and glucose metabolism in the postprandial state are recognised as important risk factors for the development of cardiovascular disease and type 2 diabetes. Our objective was to create a comprehensive, standardised database of postprandial studies to provide insights into the physiological factors that influence postprandial lipid and glucose responses. Data were collated from subjects (n = 467) taking part in single and sequential meal postprandial studies conducted by researchers at the University of Reading, to form the DISRUPT (DIetary Studies: Reading Unilever Postprandial Trials) database. Subject attributes including age, gender, genotype, menopausal status, body mass index, blood pressure and a fasting biochemical profile, together with postprandial measurements of triacylglycerol (TAG), non-esterified fatty acids, glucose, insulin and TAG-rich lipoprotein composition are recorded. A particular strength of the studies is the frequency of blood sampling, with on average 10-13 blood samples taken during each postprandial assessment, and the fact that identical test meal protocols were used in a number of studies, allowing pooling of data to increase statistical power. The DISRUPT database is the most comprehensive postprandial metabolism database that exists worldwide and preliminary analysis of the pooled sequential meal postprandial dataset has revealed both confirmatory and novel observations with respect to the impact of gender and age on the postprandial TAG response. Further analysis of the dataset using conventional statistical techniques along with integrated mathematical models and clustering analysis will provide a unique opportunity to greatly expand current knowledge of the aetiology of inter-individual variability in postprandial lipid and glucose responses.
Remote collection and analysis of witness reports on flash floods
NASA Astrophysics Data System (ADS)
Gourley, Jonathan; Erlingis, Jessica; Smith, Travis; Ortega, Kiel; Hong, Yang
2010-05-01
Typically, flash floods are studied ex post facto in response to a major impact event. A complement to field investigations is developing a detailed database of flash flood events, including minor events and null reports (i.e., where heavy rain occurred but there was no flash flooding), based on public survey questions conducted in near-real time. The Severe Hazards Analysis and Verification Experiment (SHAVE) has been in operation at the National Severe Storms Laboratory (NSSL) in Norman, OK, USA during the summers since 2006. The experiment employs undergraduate students to analyse real-time products from weather radars, target specific regions within the conterminous US, and poll public residences and businesses regarding the occurrence and severity of hail, wind, tornadoes, and now flash floods. In addition to providing a rich learning experience for students, SHAVE has been successful in creating high-resolution datasets of severe hazards used for algorithm and model verification. This talk describes the criteria used to initiate the flash flood survey, the specific questions asked and information entered to the database, and then provides an analysis of results for flash flood data collected during the summer of 2008. It is envisioned that specific details provided by the SHAVE flash flood observation database will complement databases collected by operational agencies and thus lead to better tools to predict the likelihood of flash floods and ultimately reduce their impacts on society.
NASA Astrophysics Data System (ADS)
Wilcox, William Edward, Jr.
1995-01-01
A computer program (LIDAR-PC) and associated atmospheric spectral databases have been developed which accurately simulate the laser remote sensing of the atmosphere and the system performance of a direct-detection Lidar or tunable Differential Absorption Lidar (DIAL) system. This simulation program allows, for the first time, the use of several different large atmospheric spectral databases to be coupled with Lidar parameter simulations on the same computer platform to provide a real-time, interactive, and easy to use design tool for atmospheric Lidar simulation and modeling. LIDAR -PC has been used for a range of different Lidar simulations and compared to experimental Lidar data. In general, the simulations agreed very well with the experimental measurements. In addition, the simulation offered, for the first time, the analysis and comparison of experimental Lidar data to easily determine the range-resolved attenuation coefficient of the atmosphere and the effect of telescope overlap factor. The software and databases operate on an IBM-PC or compatible computer platform, and thus are very useful to the research community for Lidar analysis. The complete Lidar and atmospheric spectral transmission modeling program uses the HITRAN database for high-resolution molecular absorption lines of the atmosphere, the BACKSCAT/LOWTRAN computer databases and models for the effects of aerosol and cloud backscatter and attenuation, and the range-resolved Lidar equation. The program can calculate the Lidar backscattered signal-to-noise for a slant path geometry from space and simulate the effect of high resolution, tunable, single frequency, and moderate line width lasers on the Lidar/DIAL signal. The program was used to model and analyze the experimental Lidar data obtained from several measurements. A fixed wavelength, Ho:YSGG aerosol Lidar (Sugimoto, 1990) developed at USF and a tunable Ho:YSGG DIAL system (Cha, 1991) for measuring atmospheric water vapor at 2.1 μm were analyzed. The simulations agreed very well with the measurements, and also yielded, for the first time, the ability to easily deduce the atmospheric attentuation coefficient, alpha, from the Lidar data. Simulations and analysis of other Lidar measurements included that of a 1.57 mu m OPO aerosol Lidar system developed at USF (Harrell, 1995) and of the NASA LITE (Laser-in-Space Technology Experiment) Lidar recently flown in the Space shuttle. Finally, an extensive series of laboratory experiments were made with the 1.57 μm OPO Lidar system to test calculations of the telescope/laser overlap and the effect of different telescope sizes and designs. The simulations agreed well with the experimental data for the telescope diameter and central obscuration test cases. The LIDAR-PC programs are available on the Internet from the USAF Lidar Home Page Web site, http://www.cas.usf.edu/physics/lidar.html/.
dSED: A database tool for modeling sediment early diagenesis
NASA Astrophysics Data System (ADS)
Katsev, S.; Rancourt, D. G.; L'Heureux, I.
2003-04-01
Sediment early diagenesis reaction transport models (RTMs) are becoming powerful tools in providing kinetic descriptions of the metal and nutrient diagenetic cycling in marine, lacustrine, estuarine, and other aquatic sediments, as well as of exchanges with the water column. Whereas there exist several good database/program combinations for thermodynamic equilibrium calculations in aqueous systems, at present there exist no database tools for classification and analysis of the kinetic data essential to RTM development. We present a database tool that is intended to serve as an online resource for information about chemical reactions, solid phase and solute reactants, sorption reactions, transport mechanisms, and kinetic and equilibrium parameters that are relevant to sediment diagenesis processes. The list of reactive substances includes but is not limited to organic matter, Fe and Mn oxides and oxyhydroxides, sulfides and sulfates, calcium, iron, and manganese carbonates, phosphorus-bearing minerals, and silicates. Aqueous phases include dissolved carbon dioxide, oxygen, methane, hydrogen sulfide, sulfate, nitrate, phosphate, some organic compounds, and dissolved metal species. A number of filters allow extracting information according to user-specified criteria, e.g., about a class of substances contributing to the cycling of iron. The database also includes bibliographic information about published diagenetic models and the reactions and processes that they consider. At the time of preparing this abstract, dSED contained 128 reactions and 12 pre-defined filters. dSED is maintained by the Lake Sediment Structure and Evolution (LSSE) group at the University of Ottawa (www.science.uottawa.ca/LSSE/dSED) and we invite input from the geochemical community.
Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R
2003-01-01
Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545
The composite load spectra project
NASA Technical Reports Server (NTRS)
Newell, J. F.; Ho, H.; Kurth, R. E.
1990-01-01
Probabilistic methods and generic load models capable of simulating the load spectra that are induced in space propulsion system components are being developed. Four engine component types (the transfer ducts, the turbine blades, the liquid oxygen posts and the turbopump oxidizer discharge duct) were selected as representative hardware examples. The composite load spectra that simulate the probabilistic loads for these components are typically used as the input loads for a probabilistic structural analysis. The knowledge-based system approach used for the composite load spectra project provides an ideal environment for incremental development. The intelligent database paradigm employed in developing the expert system provides a smooth coupling between the numerical processing and the symbolic (information) processing. Large volumes of engine load information and engineering data are stored in database format and managed by a database management system. Numerical procedures for probabilistic load simulation and database management functions are controlled by rule modules. Rules were hard-wired as decision trees into rule modules to perform process control tasks. There are modules to retrieve load information and models. There are modules to select loads and models to carry out quick load calculations or make an input file for full duty-cycle time dependent load simulation. The composite load spectra load expert system implemented today is capable of performing intelligent rocket engine load spectra simulation. Further development of the expert system will provide tutorial capability for users to learn from it.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Sharon W. Woudenberg; Barbara L. Conkling; Barbara M. O' Connell; Elizabeth B. LaPoint; Jeffery A. Turner; Karen L. Waddell
2010-01-01
This document is based on previous documentation of the nationally standardized Forest Inventory and Analysis database (Hansen and others 1992; Woudenberg and Farrenkopf 1995; Miles and others 2001). Documentation of the structure of the Forest Inventory and Analysis database (FIADB) for Phase 2 data, as well as codes and definitions, is provided. Examples for...
Flight testing and frequency domain analysis for rotorcraft handling qualities characteristics
NASA Technical Reports Server (NTRS)
Ham, Johnnie A.; Gardner, Charles K.; Tischler, Mark B.
1993-01-01
A demonstration of frequency domain flight testing techniques and analyses was performed on a U.S. Army OH-58D helicopter in support of the OH-58D Airworthiness and Flight Characteristics Evaluation and the Army's development and ongoing review of Aeronautical Design Standard 33C, Handling Qualities Requirements for Military Rotorcraft. Hover and forward flight (60 knots) tests were conducted in 1 flight hour by Army experimental test pilots. Further processing of the hover data generated a complete database of velocity, angular rate, and acceleration frequency responses to control inputs. A joint effort was then undertaken by the Airworthiness Qualification Test Directorate (AQTD) and the U.S. Army Aeroflightdynamics Directorate (AFDD) to derive handling qualities information from the frequency response database. A significant amount of information could be extracted from the frequency domain database using a variety of approaches. This report documents numerous results that have been obtained from the simple frequency domain tests; in many areas, these results provide more insight into the aircraft dynamics that affect handling qualities than to traditional flight tests. The handling qualities results include ADS-33C bandwidth and phase delay calculations, vibration spectral determinations, transfer function models to examine single axis results, and a six degree of freedom fully coupled state space model. The ability of this model to accurately predict aircraft responses was verified using data from pulse inputs. This report also documents the frequency-sweep flight test technique and data analysis used to support the tests.
NASA Astrophysics Data System (ADS)
Chen, Pengfei; Jing, Qi
2017-02-01
An assumption that the non-linear method is more reasonable than the linear method when canopy reflectance is used to establish the yield prediction model was proposed and tested in this study. For this purpose, partial least squares regression (PLSR) and artificial neural networks (ANN), represented linear and non-linear analysis method, were applied and compared for wheat yield prediction. Multi-period Landsat-8 OLI images were collected at two different wheat growth stages, and a field campaign was conducted to obtain grain yields at selected sampling sites in 2014. The field data were divided into a calibration database and a testing database. Using calibration data, a cross-validation concept was introduced for the PLSR and ANN model construction to prevent over-fitting. All models were tested using the test data. The ANN yield-prediction model produced R2, RMSE and RMSE% values of 0.61, 979 kg ha-1, and 10.38%, respectively, in the testing phase, performing better than the PLSR yield-prediction model, which produced R2, RMSE, and RMSE% values of 0.39, 1211 kg ha-1, and 12.84%, respectively. Non-linear method was suggested as a better method for yield prediction.
Estimating Biofuel Feedstock Water Footprints Using System Dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inman, Daniel; Warner, Ethan; Stright, Dana
Increased biofuel production has prompted concerns about the environmental tradeoffs of biofuels compared to petroleum-based fuels. Biofuel production in general, and feedstock production in particular, is under increased scrutiny. Water footprinting (measuring direct and indirect water use) has been proposed as one measure to evaluate water use in the context of concerns about depleting rural water supplies through activities such as irrigation for large-scale agriculture. Water footprinting literature has often been limited in one or more key aspects: complete assessment across multiple water stocks (e.g., vadose zone, surface, and ground water stocks), geographical resolution of data, consistent representation of manymore » feedstocks, and flexibility to perform scenario analysis. We developed a model called BioSpatial H2O using a system dynamics modeling and database framework. BioSpatial H2O could be used to consistently evaluate the complete water footprints of multiple biomass feedstocks at high geospatial resolutions. BioSpatial H2O has the flexibility to perform simultaneous scenario analysis of current and potential future crops under alternative yield and climate conditions. In this proof-of-concept paper, we modeled corn grain (Zea mays L.) and soybeans (Glycine max) under current conditions as illustrative results. BioSpatial H2O links to a unique database that houses annual spatially explicit climate, soil, and plant physiological data. Parameters from the database are used as inputs to our system dynamics model for estimating annual crop water requirements using daily time steps. Based on our review of the literature, estimated green water footprints are comparable to other modeled results, suggesting that BioSpatial H2O is computationally sound for future scenario analysis. Our modeling framework builds on previous water use analyses to provide a platform for scenario-based assessment. BioSpatial H2O's system dynamics is a flexible and user-friendly interface for on-demand, spatially explicit, water use scenario analysis for many US agricultural crops. Built-in controls permit users to quickly make modifications to the model assumptions, such as those affecting yield, and to see the implications of those results in real time. BioSpatial H2O's dynamic capabilities and adjustable climate data allow for analyses of water use and management scenarios to inform current and potential future bioenergy policies. The model could also be adapted for scenario analysis of alternative climatic conditions and comparison of multiple crops. The results of such an analysis would help identify risks associated with water use competition among feedstocks in certain regions. Results could also inform research and development efforts that seek to reduce water-related risks of biofuel pathways.« less
Characterizing the genetic structure of a forensic DNA database using a latent variable approach.
Kruijver, Maarten
2016-07-01
Several problems in forensic genetics require a representative model of a forensic DNA database. Obtaining an accurate representation of the offender database can be difficult, since databases typically contain groups of persons with unregistered ethnic origins in unknown proportions. We propose to estimate the allele frequencies of the subpopulations comprising the offender database and their proportions from the database itself using a latent variable approach. We present a model for which parameters can be estimated using the expectation maximization (EM) algorithm. This approach does not rely on relatively small and possibly unrepresentative population surveys, but is driven by the actual genetic composition of the database only. We fit the model to a snapshot of the Dutch offender database (2014), which contains close to 180,000 profiles, and find that three subpopulations suffice to describe a large fraction of the heterogeneity in the database. We demonstrate the utility and reliability of the approach with three applications. First, we use the model to predict the number of false leads obtained in database searches. We assess how well the model predicts the number of false leads obtained in mock searches in the Dutch offender database, both for the case of familial searching for first degree relatives of a donor and searching for contributors to three-person mixtures. Second, we study the degree of partial matching between all pairs of profiles in the Dutch database and compare this to what is predicted using the latent variable approach. Third, we use the model to provide evidence to support that the Dutch practice of estimating match probabilities using the Balding-Nichols formula with a native Dutch reference database and θ=0.03 is conservative. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models
Aziz, Ramy K.; Devoid, Scott; Disz, Terrence; Edwards, Robert A.; Henry, Christopher S.; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Stevens, Rick L.; Vonstein, Veronika; Xia, Fangfang
2012-01-01
The remarkable advance in sequencing technology and the rising interest in medical and environmental microbiology, biotechnology, and synthetic biology resulted in a deluge of published microbial genomes. Yet, genome annotation, comparison, and modeling remain a major bottleneck to the translation of sequence information into biological knowledge, hence computational analysis tools are continuously being developed for rapid genome annotation and interpretation. Among the earliest, most comprehensive resources for prokaryotic genome analysis, the SEED project, initiated in 2003 as an integration of genomic data and analysis tools, now contains >5,000 complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, a derived set of protein families, and hundreds of genome-scale metabolic models. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. To allow high-performance remote access to the SEED database, we developed the SEED Servers (http://www.theseed.org/servers): four network-based servers intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. The SEED servers offer open access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. This work offers and supports a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and programmatic access to the SEED data will provide significant utility to a broad collection of potential users. PMID:23110173
Multi-Sensor Scene Synthesis and Analysis
1981-09-01
Quad Trees for Image Representation and Processing ...... ... 126 2.6.2 Databases ..... ..... ... ..... ... ..... ..... 138 2.6.2.1 Definitions and...Basic Concepts ....... 138 2.6.3 Use of Databases in Hierarchical Scene Analysis ...... ... ..................... 147 2.6.4 Use of Relational Tables...Multisensor Image Database Systems (MIDAS) . 161 2.7.2 Relational Database System for Pictures .... ..... 168 2.7.3 Relational Pictorial Database
Hammond, Davyda; Conlon, Kathryn; Barzyk, Timothy; Chahine, Teresa; Zartarian, Valerie; Schultz, Brad
2011-03-01
Communities are concerned over pollution levels and seek methods to systematically identify and prioritize the environmental stressors in their communities. Geographic information system (GIS) maps of environmental information can be useful tools for communities in their assessment of environmental-pollution-related risks. Databases and mapping tools that supply community-level estimates of ambient concentrations of hazardous pollutants, risk, and potential health impacts can provide relevant information for communities to understand, identify, and prioritize potential exposures and risk from multiple sources. An assessment of existing databases and mapping tools was conducted as part of this study to explore the utility of publicly available databases, and three of these databases were selected for use in a community-level GIS mapping application. Queried data from the U.S. EPA's National-Scale Air Toxics Assessment, Air Quality System, and National Emissions Inventory were mapped at the appropriate spatial and temporal resolutions for identifying risks of exposure to air pollutants in two communities. The maps combine monitored and model-simulated pollutant and health risk estimates, along with local survey results, to assist communities with the identification of potential exposure sources and pollution hot spots. Findings from this case study analysis will provide information to advance the development of new tools to assist communities with environmental risk assessments and hazard prioritization. © 2010 Society for Risk Analysis.
Wind Energy Conversion System Analysis Model (WECSAM) computer program documentation
NASA Astrophysics Data System (ADS)
Downey, W. T.; Hendrick, P. L.
1982-07-01
Described is a computer-based wind energy conversion system analysis model (WECSAM) developed to predict the technical and economic performance of wind energy conversion systems (WECS). The model is written in CDC FORTRAN V. The version described accesses a data base containing wind resource data, application loads, WECS performance characteristics, utility rates, state taxes, and state subsidies for a six state region (Minnesota, Michigan, Wisconsin, Illinois, Ohio, and Indiana). The model is designed for analysis at the county level. The computer model includes a technical performance module and an economic evaluation module. The modules can be run separately or together. The model can be run for any single user-selected county within the region or looped automatically through all counties within the region. In addition, the model has a restart capability that allows the user to modify any data-base value written to a scratch file prior to the technical or economic evaluation.
Big Data and Total Hip Arthroplasty: How Do Large Databases Compare?
Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael A; Lux, Nathan R; Bozic, Kevin J; Callaghan, John J
2018-01-01
Use of large databases for orthopedic research has become extremely popular in recent years. Each database varies in the methods used to capture data and the population it represents. The purpose of this study was to evaluate how these databases differed in reported demographics, comorbidities, and postoperative complications for primary total hip arthroplasty (THA) patients. Primary THA patients were identified within National Surgical Quality Improvement Programs (NSQIP), Nationwide Inpatient Sample (NIS), Medicare Standard Analytic Files (MED), and Humana administrative claims database (HAC). NSQIP definitions for comorbidities and complications were matched to corresponding International Classification of Diseases, 9th Revision/Current Procedural Terminology codes to query the other databases. Demographics, comorbidities, and postoperative complications were compared. The number of patients from each database was 22,644 in HAC, 371,715 in MED, 188,779 in NIS, and 27,818 in NSQIP. Age and gender distribution were clinically similar. Overall, there was variation in prevalence of comorbidities and rates of postoperative complications between databases. As an example, NSQIP had more than twice the obesity than NIS. HAC and MED had more than 2 times the diabetics than NSQIP. Rates of deep infection and stroke 30 days after THA had more than 2-fold difference between all databases. Among databases commonly used in orthopedic research, there is considerable variation in complication rates following THA depending upon the database used for analysis. It is important to consider these differences when critically evaluating database research. Additionally, with the advent of bundled payments, these differences must be considered in risk adjustment models. Copyright © 2017 Elsevier Inc. All rights reserved.
Todd, Thomas; Dunn, Natalie; Xiang, Zuoshuang; He, Yongqun
2016-01-01
Animal models are indispensable for vaccine research and development. However, choosing which species to use and designing a vaccine study that is optimized for that species is often challenging. Vaxar (http://www.violinet.org/vaxar/) is a web-based database and analysis system that stores manually curated data regarding vaccine-induced responses in animals. To date, Vaxar encompasses models from 35 animal species including rodents, rabbits, ferrets, primates, and birds. These 35 species have been used to study more than 1300 experimentally tested vaccines for 164 pathogens and diseases significant to humans and domestic animals. The responses to vaccines by animals in more than 1500 experimental studies are recorded in Vaxar; these data can be used for systematic meta-analysis of various animal responses to a particular vaccine. For example, several variables, including animal strain, animal age, and the dose or route of either vaccination or challenge, might affect host response outcomes. Vaxar can also be used to identify variables that affect responses to different vaccines in a specific animal model. All data stored in Vaxar are publically available for web-based queries and analyses. Overall Vaxar provides a unique systematic approach for understanding vaccine-induced host immunity. PMID:27053566
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
DOE Office of Scientific and Technical Information (OSTI.GOV)
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
CTLA-4 and MDR1 polymorphisms increase the risk for ulcerative colitis: A meta-analysis
Zhao, Jia-Jun; Wang, Di; Yao, Hui; Sun, Da-Wei; Li, Hong-Yu
2015-01-01
AIM: To evaluate the correlations between cytotoxic T lymphocyte-associated antigen-4 (CTLA-4) and multi-drug resistance 1 (MDR1) genes polymorphisms with ulcerative colitis (UC) risk. METHODS: PubMed, EMBASE, Web of Science, Cochrane Library, CBM databases, Springerlink, Wiley, EBSCO, Ovid, Wanfang database, VIP database, China National Knowledge Infrastructure, and Weipu Journal databases were exhaustively searched using combinations of keywords relating to CTLA-4, MDR1 and UC. The published studies were filtered using our stringent inclusion and exclusion criteria, the quality assessment for each eligible study was conducted using Critical Appraisal Skill Program and the resultant high-quality data from final selected studies were analyzed using Comprehensive Meta-analysis 2.0 (CMA 2.0) software. The correlations between SNPs of CTLA-4 gene, MDR1 gene and the risk of UC were evaluated by OR at 95%CI. Z test was carried out to evaluate the significance of overall effect values. Cochran’s Q-statistic and I2 tests were applied to quantify heterogeneity among studies. Funnel plots, classic fail-safe N and Egger’s linear regression test were inspected for indication of publication bias. RESULTS: A total of 107 studies were initially retrieved and 12 studies were eventually selected for meta-analysis. These 12 case-control studies involved 1860 UC patients and 2663 healthy controls. Our major result revealed that single nucleotide polymorphisms (SNPs) of CTLA-4 gene rs3087243 G > A and rs231775 G > A may increase the risk of UC (rs3087243 G > A: allele model: OR = 1.365, 95%CI: 1.023-1.822, P = 0.035; dominant model: OR = 1.569, 95%CI: 1.269-1.940, P < 0.001; rs231775 G > A: allele model: OR = 1.583, 95%CI: = 1.306-1.918, P < 0.001; dominant model: OR = 1.805, 95%CI: 1.393-2.340, P < 0.001). In addition, based on our result, SNPs of MDR1 gene rs1045642 C > T might also confer a significant increases for the risk of UC (allele model: OR = 1.389, 95%CI: 1.214-1.590, P < 0.001; dominant model: OR = 1.518, 95%CI: 1.222-1.886, P < 0.001). CONCLUSION: CTLA-4 gene rs3087243 G > A and rs231775 G > A, and MDR1 gene rs1045642 C > T might confer an increase for UC risk. PMID:26379408
Huala, Eva; Dickerman, Allan W.; Garcia-Hernandez, Margarita; Weems, Danforth; Reiser, Leonore; LaFond, Frank; Hanley, David; Kiphart, Donald; Zhuang, Mingzhe; Huang, Wen; Mueller, Lukas A.; Bhattacharyya, Debika; Bhaya, Devaki; Sobral, Bruno W.; Beavis, William; Meinke, David W.; Town, Christopher D.; Somerville, Chris; Rhee, Seung Yon
2001-01-01
Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org). PMID:11125061
NASA Astrophysics Data System (ADS)
Pecoraro, Gaetano; Calvello, Michele
2017-04-01
In Italy rainfall-induced landslides pose a significant and widespread hazard, resulting in a large number of casualties and enormous economic damages. Mitigation of such a diffuse risk cannot be attained with structural measures only. With respect to the risk to life, early warning systems represent a viable and useful tool for landslide risk mitigation over wide areas. Inventories of rainfall-induced landslides are critical to support investigations of where and when landslides have happened and may occur in the future, i.e. to establish reliable correlations between rainfall characteristics and landslide occurrences. In this work a parametric study has been conducted to evaluate the performance of correlation models between rainfall and landslides over the Italian territory using the "FraneItalia" database, an inventory of landslides retrieved from online Italian journalistic news. The information reported for each record of this database always include: the site of occurrence of the landslides, the date of occurrence, the source of the news. Multiple landslides occurring in the same date, within the same province or region, are inventoried together in one single record of the database, in this case also reporting the number of landslides of the event. Each record the database may also include, if the related information is available: hour of occurrence; typology, volume and material of the landslide; activity phase; effects on people, structures, infrastructures, cars or other elements. The database currently contains six complete years of data (2010-2015), including more than 4000 landslide reports, most of them triggered by rainfall. For the aim of this study, different rainfall-landslides correlation models have been tested by analysing the reported landslides, within all the 144 zones identified by the national civil protection for weather-related warnings in Italy, in relation to satellite-based precipitations estimates from the Global Precipitation Measurement (GPM) NASA mission. This remote sensing database contains gridded precipitation and precipitation-error estimates, with a half-hour temporal resolution and a 0.10-degree spatial resolution, covering most of the earth starting from 2014. It is well known that satellite estimates of rainfall have some limitations in resolving specific rainfall features (e.g., shallow orographic events and short-duration, high-intensity events), yet the temporal and spatial accuracy of the GPM data may be considered adequate in relation to the scale of the analysis and the size of the warning zones used for this study. The results of the parametric analysis conducted herein, although providing some indications on the most relevant rainfall conditions leading to widespread landsliding over a warning zone, must be considered preliminary as they show a very heterogeneous behaviour of the employed rainfall-based warning models over the Italian territory. Nevertheless, they clearly show the strong potential of the continuous multi-year landslide records available from the "FraneItalia" database as an important source of information to evaluate the performance of warning models at regional scale throughout Italy.
Geographic information system/watershed model interface
Fisher, Gary T.
1989-01-01
Geographic information systems allow for the interactive analysis of spatial data related to water-resources investigations. A conceptual design for an interface between a geographic information system and a watershed model includes functions for the estimation of model parameter values. Design criteria include ease of use, minimal equipment requirements, a generic data-base management system, and use of a macro language. An application is demonstrated for a 90.1-square-kilometer subbasin of the Patuxent River near Unity, Maryland, that performs automated derivation of watershed parameters for hydrologic modeling.
NASA Astrophysics Data System (ADS)
Singh, Shailesh Kumar; Zammit, Christian; Hreinsson, Einar; Woods, Ross; Clark, Martyn; Hamlet, Alan
2013-04-01
Increased access to water is a key pillar of the New Zealand government plan for economic growths. Variable climatic conditions coupled with market drivers and increased demand on water resource result in critical decision made by water managers based on climate and streamflow forecast. Because many of these decisions have serious economic implications, accurate forecast of climate and streamflow are of paramount importance (eg irrigated agriculture and electricity generation). New Zealand currently does not have a centralized, comprehensive, and state-of-the-art system in place for providing operational seasonal to interannual streamflow forecasts to guide water resources management decisions. As a pilot effort, we implement and evaluate an experimental ensemble streamflow forecasting system for the Waitaki and Rangitata River basins on New Zealand's South Island using a hydrologic simulation model (TopNet) and the familiar ensemble streamflow prediction (ESP) paradigm for estimating forecast uncertainty. To provide a comprehensive database for evaluation of the forecasting system, first a set of retrospective model states simulated by the hydrologic model on the first day of each month were archived from 1972-2009. Then, using the hydrologic simulation model, each of these historical model states was paired with the retrospective temperature and precipitation time series from each historical water year to create a database of retrospective hindcasts. Using the resulting database, the relative importance of initial state variables (such as soil moisture and snowpack) as fundamental drivers of uncertainties in forecasts were evaluated for different seasons and lead times. The analysis indicate that the sensitivity of flow forecast to initial condition uncertainty is depend on the hydrological regime and season of forecast. However initial conditions do not have a large impact on seasonal flow uncertainties for snow dominated catchments. Further analysis indicates that this result is valid when the hindcast database is conditioned by ENSO classification. As a result hydrological forecasts based on ESP technique, where present initial conditions with histological forcing data are used may be plausible for New Zealand catchments.
Analysis of model development strategies: predicting ventral hernia recurrence.
Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K
2016-11-01
There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.
CicerTransDB 1.0: a resource for expression and functional study of chickpea transcription factors.
Gayali, Saurabh; Acharya, Shankar; Lande, Nilesh Vikram; Pandey, Aarti; Chakraborty, Subhra; Chakraborty, Niranjan
2016-07-29
Transcription factor (TF) databases are major resource for systematic studies of TFs in specific species as well as related family members. Even though there are several publicly available multi-species databases, the information on the amount and diversity of TFs within individual species is fragmented, especially for newly sequenced genomes of non-model species of agricultural significance. We constructed CicerTransDB (Cicer Transcription Factor Database), the first database of its kind, which would provide a centralized putatively complete list of TFs in a food legume, chickpea. CicerTransDB, available at www.cicertransdb.esy.es , is based on chickpea (Cicer arietinum L.) annotation v 1.0. The database is an outcome of genome-wide domain study and manual classification of TF families. This database not only provides information of the gene, but also gene ontology, domain and motif architecture. CicerTransDB v 1.0 comprises information of 1124 genes of chickpea and enables the user to not only search, browse and download sequences but also retrieve sequence features. CicerTransDB also provides several single click interfaces, transconnecting to various other databases to ease further analysis. Several webAPI(s) integrated in the database allow end-users direct access of data. A critical comparison of CicerTransDB with PlantTFDB (Plant Transcription Factor Database) revealed 68 novel TFs in the chickpea genome, hitherto unexplored. Database URL: http://www.cicertransdb.esy.es.
Machine learning for medical images analysis.
Criminisi, A
2016-10-01
This article discusses the application of machine learning for the analysis of medical images. Specifically: (i) We show how a special type of learning models can be thought of as automatically optimized, hierarchically-structured, rule-based algorithms, and (ii) We discuss how the issue of collecting large labelled datasets applies to both conventional algorithms as well as machine learning techniques. The size of the training database is a function of model complexity rather than a characteristic of machine learning methods. Crown Copyright © 2016. Published by Elsevier B.V. All rights reserved.
Burnett, Leslie; Barlow-Stewart, Kris; Proos, Anné L; Aizenberg, Harry
2003-05-01
This article describes a generic model for access to samples and information in human genetic databases. The model utilises a "GeneTrustee", a third-party intermediary independent of the subjects and of the investigators or database custodians. The GeneTrustee model has been implemented successfully in various community genetics screening programs and has facilitated research access to genetic databases while protecting the privacy and confidentiality of research subjects. The GeneTrustee model could also be applied to various types of non-conventional genetic databases, including neonatal screening Guthrie card collections, and to forensic DNA samples.
Examples of finite element mesh generation using SDRC IDEAS
NASA Technical Reports Server (NTRS)
Zapp, John; Volakis, John L.
1990-01-01
IDEAS (Integrated Design Engineering Analysis Software) offers a comprehensive package for mechanical design engineers. Due to its multifaceted capabilities, however, it can be manipulated to serve the needs of electrical engineers, also. IDEAS can be used to perform the following tasks: system modeling, system assembly, kinematics, finite element pre/post processing, finite element solution, system dynamics, drafting, test data analysis, and project relational database.
Nerve growth factor for Bell’s palsy: A meta-analysis
SU, YIPENG; DONG, XIAOMENG; LIU, JUAN; HU, YAOZHI; CHEN, JINBO
2015-01-01
A meta-analysis was performed to evaluate the efficacy and safety of nerve growth factor (NGF) in the treatment of Bell’s palsy. PubMed, the Cochrane Central Register of Controlled Trials, Embase and a number of Chinese databases, including the China National Knowledge Infrastructure, China Biology Medicine disc, VIP Database for Chinese Technical Periodicals and Wan Fang Data, were used to collect randomised controlled trials (RCTs) of NGF for Bell’s palsy. The span of the search covered data from the date of database establishment until December 2013. The included trials were screened comprehensively and rigorously. The efficacies of NGF were pooled via meta-analysis performed using Review Manager 5.2 software. Odds ratios (ORs) and 95% confidence intervals (CIs) were calculated using the fixed-effects model. The meta-analysis of eight RCTs showed favorable effects of NGF on the disease response rate (n=642; OR, 3.87; 95% CI, 2.13–7.03; P<0.01; I2=0%). However, evidence supporting the effectiveness of NGF for the treatment of Bell’s palsy is limited. The number and quality of trials are too low to form solid conclusions. Further meticulous RCTs are required to overcome the limitations identified in the present study. PMID:25574223
DHLAS: A web-based information system for statistical genetic analysis of HLA population data.
Thriskos, P; Zintzaras, E; Germenis, A
2007-03-01
DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.
Design and Establishment of Quality Model of Fundamental Geographic Information Database
NASA Astrophysics Data System (ADS)
Ma, W.; Zhang, J.; Zhao, Y.; Zhang, P.; Dang, Y.; Zhao, T.
2018-04-01
In order to make the quality evaluation for the Fundamental Geographic Information Databases(FGIDB) more comprehensive, objective and accurate, this paper studies and establishes a quality model of FGIDB, which formed by the standardization of database construction and quality control, the conformity of data set quality and the functionality of database management system, and also designs the overall principles, contents and methods of the quality evaluation for FGIDB, providing the basis and reference for carry out quality control and quality evaluation for FGIDB. This paper designs the quality elements, evaluation items and properties of the Fundamental Geographic Information Database gradually based on the quality model framework. Connected organically, these quality elements and evaluation items constitute the quality model of the Fundamental Geographic Information Database. This model is the foundation for the quality demand stipulation and quality evaluation of the Fundamental Geographic Information Database, and is of great significance on the quality assurance in the design and development stage, the demand formulation in the testing evaluation stage, and the standard system construction for quality evaluation technology of the Fundamental Geographic Information Database.
Flood susceptibility analysis through remote sensing, GIS and frequency ratio model
NASA Astrophysics Data System (ADS)
Samanta, Sailesh; Pal, Dilip Kumar; Palsamanta, Babita
2018-05-01
Papua New Guinea (PNG) is saddled with frequent natural disasters like earthquake, volcanic eruption, landslide, drought, flood etc. Flood, as a hydrological disaster to humankind's niche brings about a powerful and often sudden, pernicious change in the surface distribution of water on land, while the benevolence of flood manifests in restoring the health of the thalweg from excessive siltation by redistributing the fertile sediments on the riverine floodplains. In respect to social, economic and environmental perspective, flood is one of the most devastating disasters in PNG. This research was conducted to investigate the usefulness of remote sensing, geographic information system and the frequency ratio (FR) for flood susceptibility mapping. FR model was used to handle different independent variables via weighted-based bivariate probability values to generate a plausible flood susceptibility map. This study was conducted in the Markham riverine precinct under Morobe province in PNG. A historical flood inventory database of PNG resource information system (PNGRIS) was used to generate 143 flood locations based on "create fishnet" analysis. 100 (70%) flood sample locations were selected randomly for model building. Ten independent variables, namely land use/land cover, elevation, slope, topographic wetness index, surface runoff, landform, lithology, distance from the main river, soil texture and soil drainage were used into the FR model for flood vulnerability analysis. Finally, the database was developed for areas vulnerable to flood. The result demonstrated a span of FR values ranging from 2.66 (least flood prone) to 19.02 (most flood prone) for the study area. The developed database was reclassified into five (5) flood vulnerability zones segmenting on the FR values, namely very low (less that 5.0), low (5.0-7.5), moderate (7.5-10.0), high (10.0-12.5) and very high susceptibility (more than 12.5). The result indicated that about 19.4% land area as `very high' and 35.8% as `high' flood vulnerable class. The FR model output was validated with remaining 43 (30%) flood points, where 42 points were marked as correct predictions which evinced an accuracy of 97.7% in prediction. A total of 137292 people are living in those vulnerable zones. The flood susceptibility analysis using this model will be very useful and also an efficient tool to the local government administrators, researchers and planners for devising flood mitigation plans.
BioNetSim: a Petri net-based modeling tool for simulations of biochemical processes.
Gao, Junhui; Li, Li; Wu, Xiaolin; Wei, Dong-Qing
2012-03-01
BioNetSim, a Petri net-based software for modeling and simulating biochemistry processes, is developed, whose design and implement are presented in this paper, including logic construction, real-time access to KEGG (Kyoto Encyclopedia of Genes and Genomes), and BioModel database. Furthermore, glycolysis is simulated as an example of its application. BioNetSim is a helpful tool for researchers to download data, model biological network, and simulate complicated biochemistry processes. Gene regulatory networks, metabolic pathways, signaling pathways, and kinetics of cell interaction are all available in BioNetSim, which makes modeling more efficient and effective. Similar to other Petri net-based softwares, BioNetSim does well in graphic application and mathematic construction. Moreover, it shows several powerful predominances. (1) It creates models in database. (2) It realizes the real-time access to KEGG and BioModel and transfers data to Petri net. (3) It provides qualitative analysis, such as computation of constants. (4) It generates graphs for tracing the concentration of every molecule during the simulation processes.
Ho, Lap; Cheng, Haoxiang; Wang, Jun; Simon, James E; Wu, Qingli; Zhao, Danyue; Carry, Eileen; Ferruzzi, Mario G; Faith, Jeremiah; Valcarcel, Breanna; Hao, Ke; Pasinetti, Giulio M
2018-03-05
The development of a given botanical preparation for eventual clinical application requires extensive, detailed characterizations of the chemical composition, as well as the biological availability, biological activity, and safety profiles of the botanical. These issues are typically addressed using diverse experimental protocols and model systems. Based on this consideration, in this study we established a comprehensive database and analysis framework for the collection, collation, and integrative analysis of diverse, multiscale data sets. Using this framework, we conducted an integrative analysis of heterogeneous data from in vivo and in vitro investigation of a complex bioactive dietary polyphenol-rich preparation (BDPP) and built an integrated network linking data sets generated from this multitude of diverse experimental paradigms. We established a comprehensive database and analysis framework as well as a systematic and logical means to catalogue and collate the diverse array of information gathered, which is securely stored and added to in a standardized manner to enable fast query. We demonstrated the utility of the database in (1) a statistical ranking scheme to prioritize response to treatments and (2) in depth reconstruction of functionality studies. By examination of these data sets, the system allows analytical querying of heterogeneous data and the access of information related to interactions, mechanism of actions, functions, etc., which ultimately provide a global overview of complex biological responses. Collectively, we present an integrative analysis framework that leads to novel insights on the biological activities of a complex botanical such as BDPP that is based on data-driven characterizations of interactions between BDPP-derived phenolic metabolites and their mechanisms of action, as well as synergism and/or potential cancellation of biological functions. Out integrative analytical approach provides novel means for a systematic integrative analysis of heterogeneous data types in the development of complex botanicals such as polyphenols for eventual clinical and translational applications.
Geospatial Database for Strata Objects Based on Land Administration Domain Model (ladm)
NASA Astrophysics Data System (ADS)
Nasorudin, N. N.; Hassan, M. I.; Zulkifli, N. A.; Rahman, A. Abdul
2016-09-01
Recently in our country, the construction of buildings become more complex and it seems that strata objects database becomes more important in registering the real world as people now own and use multilevel of spaces. Furthermore, strata title was increasingly important and need to be well-managed. LADM is a standard model for land administration and it allows integrated 2D and 3D representation of spatial units. LADM also known as ISO 19152. The aim of this paper is to develop a strata objects database using LADM. This paper discusses the current 2D geospatial database and needs for 3D geospatial database in future. This paper also attempts to develop a strata objects database using a standard data model (LADM) and to analyze the developed strata objects database using LADM data model. The current cadastre system in Malaysia includes the strata title is discussed in this paper. The problems in the 2D geospatial database were listed and the needs for 3D geospatial database in future also is discussed. The processes to design a strata objects database are conceptual, logical and physical database design. The strata objects database will allow us to find the information on both non-spatial and spatial strata title information thus shows the location of the strata unit. This development of strata objects database may help to handle the strata title and information.
Long term volcanic hazard analysis in the Canary Islands
NASA Astrophysics Data System (ADS)
Becerril, L.; Galindo, I.; Laín, L.; Llorente, M.; Mancebo, M. J.
2009-04-01
Historic volcanism in Spain is restricted to the Canary Islands, a volcanic archipelago formed by seven volcanic islands. Several historic eruptions have been registered in the last five hundred years. However, and despite the huge amount of citizens and tourist in the archipelago, only a few volcanic hazard studies have been carried out. These studies are mainly focused in the developing of hazard maps in Lanzarote and Tenerife islands, especially for land use planning. The main handicap for these studies in the Canary Islands is the lack of well reported historical eruptions, but also the lack of data such as geochronological, geochemical or structural. In recent years, the use of Geographical Information Systems (GIS) and the improvement in the volcanic processes modelling has provided an important tool for volcanic hazard assessment. Although this sophisticated programs are really useful they need to be fed by a huge amount of data that sometimes, such in the case of the Canary Islands, are not available. For this reason, the Spanish Geological Survey (IGME) is developing a complete geo-referenced database for long term volcanic analysis in the Canary Islands. The Canarian Volcanic Hazard Database (HADA) is based on a GIS helping to organize and manage volcanic information efficiently. HADA includes the following groups of information: (1) 1:25.000 scale geologic maps, (2) 1:25.000 topographic maps, (3) geochronologic data, (4) geochemical data, (5) structural information, (6) climatic data. Data must pass a quality control before they are included in the database. New data are easily integrated in the database. With the HADA database the IGME has started a systematic organization of the existing data. In the near future, the IGME will generate new information to be included in HADA, such as volcanological maps of the islands, structural information, geochronological data and other information to assess long term volcanic hazard analysis. HADA will permit having enough quality information to map volcanic hazards and to run more reliable models of volcanic hazards, but in addition it aims to become a sharing system, improving communication between researchers, reducing redundant work and to be the reference for geological research in the Canary Islands.
Aligning Event Logs to Task-Time Matrix Clinical Pathways in BPMN for Variance Analysis.
Yan, Hui; Van Gorp, Pieter; Kaymak, Uzay; Lu, Xudong; Ji, Lei; Chiau, Choo Chiap; Korsten, Hendrikus H M; Duan, Huilong
2018-03-01
Clinical pathways (CPs) are popular healthcare management tools to standardize care and ensure quality. Analyzing CP compliance levels and variances is known to be useful for training and CP redesign purposes. Flexible semantics of the business process model and notation (BPMN) language has been shown to be useful for the modeling and analysis of complex protocols. However, in practical cases one may want to exploit that CPs often have the form of task-time matrices. This paper presents a new method parsing complex BPMN models and aligning traces to the models heuristically. A case study on variance analysis is undertaken, where a CP from the practice and two large sets of patients data from an electronic medical record (EMR) database are used. The results demonstrate that automated variance analysis between BPMN task-time models and real-life EMR data are feasible, whereas that was not the case for the existing analysis techniques. We also provide meaningful insights for further improvement.
NASA Astrophysics Data System (ADS)
Vannier, Olivier; Braud, Isabelle; Anquetin, Sandrine
2013-04-01
The estimation of catchment-scale soil properties, such as water storage capacity and hydraulic conductivity, is of primary interest for the implementation of distributed hydrological models at the regional scale. This estimation is generally done on the basis of information provided by soil databases. However, such databases are often established for agronomic uses and generally do not document deep weathered rock horizons (i.e. pedologic horizons of type C and deeper), which can play a major role in water transfer and storages. Here we define the Drainable Storage Capacity Index (DSCI), an indicator that relies on the comparison of cumulated streamflow and precipitation to assess catchment-scale storage capacities. The DSCI is found to be reliable to detect underestimation of soil storage capacities in soil databases. We also use the streamflow recession analysis methodology defined by Brutsaert and Nieber (Water Resources Research 13(3), 1977) to estimate water storage capacities and lateral saturated hydraulic conductivities of the non-documented deep horizons. The analysis is applied to a sample of twenty-three catchments (0.2 km² - 291 km²) located in the Cévennes-Vivarais region (south of France). In a regionalisation purpose, the obtained results are compared to the dominant catchments geology. This highlights a clear hierarchy between the different geologies present in the area. Hard crystalline rocks are found to be associated to the thickest and less conductive deep soil horizons. Schist rocks present intermediate values of thickness and of saturated hydraulic conductivity, whereas sedimentary rocks and alluvium are found to be the less thick and the most conductive. Consequently, deep soil layers with thicknesses and hydraulic conductivities differing with the geology were added to a distributed hydrological model implemented over the Cévennes-Vivarais region. Preliminary simulations show a major improvement in terms of simulated discharge when compared to simulations done without deep soil layers. KEY WORDS: hydraulic soil properties, streamflow recession, deep soil horizons, soil databases, Boussinesq equation, storage capacity, regionalisation
NASA Astrophysics Data System (ADS)
Kadow, Christopher; Illing, Sebastian; Kunst, Oliver; Schartner, Thomas; Kirchner, Ingo; Rust, Henning W.; Cubasch, Ulrich; Ulbrich, Uwe
2016-04-01
The Freie Univ Evaluation System Framework (Freva - freva.met.fu-berlin.de) is a software infrastructure for standardized data and tool solutions in Earth system science. Freva runs on high performance computers to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, including all features present in the shell and web environment. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). Freva indexes different data projects into one common search environment by storing the meta data information of the self-describing model, reanalysis and observational data sets in a database. This implemented meta data system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. Facilitation of the provision and usage of tools and climate data automatically increases the number of scientists working with the data sets and identifying discrepancies. The integrated web-shell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects HPC. Plugins are able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-system stores every analysis performed with the evaluation system in a database. Configurations and results of the tools can be shared among scientists via shell or web system. Therefore, plugged-in tools benefit from transparency and reproducibility. Furthermore, if configurations match while starting an evaluation plugin, the system suggests to use results already produced by other users - saving CPU/h, I/O, disk space and time. The efficient interaction between different technologies improves the Earth system modeling science framed by Freva.
NASA Astrophysics Data System (ADS)
Kadow, C.; Illing, S.; Schartner, T.; Grieger, J.; Kirchner, I.; Rust, H.; Cubasch, U.; Ulbrich, U.
2017-12-01
The Freie Univ Evaluation System Framework (Freva - freva.met.fu-berlin.de) is a software infrastructure for standardized data and tool solutions in Earth system science (e.g. www-miklip.dkrz.de, cmip-eval.dkrz.de). Freva runs on high performance computers to handle customizable evaluation systems of research projects, institutes or universities. It combines different software technologies into one common hybrid infrastructure, including all features present in the shell and web environment. The database interface satisfies the international standards provided by the Earth System Grid Federation (ESGF). Freva indexes different data projects into one common search environment by storing the meta data information of the self-describing model, reanalysis and observational data sets in a database. This implemented meta data system with its advanced but easy-to-handle search tool supports users, developers and their plugins to retrieve the required information. A generic application programming interface (API) allows scientific developers to connect their analysis tools with the evaluation system independently of the programming language used. Users of the evaluation techniques benefit from the common interface of the evaluation system without any need to understand the different scripting languages. The integrated web-shell (shellinabox) adds a degree of freedom in the choice of the working environment and can be used as a gate to the research projects HPC. Plugins are able to integrate their e.g. post-processed results into the database of the user. This allows e.g. post-processing plugins to feed statistical analysis plugins, which fosters an active exchange between plugin developers of a research project. Additionally, the history and configuration sub-system stores every analysis performed with the evaluation system in a database. Configurations and results of the tools can be shared among scientists via shell or web system. Furthermore, if configurations match while starting an evaluation plugin, the system suggests to use results already produced by other users - saving CPU/h, I/O, disk space and time. The efficient interaction between different technologies improves the Earth system modeling science framed by Freva.
Numerical Model Sensitivity to Heterogeneous Satellite Derived Vegetation Roughness
NASA Technical Reports Server (NTRS)
Jasinski, Michael; Eastman, Joseph; Borak, Jordan
2011-01-01
The sensitivity of a mesoscale weather prediction model to a 1 km satellite-based vegetation roughness initialization is investigated for a domain within the south central United States. Three different roughness databases are employed: i) a control or standard lookup table roughness that is a function only of land cover type, ii) a spatially heterogeneous roughness database, specific to the domain, that was previously derived using a physically based procedure and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, and iii) a MODIS climatologic roughness database that like (i) is a function only of land cover type, but possesses domain specific mean values from (ii). The model used is the Weather Research and Forecast Model (WRF) coupled to the Community Land Model within the Land Information System (LIS). For each simulation, a statistical comparison is made between modeled results and ground observations within a domain including Oklahoma, Eastern Arkansas, and Northwest Louisiana during a 4-day period within IHOP 2002. Sensitivity analysis compares the impact the three roughness initializations on time-series temperature, precipitation probability of detection (POD), average wind speed, boundary layer height, and turbulent kinetic energy (TKE). Overall, the results indicate that, for the current investigation, replacement of the standard look-up table values with the satellite-derived values statistically improves model performance for most observed variables. Such natural roughness heterogeneity enhances the surface wind speed, PBL height and TKE production up to 10 percent, with a lesser effect over grassland, and greater effect over mixed land cover domains.
Dynamic analysis environment for nuclear forensic analyses
NASA Astrophysics Data System (ADS)
Stork, C. L.; Ummel, C. C.; Stuart, D. S.; Bodily, S.; Goldblum, B. L.
2017-01-01
A Dynamic Analysis Environment (DAE) software package is introduced to facilitate group inclusion/exclusion method testing, evaluation and comparison for pre-detonation nuclear forensics applications. Employing DAE, the multivariate signatures of a questioned material can be compared to the signatures for different, known groups, enabling the linking of the questioned material to its potential process, location, or fabrication facility. Advantages of using DAE for group inclusion/exclusion include built-in query tools for retrieving data of interest from a database, the recording and documentation of all analysis steps, a clear visualization of the analysis steps intelligible to a non-expert, and the ability to integrate analysis tools developed in different programming languages. Two group inclusion/exclusion methods are implemented in DAE: principal component analysis, a parametric feature extraction method, and k nearest neighbors, a nonparametric pattern recognition method. Spent Fuel Isotopic Composition (SFCOMPO), an open source international database of isotopic compositions for spent nuclear fuels (SNF) from 14 reactors, is used to construct PCA and KNN models for known reactor groups, and 20 simulated SNF samples are utilized in evaluating the performance of these group inclusion/exclusion models. For all 20 simulated samples, PCA in conjunction with the Q statistic correctly excludes a large percentage of reactor groups and correctly includes the true reactor of origination. Employing KNN, 14 of the 20 simulated samples are classified to their true reactor of origination.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gerhard, M.A.; Sommer, S.C.
1995-04-01
AUTOCASK (AUTOmatic Generation of 3-D CASK models) is a microcomputer-based system of computer programs and databases developed at the Lawrence Livermore National Laboratory (LLNL) for the structural analysis of shipping casks for radioactive material. Model specification is performed on the microcomputer, and the analyses are performed on an engineering workstation or mainframe computer. AUTOCASK is based on 80386/80486 compatible microcomputers. The system is composed of a series of menus, input programs, display programs, a mesh generation program, and archive programs. All data is entered through fill-in-the-blank input screens that contain descriptive data requests.
Nonlinear and progressive failure aspects of transport composite fuselage damage tolerance
NASA Technical Reports Server (NTRS)
Walker, Tom; Ilcewicz, L.; Murphy, Dan; Dopker, Bernhard
1993-01-01
The purpose is to provide an end-user's perspective on the state of the art in life prediction and failure analysis by focusing on subsonic transport fuselage issues being addressed in the NASA/Boeing Advanced Technology Composite Aircraft Structure (ATCAS) contract and a related task-order contract. First, some discrepancies between the ATCAS tension-fracture test database and classical prediction methods is discussed, followed by an overview of material modeling work aimed at explaining some of these discrepancies. Finally, analysis efforts associated with a pressure-box test fixture are addressed, as an illustration of modeling complexities required to model and interpret tests.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robertson, SP; Quon, H; Kiess, AP
Purpose: To develop a framework for automatic extraction of clinically meaningful dosimetric-outcome relationships from an in-house, analytic oncology database. Methods: Dose-volume histograms (DVH) and clinical outcome-related structured data elements have been routinely stored to our database for 513 HN cancer patients treated from 2007 to 2014. SQL queries were developed to extract outcomes that had been assessed for at least 100 patients, as well as DVH curves for organs-at-risk (OAR) that were contoured for at least 100 patients. DVH curves for paired OAR (e.g., left and right parotids) were automatically combined and included as additional structures for analysis. For eachmore » OAR-outcome combination, DVH dose points, D(V{sub t}), at a series of normalized volume thresholds, V{sub t}=[0.01,0.99], were stratified into two groups based on outcomes after treatment completion. The probability, P[D(V{sub t})], of an outcome was modeled at each V{sub t} by logistic regression. Notable combinations, defined as having P[D(V{sub t})] increase by at least 5% per Gy (p<0.05), were further evaluated for clinical relevance using a custom graphical interface. Results: A total of 57 individual and combined structures and 115 outcomes were queried, resulting in over 6,500 combinations for analysis. Of these, 528 combinations met the 5%/Gy requirement, with further manual inspection revealing a number of reasonable models based on either reported literature or proximity between neighboring OAR. The data mining algorithm confirmed the following well-known toxicity/outcome relationships: dysphagia/larynx, voice changes/larynx, esophagitis/esophagus, xerostomia/combined parotids, and mucositis/oral mucosa. Other notable relationships included dysphagia/pharyngeal constrictors, nausea/brainstem, nausea/spinal cord, weight-loss/mandible, and weight-loss/combined parotids. Conclusion: Our database platform has enabled large-scale analysis of dose-outcome relationships. The current data-mining framework revealed both known and novel dosimetric and clinical relationships, underscoring the potential utility of this analytic approach. Multivariate models may be necessary to further evaluate the complex relationship between neighboring OARs and observed outcomes. This research was supported through collaborations with Elekta, Philips, and Toshiba.« less
Kireeva, N; Baskin, I I; Gaspar, H A; Horvath, D; Marcou, G; Varnek, A
2012-04-01
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Prediction of pelvic organ prolapse using an artificial neural network.
Robinson, Christopher J; Swift, Steven; Johnson, Donna D; Almeida, Jonas S
2008-08-01
The objective of this investigation was to test the ability of a feedforward artificial neural network (ANN) to differentiate patients who have pelvic organ prolapse (POP) from those who retain good pelvic organ support. Following institutional review board approval, patients with POP (n = 87) and controls with good pelvic organ support (n = 368) were identified from the urogynecology research database. Historical and clinical information was extracted from the database. Data analysis included the training of a feedforward ANN, variable selection, and external validation of the model with an independent data set. Twenty variables were used. The median-performing ANN model used a median of 3 (quartile 1:3 to quartile 3:5) variables and achieved an area under the receiver operator curve of 0.90 (external, independent validation set). Ninety percent sensitivity and 83% specificity were obtained in the external validation by ANN classification. Feedforward ANN modeling is applicable to the identification and prediction of POP.
NASA Astrophysics Data System (ADS)
Regazzoni, C.; Payraudeau, S.
2012-04-01
Runoff and associated erosion represent a primary mode of mobilization and transfer of pesticides from agricultural lands to watercourses and groundwater. The pesticides toxicity is potentially higher at the headwater catchment scale. These catchments are usually ungauged and characterized by temporary streams. Several mitigation strategies and management practices are currently used to mitigate the pesticides mixtures in agro-ecosystems. Among those practices, Stormwater Wetlands (SW) could be implemented to store surface runoff and to mitigate pesticides loads. The implementation of New Potential Stormwater Wetlands (NPSW) requires a diagnosis of intermittent runoff at the headwater catchment scale. The main difficulty to perform this diagnosis at the headwater catchment scale is to spatially characterize with enough accuracy the landscape components. Indeed, fields and field margins enhance or decrease the runoff and determine the pathways of hortonian overland flow. Land use, soil and Digital Elevation Model databases are systematically used. The question of the respective weight of each of these databases on the uncertainty of the diagnostic results is rarely analyzed at the headwater catchment scale. Therefore, this work focused (i) on the uncertainties of each of these databases and their propagation on the hortonian overland flow modelling, (ii) the methods to improve the accuracy of each database, (iii) the propagation of the databases uncertainties on intermittent runoff modelling and (iv) the impact of modelling cell size on the diagnosis. The model developed was a raster approach of the SCS-CN method integrating re-infiltration processes. The uncertainty propagation was analyzed on the Briançon vineyard catchment (Gard, France, 1400 ha). Based on this study site, the results showed that the geographic and thematic accuracies of regional soil database (1:250 000) were insufficient to correctly simulate the hortonian overland flow. These results have to be weighted according to the soil heterogeneity. Conversely, the regional land use (1:50 000) provided an acceptable diagnostic when combining with accurate soil database (1:15 000). Moreover, the regional land use quality can be improved by integrating road and river networks usually available at the national scale. Finally, a 5 m modelling cell size appeared as an optimum to correctly describe the landscape components and to assess the hortonian overland flow. A wrong assessment of the hortonian overland flow leads to a misinterpretation of the results and affects effective decision-making, e.g. the number and the location of the NSPW. This uncertainty analysis and the improvement methods developed on this study site can be adapted on other headwater catchments characterized by intermittent surface runoff.
Application of materials database (MAT.DB.) to materials education
NASA Technical Reports Server (NTRS)
Liu, Ping; Waskom, Tommy L.
1994-01-01
Finding the right material for the job is an important aspect of engineering. Sometimes the choice is as fundamental as selecting between steel and aluminum. Other times, the choice may be between different compositions in an alloy. Discovering and compiling materials data is a demanding task, but it leads to accurate models for analysis and successful materials application. Mat. DB. is a database management system designed for maintaining information on the properties and processing of engineered materials, including metals, plastics, composites, and ceramics. It was developed by the Center for Materials Data of American Society for Metals (ASM) International. The ASM Center for Materials Data collects and reviews material property data for publication in books, reports, and electronic database. Mat. DB was developed to aid the data management and material applications.
A hybrid CNN feature model for pulmonary nodule malignancy risk differentiation.
Wang, Huafeng; Zhao, Tingting; Li, Lihong Connie; Pan, Haixia; Liu, Wanquan; Gao, Haoqi; Han, Fangfang; Wang, Yuehai; Qi, Yifan; Liang, Zhengrong
2018-01-01
The malignancy risk differentiation of pulmonary nodule is one of the most challenge tasks of computer-aided diagnosis (CADx). Most recently reported CADx methods or schemes based on texture and shape estimation have shown relatively satisfactory on differentiating the risk level of malignancy among the nodules detected in lung cancer screening. However, the existing CADx schemes tend to detect and analyze characteristics of pulmonary nodules from a statistical perspective according to local features only. Enlightened by the currently prevailing learning ability of convolutional neural network (CNN), which simulates human neural network for target recognition and our previously research on texture features, we present a hybrid model that takes into consideration of both global and local features for pulmonary nodule differentiation using the largest public database founded by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI). By comparing three types of CNN models in which two of them were newly proposed by us, we observed that the multi-channel CNN model yielded the best discrimination in capacity of differentiating malignancy risk of the nodules based on the projection of distributions of extracted features. Moreover, CADx scheme using the new multi-channel CNN model outperformed our previously developed CADx scheme using the 3D texture feature analysis method, which increased the computed area under a receiver operating characteristic curve (AUC) from 0.9441 to 0.9702.
Chao, Edmund Y S; Armiger, Robert S; Yoshida, Hiroaki; Lim, Jonathan; Haraguchi, Naoki
2007-03-08
The ability to combine physiology and engineering analyses with computer sciences has opened the door to the possibility of creating the "Virtual Human" reality. This paper presents a broad foundation for a full-featured biomechanical simulator for the human musculoskeletal system physiology. This simulation technology unites the expertise in biomechanical analysis and graphic modeling to investigate joint and connective tissue mechanics at the structural level and to visualize the results in both static and animated forms together with the model. Adaptable anatomical models including prosthetic implants and fracture fixation devices and a robust computational infrastructure for static, kinematic, kinetic, and stress analyses under varying boundary and loading conditions are incorporated on a common platform, the VIMS (Virtual Interactive Musculoskeletal System). Within this software system, a manageable database containing long bone dimensions, connective tissue material properties and a library of skeletal joint system functional activities and loading conditions are also available and they can easily be modified, updated and expanded. Application software is also available to allow end-users to perform biomechanical analyses interactively. Examples using these models and the computational algorithms in a virtual laboratory environment are used to demonstrate the utility of these unique database and simulation technology. This integrated system, model library and database will impact on orthopaedic education, basic research, device development and application, and clinical patient care related to musculoskeletal joint system reconstruction, trauma management, and rehabilitation.
Chao, Edmund YS; Armiger, Robert S; Yoshida, Hiroaki; Lim, Jonathan; Haraguchi, Naoki
2007-01-01
The ability to combine physiology and engineering analyses with computer sciences has opened the door to the possibility of creating the "Virtual Human" reality. This paper presents a broad foundation for a full-featured biomechanical simulator for the human musculoskeletal system physiology. This simulation technology unites the expertise in biomechanical analysis and graphic modeling to investigate joint and connective tissue mechanics at the structural level and to visualize the results in both static and animated forms together with the model. Adaptable anatomical models including prosthetic implants and fracture fixation devices and a robust computational infrastructure for static, kinematic, kinetic, and stress analyses under varying boundary and loading conditions are incorporated on a common platform, the VIMS (Virtual Interactive Musculoskeletal System). Within this software system, a manageable database containing long bone dimensions, connective tissue material properties and a library of skeletal joint system functional activities and loading conditions are also available and they can easily be modified, updated and expanded. Application software is also available to allow end-users to perform biomechanical analyses interactively. Examples using these models and the computational algorithms in a virtual laboratory environment are used to demonstrate the utility of these unique database and simulation technology. This integrated system, model library and database will impact on orthopaedic education, basic research, device development and application, and clinical patient care related to musculoskeletal joint system reconstruction, trauma management, and rehabilitation. PMID:17343764
Levi, Benjamin; Jayakumar, Prakash; Giladi, Avi; Jupiter, Jesse B; Ring, David C; Kowalske, Karen; Gibran, Nicole S; Herndon, David; Schneider, Jeffrey C; Ryan, Colleen M
2015-11-01
Heterotopic ossification (HO) is a debilitating complication of burn injury; however, incidence and risk factors are poorly understood. In this study, we use a multicenter database of adults with burn injuries to identify and analyze clinical factors that predict HO formation. Data from six high-volume burn centers, in the Burn Injury Model System Database, were analyzed. Univariate logistic regression models were used for model selection. Cluster-adjusted multivariate logistic regression was then used to evaluate the relationship between clinical and demographic data and the development of HO. Of 2,979 patients in the database with information on HO that addressed risk factors for development of HO, 98 (3.5%) developed HO. Of these 98 patients, 97 had arm burns, and 96 had arm grafts. When controlling for age and sex in a multivariate model, patients with greater than 30% total body surface area burn had 11.5 times higher odds of developing HO (p < 0.001), and those with arm burns that required skin grafting had 96.4 times higher odds of developing HO (p = 0.04). For each additional time a patient went to the operating room, odds of HO increased by 30% (odds ratio, 1.32; p < 0.001), and each additional ventilator day increased odds by 3.5% (odds ratio, 1.035; p < 0.001). Joint contracture, inhalation injury, and bone exposure did not significantly increase odds of HO. Risk factors for HO development include greater than 30% total body surface area burn, arm burns, arm grafts, ventilator days, and number of trips to the operating room. Future studies can use these results to identify highest-risk patients to guide deployment of prophylactic and experimental treatments. Prognostic study, level III.
NASA Astrophysics Data System (ADS)
Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.
2017-08-01
Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.
Levi, Benjamin; Jayakumar, Prakash; Giladi, Avi; Jupiter, Jesse B.; Ring, David C.; Kowalske, Karen; Gibran, Nicole S.; Herndon, David; Schneider, Jeffrey C.; Ryan, Colleen M.
2015-01-01
Purpose Heterotopic ossification (HO) is a debilitating complication of burn injury; however, incidence and risk factors are poorly understood. In this study we utilize a multicenter database of adults with burn injuries to identify and analyze clinical factors that predict HO formation. Methods Data from 6 high-volume burn centers, in the Burn Injury Model System Database, were analyzed. Univariate logistic regression models were used for model selection. Cluster-adjusted multivariate logistic regression was then used to evaluate the relationship between clinical and demographic data and the development of HO. Results Of 2,979 patients in the database with information on HO that addressed risk factors for development of HO, 98 (3.5%) developed HO. Of these 98 patients, 97 had arm burns, and 96 had arm grafts. Controlling for age and sex in a multivariate model, patients with >30% total body surface area (TBSA) burn had 11.5x higher odds of developing HO (p<0.001), and those with arm burns that required skin grafting had 96.4x higher odds of developing HO (p=0.04). For each additional time a patient went to the operating room, odds of HO increased 30% (OR 1.32, p<0.001), and each additional ventilator day increase odds 3.5% (OR 1.035, p<0.001). Joint contracture, inhalation injury, and bone exposure did not significantly increase odds of HO. Conclusion Risk factors for HO development include >30% TBSA burn, arm burns, arm grafts, ventilator days, and number of trips to the operating room. Future studies can use these results to identify highest-risk patients to guide deployment of prophylactic and experimental treatments. PMID:26496115
Chess databases as a research vehicle in psychology: Modeling large data.
Vaci, Nemanja; Bilalić, Merim
2017-08-01
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
Zhao, Lei; Guo, Yi; Wang, Wei; Yan, Li-juan
2011-08-01
To evaluate the effectiveness of acupuncture as a treatment for neurovascular headache and to analyze the current situation related to acupuncture treatment. PubMed database (1966-2010), EMBASE database (1986-2010), Cochrane Library (Issue 1, 2010), Chinese Biomedical Literature Database (1979-2010), China HowNet Knowledge Database (1979-2010), VIP Journals Database (1989-2010), and Wanfang database (1998-2010) were retrieved. Randomized or quasi-randomized controlled studies were included. The priority was given to high-quality randomized, controlled trials. Statistical outcome indicators were measured using RevMan 5.0.20 software. A total of 16 articles and 1 535 cases were included. Meta-analysis showed a significant difference between the acupuncture therapy and Western medicine therapy [combined RR (random efficacy model)=1.46, 95% CI (1.21, 1.75), Z=3.96, P<0.0001], indicating an obvious superior effect of the acupuncture therapy; significant difference also existed between the comprehensive acupuncture therapy and acupuncture therapy alone [combined RR (fixed efficacy model)=3.35, 95% CI (1.92, 5.82), Z=4.28, P<0.0001], indicating that acupuncture combined with other therapies, such as points injection, scalp acupuncture, auricular acupuncture, etc., were superior to the conventional body acupuncture therapy alone. The inclusion of limited clinical studies had verified the efficacy of acupuncture in the treatment of neurovascular headache. Although acupuncture or its combined therapies provides certain advantages, most clinical studies are of small sample sizes. Large sample size, randomized, controlled trials are needed in the future for more definitive results.
MIPS: a database for genomes and protein sequences.
Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D
1999-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
NADM Conceptual Model 1.0 -- A Conceptual Model for Geologic Map Information
,
2004-01-01
Executive Summary -- The NADM Data Model Design Team was established in 1999 by the North American Geologic Map Data Model Steering Committee (NADMSC) with the purpose of drafting a geologic map data model for consideration as a standard for developing interoperable geologic map-centered databases by state, provincial, and federal geological surveys. The model is designed to be a technology-neutral conceptual model that can form the basis for a web-based interchange format using evolving information technology (e.g., XML, RDF, OWL), and guide implementation of geoscience databases in a common conceptual framework. The intended purpose is to allow geologic information sharing between geologic map data providers and users, independent of local information system implementation. The model emphasizes geoscience concepts and relationships related to information presented on geologic maps. Design has been guided by an informal requirements analysis, documentation of existing databases, technology developments, and other standardization efforts in the geoscience and computer-science communities. A key aspect of the model is the notion that representation of the conceptual framework (ontology) that underlies geologic map data must be part of the model, because this framework changes with time and understanding, and varies between information providers. The top level of the model distinguishes geologic concepts, geologic representation concepts, and metadata. The geologic representation part of the model provides a framework for representing the ontology that underlies geologic map data through a controlled vocabulary, and for establishing the relationships between this vocabulary and a geologic map visualization or portrayal. Top-level geologic classes in the model are Earth material (substance), geologic unit (parts of the Earth), geologic age, geologic structure, fossil, geologic process, geologic relation, and geologic event.
Generation of signature databases with fast codes
NASA Astrophysics Data System (ADS)
Bradford, Robert A.; Woodling, Arthur E.; Brazzell, James S.
1990-09-01
Using the FASTSIG signature code to generate optical signature databases for the Ground-based Surveillance and Traking System (GSTS) Program has improved the efficiency of the database generation process. The goal of the current GSTS database is to provide standardized, threat representative target signatures that can easily be used for acquisition and trk studies, discrimination algorithm development, and system simulations. Large databases, with as many as eight interpolalion parameters, are required to maintain the fidelity demands of discrimination and to generalize their application to other strateg systems. As the need increases for quick availability of long wave infrared (LWIR) target signatures for an evolving design4o-threat, FASTSIG has become a database generation alternative to using the industry standard OptiCal Signatures Code (OSC). FASTSIG, developed in 1985 to meet the unique strategic systems demands imposed by the discrimination function, has the significant advantage of being a faster running signature code than the OSC, typically requiring two percent of the cpu time. It uses analytical approximations to model axisymmetric targets, with the fidelity required for discrimination analysis. Access of the signature database is accomplished through use of the waveband integration and interpolation software, INTEG and SIGNAT. This paper gives details of this procedure as well as sample interpolated signatures and also covers sample verification by comparison to the OSC, in order to establish the fidelity of the FASTSIG generated database.
PlantTFDB: a comprehensive plant transcription factor database
Guo, An-Yuan; Chen, Xin; Gao, Ge; Zhang, He; Zhu, Qi-Hui; Liu, Xiao-Chuan; Zhong, Ying-Fu; Gu, Xiaocheng; He, Kun; Luo, Jingchu
2008-01-01
Transcription factors (TFs) play key roles in controlling gene expression. Systematic identification and annotation of TFs, followed by construction of TF databases may serve as useful resources for studying the function and evolution of transcription factors. We developed a comprehensive plant transcription factor database PlantTFDB (http://planttfdb.cbi.pku.edu.cn), which contains 26 402 TFs predicted from 22 species, including five model organisms with available whole genome sequence and 17 plants with available EST sequences. To provide comprehensive information for those putative TFs, we made extensive annotation at both family and gene levels. A brief introduction and key references were presented for each family. Functional domain information and cross-references to various well-known public databases were available for each identified TF. In addition, we predicted putative orthologs of those TFs among the 22 species. PlantTFDB has a simple interface to allow users to search the database by IDs or free texts, to make sequence similarity search against TFs of all or individual species, and to download TF sequences for local analysis. PMID:17933783
Common modeling system for digital simulation
NASA Technical Reports Server (NTRS)
Painter, Rick
1994-01-01
The Joint Modeling and Simulation System is a tri-service investigation into a common modeling framework for the development digital models. The basis for the success of this framework is an X-window-based, open systems architecture, object-based/oriented methodology, standard interface approach to digital model construction, configuration, execution, and post processing. For years Department of Defense (DOD) agencies have produced various weapon systems/technologies and typically digital representations of the systems/technologies. These digital representations (models) have also been developed for other reasons such as studies and analysis, Cost Effectiveness Analysis (COEA) tradeoffs, etc. Unfortunately, there have been no Modeling and Simulation (M&S) standards, guidelines, or efforts towards commonality in DOD M&S. The typical scenario is an organization hires a contractor to build hardware and in doing so an digital model may be constructed. Until recently, this model was not even obtained by the organization. Even if it was procured, it was on a unique platform, in a unique language, with unique interfaces, and, with the result being UNIQUE maintenance required. Additionally, the constructors of the model expended more effort in writing the 'infrastructure' of the model/simulation (e.g. user interface, database/database management system, data journalizing/archiving, graphical presentations, environment characteristics, other components in the simulation, etc.) than in producing the model of the desired system. Other side effects include: duplication of efforts; varying assumptions; lack of credibility/validation; and decentralization in policy and execution. J-MASS provides the infrastructure, standards, toolset, and architecture to permit M&S developers and analysts to concentrate on the their area of interest.
Very fast road database verification using textured 3D city models obtained from airborne imagery
NASA Astrophysics Data System (ADS)
Bulatov, Dimitri; Ziems, Marcel; Rottensteiner, Franz; Pohl, Melanie
2014-10-01
Road databases are known to be an important part of any geodata infrastructure, e.g. as the basis for urban planning or emergency services. Updating road databases for crisis events must be performed quickly and with the highest possible degree of automation. We present a semi-automatic algorithm for road verification using textured 3D city models, starting from aerial or even UAV-images. This algorithm contains two processes, which exchange input and output, but basically run independently from each other. These processes are textured urban terrain reconstruction and road verification. The first process contains a dense photogrammetric reconstruction of 3D geometry of the scene using depth maps. The second process is our core procedure, since it contains various methods for road verification. Each method represents a unique road model and a specific strategy, and thus is able to deal with a specific type of roads. Each method is designed to provide two probability distributions, where the first describes the state of a road object (correct, incorrect), and the second describes the state of its underlying road model (applicable, not applicable). Based on the Dempster-Shafer Theory, both distributions are mapped to a single distribution that refers to three states: correct, incorrect, and unknown. With respect to the interaction of both processes, the normalized elevation map and the digital orthophoto generated during 3D reconstruction are the necessary input - together with initial road database entries - for the road verification process. If the entries of the database are too obsolete or not available at all, sensor data evaluation enables classification of the road pixels of the elevation map followed by road map extraction by means of vectorization and filtering of the geometrically and topologically inconsistent objects. Depending on the time issue and availability of a geo-database for buildings, the urban terrain reconstruction procedure has semantic models of buildings, trees, and ground as output. Building s and ground are textured by means of available images. This facilitates the orientation in the model and the interactive verification of the road objects that where initially classified as unknown. The three main modules of the texturing algorithm are: Pose estimation (if the videos are not geo-referenced), occlusion analysis, and texture synthesis.
Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.
2014-01-01
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599
Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S
2014-07-01
The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.
Research on the spatial analysis method of seismic hazard for island
NASA Astrophysics Data System (ADS)
Jia, Jing; Jiang, Jitong; Zheng, Qiuhong; Gao, Huiying
2017-05-01
Seismic hazard analysis(SHA) is a key component of earthquake disaster prevention field for island engineering, whose result could provide parameters for seismic design microscopically and also is the requisite work for the island conservation planning’s earthquake and comprehensive disaster prevention planning macroscopically, in the exploitation and construction process of both inhabited and uninhabited islands. The existing seismic hazard analysis methods are compared in their application, and their application and limitation for island is analysed. Then a specialized spatial analysis method of seismic hazard for island (SAMSHI) is given to support the further related work of earthquake disaster prevention planning, based on spatial analysis tools in GIS and fuzzy comprehensive evaluation model. The basic spatial database of SAMSHI includes faults data, historical earthquake record data, geological data and Bouguer gravity anomalies data, which are the data sources for the 11 indices of the fuzzy comprehensive evaluation model, and these indices are calculated by the spatial analysis model constructed in ArcGIS’s Model Builder platform.
BioModels Database: a repository of mathematical models of biological processes.
Chelliah, Vijayalakshmi; Laibe, Camille; Le Novère, Nicolas
2013-01-01
BioModels Database is a public online resource that allows storing and sharing of published, peer-reviewed quantitative, dynamic models of biological processes. The model components and behaviour are thoroughly checked to correspond the original publication and manually curated to ensure reliability. Furthermore, the model elements are annotated with terms from controlled vocabularies as well as linked to relevant external data resources. This greatly helps in model interpretation and reuse. Models are stored in SBML format, accepted in SBML and CellML formats, and are available for download in various other common formats such as BioPAX, Octave, SciLab, VCML, XPP and PDF, in addition to SBML. The reaction network diagram of the models is also available in several formats. BioModels Database features a search engine, which provides simple and more advanced searches. Features such as online simulation and creation of smaller models (submodels) from the selected model elements of a larger one are provided. BioModels Database can be accessed both via a web interface and programmatically via web services. New models are available in BioModels Database at regular releases, about every 4 months.
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.
Genetic analysis of the cytoplasmic dynein subunit families.
Pfister, K Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M C
2006-01-01
Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles.
Genetic Analysis of the Cytoplasmic Dynein Subunit Families
Pfister, K. Kevin; Shah, Paresh R; Hummerich, Holger; Russ, Andreas; Cotton, James; Annuar, Azlina Ahmad; King, Stephen M; Fisher, Elizabeth M. C
2006-01-01
Cytoplasmic dyneins, the principal microtubule minus-end-directed motor proteins of the cell, are involved in many essential cellular processes. The major form of this enzyme is a complex of at least six protein subunits, and in mammals all but one of the subunits are encoded by at least two genes. Here we review current knowledge concerning the subunits, their interactions, and their functional roles as derived from biochemical and genetic analyses. We also carried out extensive database searches to look for new genes and to clarify anomalies in the databases. Our analysis documents evolutionary relationships among the dynein subunits of mammals and other model organisms, and sheds new light on the role of this diverse group of proteins, highlighting the existence of two cytoplasmic dynein complexes with distinct cellular roles. PMID:16440056
Fan, Yaofu; Wang, Kun; Xu, Shuhang; Chen, Guofang; Di, Hongjie; Cao, Meng; Liu, Chao
2014-01-01
Recently, a number of studies have reported the association between the single nucleotide polymorphisms (SNPs) +45T>G polymorphism in the adiponectin (ADIPOQ) gene and type 2 diabetes mellitus (T2DM) risk, though the results are inconsistent. In order to obtain a more precise estimation of the relationship, a meta-analysis was performed. In this current study, the Medline, Embase, Pubmed, ISI Web of Knowledge, Ovid, Science Citation Index Expanded Database, Wanfang Database, and China National Knowledge Infrastructure were searched for eligible studies. Odds ratios (ORs) with 95% confidence intervals (CIs) were used to estimate the strength of association. Forty-five publications were included in the final meta-analysis with 9986 T2DM patients and 16,222 controls for ADIPOQ +45T>G polymorphism according to our inclusion and exclusion criteria. The +45T>G polymorphism was associated with an overall significantly increased risk of T2DM (G vs. T: OR = 1.18, 95% CI = 1.06–1.32; The dominant model: OR = 1.18, 95% CI = 1.03–1.33; The recessive model: OR = 1.47, 95% CI = 1.20–1.78; The homozygous model: OR = 1.62, 95% CI = 1.25–2.09; Except the heterozygous model: OR = 1.11, 95% CI = 0.98–1.24). Subgroup analysis revealed a significant association between the +45T>G polymorphism and T2D in an Asian population. Thus, this meta-analysis indicates that the G allele of the ADIPOQ +45T>G polymorphisms associated with a significantly increased risk of T2DM in the Asian population. PMID:25561226
Digital Dental X-ray Database for Caries Screening
NASA Astrophysics Data System (ADS)
Rad, Abdolvahab Ehsani; Rahim, Mohd Shafry Mohd; Rehman, Amjad; Saba, Tanzila
2016-06-01
Standard database is the essential requirement to compare the performance of image analysis techniques. Hence the main issue in dental image analysis is the lack of available image database which is provided in this paper. Periapical dental X-ray images which are suitable for any analysis and approved by many dental experts are collected. This type of dental radiograph imaging is common and inexpensive, which is normally used for dental disease diagnosis and abnormalities detection. Database contains 120 various Periapical X-ray images from top to bottom jaw. Dental digital database is constructed to provide the source for researchers to use and compare the image analysis techniques and improve or manipulate the performance of each technique.
Building an efficient curation workflow for the Arabidopsis literature corpus
Li, Donghui; Berardini, Tanya Z.; Muller, Robert J.; Huala, Eva
2012-01-01
TAIR (The Arabidopsis Information Resource) is the model organism database (MOD) for Arabidopsis thaliana, a model plant with a literature corpus of about 39 000 articles in PubMed, with over 4300 new articles added in 2011. We have developed a literature curation workflow incorporating both automated and manual elements to cope with this flood of new research articles. The current workflow can be divided into two phases: article selection and curation. Structured controlled vocabularies, such as the Gene Ontology and Plant Ontology are used to capture free text information in the literature as succinct ontology-based annotations suitable for the application of computational analysis methods. We also describe our curation platform and the use of text mining tools in our workflow. Database URL: www.arabidopsis.org PMID:23221298
Clinical study of the Erlanger silver catheter--data management and biometry.
Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P
1999-01-01
The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.
CARINA data synthesis project: pH data scale unification and cruise adjustments
NASA Astrophysics Data System (ADS)
Velo, A.; Pérez, F. F.; Lin, X.; Key, R. M.; Tanhua, T.; de La Paz, M.; van Heuven, S.; Jutterström, S.; Ríos, A. F.
2009-10-01
Data on carbon and carbon-relevant hydrographic and hydrochemical parameters from previously non-publicly available cruise data sets in the Artic Mediterranean Seas (AMS), Atlantic and Southern Ocean have been retrieved and merged to a new database: CARINA (CARbon IN the Atlantic). These data have gone through rigorous quality control (QC) procedures to assure the highest possible quality and consistency. The data for most of the measured parameters in the CARINA database were objectively examined in order to quantify systematic differences in the reported values, i.e. secondary quality control. Systematic biases found in the data have been corrected in the data products, i.e. three merged data files with measured, calculated and interpolated data for each of the three CARINA regions; AMS, Atlantic and Southern Ocean. Out of a total of 188 cruise entries in the CARINA database, 59 reported pH measured values. Here we present details of the secondary QC on pH for the CARINA database. Procedures of quality control, including crossover analysis between cruises and inversion analysis of all crossover data are briefly described. Adjustments were applied to the pH values for 21 of the cruises in the CARINA dataset. With these adjustments the CARINA database is consistent both internally as well as with GLODAP data, an oceanographic data set based on the World Hydrographic Program in the 1990s. Based on our analysis we estimate the internal accuracy of the CARINA pH data to be 0.005 pH units. The CARINA data are now suitable for accurate assessments of, for example, oceanic carbon inventories and uptake rates and for model validation.
Ghandikota, Sudhir; Hershey, Gurjit K Khurana; Mersha, Tesfaye B
2018-03-24
Advances in high-throughput sequencing technologies have made it possible to generate multiple omics data at an unprecedented rate and scale. The accumulation of these omics data far outpaces the rate at which biologists can mine and generate new hypothesis to test experimentally. There is an urgent need to develop a myriad of powerful tools to efficiently and effectively search and filter these resources to address specific post-GWAS functional genomics questions. However, to date, these resources are scattered across several databases and often lack a unified portal for data annotation and analytics. In addition, existing tools to analyze and visualize these databases are highly fragmented, resulting researchers to access multiple applications and manual interventions for each gene or variant in an ad hoc fashion until all the questions are answered. In this study, we present GENEASE, a web-based one-stop bioinformatics tool designed to not only query and explore multi-omics and phenotype databases (e.g., GTEx, ClinVar, dbGaP, GWAS Catalog, ENCODE, Roadmap Epigenomics, KEGG, Reactome, Gene and Phenotype Ontology) in a single web interface but also to perform seamless post genome-wide association downstream functional and overlap analysis for non-coding regulatory variants. GENEASE accesses over 50 different databases in public domain including model organism-specific databases to facilitate gene/variant and disease exploration, enrichment and overlap analysis in real time. It is a user-friendly tool with point-and-click interface containing links for support information including user manual and examples. GENEASE can be accessed freely at http://research.cchmc.org/mershalab/genease_new/login.html. Tesfaye.Mersha@cchmc.org, Sudhir.Ghandikota@cchmc.org. Supplementary data are available at Bioinformatics online.
A dedicated database system for handling multi-level data in systems biology.
Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens
2014-01-01
Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging. To overcome this, we designed and developed a dedicated database system that can serve and solve the vital issues in data management and hereby facilitate data integration, modeling and analysis in systems biology within a sole database. In addition, a yeast data repository was implemented as an integrated database environment which is operated by the database system. Two applications were implemented to demonstrate extensibility and utilization of the system. Both illustrate how the user can access the database via the web query function and implemented scripts. These scripts are specific for two sample cases: 1) Detecting the pheromone pathway in protein interaction networks; and 2) Finding metabolic reactions regulated by Snf1 kinase. In this study we present the design of database system which offers an extensible environment to efficiently capture the majority of biological entities and relations encountered in systems biology. Critical functions and control processes were designed and implemented to ensure consistent, efficient, secure and reliable transactions. The two sample cases on the yeast integrated data clearly demonstrate the value of a sole database environment for systems biology research.
Xin, J Y; Cong, H L
2018-06-24
Objective: To explore the association between genetic polymorphisms of rs2073617T/C (950T/C) and rs2073618G/C(1181G/C) in the osteoprotegerin gene and cardiovascular disease with meta-analysis. Methods: A computer-based search for the study of relationship between genetic polymorphisms of rs2073617T/C and rs2073618G/C in the osteoprotegerin gene and cardiovascular disease were performed in electronic databases including China National Knowledge Infrastructure(CNKI), China Biomedical Literature Database, Wanfang Database, Chinese Journal Full-text Database, Embase, PubMed, and Cochrane Library, supplemented by manual search, from the beginning of library to February 28, 2017. The quality of the included studies were assessed by the Newcastle-Ottawa Scale (NOS) scoring system. Data were analyzed using STATA 12.0 software. Results: Eleven clinical case-control studies that enrolled 2 115 patients with cardiovascular disease and 1 467 healthy subjects were included.The results indicated that osteoprotegerin gene polymorphisms of rs2073617T/C and rs2073618G/C might be closely associated with the susceptibility to cardiovascular disease(rs2073617T/C allele model: OR= 0.79, 95% CI 0.73-0.87, P= 0.001;rs2073618G/C M allele and W allele: OR= 0.83, 95% CI 0.74-0.92, P= 0.001). The osteoprotegerin gene polymorphisms of rs2073617T/C and rs2073618G/C were significantly related to the incidence of coronary artery disease and acute coronary syndrome(coronary artery disease allele model: OR= 0.83, 95% CI 0.75-0.92, P= 0.001; acute coronary syndrome allele model: OR= 0.73, 95% CI 0.62-0.85, P< 0.001). However, there was no significant correlation between the genetic polymorphisms of these two sites and the lesion vessel number of coronary artery (rs2073617T/C allele model: OR= 1.00, 95% CI 0.81-1.24, P= 0.985;rs2073618G/C allele model: OR= 0.98, 95% CI 0.80-1.21, P= 0.626). Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and polymerase chain reaction-ligase detection reaction(PCR-LDR) evidenced the association between osteoprotegerin gene polymorphisms and cardiovascular disease(allele model: OR= 0.79, 95% CI 0.72-0.86, P< 0.001), but no obvious relationship was found with fluorogenic quantitative detection and molecularprobe(allele model: OR= 0.86, 95% CI 0.65-1.12, P= 0.263). Conclusion: This meta-analysis indicates that the osteoprotegerin gene polymorphisms of rs2073617T/C and rs2073618G/C may be closely related to the increased risk of cardiovascular disease.
Enabling comparative modeling of closely related genomes: Example genus Brucella
Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.; ...
2014-03-08
For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less
Enabling comparative modeling of closely related genomes: Example genus Brucella
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.
For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less
MOCCA-SURVEY Database I: Is NGC 6535 a dark star cluster harbouring an IMBH?
NASA Astrophysics Data System (ADS)
Askar, Abbas; Bianchini, Paolo; de Vita, Ruggero; Giersz, Mirek; Hypki, Arkadiusz; Kamann, Sebastian
2017-01-01
We describe the dynamical evolution of a unique type of dark star cluster model in which the majority of the cluster mass at Hubble time is dominated by an intermediate-mass black hole (IMBH). We analysed results from about 2000 star cluster models (Survey Database I) simulated using the Monte Carlo code MOnte Carlo Cluster simulAtor and identified these dark star cluster models. Taking one of these models, we apply the method of simulating realistic `mock observations' by utilizing the Cluster simulatiOn Comparison with ObservAtions (COCOA) and Simulating Stellar Cluster Observation (SISCO) codes to obtain the photometric and kinematic observational properties of the dark star cluster model at 12 Gyr. We find that the perplexing Galactic globular cluster NGC 6535 closely matches the observational photometric and kinematic properties of the dark star cluster model presented in this paper. Based on our analysis and currently observed properties of NGC 6535, we suggest that this globular cluster could potentially harbour an IMBH. If it exists, the presence of this IMBH can be detected robustly with proposed kinematic observations of NGC 6535.
Survey data and metadata modelling using document-oriented NoSQL
NASA Astrophysics Data System (ADS)
Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.
2018-03-01
Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
Fourches, Denis; Muratov, Eugene; Tropsha, Alexander
2010-01-01
Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all scientists working in the fields of molecular modeling, cheminformatics, and QSAR studies. PMID:20572635
The landslide database for Germany: Closing the gap at national level
NASA Astrophysics Data System (ADS)
Damm, Bodo; Klose, Martin
2015-11-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still has a long research history in Germany, but one focussed on the development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present paper reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to the 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this paper, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of three case studies in the German Central Uplands. The case study results exemplify database application in the analysis of landslide frequency and causes, impact statistics, and landslide susceptibility modeling. Using the example of these case studies, strengths and weaknesses of the database are discussed in detail. The paper concludes with a summary of the database project with regard to previous achievements and the strategic roadmap.
Commercial Building Energy Saver, API
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Tianzhen; Piette, Mary; Lee, Sang Hoon
2015-08-27
The CBES API provides Application Programming Interface to a suite of functions to improve energy efficiency of buildings, including building energy benchmarking, preliminary retrofit analysis using a pre-simulation database DEEP, and detailed retrofit analysis using energy modeling with the EnergyPlus simulation engine. The CBES API is used to power the LBNL CBES Web App. It can be adopted by third party developers and vendors into their software tools and platforms.
Allometric scaling theory applied to FIA biomass estimation
David C. Chojnacky
2002-01-01
Tree biomass estimates in the Forest Inventory and Analysis (FIA) database are derived from numerous methodologies whose abundance and complexity raise questions about consistent results throughout the U.S. A new model based on allometric scaling theory ("WBE") offers simplified methodology and a theoretically sound basis for improving the reliability and...
Parental Influences on Adolescent Adjustment: Parenting Styles Versus Parenting Practices
ERIC Educational Resources Information Center
Lee, Sang Min; Daniels, M. Harry; Kissinger, Daniel B.
2006-01-01
The study identified distinct patterns of parental practices that differentially influence adolescent behavior using the National Educational Longitudinal Survey (NELS:88) database. Following Brenner and Fox's research model (1999), the cluster analysis was used to classify the four types of parental practices. The clusters of parenting practices…
DOT National Transportation Integrated Search
1999-12-01
This paper analyzes the freight demand characteristics that drive modal choice by means of a large scale, national, disaggregate revealed preference database for shippers in France in 1988, using a nested logit. Particular attention is given to priva...
National Operational Hydrologic Remote Sensing Center - The ultimate source
Analysis Satellite Obs Forecasts Data Archive SHEF Products Observations near City, ST Go Science Database Airborne Snow Surveys Satellite Snow Cover Mapping Snow Modeling and Data Assimilation Analyses polar-orbiting and geostationary satellite imagery. Maps are provided for the U.S. and the northern
Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.
Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal
2002-12-15
We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.
Global and regional ecosystem modeling: comparison of model outputs and field measurements
NASA Astrophysics Data System (ADS)
Olson, R. J.; Hibbard, K.
2003-04-01
The Ecosystem Model-Data Intercomparison (EMDI) Workshops provide a venue for global ecosystem modeling groups to compare model outputs against measurements of net primary productivity (NPP). The objective of EMDI Workshops is to evaluate model performance relative to observations in order to improve confidence in global model projections terrestrial carbon cycling. The questions addressed by EMDI include: How does the simulated NPP compare with the field data across biome and environmental gradients? How sensitive are models to site-specific climate? Does additional mechanistic detail in models result in a better match with field measurements? How useful are the measures of NPP for evaluating model predictions? How well do models represent regional patterns of NPP? Initial EMDI results showed general agreement between model predictions and field measurements but with obvious differences that indicated areas for potential data and model improvement. The effort was built on the development and compilation of complete and consistent databases for model initialization and comparison. Database development improves the data as well as models; however, there is a need to incorporate additional observations and model outputs (LAI, hydrology, etc.) for comprehensive analyses of biogeochemical processes and their relationships to ecosystem structure and function. EMDI initialization and NPP data sets are available from the Oak Ridge National Laboratory Distributed Active Archive Center http://www.daac.ornl.gov/. Acknowledgements: This work was partially supported by the International Geosphere-Biosphere Programme - Data and Information System (IGBP-DIS); the IGBP-Global Analysis, Interpretation and Modelling Task Force (GAIM); the National Center for Ecological Analysis and Synthesis (NCEAS); and the National Aeronautics and Space Administration (NASA) Terrestrial Ecosystem Program. Oak Ridge National Laboratory is managed by UT-Battelle LLC for the U.S. Department of Energy under contract DE-AC05-00OR22725
MIPS: a database for genomes and protein sequences
Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.
Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B
2002-01-01
The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
Security of statistical data bases: invasion of privacy through attribute correlational modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palley, M.A.
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
Seismic Search Engine: A distributed database for mining large scale seismic data
NASA Astrophysics Data System (ADS)
Liu, Y.; Vaidya, S.; Kuzma, H. A.
2009-12-01
The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.
Global Distribution of Outbreaks of Water-Associated Infectious Diseases
Yang, Kun; LeJeune, Jeffrey; Alsdorf, Doug; Lu, Bo; Shum, C. K.; Liang, Song
2012-01-01
Background Water plays an important role in the transmission of many infectious diseases, which pose a great burden on global public health. However, the global distribution of these water-associated infectious diseases and underlying factors remain largely unexplored. Methods and Findings Based on the Global Infectious Disease and Epidemiology Network (GIDEON), a global database including water-associated pathogens and diseases was developed. In this study, reported outbreak events associated with corresponding water-associated infectious diseases from 1991 to 2008 were extracted from the database. The location of each reported outbreak event was identified and geocoded into a GIS database. Also collected in the GIS database included geo-referenced socio-environmental information including population density (2000), annual accumulated temperature, surface water area, and average annual precipitation. Poisson models with Bayesian inference were developed to explore the association between these socio-environmental factors and distribution of the reported outbreak events. Based on model predictions a global relative risk map was generated. A total of 1,428 reported outbreak events were retrieved from the database. The analysis suggested that outbreaks of water-associated diseases are significantly correlated with socio-environmental factors. Population density is a significant risk factor for all categories of reported outbreaks of water-associated diseases; water-related diseases (e.g., vector-borne diseases) are associated with accumulated temperature; water-washed diseases (e.g., conjunctivitis) are inversely related to surface water area; both water-borne and water-related diseases are inversely related to average annual rainfall. Based on the model predictions, “hotspots” of risks for all categories of water-associated diseases were explored. Conclusions At the global scale, water-associated infectious diseases are significantly correlated with socio-environmental factors, impacting all regions which are affected disproportionately by different categories of water-associated infectious diseases. PMID:22348158
PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.
Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X
2016-01-01
PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.
Automated 3D Phenotype Analysis Using Data Mining
Plyusnin, Ilya; Evans, Alistair R.; Karme, Aleksis; Gionis, Aristides; Jernvall, Jukka
2008-01-01
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases. PMID:18320060
1989-03-01
KOWLEDGE INFERENCE IMAGE DAAAEENGINE DATABASE Automated Photointerpretation Testbed. 4.1.7 Fig. .1.1-2 An Initial Segmentation of an Image / zx...MRF) theory provide a powerful alternative texture model and have resulted in intensive research activity in MRF model- based texture analysis...interpretation process. 5. Additional, and perhaps more powerful , features have to be incorporated into the image segmentation procedure. 6. Object detection
VKCDB: voltage-gated K+ channel database updated and upgraded.
Gallin, Warren J; Boutet, Patrick A
2011-01-01
The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
bpRNA: large-scale automated annotation and analysis of RNA secondary structure.
Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David
2018-05-09
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Neurotree: a collaborative, graphical database of the academic genealogy of neuroscience.
David, Stephen V; Hayden, Benjamin Y
2012-01-01
Neurotree is an online database that documents the lineage of academic mentorship in neuroscience. Modeled on the tree format typically used to describe biological genealogies, the Neurotree web site provides a concise summary of the intellectual history of neuroscience and relationships between individuals in the current neuroscience community. The contents of the database are entirely crowd-sourced: any internet user can add information about researchers and the connections between them. As of July 2012, Neurotree has collected information from 10,000 users about 35,000 researchers and 50,000 mentor relationships, and continues to grow. The present report serves to highlight the utility of Neurotree as a resource for academic research and to summarize some basic analysis of its data. The tree structure of the database permits a variety of graphical analyses. We find that the connectivity and graphical distance between researchers entered into Neurotree early has stabilized and thus appears to be mostly complete. The connectivity of more recent entries continues to mature. A ranking of researcher fecundity based on their mentorship reveals a sustained period of influential researchers from 1850-1950, with the most influential individuals active at the later end of that period. Finally, a clustering analysis reveals that some subfields of neuroscience are reflected in tightly interconnected mentor-trainee groups.
Neurotree: A Collaborative, Graphical Database of the Academic Genealogy of Neuroscience
David, Stephen V.; Hayden, Benjamin Y.
2012-01-01
Neurotree is an online database that documents the lineage of academic mentorship in neuroscience. Modeled on the tree format typically used to describe biological genealogies, the Neurotree web site provides a concise summary of the intellectual history of neuroscience and relationships between individuals in the current neuroscience community. The contents of the database are entirely crowd-sourced: any internet user can add information about researchers and the connections between them. As of July 2012, Neurotree has collected information from 10,000 users about 35,000 researchers and 50,000 mentor relationships, and continues to grow. The present report serves to highlight the utility of Neurotree as a resource for academic research and to summarize some basic analysis of its data. The tree structure of the database permits a variety of graphical analyses. We find that the connectivity and graphical distance between researchers entered into Neurotree early has stabilized and thus appears to be mostly complete. The connectivity of more recent entries continues to mature. A ranking of researcher fecundity based on their mentorship reveals a sustained period of influential researchers from 1850–1950, with the most influential individuals active at the later end of that period. Finally, a clustering analysis reveals that some subfields of neuroscience are reflected in tightly interconnected mentor-trainee groups. PMID:23071595
HEMORHEOLOGY INDEX CHANGES IN A RAT ACUTE BLOOD STASIS MODEL: A SYSTEMATIC REVIEW AND META-ANALYSIS
Zhang, Jun-Xiu; Feng, Yu; Zhang, Yin; Liu, Yi; Li, Shao-Dan; Yang, Ming-Hui
2017-01-01
Background: Blood stasis has received increasing attention in research related to traditional Chinese medicine (TCM) and integrative Chinese and Western medicine. More than 90% of research studies use hemorheology indexes to evaluate the establishment of animal blood stasis models rather than pathological methods, as hemorheology index evaluations of blood stasis were short of the consolidated standard. The aim of this study was to evaluate the accuracy of hemorheology indexes in rat models of acute blood stasis (ABS) based on studies in which the ABS model had been confirmed by pathological methods. Materials and Methods: We searched the Chinese National Knowledge Infrastructure database (CNKI), Chinese Medical Journal Database (CMJD), Chinese Biology Medicine disc (CBM), Wanfang database, and PubMed for studies of rat blood stasis models; the search identified 18 studies of rat ABS models induced by subcutaneous injection of epinephrine combined with an ice bath. Each included study received a modified Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies (CAMARADES) score list and methodological quality assessment, then data related to whole blood viscosity, plasma viscosity, platelet aggregation rate, erythrocyte aggregation index, and fibrinogen concentration were extracted. Extracted data were analyzed using Revman 5.3; heterogeneity was tested using Egger’s test. Results: A total of 343 studies of rat blood stasis were reviewed. Eighteen studies were included in this meta-analysis; the mean CAMARADES score was 3.5. The rat ABS model revealed a significant increase in whole blood viscosity (medium shear rate), whole blood viscosity (high shear rate), plasma viscosity, platelet aggregation rate, erythrocyte aggregation index, and fibrinogen concentration compared to controls, with weighted mean differences (WMD) of 2.42 mPa/s (95% confidence interval (CI) = 1.73 - 3.10); 1.76 mPa/s (95% CI = 1.28 - 2.24); 0.39 mPa/s (95% CI = 0.24 - 0.55); 13.66% (95% CI = 9.78 - 17.55); 0.84 (95% CI = 0.53 - 1.16); and 1.22 g/L (95% CI = 0.76 - 1.67), respectively. Subgroup analysis showed that whole blood viscosity, plasma viscosity, and the platelet aggregation rate test methods were more sensitive when measured at 0-24 h than at 24-72 h after induction of blood stasis. Conclusions: Rat blood stasis studies have incomplete experimental design and quality controls, and thus need an integrated improvement. Meta-analysis of included studies indicated that the unified hemorheology index of whole blood viscosity (medium and high shear rate), platelet aggregation rate, erythrocyte aggregation rate, and fibrinogen concentration might be used for assessment of rat ABS models independent of pathology methods. PMID:28638872
Chen, Can; Wang, Ting; Wu, Fengbo; Huang, Wei; He, Gu; Ouyang, Liang; Xiang, Mingli; Peng, Cheng; Jiang, Qinglin
2014-01-01
Compared with normal differentiated cells, cancer cells upregulate the expression of pyruvate kinase isozyme M2 (PKM2) to support glycolytic intermediates for anabolic processes, including the synthesis of nucleic acids, amino acids, and lipids. In this study, a combination of the structure-based pharmacophore modeling and a hybrid protocol of virtual screening methods comprised of pharmacophore model-based virtual screening, docking-based virtual screening, and in silico ADMET (absorption, distribution, metabolism, excretion and toxicity) analysis were used to retrieve novel PKM2 activators from commercially available chemical databases. Tetrahydroquinoline derivatives were identified as potential scaffolds of PKM2 activators. Thus, the hybrid virtual screening approach was applied to screen the focused tetrahydroquinoline derivatives embedded in the ZINC database. Six hit compounds were selected from the final hits and experimental studies were then performed. Compound 8 displayed a potent inhibitory effect on human lung cancer cells. Following treatment with Compound 8, cell viability, apoptosis, and reactive oxygen species (ROS) production were examined in A549 cells. Finally, we evaluated the effects of Compound 8 on mice xenograft tumor models in vivo. These results may provide important information for further research on novel PKM2 activators as antitumor agents. PMID:25214764