Sample records for large distributed databases

  1. Design and implementation of a distributed large-scale spatial database system based on J2EE

    NASA Astrophysics Data System (ADS)

    Gong, Jianya; Chen, Nengcheng; Zhu, Xinyan; Zhang, Xia

    2003-03-01

    With the increasing maturity of distributed object technology, CORBA, .NET and EJB are universally used in traditional IT field. However, theories and practices of distributed spatial database need farther improvement in virtue of contradictions between large scale spatial data and limited network bandwidth or between transitory session and long transaction processing. Differences and trends among of CORBA, .NET and EJB are discussed in details, afterwards the concept, architecture and characteristic of distributed large-scale seamless spatial database system based on J2EE is provided, which contains GIS client application, web server, GIS application server and spatial data server. Moreover the design and implementation of components of GIS client application based on JavaBeans, the GIS engine based on servlet, the GIS Application server based on GIS enterprise JavaBeans(contains session bean and entity bean) are explained.Besides, the experiments of relation of spatial data and response time under different conditions are conducted, which proves that distributed spatial database system based on J2EE can be used to manage, distribute and share large scale spatial data on Internet. Lastly, a distributed large-scale seamless image database based on Internet is presented.

  2. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    NASA Astrophysics Data System (ADS)

    Dykstra, Dave

    2012-12-01

    One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.

  3. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, Dave

    One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less

  4. DataHub knowledge based assistance for science visualization and analysis using large distributed databases

    NASA Technical Reports Server (NTRS)

    Handley, Thomas H., Jr.; Collins, Donald J.; Doyle, Richard J.; Jacobson, Allan S.

    1991-01-01

    Viewgraphs on DataHub knowledge based assistance for science visualization and analysis using large distributed databases. Topics covered include: DataHub functional architecture; data representation; logical access methods; preliminary software architecture; LinkWinds; data knowledge issues; expert systems; and data management.

  5. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1992-01-01

    One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.

  6. Compressing DNA sequence databases with coil.

    PubMed

    White, W Timothy J; Hendy, Michael D

    2008-05-20

    Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.

  7. Compressing DNA sequence databases with coil

    PubMed Central

    White, W Timothy J; Hendy, Michael D

    2008-01-01

    Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794

  8. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1993-01-01

    One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.

  9. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

    PubMed Central

    2012-01-01

    Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909

  10. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.

    PubMed

    Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John

    2012-12-05

    For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.

  11. Design considerations, architecture, and use of the Mini-Sentinel distributed data system.

    PubMed

    Curtis, Lesley H; Weiner, Mark G; Boudreau, Denise M; Cooper, William O; Daniel, Gregory W; Nair, Vinit P; Raebel, Marsha A; Beaulieu, Nicolas U; Rosofsky, Robert; Woodworth, Tiffany S; Brown, Jeffrey S

    2012-01-01

    We describe the design, implementation, and use of a large, multiorganizational distributed database developed to support the Mini-Sentinel Pilot Program of the US Food and Drug Administration (FDA). As envisioned by the US FDA, this implementation will inform and facilitate the development of an active surveillance system for monitoring the safety of medical products (drugs, biologics, and devices) in the USA. A common data model was designed to address the priorities of the Mini-Sentinel Pilot and to leverage the experience and data of participating organizations and data partners. A review of existing common data models informed the process. Each participating organization designed a process to extract, transform, and load its source data, applying the common data model to create the Mini-Sentinel Distributed Database. Transformed data were characterized and evaluated using a series of programs developed centrally and executed locally by participating organizations. A secure communications portal was designed to facilitate queries of the Mini-Sentinel Distributed Database and transfer of confidential data, analytic tools were developed to facilitate rapid response to common questions, and distributed querying software was implemented to facilitate rapid querying of summary data. As of July 2011, information on 99,260,976 health plan members was included in the Mini-Sentinel Distributed Database. The database includes 316,009,067 person-years of observation time, with members contributing, on average, 27.0 months of observation time. All data partners have successfully executed distributed code and returned findings to the Mini-Sentinel Operations Center. This work demonstrates the feasibility of building a large, multiorganizational distributed data system in which organizations retain possession of their data that are used in an active surveillance system. Copyright © 2012 John Wiley & Sons, Ltd.

  12. Recent advances on terrain database correlation testing

    NASA Astrophysics Data System (ADS)

    Sakude, Milton T.; Schiavone, Guy A.; Morelos-Borja, Hector; Martin, Glenn; Cortes, Art

    1998-08-01

    Terrain database correlation is a major requirement for interoperability in distributed simulation. There are numerous situations in which terrain database correlation problems can occur that, in turn, lead to lack of interoperability in distributed training simulations. Examples are the use of different run-time terrain databases derived from inconsistent on source data, the use of different resolutions, and the use of different data models between databases for both terrain and culture data. IST has been developing a suite of software tools, named ZCAP, to address terrain database interoperability issues. In this paper we discuss recent enhancements made to this suite, including improved algorithms for sampling and calculating line-of-sight, an improved method for measuring terrain roughness, and the application of a sparse matrix method to the terrain remediation solution developed at the Visual Systems Lab of the Institute for Simulation and Training. We review the application of some of these new algorithms to the terrain correlation measurement processes. The application of these new algorithms improves our support for very large terrain databases, and provides the capability for performing test replications to estimate the sampling error of the tests. With this set of tools, a user can quantitatively assess the degree of correlation between large terrain databases.

  13. LSD: Large Survey Database framework

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2012-09-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.

  14. FishTraits Database

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2009-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.

  15. Mass measurement errors of Fourier-transform mass spectrometry (FTMS): distribution, recalibration, and application.

    PubMed

    Zhang, Jiyang; Ma, Jie; Dou, Lei; Wu, Songfeng; Qian, Xiaohong; Xie, Hongwei; Zhu, Yunping; He, Fuchu

    2009-02-01

    The hybrid linear trap quadrupole Fourier-transform (LTQ-FT) ion cyclotron resonance mass spectrometer, an instrument with high accuracy and resolution, is widely used in the identification and quantification of peptides and proteins. However, time-dependent errors in the system may lead to deterioration of the accuracy of these instruments, negatively influencing the determination of the mass error tolerance (MET) in database searches. Here, a comprehensive discussion of LTQ/FT precursor ion mass error is provided. On the basis of an investigation of the mass error distribution, we propose an improved recalibration formula and introduce a new tool, FTDR (Fourier-transform data recalibration), that employs a graphic user interface (GUI) for automatic calibration. It was found that the calibration could adjust the mass error distribution to more closely approximate a normal distribution and reduce the standard deviation (SD). Consequently, we present a new strategy, LDSF (Large MET database search and small MET filtration), for database search MET specification and validation of database search results. As the name implies, a large-MET database search is conducted and the search results are then filtered using the statistical MET estimated from high-confidence results. By applying this strategy to a standard protein data set and a complex data set, we demonstrate the LDSF can significantly improve the sensitivity of the result validation procedure.

  16. Design and Implementation of an Environmental Mercury Database for Northeastern North America

    NASA Astrophysics Data System (ADS)

    Clair, T. A.; Evers, D.; Smith, T.; Goodale, W.; Bernier, M.

    2002-12-01

    An important issue faced when attempting to interpret geochemical variability studies across large regions, is the accumulation, access and consistent display of data from a large number of sources. We were given the opportunity to provide a regional assessment of mercury distribution in surface waters, sediments, invertebrates, fish, and birds in a region extending from New York State to the Island of Newfoundland. We received over 20 individual databases from State, Provincial, and Federal governments, as well as university researchers from both Canada and the United States. These databases came in a variety of formats and sizes. Our challenge was to find a way of accumulating and presenting the large amounts of acquired data, in a consistent, easily accessible fashion, which could then be more easily interpreted. Moreover, the database had to be portable and easily distributable to the large number of study participants. We developed a static database structure using a web-based approach which we were then able to mount on a server which was accessible to all project participants. The site also contained all the necessary documentation related to the data, its acquisition, as well as the methods used in its analysis and interpretation. We then copied the complete web site on CDROM's which we then distributed to all project participants, funding agencies, and other interested parties. The CDROM formed a permanent record of the project and was issued ISSN and ISBN numbers so that the information remained accessible to researchers in perpetuity. Here we present an overview of the CDROM and data structures, of the information accumulated over the first year of the study, and initial interpretation of the results.

  17. Benchmarking distributed data warehouse solutions for storing genomic variant information

    PubMed Central

    Wiewiórka, Marek S.; Wysakowicz, Dawid P.; Okoniewski, Michał J.

    2017-01-01

    Abstract Genomic-based personalized medicine encompasses storing, analysing and interpreting genomic variants as its central issues. At a time when thousands of patientss sequenced exomes and genomes are becoming available, there is a growing need for efficient database storage and querying. The answer could be the application of modern distributed storage systems and query engines. However, the application of large genomic variant databases to this problem has not been sufficiently far explored so far in the literature. To investigate the effectiveness of modern columnar storage [column-oriented Database Management System (DBMS)] and query engines, we have developed a prototypic genomic variant data warehouse, populated with large generated content of genomic variants and phenotypic data. Next, we have benchmarked performance of a number of combinations of distributed storages and query engines on a set of SQL queries that address biological questions essential for both research and medical applications. In addition, a non-distributed, analytical database (MonetDB) has been used as a baseline. Comparison of query execution times confirms that distributed data warehousing solutions outperform classic relational DBMSs. Moreover, pre-aggregation and further denormalization of data, which reduce the number of distributed join operations, significantly improve query performance by several orders of magnitude. Most of distributed back-ends offer a good performance for complex analytical queries, while the Optimized Row Columnar (ORC) format paired with Presto and Parquet with Spark 2 query engines provide, on average, the lowest execution times. Apache Kudu on the other hand, is the only solution that guarantees a sub-second performance for simple genome range queries returning a small subset of data, where low-latency response is expected, while still offering decent performance for running analytical queries. In summary, research and clinical applications that require the storage and analysis of variants from thousands of samples can benefit from the scalability and performance of distributed data warehouse solutions. Database URL: https://github.com/ZSI-Bio/variantsdwh PMID:29220442

  18. A Data Analysis Expert System For Large Established Distributed Databases

    NASA Astrophysics Data System (ADS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-05-01

    The purpose of this work is to analyze the applicability of artificial intelligence techniques for developing a user-friendly, parallel interface to large isolated, incompatible NASA databases for the purpose of assisting the management decision process. To carry out this work, a survey was conducted to establish the data access requirements of several key NASA user groups. In addition, current NASA database access methods were evaluated. The results of this work are presented in the form of a design for a natural language database interface system, called the Deductively Augmented NASA Management Decision Support System (DANMDS). This design is feasible principally because of recently announced commercial hardware and software product developments which allow cross-vendor compatibility. The goal of the DANMDS system is commensurate with the central dilemma confronting most large companies and institutions in America, the retrieval of information from large, established, incompatible database systems. The DANMDS system implementation would represent a significant first step toward this problem's resolution.

  19. Storage and distribution of pathology digital images using integrated web-based viewing systems.

    PubMed

    Marchevsky, Alberto M; Dulbandzhyan, Ronda; Seely, Kevin; Carey, Steve; Duncan, Raymond G

    2002-05-01

    Health care providers have expressed increasing interest in incorporating digital images of gross pathology specimens and photomicrographs in routine pathology reports. To describe the multiple technical and logistical challenges involved in the integration of the various components needed for the development of a system for integrated Web-based viewing, storage, and distribution of digital images in a large health system. An Oracle version 8.1.6 database was developed to store, index, and deploy pathology digital photographs via our Intranet. The database allows for retrieval of images by patient demographics or by SNOMED code information. The Intranet of a large health system accessible from multiple computers located within the medical center and at distant private physician offices. The images can be viewed using any of the workstations of the health system that have authorized access to our Intranet, using a standard browser or a browser configured with an external viewer or inexpensive plug-in software, such as Prizm 2.0. The images can be printed on paper or transferred to film using a digital film recorder. Digital images can also be displayed at pathology conferences by using wireless local area network (LAN) and secure remote technologies. The standardization of technologies and the adoption of a Web interface for all our computer systems allows us to distribute digital images from a pathology database to a potentially large group of users distributed in multiple locations throughout a large medical center.

  20. Very large database of lipids: rationale and design.

    PubMed

    Martin, Seth S; Blaha, Michael J; Toth, Peter P; Joshi, Parag H; McEvoy, John W; Ahmed, Haitham M; Elshazly, Mohamed B; Swiger, Kristopher J; Michos, Erin D; Kwiterovich, Peter O; Kulkarni, Krishnaji R; Chimera, Joseph; Cannon, Christopher P; Blumenthal, Roger S; Jones, Steven R

    2013-11-01

    Blood lipids have major cardiovascular and public health implications. Lipid-lowering drugs are prescribed based in part on categorization of patients into normal or abnormal lipid metabolism, yet relatively little emphasis has been placed on: (1) the accuracy of current lipid measures used in clinical practice, (2) the reliability of current categorizations of dyslipidemia states, and (3) the relationship of advanced lipid characterization to other cardiovascular disease biomarkers. To these ends, we developed the Very Large Database of Lipids (NCT01698489), an ongoing database protocol that harnesses deidentified data from the daily operations of a commercial lipid laboratory. The database includes individuals who were referred for clinical purposes for a Vertical Auto Profile (Atherotech Inc., Birmingham, AL), which directly measures cholesterol concentrations of low-density lipoprotein, very low-density lipoprotein, intermediate-density lipoprotein, high-density lipoprotein, their subclasses, and lipoprotein(a). Individual Very Large Database of Lipids studies, ranging from studies of measurement accuracy, to dyslipidemia categorization, to biomarker associations, to characterization of rare lipid disorders, are investigator-initiated and utilize peer-reviewed statistical analysis plans to address a priori hypotheses/aims. In the first database harvest (Very Large Database of Lipids 1.0) from 2009 to 2011, there were 1 340 614 adult and 10 294 pediatric patients; the adult sample had a median age of 59 years (interquartile range, 49-70 years) with even representation by sex. Lipid distributions closely matched those from the population-representative National Health and Nutrition Examination Survey. The second harvest of the database (Very Large Database of Lipids 2.0) is underway. Overall, the Very Large Database of Lipids database provides an opportunity for collaboration and new knowledge generation through careful examination of granular lipid data on a large scale. © 2013 Wiley Periodicals, Inc.

  1. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  2. Using Large Diabetes Databases for Research.

    PubMed

    Wild, Sarah; Fischbacher, Colin; McKnight, John

    2016-09-01

    There are an increasing number of clinical, administrative and trial databases that can be used for research. These are particularly valuable if there are opportunities for linkage to other databases. This paper describes examples of the use of large diabetes databases for research. It reviews the advantages and disadvantages of using large diabetes databases for research and suggests solutions for some challenges. Large, high-quality databases offer potential sources of information for research at relatively low cost. Fundamental issues for using databases for research are the completeness of capture of cases within the population and time period of interest and accuracy of the diagnosis of diabetes and outcomes of interest. The extent to which people included in the database are representative should be considered if the database is not population based and there is the intention to extrapolate findings to the wider diabetes population. Information on key variables such as date of diagnosis or duration of diabetes may not be available at all, may be inaccurate or may contain a large amount of missing data. Information on key confounding factors is rarely available for the nondiabetic or general population limiting comparisons with the population of people with diabetes. However comparisons that allow for differences in distribution of important demographic factors may be feasible using data for the whole population or a matched cohort study design. In summary, diabetes databases can be used to address important research questions. Understanding the strengths and limitations of this approach is crucial to interpret the findings appropriately. © 2016 Diabetes Technology Society.

  3. The BioMart community portal: an innovative alternative to large, centralized data repositories

    USDA-ARS?s Scientific Manuscript database

    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biologi...

  4. ARACHNID: A prototype object-oriented database tool for distributed systems

    NASA Technical Reports Server (NTRS)

    Younger, Herbert; Oreilly, John; Frogner, Bjorn

    1994-01-01

    This paper discusses the results of a Phase 2 SBIR project sponsored by NASA and performed by MIMD Systems, Inc. A major objective of this project was to develop specific concepts for improved performance in accessing large databases. An object-oriented and distributed approach was used for the general design, while a geographical decomposition was used as a specific solution. The resulting software framework is called ARACHNID. The Faint Source Catalog developed by NASA was the initial database testbed. This is a database of many giga-bytes, where an order of magnitude improvement in query speed is being sought. This database contains faint infrared point sources obtained from telescope measurements of the sky. A geographical decomposition of this database is an attractive approach to dividing it into pieces. Each piece can then be searched on individual processors with only a weak data linkage between the processors being required. As a further demonstration of the concepts implemented in ARACHNID, a tourist information system is discussed. This version of ARACHNID is the commercial result of the project. It is a distributed, networked, database application where speed, maintenance, and reliability are important considerations. This paper focuses on the design concepts and technologies that form the basis for ARACHNID.

  5. Verification of the databases EXFOR and ENDF

    NASA Astrophysics Data System (ADS)

    Berton, Gottfried; Damart, Guillaume; Cabellos, Oscar; Beauzamy, Bernard; Soppera, Nicolas; Bossant, Manuel

    2017-09-01

    The objective of this work is for the verification of large experimental (EXFOR) and evaluated nuclear reaction databases (JEFF, ENDF, JENDL, TENDL…). The work is applied to neutron reactions in EXFOR data, including threshold reactions, isomeric transitions, angular distributions and data in the resonance region of both isotopes and natural elements. Finally, a comparison of the resonance integrals compiled in EXFOR database with those derived from the evaluated libraries is also performed.

  6. Surviving the Glut: The Management of Event Streams in Cyberphysical Systems

    NASA Astrophysics Data System (ADS)

    Buchmann, Alejandro

    Alejandro Buchmann is Professor in the Department of Computer Science, Technische Universität Darmstadt, where he heads the Databases and Distributed Systems Group. He received his MS (1977) and PhD (1980) from the University of Texas at Austin. He was an Assistant/Associate Professor at the Institute for Applied Mathematics and Systems IIMAS/UNAM in Mexico, doing research on databases for CAD, geographic information systems, and objectoriented databases. At Computer Corporation of America (later Xerox Advanced Information Systems) in Cambridge, Mass., he worked in the areas of active databases and real-time databases, and at GTE Laboratories, Waltham, in the areas of distributed object systems and the integration of heterogeneous legacy systems. 1991 he returned to academia and joined T.U. Darmstadt. His current research interests are at the intersection of middleware, databases, eventbased distributed systems, ubiquitous computing, and very large distributed systems (P2P, WSN). Much of the current research is concerned with guaranteeing quality of service and reliability properties in these systems, for example, scalability, performance, transactional behaviour, consistency, and end-to-end security. Many research projects imply collaboration with industry and cover a broad spectrum of application domains. Further information can be found at http://www.dvs.tu-darmstadt.de

  7. Distribution of late Pleistocene ice-rich syngenetic permafrost of the Yedoma Suite in east and central Siberia, Russia

    USGS Publications Warehouse

    Grosse, Guido; Robinson, Joel E.; Bryant, Robin; Taylor, Maxwell D.; Harper, William; DeMasi, Amy; Kyker-Snowman, Emily; Veremeeva, Alexandra; Schirrmeister, Lutz; Harden, Jennifer

    2013-01-01

    This digital database is the product of collaboration between the U.S. Geological Survey, the Geophysical Institute at the University of Alaska, Fairbanks; the Los Altos Hills Foothill College GeoSpatial Technology Certificate Program; the Alfred Wegener Institute for Polar and Marine Research, Potsdam, Germany; and the Institute of Physical Chemical and Biological Problems in Soil Science of the Russian Academy of Sciences. The primary goal for creating this digital database is to enhance current estimates of soil organic carbon stored in deep permafrost, in particular the late Pleistocene syngenetic ice-rich permafrost deposits of the Yedoma Suite. Previous studies estimated that Yedoma deposits cover about 1 million square kilometers of a large region in central and eastern Siberia, but these estimates generally are based on maps with scales smaller than 1:10,000,000. Taking into account this large area, it was estimated that Yedoma may store as much as 500 petagrams of soil organic carbon, a large part of which is vulnerable to thaw and mobilization from thermokarst and erosion. To refine assessments of the spatial distribution of Yedoma deposits, we digitized 11 Russian Quaternary geologic maps. Our study focused on extracting geologic units interpreted by us as late Pleistocene ice-rich syngenetic Yedoma deposits based on lithology, ground ice conditions, stratigraphy, and geomorphological and spatial association. These Yedoma units then were merged into a single data layer across map tiles. The spatial database provides a useful update of the spatial distribution of this deposit for an approximately 2.32 million square kilometers land area in Siberia that will (1) serve as a core database for future refinements of Yedoma distribution in additional regions, and (2) provide a starting point to revise the size of deep but thaw-vulnerable permafrost carbon pools in the Arctic based on surface geology and the distribution of cryolithofacies types at high spatial resolution. However, we recognize that the extent of Yedoma deposits presented in this database is not complete for a global assessment, because Yedoma deposits also occur in the Taymyr lowlands and Chukotka, and in parts of Alaska and northwestern Canada.

  8. BioMart: a data federation framework for large collaborative projects.

    PubMed

    Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek

    2011-01-01

    BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.

  9. Spatial distribution of GRBs and large scale structure of the Universe

    NASA Astrophysics Data System (ADS)

    Bagoly, Zsolt; Rácz, István I.; Balázs, Lajos G.; Tóth, L. Viktor; Horváth, István

    We studied the space distribution of the starburst galaxies from Millennium XXL database at z = 0.82. We examined the starburst distribution in the classical Millennium I (De Lucia et al. (2006)) using a semi-analytical model for the genesis of the galaxies. We simulated a starburst galaxies sample with Markov Chain Monte Carlo method. The connection between the large scale structures homogenous and starburst groups distribution (Kofman and Shandarin 1998), Suhhonenko et al. (2011), Liivamägi et al. (2012), Park et al. (2012), Horvath et al. (2014), Horvath et al. (2015)) on a defined scale were checked too.

  10. Fault-tolerant symmetrically-private information retrieval

    NASA Astrophysics Data System (ADS)

    Wang, Tian-Yin; Cai, Xiao-Qiu; Zhang, Rui-Ling

    2016-08-01

    We propose two symmetrically-private information retrieval protocols based on quantum key distribution, which provide a good degree of database and user privacy while being flexible, loss-resistant and easily generalized to a large database similar to the precedent works. Furthermore, one protocol is robust to a collective-dephasing noise, and the other is robust to a collective-rotation noise.

  11. Extending GIS Technology to Study Karst Features of Southeastern Minnesota

    NASA Astrophysics Data System (ADS)

    Gao, Y.; Tipping, R. G.; Alexander, E. C.; Alexander, S. C.

    2001-12-01

    This paper summarizes ongoing research on karst feature distribution of southeastern Minnesota. The main goals of this interdisciplinary research are: 1) to look for large-scale patterns in the rate and distribution of sinkhole development; 2) to conduct statistical tests of hypotheses about the formation of sinkholes; 3) to create management tools for land-use managers and planners; and 4) to deliver geomorphic and hydrogeologic criteria for making scientifically valid land-use policies and ethical decisions in karst areas of southeastern Minnesota. Existing county and sub-county karst feature datasets of southeastern Minnesota have been assembled into a large GIS-based database capable of analyzing the entire data set. The central database management system (DBMS) is a relational GIS-based system interacting with three modules: GIS, statistical and hydrogeologic modules. ArcInfo and ArcView were used to generate a series of 2D and 3D maps depicting karst feature distributions in southeastern Minnesota. IRIS ExplorerTM was used to produce satisfying 3D maps and animations using data exported from GIS-based database. Nearest-neighbor analysis has been used to test sinkhole distributions in different topographic and geologic settings. All current nearest-neighbor analyses testify that sinkholes in southeastern Minnesota are not evenly distributed in this area (i.e., they tend to be clustered). More detailed statistical methods such as cluster analysis, histograms, probability estimation, correlation and regression have been used to study the spatial distributions of some mapped karst features of southeastern Minnesota. A sinkhole probability map for Goodhue County has been constructed based on sinkhole distribution, bedrock geology, depth to bedrock, GIS buffer analysis and nearest-neighbor analysis. A series of karst features for Winona County including sinkholes, springs, seeps, stream sinks and outcrop has been mapped and entered into the Karst Feature Database of Southeastern Minnesota. The Karst Feature Database of Winona County is being expanded to include all the mapped karst features of southeastern Minnesota. Air photos from 1930s to 1990s of Spring Valley Cavern Area in Fillmore County were scanned and geo-referenced into our GIS system. This technology has been proved to be very useful to identify sinkholes and study the rate of sinkhole development.

  12. A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L

    2011-01-01

    Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less

  13. [Privacy and public benefit in using large scale health databases].

    PubMed

    Yamamoto, Ryuichi

    2014-01-01

    In Japan, large scale heath databases were constructed in a few years, such as National Claim insurance and health checkup database (NDB) and Japanese Sentinel project. But there are some legal issues for making adequate balance between privacy and public benefit by using such databases. NDB is carried based on the act for elderly person's health care but in this act, nothing is mentioned for using this database for general public benefit. Therefore researchers who use this database are forced to pay much concern about anonymization and information security that may disturb the research work itself. Japanese Sentinel project is a national project to detecting drug adverse reaction using large scale distributed clinical databases of large hospitals. Although patients give the future consent for general such purpose for public good, it is still under discussion using insufficiently anonymized data. Generally speaking, researchers of study for public benefit will not infringe patient's privacy, but vague and complex requirements of legislation about personal data protection may disturb the researches. Medical science does not progress without using clinical information, therefore the adequate legislation that is simple and clear for both researchers and patients is strongly required. In Japan, the specific act for balancing privacy and public benefit is now under discussion. The author recommended the researchers including the field of pharmacology should pay attention to, participate in the discussion of, and make suggestion to such act or regulations.

  14. Hierarchical Data Distribution Scheme for Peer-to-Peer Networks

    NASA Astrophysics Data System (ADS)

    Bhushan, Shashi; Dave, M.; Patel, R. B.

    2010-11-01

    In the past few years, peer-to-peer (P2P) networks have become an extremely popular mechanism for large-scale content sharing. P2P systems have focused on specific application domains (e.g. music files, video files) or on providing file system like capabilities. P2P is a powerful paradigm, which provides a large-scale and cost-effective mechanism for data sharing. P2P system may be used for storing data globally. Can we implement a conventional database on P2P system? But successful implementation of conventional databases on the P2P systems is yet to be reported. In this paper we have presented the mathematical model for the replication of the partitions and presented a hierarchical based data distribution scheme for the P2P networks. We have also analyzed the resource utilization and throughput of the P2P system with respect to the availability, when a conventional database is implemented over the P2P system with variable query rate. Simulation results show that database partitions placed on the peers with higher availability factor perform better. Degradation index, throughput, resource utilization are the parameters evaluated with respect to the availability factor.

  15. Insertion algorithms for network model database management systems

    NASA Astrophysics Data System (ADS)

    Mamadolimov, Abdurashid; Khikmat, Saburov

    2017-12-01

    The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, forms partial order. When a database is large and a query comparison is expensive then the efficiency requirement of managing algorithms is minimizing the number of query comparisons. We consider updating operation for network model database management systems. We develop a new sequantial algorithm for updating operation. Also we suggest a distributed version of the algorithm.

  16. Very Large Scale Distributed Information Processing Systems

    DTIC Science & Technology

    1991-09-27

    USENIX Conference Proceedings, pp. 31-43. USENIX, February 1988. [KLA90] Michael L. Kazar, Bruce W. Leverett, Owen T. Anderson, Vasilis Apos- tolides, Beth...will be selected if cost is the curlcron Iorsleettin- IfFigure 2 R DistribUted Database lSgtam and its we combin the abolve two pit , n r-itcrr

  17. A blue carbon soil database: Tidal wetland stocks for the US National Greenhouse Gas Inventory

    NASA Astrophysics Data System (ADS)

    Feagin, R. A.; Eriksson, M.; Hinson, A.; Najjar, R. G.; Kroeger, K. D.; Herrmann, M.; Holmquist, J. R.; Windham-Myers, L.; MacDonald, G. M.; Brown, L. N.; Bianchi, T. S.

    2015-12-01

    Coastal wetlands contain large reservoirs of carbon, and in 2015 the US National Greenhouse Gas Inventory began the work of placing blue carbon within the national regulatory context. The potential value of a wetland carbon stock, in relation to its location, soon could be influential in determining governmental policy and management activities, or in stimulating market-based CO2 sequestration projects. To meet the national need for high-resolution maps, a blue carbon stock database was developed linking National Wetlands Inventory datasets with the USDA Soil Survey Geographic Database. Users of the database can identify the economic potential for carbon conservation or restoration projects within specific estuarine basins, states, wetland types, physical parameters, and land management activities. The database is geared towards both national-level assessments and local-level inquiries. Spatial analysis of the stocks show high variance within individual estuarine basins, largely dependent on geomorphic position on the landscape, though there are continental scale trends to the carbon distribution as well. Future plans including linking this database with a sedimentary accretion database to predict carbon flux in US tidal wetlands.

  18. Distributed data collection for a database of radiological image interpretations

    NASA Astrophysics Data System (ADS)

    Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.

    1997-01-01

    The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.

  19. An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning

    PubMed Central

    Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi

    2013-01-01

    The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase. PMID:23966197

  20. An improved algorithm to generate a Wi-Fi fingerprint database for indoor positioning.

    PubMed

    Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi

    2013-08-21

    The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.

  1. [Benefits of large healthcare databases for drug risk research].

    PubMed

    Garbe, Edeltraut; Pigeot, Iris

    2015-08-01

    Large electronic healthcare databases have become an important worldwide data resource for drug safety research after approval. Signal generation methods and drug safety studies based on these data facilitate the prospective monitoring of drug safety after approval, as has been recently required by EU law and the German Medicines Act. Despite its large size, a single healthcare database may include insufficient patients for the study of a very small number of drug-exposed patients or the investigation of very rare drug risks. For that reason, in the United States, efforts have been made to work on models that provide the linkage of data from different electronic healthcare databases for monitoring the safety of medicines after authorization in (i) the Sentinel Initiative and (ii) the Observational Medical Outcomes Partnership (OMOP). In July 2014, the pilot project Mini-Sentinel included a total of 178 million people from 18 different US databases. The merging of the data is based on a distributed data network with a common data model. In the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCEPP) there has been no comparable merging of data from different databases; however, first experiences have been gained in various EU drug safety projects. In Germany, the data of the statutory health insurance providers constitute the most important resource for establishing a large healthcare database. Their use for this purpose has so far been severely restricted by the Code of Social Law (Section 75, Book 10). Therefore, a reform of this section is absolutely necessary.

  2. Video quality pooling adaptive to perceptual distortion severity.

    PubMed

    Park, Jincheol; Seshadrinathan, Kalpana; Lee, Sanghoon; Bovik, Alan Conrad

    2013-02-01

    It is generally recognized that severe video distortions that are transient in space and/or time have a large effect on overall perceived video quality. In order to understand this phenomena, we study the distribution of spatio-temporally local quality scores obtained from several video quality assessment (VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed distribution. Our method adaptively emphasizes "worst" scores along both the spatial and temporal dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database.

  3. Comparison of the NCI open database with seven large chemical structural databases.

    PubMed

    Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C

    2001-01-01

    Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.

  4. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    NASA Astrophysics Data System (ADS)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  5. Towards communication-efficient quantum oblivious key distribution

    NASA Astrophysics Data System (ADS)

    Panduranga Rao, M. V.; Jakobi, M.

    2013-01-01

    Symmetrically private information retrieval, a fundamental problem in the field of secure multiparty computation, is defined as follows: A database D of N bits held by Bob is queried by a user Alice who is interested in the bit Db in such a way that (1) Alice learns Db and only Db and (2) Bob does not learn anything about Alice's choice b. While solutions to this problem in the classical domain rely largely on unproven computational complexity theoretic assumptions, it is also known that perfect solutions that guarantee both database and user privacy are impossible in the quantum domain. Jakobi [Phys. Rev. APLRAAN1050-294710.1103/PhysRevA.83.022301 83, 022301 (2011)] proposed a protocol for oblivious transfer using well-known quantum key device (QKD) techniques to establish an oblivious key to solve this problem. Their solution provided a good degree of database and user privacy (using physical principles like the impossibility of perfectly distinguishing nonorthogonal quantum states and the impossibility of superluminal communication) while being loss-resistant and implementable with commercial QKD devices (due to the use of the Scarani-Acin-Ribordy-Gisin 2004 protocol). However, their quantum oblivious key distribution (QOKD) protocol requires a communication complexity of O(NlogN). Since modern databases can be extremely large, it is important to reduce this communication as much as possible. In this paper, we first suggest a modification of their protocol wherein the number of qubits that need to be exchanged is reduced to O(N). A subsequent generalization reduces the quantum communication complexity even further in such a way that only a few hundred qubits are needed to be transferred even for very large databases.

  6. Very large hail occurrence in Poland from 2007 to 2015

    NASA Astrophysics Data System (ADS)

    Pilorz, Wojciech

    2015-10-01

    Very large hail is known as a presence of a hailstone greater or equal to 5 cm in diameter. This phenomenon is rare but its significant consequences, not only to agriculture but also to automobiles, households and people outdoor makes it essential thing to examine. Hail appearance is strictly connected with storms frequency and its kind. The most hail-endangered kind of storm is supercell storm. Geographical distribution of hailstorms was compared with geographical distribution of storms in Poland. Similarities were found. The area of the largest number of storms is southeastern Poland. Analyzed European Severe Weather Database (ESWD) data showed that most of very large hail reports occurred in this part of Poland. The probable reason for this situation is the longest period of lasting tropical airmasses in southeastern Poland. Spatial distribution analysis shows also more hail incidents over Upper Silesia, Lesser Poland, Subcarpathia and Świętokrzyskie regions. The information source about hail occurrence was ESWD - open database, where everyone can add report and find reports which meet given search criteria. 69 hailstorms in the period of 2007 - 2015 were examined. They caused 121 very large hail reports. It was found that there is large disproportion in number of hailstorms and hail reports between individual years. Very large hail season in Poland begins in May and ends in September with cumulation in July. Most of hail occurs between 12:00 and 17:00 UTC, but there were some cases of very large (one extremely large) hail at night and early morning hours. However very large hail is a spectacular phenomenon, its local character determines potentially high information loss rate and it is the most significant problem in hail research.

  7. "Mr. Database" : Jim Gray and the History of Database Technologies.

    PubMed

    Hanwahr, Nils C

    2017-12-01

    Although the widespread use of the term "Big Data" is comparatively recent, it invokes a phenomenon in the developments of database technology with distinct historical contexts. The database engineer Jim Gray, known as "Mr. Database" in Silicon Valley before his disappearance at sea in 2007, was involved in many of the crucial developments since the 1970s that constitute the foundation of exceedingly large and distributed databases. Jim Gray was involved in the development of relational database systems based on the concepts of Edgar F. Codd at IBM in the 1970s before he went on to develop principles of Transaction Processing that enable the parallel and highly distributed performance of databases today. He was also involved in creating forums for discourse between academia and industry, which influenced industry performance standards as well as database research agendas. As a co-founder of the San Francisco branch of Microsoft Research, Gray increasingly turned toward scientific applications of database technologies, e. g. leading the TerraServer project, an online database of satellite images. Inspired by Vannevar Bush's idea of the memex, Gray laid out his vision of a Personal Memex as well as a World Memex, eventually postulating a new era of data-based scientific discovery termed "Fourth Paradigm Science". This article gives an overview of Gray's contributions to the development of database technology as well as his research agendas and shows that central notions of Big Data have been occupying database engineers for much longer than the actual term has been in use.

  8. Chesapeake Bay Program Water Quality Database

    EPA Pesticide Factsheets

    The Chesapeake Information Management System (CIMS), designed in 1996, is an integrated, accessible information management system for the Chesapeake Bay Region. CIMS is an organized, distributed library of information and software tools designed to increase basin-wide public access to Chesapeake Bay information. The information delivered by CIMS includes technical and public information, educational material, environmental indicators, policy documents, and scientific data. Through the use of relational databases, web-based programming, and web-based GIS a large number of Internet resources have been established. These resources include multiple distributed on-line databases, on-demand graphing and mapping of environmental data, and geographic searching tools for environmental information. Baseline monitoring data, summarized data and environmental indicators that document ecosystem status and trends, confirm linkages between water quality, habitat quality and abundance, and the distribution and integrity of biological populations are also available. One of the major features of the CIMS network is the Chesapeake Bay Program's Data Hub, providing users access to a suite of long- term water quality and living resources databases. Chesapeake Bay mainstem and tidal tributary water quality, benthic macroinvertebrates, toxics, plankton, and fluorescence data can be obtained for a network of over 800 monitoring stations.

  9. Cloud-Based Distributed Control of Unmanned Systems

    DTIC Science & Technology

    2015-04-01

    during mission execution. At best, the data is saved onto hard-drives and is accessible only by the local team. Data history in a form available and...following open source technologies: GeoServer, OpenLayers, PostgreSQL , and PostGIS are chosen to implement the back-end database and server. A brief...geospatial map data. 3. PostgreSQL : An SQL-compliant object-relational database that easily scales to accommodate large amounts of data - upwards to

  10. Evaluation of Online Information Sources on Alien Species in Europe: The Need of Harmonization and Integration

    NASA Astrophysics Data System (ADS)

    Gatto, Francesca; Katsanevakis, Stelios; Vandekerkhove, Jochen; Zenetos, Argyro; Cardoso, Ana Cristina

    2013-06-01

    Europe is severely affected by alien invasions, which impact biodiversity, ecosystem services, economy, and human health. A large number of national, regional, and global online databases provide information on the distribution, pathways of introduction, and impacts of alien species. The sufficiency and efficiency of the current online information systems to assist the European policy on alien species was investigated by a comparative analysis of occurrence data across 43 online databases. Large differences among databases were found which are partially explained by variations in their taxonomical, environmental, and geographical scopes but also by the variable efforts for continuous updates and by inconsistencies on the definition of "alien" or "invasive" species. No single database covered all European environments, countries, and taxonomic groups. In many European countries national databases do not exist, which greatly affects the quality of reported information. To be operational and useful to scientists, managers, and policy makers, online information systems need to be regularly updated through continuous monitoring on a country or regional level. We propose the creation of a network of online interoperable web services through which information in distributed resources can be accessed, aggregated and then used for reporting and further analysis at different geographical and political scales, as an efficient approach to increase the accessibility of information. Harmonization, standardization, conformity on international standards for nomenclature, and agreement on common definitions of alien and invasive species are among the necessary prerequisites.

  11. Nosql for Storage and Retrieval of Large LIDAR Data Collections

    NASA Astrophysics Data System (ADS)

    Boehm, J.; Liu, K.

    2015-08-01

    Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.

  12. Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints.

    PubMed

    Vogt, Martin; Bajorath, Jürgen

    2008-01-01

    Bayesian classifiers are increasingly being used to distinguish active from inactive compounds and search large databases for novel active molecules. We introduce an approach to directly combine the contributions of property descriptors and molecular fingerprints in the search for active compounds that is based on a Bayesian framework. Conventionally, property descriptors and fingerprints are used as alternative features for virtual screening methods. Following the approach introduced here, probability distributions of descriptor values and fingerprint bit settings are calculated for active and database molecules and the divergence between the resulting combined distributions is determined as a measure of biological activity. In test calculations on a large number of compound activity classes, this methodology was found to consistently perform better than similarity searching using fingerprints and multiple reference compounds or Bayesian screening calculations using probability distributions calculated only from property descriptors. These findings demonstrate that there is considerable synergy between different types of property descriptors and fingerprints in recognizing diverse structure-activity relationships, at least in the context of Bayesian modeling.

  13. A data analysis expert system for large established distributed databases

    NASA Technical Reports Server (NTRS)

    Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick

    1987-01-01

    A design for a natural language database interface system, called the Deductively Augmented NASA Management Decision support System (DANMDS), is presented. The DANMDS system components have been chosen on the basis of the following considerations: maximal employment of the existing NASA IBM-PC computers and supporting software; local structuring and storing of external data via the entity-relationship model; a natural easy-to-use error-free database query language; user ability to alter query language vocabulary and data analysis heuristic; and significant artificial intelligence data analysis heuristic techniques that allow the system to become progressively and automatically more useful.

  14. Neuroimaging Data Sharing on the Neuroinformatics Database Platform

    PubMed Central

    Book, Gregory A; Stevens, Michael; Assaf, Michal; Glahn, David; Pearlson, Godfrey D

    2015-01-01

    We describe the Neuroinformatics Database (NiDB), an open-source database platform for archiving, analysis, and sharing of neuroimaging data. Data from the multi-site projects Autism Brain Imaging Data Exchange (ABIDE), Bipolar-Schizophrenia Network on Intermediate Phenotypes parts one and two (B-SNIP1, B-SNIP2), and Monetary Incentive Delay task (MID) are available for download from the public instance of NiDB, with more projects sharing data as it becomes available. As demonstrated by making several large datasets available, NiDB is an extensible platform appropriately suited to archive and distribute shared neuroimaging data. PMID:25888923

  15. Constructing distributed Hippocratic video databases for privacy-preserving online patient training and counseling.

    PubMed

    Peng, Jinye; Babaguchi, Noboru; Luo, Hangzai; Gao, Yuli; Fan, Jianping

    2010-07-01

    Digital video now plays an important role in supporting more profitable online patient training and counseling, and integration of patient training videos from multiple competitive organizations in the health care network will result in better offerings for patients. However, privacy concerns often prevent multiple competitive organizations from sharing and integrating their patient training videos. In addition, patients with infectious or chronic diseases may not want the online patient training organizations to identify who they are or even which video clips they are interested in. Thus, there is an urgent need to develop more effective techniques to protect both video content privacy and access privacy . In this paper, we have developed a new approach to construct a distributed Hippocratic video database system for supporting more profitable online patient training and counseling. First, a new database modeling approach is developed to support concept-oriented video database organization and assign a degree of privacy of the video content for each database level automatically. Second, a new algorithm is developed to protect the video content privacy at the level of individual video clip by filtering out the privacy-sensitive human objects automatically. In order to integrate the patient training videos from multiple competitive organizations for constructing a centralized video database indexing structure, a privacy-preserving video sharing scheme is developed to support privacy-preserving distributed classifier training and prevent the statistical inferences from the videos that are shared for cross-validation of video classifiers. Our experiments on large-scale video databases have also provided very convincing results.

  16. Effects of Energetic Additives on Combustion Dynamics

    DTIC Science & Technology

    2010-04-19

    has the Distribution Statement checked befow. The current distribution for this document can be found in the DTIC® Technical Report Database. Q...no person shall be subject to any penalty for failing to comply with a collection of information if it does not display a currently velid OMB...and ethanol drops loaded with nano-Al additives burned differently. An exploratory computational study using Large Eddy Simulation indicated that

  17. The Identity Mapping Project: Demographic differences in patterns of distributed identity.

    PubMed

    Gilbert, Richard L; Dionisio, John David N; Forney, Andrew; Dorin, Philip

    2015-01-01

    The advent of cloud computing and a multi-platform digital environment is giving rise to a new phase of human identity called "The Distributed Self." In this conception, aspects of the self are distributed into a variety of 2D and 3D digital personas with the capacity to reflect any number of combinations of now malleable personality traits. In this way, the source of human identity remains internal and embodied, but the expression or enactment of the self becomes increasingly external, disembodied, and distributed on demand. The Identity Mapping Project (IMP) is an interdisciplinary collaboration between psychology and computer Science designed to empirically investigate the development of distributed forms of identity. Methodologically, it collects a large database of "identity maps" - computerized graphical representations of how active someone is online and how their identity is expressed and distributed across 7 core digital domains: email, blogs/personal websites, social networks, online forums, online dating sites, character based digital games, and virtual worlds. The current paper reports on gender and age differences in online identity based on an initial database of distributed identity profiles.

  18. FishTraits: a database of ecological and life-history traits of freshwater fishes of the United States

    USGS Publications Warehouse

    Angermeier, Paul L.; Frimpong, Emmanuel A.

    2011-01-01

    The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. We have compiled a database of > 100 traits for 809 (731 native and 78 nonnative) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database, named Fish Traits, contains information on four major categories of traits: (1) trophic ecology; (2) body size, reproductive ecology, and life history; (3) habitat preferences; and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status was also compiled. The database enhances many opportunities for conducting research on fish species traits and constitutes the first step toward establishing a central repository for a continually expanding set of traits of North American fishes.

  19. Evolution of the use of relational and NoSQL databases in the ATLAS experiment

    NASA Astrophysics Data System (ADS)

    Barberis, D.

    2016-09-01

    The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.

  20. Application of new type of distributed multimedia databases to networked electronic museum

    NASA Astrophysics Data System (ADS)

    Kuroda, Kazuhide; Komatsu, Naohisa; Komiya, Kazumi; Ikeda, Hiroaki

    1999-01-01

    Recently, various kinds of multimedia application systems have actively been developed based on the achievement of advanced high sped communication networks, computer processing technologies, and digital contents-handling technologies. Under this background, this paper proposed a new distributed multimedia database system which can effectively perform a new function of cooperative retrieval among distributed databases. The proposed system introduces a new concept of 'Retrieval manager' which functions as an intelligent controller so that the user can recognize a set of distributed databases as one logical database. The logical database dynamically generates and performs a preferred combination of retrieving parameters on the basis of both directory data and the system environment. Moreover, a concept of 'domain' is defined in the system as a managing unit of retrieval. The retrieval can effectively be performed by cooperation of processing among multiple domains. Communication language and protocols are also defined in the system. These are used in every action for communications in the system. A language interpreter in each machine translates a communication language into an internal language used in each machine. Using the language interpreter, internal processing, such internal modules as DBMS and user interface modules can freely be selected. A concept of 'content-set' is also introduced. A content-set is defined as a package of contents. Contents in the content-set are related to each other. The system handles a content-set as one object. The user terminal can effectively control the displaying of retrieved contents, referring to data indicating the relation of the contents in the content- set. In order to verify the function of the proposed system, a networked electronic museum was experimentally built. The results of this experiment indicate that the proposed system can effectively retrieve the objective contents under the control to a number of distributed domains. The result also indicate that the system can effectively work even if the system becomes large.

  1. Overcoming barriers to a research-ready national commercial claims database.

    PubMed

    Newman, David; Herrera, Carolina-Nicole; Parente, Stephen T

    2014-11-01

    Billions of dollars have been spent on the goal of making healthcare data available to clinicians and researchers in the hopes of improving healthcare and lowering costs. However, the problems of data governance, distribution, and accessibility remain challenges for the healthcare system to overcome. In this study, we discuss some of the issues around holding, reporting, and distributing data, including the newest "big data" challenge: making the data accessible to researchers and policy makers. This article presents a case study in "big healthcare data" involving the Health Care Cost Institute (HCCI). HCCI is a nonprofit, nonpartisan, independent research institute that serves as a voluntary repository of national commercial healthcare claims data. Governance of large healthcare databases is complicated by the data-holding model and further complicated by issues related to distribution to research teams. For multi-payer healthcare claims databases, the 2 most common models of data holding (mandatory and voluntary) have different data security requirements. Furthermore, data transport and accessibility may require technological investment. HCCI's efforts offer insights from which other data managers and healthcare leaders may benefit when contemplating a data collaborative.

  2. Database technology and the management of multimedia data in the Mirror project

    NASA Astrophysics Data System (ADS)

    de Vries, Arjen P.; Blanken, H. M.

    1998-10-01

    Multimedia digital libraries require an open distributed architecture instead of a monolithic database system. In the Mirror project, we use the Monet extensible database kernel to manage different representation of multimedia objects. To maintain independence between content, meta-data, and the creation of meta-data, we allow distribution of data and operations using CORBA. This open architecture introduces new problems for data access. From an end user's perspective, the problem is how to search the available representations to fulfill an actual information need; the conceptual gap between human perceptual processes and the meta-data is too large. From a system's perspective, several representations of the data may semantically overlap or be irrelevant. We address these problems with an iterative query process and active user participating through relevance feedback. A retrieval model based on inference networks assists the user with query formulation. The integration of this model into the database design has two advantages. First, the user can query both the logical and the content structure of multimedia objects. Second, the use of different data models in the logical and the physical database design provides data independence and allows algebraic query optimization. We illustrate query processing with a music retrieval application.

  3. GPCALMA: A Tool For Mammography With A GRID-Connected Distributed Database

    NASA Astrophysics Data System (ADS)

    Bottigli, U.; Cerello, P.; Cheran, S.; Delogu, P.; Fantacci, M. E.; Fauci, F.; Golosio, B.; Lauria, A.; Lopez Torres, E.; Magro, R.; Masala, G. L.; Oliva, P.; Palmiero, R.; Raso, G.; Retico, A.; Stumbo, S.; Tangaro, S.

    2003-09-01

    The GPCALMA (Grid Platform for Computer Assisted Library for MAmmography) collaboration involves several departments of physics, INFN (National Institute of Nuclear Physics) sections, and italian hospitals. The aim of this collaboration is developing a tool that can help radiologists in early detection of breast cancer. GPCALMA has built a large distributed database of digitised mammographic images (about 5500 images corresponding to 1650 patients) and developed a CAD (Computer Aided Detection) software which is integrated in a station that can also be used to acquire new images, as archive and to perform statistical analysis. The images (18×24 cm2, digitised by a CCD linear scanner with a 85 μm pitch and 4096 gray levels) are completely described: pathological ones have a consistent characterization with radiologist's diagnosis and histological data, non pathological ones correspond to patients with a follow up at least three years. The distributed database is realized throught the connection of all the hospitals and research centers in GRID tecnology. In each hospital local patients digital images are stored in the local database. Using GRID connection, GPCALMA will allow each node to work on distributed database data as well as local database data. Using its database the GPCALMA tools perform several analysis. A texture analysis, i.e. an automated classification on adipose, dense or glandular texture, can be provided by the system. GPCALMA software also allows classification of pathological features, in particular massive lesions (both opacities and spiculated lesions) analysis and microcalcification clusters analysis. The detection of pathological features is made using neural network software that provides a selection of areas showing a given "suspicion level" of lesion occurrence. The performance of the GPCALMA system will be presented in terms of the ROC (Receiver Operating Characteristic) curves. The results of GPCALMA system as "second reader" will also be presented.

  4. Big Data and Total Hip Arthroplasty: How Do Large Databases Compare?

    PubMed

    Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael A; Lux, Nathan R; Bozic, Kevin J; Callaghan, John J

    2018-01-01

    Use of large databases for orthopedic research has become extremely popular in recent years. Each database varies in the methods used to capture data and the population it represents. The purpose of this study was to evaluate how these databases differed in reported demographics, comorbidities, and postoperative complications for primary total hip arthroplasty (THA) patients. Primary THA patients were identified within National Surgical Quality Improvement Programs (NSQIP), Nationwide Inpatient Sample (NIS), Medicare Standard Analytic Files (MED), and Humana administrative claims database (HAC). NSQIP definitions for comorbidities and complications were matched to corresponding International Classification of Diseases, 9th Revision/Current Procedural Terminology codes to query the other databases. Demographics, comorbidities, and postoperative complications were compared. The number of patients from each database was 22,644 in HAC, 371,715 in MED, 188,779 in NIS, and 27,818 in NSQIP. Age and gender distribution were clinically similar. Overall, there was variation in prevalence of comorbidities and rates of postoperative complications between databases. As an example, NSQIP had more than twice the obesity than NIS. HAC and MED had more than 2 times the diabetics than NSQIP. Rates of deep infection and stroke 30 days after THA had more than 2-fold difference between all databases. Among databases commonly used in orthopedic research, there is considerable variation in complication rates following THA depending upon the database used for analysis. It is important to consider these differences when critically evaluating database research. Additionally, with the advent of bundled payments, these differences must be considered in risk adjustment models. Copyright © 2017 Elsevier Inc. All rights reserved.

  5. Distributed databases for materials study of thermo-kinetic properties

    NASA Astrophysics Data System (ADS)

    Toher, Cormac

    2015-03-01

    High-throughput computational materials science provides researchers with the opportunity to rapidly generate large databases of materials properties. To rapidly add thermal properties to the AFLOWLIB consortium and Materials Project repositories, we have implemented an automated quasi-harmonic Debye model, the Automatic GIBBS Library (AGL). This enables us to screen thousands of materials for thermal conductivity, bulk modulus, thermal expansion and related properties. The search and sort functions of the online database can then be used to identify suitable materials for more in-depth study using more precise computational or experimental techniques. AFLOW-AGL source code is public domain and will soon be released within the GNU-GPL license.

  6. PREPping Students for Authentic Science

    ERIC Educational Resources Information Center

    Dolan, Erin L.; Lally, David J.; Brooks, Eric; Tax, Frans E.

    2008-01-01

    In this article, the authors describe a large-scale research collaboration, the Partnership for Research and Education in Plants (PREP), which has capitalized on publicly available databases that contain massive amounts of biological information; stock centers that house and distribute inexpensive organisms with different genotypes; and the…

  7. Quiet, Computer at Work.

    ERIC Educational Resources Information Center

    Black, Claudia

    Libraries are becoming information access points, not just book repositories. With greater distribution of printed materials, increased use of optical disks and other compact storage techniques, the emergence of publication on demand, and the proliferation of electronic databases, libraries without large collections will be able to provide prompt…

  8. Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data.

    PubMed

    Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard

    2014-04-01

    Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices.

  9. Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data

    PubMed Central

    Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard

    2014-01-01

    Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices. PMID:24772273

  10. Distributed data mining on grids: services, tools, and applications.

    PubMed

    Cannataro, Mario; Congiusta, Antonio; Pugliese, Andrea; Talia, Domenico; Trunfio, Paolo

    2004-12-01

    Data mining algorithms are widely used today for the analysis of large corporate and scientific datasets stored in databases and data archives. Industry, science, and commerce fields often need to analyze very large datasets maintained over geographically distributed sites by using the computational power of distributed and parallel systems. The grid can play a significant role in providing an effective computational support for distributed knowledge discovery applications. For the development of data mining applications on grids we designed a system called Knowledge Grid. This paper describes the Knowledge Grid framework and presents the toolset provided by the Knowledge Grid for implementing distributed knowledge discovery. The paper discusses how to design and implement data mining applications by using the Knowledge Grid tools starting from searching grid resources, composing software and data components, and executing the resulting data mining process on a grid. Some performance results are also discussed.

  11. Spatial distribution of citizen science casuistic observations for different taxonomic groups.

    PubMed

    Tiago, Patrícia; Ceia-Hasse, Ana; Marques, Tiago A; Capinha, César; Pereira, Henrique M

    2017-10-16

    Opportunistic citizen science databases are becoming an important way of gathering information on species distributions. These data are temporally and spatially dispersed and could have limitations regarding biases in the distribution of the observations in space and/or time. In this work, we test the influence of landscape variables in the distribution of citizen science observations for eight taxonomic groups. We use data collected through a Portuguese citizen science database (biodiversity4all.org). We use a zero-inflated negative binomial regression to model the distribution of observations as a function of a set of variables representing the landscape features plausibly influencing the spatial distribution of the records. Results suggest that the density of paths is the most important variable, having a statistically significant positive relationship with number of observations for seven of the eight taxa considered. Wetland coverage was also identified as having a significant, positive relationship, for birds, amphibians and reptiles, and mammals. Our results highlight that the distribution of species observations, in citizen science projects, is spatially biased. Higher frequency of observations is driven largely by accessibility and by the presence of water bodies. We conclude that efforts are required to increase the spatial evenness of sampling effort from volunteers.

  12. Estimating Traveler Populations at Airport and Cruise Terminals for Population Distribution and Dynamics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jochem, Warren C; Sims, Kelly M; Bright, Eddie A

    In recent years, uses of high-resolution population distribution databases are increasing steadily for environmental, socioeconomic, public health, and disaster-related research and operations. With the development of daytime population distribution, temporal resolution of such databases has been improved. However, the lack of incorporation of transitional population, namely business and leisure travelers, leaves a significant population unaccounted for within the critical infrastructure networks, such as at transportation hubs. This paper presents two general methodologies for estimating passenger populations in airport and cruise port terminals at a high temporal resolution which can be incorporated into existing population distribution models. The methodologies are geographicallymore » scalable and are based on, and demonstrate how, two different transportation hubs with disparate temporal population dynamics can be modeled utilizing publicly available databases including novel data sources of flight activity from the Internet which are updated in near-real time. The airport population estimation model shows great potential for rapid implementation for a large collection of airports on a national scale, and the results suggest reasonable accuracy in the estimated passenger traffic. By incorporating population dynamics at high temporal resolutions into population distribution models, we hope to improve the estimates of populations exposed to or at risk to disasters, thereby improving emergency planning and response, and leading to more informed policy decisions.« less

  13. Design of a decentralized reusable research database architecture to support data acquisition in large research projects.

    PubMed

    Iavindrasana, Jimison; Depeursinge, Adrien; Ruch, Patrick; Spahni, Stéphane; Geissbuhler, Antoine; Müller, Henning

    2007-01-01

    The diagnostic and therapeutic processes, as well as the development of new treatments, are hindered by the fragmentation of information which underlies them. In a multi-institutional research study database, the clinical information system (CIS) contains the primary data input. An important part of the money of large scale clinical studies is often paid for data creation and maintenance. The objective of this work is to design a decentralized, scalable, reusable database architecture with lower maintenance costs for managing and integrating distributed heterogeneous data required as basis for a large-scale research project. Technical and legal aspects are taken into account based on various use case scenarios. The architecture contains 4 layers: data storage and access are decentralized at their production source, a connector as a proxy between the CIS and the external world, an information mediator as a data access point and the client side. The proposed design will be implemented inside six clinical centers participating in the @neurIST project as part of a larger system on data integration and reuse for aneurism treatment.

  14. Creation of clinical research databases in the 21st century: a practical algorithm for HIPAA Compliance.

    PubMed

    Schell, Scott R

    2006-02-01

    Enforcement of the Health Insurance Portability and Accountability Act (HIPAA) began in April, 2003. Designed as a law mandating health insurance availability when coverage was lost, HIPAA imposed sweeping and broad-reaching protections of patient privacy. These changes dramatically altered clinical research by placing sizeable regulatory burdens upon investigators with threat of severe and costly federal and civil penalties. This report describes development of an algorithmic approach to clinical research database design based upon a central key-shared data (CK-SD) model allowing researchers to easily analyze, distribute, and publish clinical research without disclosure of HIPAA Protected Health Information (PHI). Three clinical database formats (small clinical trial, operating room performance, and genetic microchip array datasets) were modeled using standard structured query language (SQL)-compliant databases. The CK database was created to contain PHI data, whereas a shareable SD database was generated in real-time containing relevant clinical outcome information while protecting PHI items. Small (< 100 records), medium (< 50,000 records), and large (> 10(8) records) model databases were created, and the resultant data models were evaluated in consultation with an HIPAA compliance officer. The SD database models complied fully with HIPAA regulations, and resulting "shared" data could be distributed freely. Unique patient identifiers were not required for treatment or outcome analysis. Age data were resolved to single-integer years, grouping patients aged > 89 years. Admission, discharge, treatment, and follow-up dates were replaced with enrollment year, and follow-up/outcome intervals calculated eliminating original data. Two additional data fields identified as PHI (treating physician and facility) were replaced with integer values, and the original data corresponding to these values were stored in the CK database. Use of the algorithm at the time of database design did not increase cost or design effort. The CK-SD model for clinical database design provides an algorithm for investigators to create, maintain, and share clinical research data compliant with HIPAA regulations. This model is applicable to new projects and large institutional datasets, and should decrease regulatory efforts required for conduct of clinical research. Application of the design algorithm early in the clinical research enterprise does not increase cost or the effort of data collection.

  15. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Distributed database kriging for adaptive sampling (D²KAS)

    DOE PAGES

    Roehm, Dominic; Pavel, Robert S.; Barros, Kipton; ...

    2015-03-18

    We present an adaptive sampling method supplemented by a distributed database and a prediction method for multiscale simulations using the Heterogeneous Multiscale Method. A finite-volume scheme integrates the macro-scale conservation laws for elastodynamics, which are closed by momentum and energy fluxes evaluated at the micro-scale. In the original approach, molecular dynamics (MD) simulations are launched for every macro-scale volume element. Our adaptive sampling scheme replaces a large fraction of costly micro-scale MD simulations with fast table lookup and prediction. The cloud database Redis provides the plain table lookup, and with locality aware hashing we gather input data for our predictionmore » scheme. For the latter we use kriging, which estimates an unknown value and its uncertainty (error) at a specific location in parameter space by using weighted averages of the neighboring points. We find that our adaptive scheme significantly improves simulation performance by a factor of 2.5 to 25, while retaining high accuracy for various choices of the algorithm parameters.« less

  17. Information-seeking behavior and the use of online resources: a snapshot of current health sciences faculty.

    PubMed

    De Groote, Sandra L; Shultz, Mary; Blecic, Deborah D

    2014-07-01

    The research assesses the information-seeking behaviors of health sciences faculty, including their use of online databases, journals, and social media. A survey was designed and distributed via email to 754 health sciences faculty at a large urban research university with 6 health sciences colleges. Twenty-six percent (198) of faculty responded. MEDLINE was the primary database utilized, with 78.5% respondents indicating they use the database at least once a week. Compared to MEDLINE, Google was utilized more often on a daily basis. Other databases showed much lower usage. Low use of online databases other than MEDLINE, link-out tools to online journals, and online social media and collaboration tools demonstrates a need for meaningful promotion of online resources and informatics literacy instruction for faculty. Library resources are plentiful and perhaps somewhat overwhelming. Librarians need to help faculty discover and utilize the resources and tools that libraries have to offer.

  18. A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

    NASA Astrophysics Data System (ADS)

    Demir, I.; Agliamzanov, R.

    2014-12-01

    Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.

  19. Terrestrial Sediments of the Earth: Development of a Global Unconsolidated Sediments Map Database (GUM)

    NASA Astrophysics Data System (ADS)

    Börker, J.; Hartmann, J.; Amann, T.; Romero-Mujalli, G.

    2018-04-01

    Mapped unconsolidated sediments cover half of the global land surface. They are of considerable importance for many Earth surface processes like weathering, hydrological fluxes or biogeochemical cycles. Ignoring their characteristics or spatial extent may lead to misinterpretations in Earth System studies. Therefore, a new Global Unconsolidated Sediments Map database (GUM) was compiled, using regional maps specifically representing unconsolidated and quaternary sediments. The new GUM database provides insights into the regional distribution of unconsolidated sediments and their properties. The GUM comprises 911,551 polygons and describes not only sediment types and subtypes, but also parameters like grain size, mineralogy, age and thickness where available. Previous global lithological maps or databases lacked detail for reported unconsolidated sediment areas or missed large areas, and reported a global coverage of 25 to 30%, considering the ice-free land area. Here, alluvial sediments cover about 23% of the mapped total ice-free area, followed by aeolian sediments (˜21%), glacial sediments (˜20%), and colluvial sediments (˜16%). A specific focus during the creation of the database was on the distribution of loess deposits, since loess is highly reactive and relevant to understand geochemical cycles related to dust deposition and weathering processes. An additional layer compiling pyroclastic sediment is added, which merges consolidated and unconsolidated pyroclastic sediments. The compilation shows latitudinal abundances of sediment types related to climate of the past. The GUM database is available at the PANGAEA database (https://doi.org/10.1594/PANGAEA.884822).

  20. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  1. Vanderbilt University Institute of Imaging Science Center for Computational Imaging XNAT: A multimodal data archive and processing environment.

    PubMed

    Harrigan, Robert L; Yvernault, Benjamin C; Boyd, Brian D; Damon, Stephen M; Gibney, Kyla David; Conrad, Benjamin N; Phillips, Nicholas S; Rogers, Baxter P; Gao, Yurui; Landman, Bennett A

    2016-01-01

    The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has developed a database built on XNAT housing over a quarter of a million scans. The database provides framework for (1) rapid prototyping, (2) large scale batch processing of images and (3) scalable project management. The system uses the web-based interfaces of XNAT and REDCap to allow for graphical interaction. A python middleware layer, the Distributed Automation for XNAT (DAX) package, distributes computation across the Vanderbilt Advanced Computing Center for Research and Education high performance computing center. All software are made available in open source for use in combining portable batch scripting (PBS) grids and XNAT servers. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Database Objects vs Files: Evaluation of alternative strategies for managing large remote sensing data

    NASA Astrophysics Data System (ADS)

    Baru, Chaitan; Nandigam, Viswanath; Krishnan, Sriram

    2010-05-01

    Increasingly, the geoscience user community expects modern IT capabilities to be available in service of their research and education activities, including the ability to easily access and process large remote sensing datasets via online portals such as GEON (www.geongrid.org) and OpenTopography (opentopography.org). However, serving such datasets via online data portals presents a number of challenges. In this talk, we will evaluate the pros and cons of alternative storage strategies for management and processing of such datasets using binary large object implementations (BLOBs) in database systems versus implementation in Hadoop files using the Hadoop Distributed File System (HDFS). The storage and I/O requirements for providing online access to large datasets dictate the need for declustering data across multiple disks, for capacity as well as bandwidth and response time performance. This requires partitioning larger files into a set of smaller files, and is accompanied by the concomitant requirement for managing large numbers of file. Storing these sub-files as blobs in a shared-nothing database implemented across a cluster provides the advantage that all the distributed storage management is done by the DBMS. Furthermore, subsetting and processing routines can be implemented as user-defined functions (UDFs) on these blobs and would run in parallel across the set of nodes in the cluster. On the other hand, there are both storage overheads and constraints, and software licensing dependencies created by such an implementation. Another approach is to store the files in an external filesystem with pointers to them from within database tables. The filesystem may be a regular UNIX filesystem, a parallel filesystem, or HDFS. In the HDFS case, HDFS would provide the file management capability, while the subsetting and processing routines would be implemented as Hadoop programs using the MapReduce model. Hadoop and its related software libraries are freely available. Another consideration is the strategy used for partitioning large data collections, and large datasets within collections, using round-robin vs hash partitioning vs range partitioning methods. Each has different characteristics in terms of spatial locality of data and resultant degree of declustering of the computations on the data. Furthermore, we have observed that, in practice, there can be large variations in the frequency of access to different parts of a large data collection and/or dataset, thereby creating "hotspots" in the data. We will evaluate the ability of different approaches for dealing effectively with such hotspots and alternative strategies for dealing with hotspots.

  3. USBombus, a database of contemporary survey data for North American Bumble Bees (Hymenoptera, Apidae, Bombus) distributed in the United States.

    PubMed

    Koch, Jonathan B; Lozier, Jeffrey; Strange, James P; Ikerd, Harold; Griswold, Terry; Cordes, Nils; Solter, Leellen; Stewart, Isaac; Cameron, Sydney A

    2015-01-01

    Bumble bees (Hymenoptera: Apidae, Bombus) are pollinators of wild and economically important flowering plants. However, at least four bumble bee species have declined significantly in population abundance and geographic range relative to historic estimates, and one species is possibly extinct. While a wealth of historic data is now available for many of the North American species found to be in decline in online databases, systematic survey data of stable species is still not publically available. The availability of contemporary survey data is critically important for the future monitoring of wild bumble bee populations. Without such data, the ability to ascertain the conservation status of bumble bees in the United States will remain challenging. This paper describes USBombus, a large database that represents the outcomes of one of the largest standardized surveys of bumble bee pollinators (Hymenoptera, Apidae, Bombus) globally. The motivation to collect live bumble bees across the United States was to examine the decline and conservation status of Bombus affinis, B. occidentalis, B. pensylvanicus, and B. terricola. Prior to our national survey of bumble bees in the United States from 2007 to 2010, there have only been regional accounts of bumble bee abundance and richness. In addition to surveying declining bumble bees, we also collected and documented a diversity of co-occuring bumble bees. However we have not yet completely reported their distribution and diversity onto a public online platform. Now, for the first time, we report the geographic distribution of bumble bees reported to be in decline (Cameron et al. 2011), as well as bumble bees that appeared to be stable on a large geographic scale in the United States (not in decline). In this database we report a total of 17,930 adult occurrence records across 397 locations and 39 species of Bombus detected in our national survey. We summarize their abundance and distribution across the United States and association to different ecoregions. The geospatial coverage of the dataset extends across 41 of the 50 US states, and from 0 to 3500 m a.s.l. Authors and respective field crews spent a total of 512 hours surveying bumble bees from 2007 to 2010. The dataset was developed using SQL server 2008 r2. For each specimen, the following information is generally provided: species, name, sex, caste, temporal and geospatial details, Cartesian coordinates, data collector(s), and when available, host plants. This database has already proven useful for a variety of studies on bumble bee ecology and conservation. However it is not publicly available. Considering the value of pollinators in agriculture and wild ecosystems, this large database of bumble bees will likely prove useful for investigations of the effects of anthropogenic activities on pollinator community composition and conservation status.

  4. Building a generalized distributed system model

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    A number of topics related to building a generalized distributed system model are discussed. The effects of distributed database modeling on evaluation of transaction rollbacks, the measurement of effects of distributed database models on transaction availability measures, and a performance analysis of static locking in replicated distributed database systems are covered.

  5. Data Intensive Systems (DIS) Benchmark Performance Summary

    DTIC Science & Technology

    2003-08-01

    models assumed by today’s conventional architectures. Such applications include model- based Automatic Target Recognition (ATR), synthetic aperture...radar (SAR) codes, large scale dynamic databases/battlefield integration, dynamic sensor- based processing, high-speed cryptanalysis, high speed...distributed interactive and data intensive simulations, data-oriented problems characterized by pointer- based and other highly irregular data structures

  6. VizieR Online Data Catalog: 2nd and 3d parameters of HB of globular clusters (Gratton+, 2010)

    NASA Astrophysics Data System (ADS)

    Gratton, R. G.; Carretta, E.; Bragaglia, A.; Lucatello, S.; S'orazii, V.

    2010-05-01

    The second parameter (the first being metallicity) defining the distribution of stars on the horizontal branch (HB) of globular clusters (GCs) has long been one of the major open issues in our understanding of the evolution of normal stars. Large photometric and spectroscopic databases are now available: they include large and homogeneous sets of colour-magnitude diagrams, cluster ages, and homogeneous data about chemical compositions from our FLAMES survey. We use these databases to re-examine this issue. Methods. We use the photometric data to derive median and extreme (i.e., the values including 90% of the distribution) colours and magnitudes of stars along the HB for about a hundred GCs. We transform these into median and extreme masses of stars on the HB, using the models developed by the Pisa group, and taking into account evolutionary effects. We compare these masses with those expected at the tip of the red giant branch (RGB) to derive the total mass lost by the stars. (11 data files).

  7. Using an object-based grid system to evaluate a newly developed EP approach to formulate SVMs as applied to the classification of organophosphate nerve agents

    NASA Astrophysics Data System (ADS)

    Land, Walker H., Jr.; Lewis, Michael; Sadik, Omowunmi; Wong, Lut; Wanekaya, Adam; Gonzalez, Richard J.; Balan, Arun

    2004-04-01

    This paper extends the classification approaches described in reference [1] in the following way: (1.) developing and evaluating a new method for evolving organophosphate nerve agent Support Vector Machine (SVM) classifiers using Evolutionary Programming, (2.) conducting research experiments using a larger database of organophosphate nerve agents, and (3.) upgrading the architecture to an object-based grid system for evaluating the classification of EP derived SVMs. Due to the increased threats of chemical and biological weapons of mass destruction (WMD) by international terrorist organizations, a significant effort is underway to develop tools that can be used to detect and effectively combat biochemical warfare. This paper reports the integration of multi-array sensors with Support Vector Machines (SVMs) for the detection of organophosphates nerve agents using a grid computing system called Legion. Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. Finally, preliminary results using EP derived support vector machines designed to operate on distributed systems have provided accurate classification results. In addition, distributed training time architectures are 50 times faster when compared to standard iterative training time methods.

  8. Developing a Near Real-time System for Earthquake Slip Distribution Inversion

    NASA Astrophysics Data System (ADS)

    Zhao, Li; Hsieh, Ming-Che; Luo, Yan; Ji, Chen

    2016-04-01

    Advances in observational and computational seismology in the past two decades have enabled completely automatic and real-time determinations of the focal mechanisms of earthquake point sources. However, seismic radiations from moderate and large earthquakes often exhibit strong finite-source directivity effect, which is critically important for accurate ground motion estimations and earthquake damage assessments. Therefore, an effective procedure to determine earthquake rupture processes in near real-time is in high demand for hazard mitigation and risk assessment purposes. In this study, we develop an efficient waveform inversion approach for the purpose of solving for finite-fault models in 3D structure. Full slip distribution inversions are carried out based on the identified fault planes in the point-source solutions. To ensure efficiency in calculating 3D synthetics during slip distribution inversions, a database of strain Green tensors (SGT) is established for 3D structural model with realistic surface topography. The SGT database enables rapid calculations of accurate synthetic seismograms for waveform inversion on a regular desktop or even a laptop PC. We demonstrate our source inversion approach using two moderate earthquakes (Mw~6.0) in Taiwan and in mainland China. Our results show that 3D velocity model provides better waveform fitting with more spatially concentrated slip distributions. Our source inversion technique based on the SGT database is effective for semi-automatic, near real-time determinations of finite-source solutions for seismic hazard mitigation purposes.

  9. Adopting a corporate perspective on databases. Improving support for research and decision making.

    PubMed

    Meistrell, M; Schlehuber, C

    1996-03-01

    The Veterans Health Administration (VHA) is at the forefront of designing and managing health care information systems that accommodate the needs of clinicians, researchers, and administrators at all levels. Rather than using one single-site, centralized corporate database VHA has constructed several large databases with different configurations to meet the needs of users with different perspectives. The largest VHA database is the Decentralized Hospital Computer Program (DHCP), a multisite, distributed data system that uses decoupled hospital databases. The centralization of DHCP policy has promoted data coherence, whereas the decentralization of DHCP management has permitted system development to be done with maximum relevance to the users'local practices. A more recently developed VHA data system, the Event Driven Reporting system (EDR), uses multiple, highly coupled databases to provide workload data at facility, regional, and national levels. The EDR automatically posts a subset of DHCP data to local and national VHA management. The development of the EDR illustrates how adoption of a corporate perspective can offer significant database improvements at reasonable cost and with modest impact on the legacy system.

  10. Combining knowledge discovery from databases (KDD) and case-based reasoning (CBR) to support diagnosis of medical images

    NASA Astrophysics Data System (ADS)

    Stranieri, Andrew; Yearwood, John; Pham, Binh

    1999-07-01

    The development of data warehouses for the storage and analysis of very large corpora of medical image data represents a significant trend in health care and research. Amongst other benefits, the trend toward warehousing enables the use of techniques for automatically discovering knowledge from large and distributed databases. In this paper, we present an application design for knowledge discovery from databases (KDD) techniques that enhance the performance of the problem solving strategy known as case- based reasoning (CBR) for the diagnosis of radiological images. The problem of diagnosing the abnormality of the cervical spine is used to illustrate the method. The design of a case-based medical image diagnostic support system has three essential characteristics. The first is a case representation that comprises textual descriptions of the image, visual features that are known to be useful for indexing images, and additional visual features to be discovered by data mining many existing images. The second characteristic of the approach presented here involves the development of a case base that comprises an optimal number and distribution of cases. The third characteristic involves the automatic discovery, using KDD techniques, of adaptation knowledge to enhance the performance of the case based reasoner. Together, the three characteristics of our approach can overcome real time efficiency obstacles that otherwise mitigate against the use of CBR to the domain of medical image analysis.

  11. Computer Science Research in Europe.

    DTIC Science & Technology

    1984-08-29

    most attention, multi- database and its structure, and (3) the dependencies between databases Distributed Systems and multi- databases . Having...completed a multi- database Newcastle University, UK system for distributed data management, At the University of Newcastle the INRIA is now working on a real...communications re- INRIA quirements of distributed database A project called SIRIUS was estab- systems, protocols for checking the lished in 1977 at the

  12. Information integration for a sky survey by data warehousing

    NASA Astrophysics Data System (ADS)

    Luo, A.; Zhang, Y.; Zhao, Y.

    The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line

  13. Distribution Grid Integration Unit Cost Database | Solar Research | NREL

    Science.gov Websites

    Unit Cost Database Distribution Grid Integration Unit Cost Database NREL's Distribution Grid Integration Unit Cost Database contains unit cost information for different components that may be used to associated with PV. It includes information from the California utility unit cost guides on traditional

  14. Analysis of data on large explosive eruptions of stratovolcanoes to constrain under-recording and eruption rates

    NASA Astrophysics Data System (ADS)

    Rougier, Jonty; Cashman, Kathy; Sparks, Stephen

    2016-04-01

    We have analysed the Large Magnitude Explosive Volcanic Eruptions database (LaMEVE) for volcanoes that classify as stratovolcanoes. A non-parametric statistical approach is used to assess the global recording rate for large (M4+). The approach imposes minimal structure on the shape of the recording rate through time. We find that the recording rates have declined rapidly, going backwards in time. Prior to 1600 they are below 50%, and prior to 1100 they are below 20%. Even in the recent past, e.g. the 1800s, they are likely to be appreciably less than 100%.The assessment for very large (M5+) eruptions is more uncertain, due to the scarcity of events. Having taken under-recording into account the large-eruption rates of stratovolcanoes are modelled exchangeably, in order to derive an informative prior distribution as an input into a subsequent volcano-by-volcano hazard assessment. The statistical model implies that volcano-by-volcano predictions can be grouped by the number of recorded large eruptions. Further, it is possible to combine all volcanoes together into a global large eruption prediction, with an M4+ rate computed from the LaMEVE database of 0.57/yr.

  15. Parallel computing method for simulating hydrological processesof large rivers under climate change

    NASA Astrophysics Data System (ADS)

    Wang, H.; Chen, Y.

    2016-12-01

    Climate change is one of the proverbial global environmental problems in the world.Climate change has altered the watershed hydrological processes in time and space distribution, especially in worldlarge rivers.Watershed hydrological process simulation based on physically based distributed hydrological model can could have better results compared with the lumped models.However, watershed hydrological process simulation includes large amount of calculations, especially in large rivers, thus needing huge computing resources that may not be steadily available for the researchers or at high expense, this seriously restricted the research and application. To solve this problem, the current parallel method are mostly parallel computing in space and time dimensions.They calculate the natural features orderly thatbased on distributed hydrological model by grid (unit, a basin) from upstream to downstream.This articleproposes ahigh-performancecomputing method of hydrological process simulation with high speedratio and parallel efficiency.It combinedthe runoff characteristics of time and space of distributed hydrological model withthe methods adopting distributed data storage, memory database, distributed computing, parallel computing based on computing power unit.The method has strong adaptability and extensibility,which means it canmake full use of the computing and storage resources under the condition of limited computing resources, and the computing efficiency can be improved linearly with the increase of computing resources .This method can satisfy the parallel computing requirements ofhydrological process simulation in small, medium and large rivers.

  16. Building a highly available and intrusion tolerant Database Security and Protection System (DSPS).

    PubMed

    Cai, Liang; Yang, Xiao-Hu; Dong, Jin-Xiang

    2003-01-01

    Database Security and Protection System (DSPS) is a security platform for fighting malicious DBMS. The security and performance are critical to DSPS. The authors suggested a key management scheme by combining the server group structure to improve availability and the key distribution structure needed by proactive security. This paper detailed the implementation of proactive security in DSPS. After thorough performance analysis, the authors concluded that the performance difference between the replicated mechanism and proactive mechanism becomes smaller and smaller with increasing number of concurrent connections; and that proactive security is very useful and practical for large, critical applications.

  17. USBombus, a database of contemporary survey data for North American Bumble Bees (Hymenoptera, Apidae, Bombus) distributed in the United States

    USDA-ARS?s Scientific Manuscript database

    This paper describes USBombus, a large dataset that represents the outcomes of one of the largest standardized surveys of bee pollinators (Hymenoptera, Apidae, Bombus) globally. The motivation to collect live bumble bees across the US was to examine the decline and conservation status of Bombus affi...

  18. Australia's continental-scale acoustic tracking database and its automated quality control process

    NASA Astrophysics Data System (ADS)

    Hoenner, Xavier; Huveneers, Charlie; Steckenreuter, Andre; Simpfendorfer, Colin; Tattersall, Katherine; Jaine, Fabrice; Atkins, Natalia; Babcock, Russ; Brodie, Stephanie; Burgess, Jonathan; Campbell, Hamish; Heupel, Michelle; Pasquer, Benedicte; Proctor, Roger; Taylor, Matthew D.; Udyawer, Vinay; Harcourt, Robert

    2018-01-01

    Our ability to predict species responses to environmental changes relies on accurate records of animal movement patterns. Continental-scale acoustic telemetry networks are increasingly being established worldwide, producing large volumes of information-rich geospatial data. During the last decade, the Integrated Marine Observing System's Animal Tracking Facility (IMOS ATF) established a permanent array of acoustic receivers around Australia. Simultaneously, IMOS developed a centralised national database to foster collaborative research across the user community and quantify individual behaviour across a broad range of taxa. Here we present the database and quality control procedures developed to collate 49.6 million valid detections from 1891 receiving stations. This dataset consists of detections for 3,777 tags deployed on 117 marine species, with distances travelled ranging from a few to thousands of kilometres. Connectivity between regions was only made possible by the joint contribution of IMOS infrastructure and researcher-funded receivers. This dataset constitutes a valuable resource facilitating meta-analysis of animal movement, distributions, and habitat use, and is important for relating species distribution shifts with environmental covariates.

  19. Global Distribution of Outbreaks of Water-Associated Infectious Diseases

    PubMed Central

    Yang, Kun; LeJeune, Jeffrey; Alsdorf, Doug; Lu, Bo; Shum, C. K.; Liang, Song

    2012-01-01

    Background Water plays an important role in the transmission of many infectious diseases, which pose a great burden on global public health. However, the global distribution of these water-associated infectious diseases and underlying factors remain largely unexplored. Methods and Findings Based on the Global Infectious Disease and Epidemiology Network (GIDEON), a global database including water-associated pathogens and diseases was developed. In this study, reported outbreak events associated with corresponding water-associated infectious diseases from 1991 to 2008 were extracted from the database. The location of each reported outbreak event was identified and geocoded into a GIS database. Also collected in the GIS database included geo-referenced socio-environmental information including population density (2000), annual accumulated temperature, surface water area, and average annual precipitation. Poisson models with Bayesian inference were developed to explore the association between these socio-environmental factors and distribution of the reported outbreak events. Based on model predictions a global relative risk map was generated. A total of 1,428 reported outbreak events were retrieved from the database. The analysis suggested that outbreaks of water-associated diseases are significantly correlated with socio-environmental factors. Population density is a significant risk factor for all categories of reported outbreaks of water-associated diseases; water-related diseases (e.g., vector-borne diseases) are associated with accumulated temperature; water-washed diseases (e.g., conjunctivitis) are inversely related to surface water area; both water-borne and water-related diseases are inversely related to average annual rainfall. Based on the model predictions, “hotspots” of risks for all categories of water-associated diseases were explored. Conclusions At the global scale, water-associated infectious diseases are significantly correlated with socio-environmental factors, impacting all regions which are affected disproportionately by different categories of water-associated infectious diseases. PMID:22348158

  20. SIRSALE: integrated video database management tools

    NASA Astrophysics Data System (ADS)

    Brunie, Lionel; Favory, Loic; Gelas, J. P.; Lefevre, Laurent; Mostefaoui, Ahmed; Nait-Abdesselam, F.

    2002-07-01

    Video databases became an active field of research during the last decade. The main objective in such systems is to provide users with capabilities to friendly search, access and playback distributed stored video data in the same way as they do for traditional distributed databases. Hence, such systems need to deal with hard issues : (a) video documents generate huge volumes of data and are time sensitive (streams must be delivered at a specific bitrate), (b) contents of video data are very hard to be automatically extracted and need to be humanly annotated. To cope with these issues, many approaches have been proposed in the literature including data models, query languages, video indexing etc. In this paper, we present SIRSALE : a set of video databases management tools that allow users to manipulate video documents and streams stored in large distributed repositories. All the proposed tools are based on generic models that can be customized for specific applications using ad-hoc adaptation modules. More precisely, SIRSALE allows users to : (a) browse video documents by structures (sequences, scenes, shots) and (b) query the video database content by using a graphical tool, adapted to the nature of the target video documents. This paper also presents an annotating interface which allows archivists to describe the content of video documents. All these tools are coupled to a video player integrating remote VCR functionalities and are based on active network technology. So, we present how dedicated active services allow an optimized video transport for video streams (with Tamanoir active nodes). We then describe experiments of using SIRSALE on an archive of news video and soccer matches. The system has been demonstrated to professionals with a positive feedback. Finally, we discuss open issues and present some perspectives.

  1. Multi-parameter vital sign database to assist in alarm optimization for general care units.

    PubMed

    Welch, James; Kanter, Benjamin; Skora, Brooke; McCombie, Scott; Henry, Isaac; McCombie, Devin; Kennedy, Rosemary; Soller, Babs

    2016-12-01

    Continual vital sign assessment on the general care, medical-surgical floor is expected to provide early indication of patient deterioration and increase the effectiveness of rapid response teams. However, there is concern that continual, multi-parameter vital sign monitoring will produce alarm fatigue. The objective of this study was the development of a methodology to help care teams optimize alarm settings. An on-body wireless monitoring system was used to continually assess heart rate, respiratory rate, SpO 2 and noninvasive blood pressure in the general ward of ten hospitals between April 1, 2014 and January 19, 2015. These data, 94,575 h for 3430 patients are contained in a large database, accessible with cloud computing tools. Simulation scenarios assessed the total alarm rate as a function of threshold and annunciation delay (s). The total alarm rate of ten alarms/patient/day predicted from the cloud-hosted database was the same as the total alarm rate for a 10 day evaluation (1550 h for 36 patients) in an independent hospital. Plots of vital sign distributions in the cloud-hosted database were similar to other large databases published by different authors. The cloud-hosted database can be used to run simulations for various alarm thresholds and annunciation delays to predict the total alarm burden experienced by nursing staff. This methodology might, in the future, be used to help reduce alarm fatigue without sacrificing the ability to continually monitor all vital signs.

  2. Generalized entropies and the similarity of texts

    NASA Astrophysics Data System (ADS)

    Altmann, Eduardo G.; Dias, Laércio; Gerlach, Martin

    2017-01-01

    We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf’s law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen-Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (from the google n-gram database) and scientific papers (indexed by Web of Science).

  3. Patterns, biases and prospects in the distribution and diversity of Neotropical snakes.

    PubMed

    Guedes, Thaís B; Sawaya, Ricardo J; Zizka, Alexander; Laffan, Shawn; Faurby, Søren; Pyron, R Alexander; Bérnils, Renato S; Jansen, Martin; Passos, Paulo; Prudente, Ana L C; Cisneros-Heredia, Diego F; Braz, Henrique B; Nogueira, Cristiano de C; Antonelli, Alexandre; Meiri, Shai

    2018-01-01

    We generated a novel database of Neotropical snakes (one of the world's richest herpetofauna) combining the most comprehensive, manually compiled distribution dataset with publicly available data. We assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling density and sampling biases. We compiled three databases of species occurrences: a dataset downloaded from the Global Biodiversity Information Facility (GBIF), a verified dataset built through taxonomic work and specialized literature, and a combined dataset comprising a cleaned version of the GBIF dataset merged with the verified dataset. Neotropics, Behrmann projection equivalent to 1° × 1°. Specimens housed in museums during the last 150 years. Squamata: Serpentes. Geographical information system (GIS). The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well-sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed. The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high-diversity areas and sampling efforts be directed towards Amazonia and poorly known species.

  4. Global Statistics of Bolides in the Terrestrial Atmosphere

    NASA Astrophysics Data System (ADS)

    Chernogor, L. F.; Shevelyov, M. B.

    2017-06-01

    Purpose: Evaluation and analysis of distribution of the number of meteoroid (mini asteroid) falls as a function of glow energy, velocity, the region of maximum glow altitude, and geographic coordinates. Design/methodology/approach: The satellite database on the glow of 693 mini asteroids, which were decelerated in the terrestrial atmosphere, has been used for evaluating basic meteoroid statistics. Findings: A rapid decrease in the number of asteroids with increasing of their glow energy is confirmed. The average speed of the celestial bodies is equal to about 17.9 km/s. The altitude of maximum glow most often equals to 30-40 km. The distribution law for a number of meteoroids entering the terrestrial atmosphere in longitude and latitude (after excluding the component in latitudinal dependence due to the geometry) is approximately uniform. Conclusions: Using a large enough database of measurements, the meteoroid (mini asteroid) statistics has been evaluated.

  5. Diamond Eye: a distributed architecture for image data mining

    NASA Astrophysics Data System (ADS)

    Burl, Michael C.; Fowlkes, Charless; Roden, Joe; Stechert, Andre; Mukhtar, Saleem

    1999-02-01

    Diamond Eye is a distributed software architecture, which enables users (scientists) to analyze large image collections by interacting with one or more custom data mining servers via a Java applet interface. Each server is coupled with an object-oriented database and a computational engine, such as a network of high-performance workstations. The database provides persistent storage and supports querying of the 'mined' information. The computational engine provides parallel execution of expensive image processing, object recognition, and query-by-content operations. Key benefits of the Diamond Eye architecture are: (1) the design promotes trial evaluation of advanced data mining and machine learning techniques by potential new users (all that is required is to point a web browser to the appropriate URL), (2) software infrastructure that is common across a range of science mining applications is factored out and reused, and (3) the system facilitates closer collaborations between algorithm developers and domain experts.

  6. VST project: distributed control system overview

    NASA Astrophysics Data System (ADS)

    Mancini, Dario; Mazzola, Germana; Molfese, C.; Schipani, Pietro; Brescia, Massimo; Marty, Laurent; Rossi, Emilio

    2003-02-01

    The VLT Survey Telescope (VST) is a co-operative program between the European Southern Observatory (ESO) and the INAF Capodimonte Astronomical Observatory (OAC), Naples, for the study, design, and realization of a 2.6-m wide-field optical imaging telescope to be operated at the Paranal Observatory, Chile. The telescope design, manufacturing and integration are responsibility of OAC. The VST has been specifically designed to carry out stand-alone observations in the UV to I spectral range and to supply target databases for the ESO Very Large Telescope (VLT). The control hardware is based on a large utilization of distributed embedded specialized controllers specifically designed, prototyped and manufactured by the Technology Working Group for VST project. The use of a field bus improves the whole system reliability in terms of high level flexibility, control speed and allow to reduce drastically the plant distribution in the instrument. The paper describes the philosophy and the architecture of the VST control HW with particular reference to the advantages of this distributed solution for the VST project.

  7. Data Sharing in DHT Based P2P Systems

    NASA Astrophysics Data System (ADS)

    Roncancio, Claudia; Del Pilar Villamil, María; Labbé, Cyril; Serrano-Alvarado, Patricia

    The evolution of peer-to-peer (P2P) systems triggered the building of large scale distributed applications. The main application domain is data sharing across a very large number of highly autonomous participants. Building such data sharing systems is particularly challenging because of the “extreme” characteristics of P2P infrastructures: massive distribution, high churn rate, no global control, potentially untrusted participants... This article focuses on declarative querying support, query optimization and data privacy on a major class of P2P systems, that based on Distributed Hash Table (P2P DHT). The usual approaches and the algorithms used by classic distributed systems and databases for providing data privacy and querying services are not well suited to P2P DHT systems. A considerable amount of work was required to adapt them for the new challenges such systems present. This paper describes the most important solutions found. It also identifies important future research trends in data management in P2P DHT systems.

  8. Fullerene data mining using bibliometrics and database tomography

    PubMed

    Kostoff; Braun; Schubert; Toothman; Humenik

    2000-01-01

    Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.

  9. There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order".

    PubMed

    Nielsen, Jakob T; Mulder, Frans A A

    2016-01-01

    The protein universe consists of a continuum of structures ranging from full order to complete disorder. As the structured part of the proteome has been intensively studied, stably folded proteins are increasingly well documented and understood. However, proteins that are fully, or in large part, disordered are much less well characterized. Here we collected NMR chemical shifts in a small database for 117 protein sequences that are known to contain disorder. We demonstrate that NMR chemical shift data can be brought to bear as an exquisite judge of protein disorder at the residue level, and help in validation. With the help of secondary chemical shift analysis we demonstrate that the proteins in the database span the full spectrum of disorder, but still, largely segregate into two classes; disordered with small segments of order scattered along the sequence, and structured with small segments of disorder inserted between the different structured regions. A detailed analysis reveals that the distribution of order/disorder along the sequence shows a complex and asymmetric distribution, that is highly protein-dependent. Access to ratified training data further suggests an avenue to improving prediction of disorder from sequence.

  10. Data Processing Factory for the Sloan Digital Sky Survey

    NASA Astrophysics Data System (ADS)

    Stoughton, Christopher; Adelman, Jennifer; Annis, James T.; Hendry, John; Inkmann, John; Jester, Sebastian; Kent, Steven M.; Kuropatkin, Nickolai; Lee, Brian; Lin, Huan; Peoples, John, Jr.; Sparks, Robert; Tucker, Douglas; Vanden Berk, Dan; Yanny, Brian; Yocum, Dan

    2002-12-01

    The Sloan Digital Sky Survey (SDSS) data handling presents two challenges: large data volume and timely production of spectroscopic plates from imaging data. A data processing factory, using technologies both old and new, handles this flow. Distribution to end users is via disk farms, to serve corrected images and calibrated spectra, and a database, to efficiently process catalog queries. For distribution of modest amounts of data from Apache Point Observatory to Fermilab, scripts use rsync to update files, while larger data transfers are accomplished by shipping magnetic tapes commercially. All data processing pipelines are wrapped in scripts to address consecutive phases: preparation, submission, checking, and quality control. We constructed the factory by chaining these pipelines together while using an operational database to hold processed imaging catalogs. The science database catalogs all imaging and spectroscopic object, with pointers to the various external files associated with them. Diverse computing systems address particular processing phases. UNIX computers handle tape reading and writing, as well as calibration steps that require access to a large amount of data with relatively modest computational demands. Commodity CPUs process steps that require access to a limited amount of data with more demanding computations requirements. Disk servers optimized for cost per Gbyte serve terabytes of processed data, while servers optimized for disk read speed run SQLServer software to process queries on the catalogs. This factory produced data for the SDSS Early Data Release in June 2001, and it is currently producing Data Release One, scheduled for January 2003.

  11. Regeneration of cervix after excisional treatment for cervical intraepithelial neoplasia: a study of collagen distribution.

    PubMed

    Phadnis, S V; Atilade, A; Bowring, J; Kyrgiou, M; Young, M P A; Evans, H; Paraskevaidis, E; Walker, P

    2011-12-01

    To study the distribution of collagen in the regenerated cervical tissue after excisional treatment for cervical intraepithelial neoplasia (CIN). Cohort study. A large tertiary teaching hospital in London. Women who underwent repeat excisional treatment for treatment failure or persistent CIN. Eligible women who underwent a repeat excisional treatment for treatment failure, including hysterectomy, between January 2002 and December 2007 in our colposcopy unit were identified by the Infoflex(®) database and SNOMED encoded histopathology database. Collagen expression was assessed using picro-Sirius red stain and the intensity of staining was compared in paired specimens from the first and second treatments. Differences in collagen expression were examined in the paired excisional treatment specimens. A total of 17 women were included. Increased collagen expression in the regenerated cervical tissue of the second cone compared with the first cone was noted in six women, decreased expression was noted in five women, and the pattern of collagen distribution was equivocal in six women. There is no overall change in collagen distribution during regeneration following excisional treatment for CIN. © 2011 The Authors BJOG An International Journal of Obstetrics and Gynaecology © 2011 RCOG.

  12. The FRUITY database on AGB stars: past, present and future

    NASA Astrophysics Data System (ADS)

    Cristallo, S.; Piersanti, L.; Straniero, O.

    2016-01-01

    We present and show the features of the FRUITY database, an interactive web- based interface devoted to the nucleosynthesis in AGB stars. We describe the current available set of AGB models (largely expanded with respect to the original one) with masses in the range 1.3≤M/M⊙≤3.0 and metallicities -2.15 ≤[Fe/H]≤+0.15. We illustrate the details of our s-process surface distributions and we compare our results to observations. Moreover, we introduce a new set of models where the effects of rotation are taken into account. Finally, we shortly describe next planned upgrades.

  13. HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing

    PubMed Central

    Karimi, Ramin; Hajdu, Andras

    2016-01-01

    Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis. PMID:26884678

  14. HTSFinder: Powerful Pipeline of DNA Signature Discovery by Parallel and Distributed Computing.

    PubMed

    Karimi, Ramin; Hajdu, Andras

    2016-01-01

    Comprehensive effort for low-cost sequencing in the past few years has led to the growth of complete genome databases. In parallel with this effort, a strong need, fast and cost-effective methods and applications have been developed to accelerate sequence analysis. Identification is the very first step of this task. Due to the difficulties, high costs, and computational challenges of alignment-based approaches, an alternative universal identification method is highly required. Like an alignment-free approach, DNA signatures have provided new opportunities for the rapid identification of species. In this paper, we present an effective pipeline HTSFinder (high-throughput signature finder) with a corresponding k-mer generator GkmerG (genome k-mers generator). Using this pipeline, we determine the frequency of k-mers from the available complete genome databases for the detection of extensive DNA signatures in a reasonably short time. Our application can detect both unique and common signatures in the arbitrarily selected target and nontarget databases. Hadoop and MapReduce as parallel and distributed computing tools with commodity hardware are used in this pipeline. This approach brings the power of high-performance computing into the ordinary desktop personal computers for discovering DNA signatures in large databases such as bacterial genome. A considerable number of detected unique and common DNA signatures of the target database bring the opportunities to improve the identification process not only for polymerase chain reaction and microarray assays but also for more complex scenarios such as metagenomics and next-generation sequencing analysis.

  15. Data Mining on Distributed Medical Databases: Recent Trends and Future Directions

    NASA Astrophysics Data System (ADS)

    Atilgan, Yasemin; Dogan, Firat

    As computerization in healthcare services increase, the amount of available digital data is growing at an unprecedented rate and as a result healthcare organizations are much more able to store data than to extract knowledge from it. Today the major challenge is to transform these data into useful information and knowledge. It is important for healthcare organizations to use stored data to improve quality while reducing cost. This paper first investigates the data mining applications on centralized medical databases, and how they are used for diagnostic and population health, then introduces distributed databases. The integration needs and issues of distributed medical databases are described. Finally the paper focuses on data mining studies on distributed medical databases.

  16. Production and distribution of scientific and technical databases - Comparison among Japan, US and Europe

    NASA Astrophysics Data System (ADS)

    Onodera, Natsuo; Mizukami, Masayuki

    This paper estimates several quantitative indice on production and distribution of scientific and technical databases based on various recent publications and attempts to compare the indice internationally. Raw data used for the estimation are brought mainly from the Database Directory (published by MITI) for database production and from some domestic and foreign study reports for database revenues. The ratio of the indice among Japan, US and Europe for usage of database is similar to those for general scientific and technical activities such as population and R&D expenditures. But Japanese contributions to production, revenue and over-countory distribution of databases are still lower than US and European countries. International comparison of relative database activities between public and private sectors is also discussed.

  17. The History and Legacy of BATSE

    NASA Technical Reports Server (NTRS)

    Fishman, Gerald J.

    2012-01-01

    The BATSE experiment on the Compton Gamma-ray Observatory was the first large detector system specifically designed for the study of gamma-ray bursts. The eight large-area detectors allowed full-sky coverage and were optimized to operate in the energy region of the peak emission of most GRBs. BATSE provided detailed observations of the temporal and spectral characteristics of large samples of GRBs, and it was the first experiment to provide rapid notifications of the coarse location of many them. It also provided strong evidence for the cosmological distances to GRBs through the observation of the sky distribution and intensity distribution of numerous GRBs. The large number of GRBs observed with the high- sensitivity BATSE detectors continues to provide a database of GRB spectral and temporal properties in the primary energy range of GRB emission that will likely not be exceeded for at least another decade. The origin and development of the BATSE experiment, some highlights from the mission and its continuing legacy are described in this paper.

  18. Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

    PubMed

    Keuleers, Emmanuel; Balota, David A

    2015-01-01

    This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.

  19. Performance related issues in distributed database systems

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    The key elements of research performed during the year long effort of this project are: Investigate the effects of heterogeneity in distributed real time systems; Study the requirements to TRAC towards building a heterogeneous database system; Study the effects of performance modeling on distributed database performance; and Experiment with an ORACLE based heterogeneous system.

  20. The role of digital cartographic data in the geosciences

    USGS Publications Warehouse

    Guptill, S.C.

    1983-01-01

    The increasing demand of the Nation's natural resource developers for the manipulation, analysis, and display of large quantities of earth-science data has necessitated the use of computers and the building of geoscience information systems. These systems require, in digital form, the spatial data on map products. The basic cartographic data shown on quadrangle maps provide a foundation for the addition of geological and geophysical data. If geoscience information systems are to realize their full potential, large amounts of digital cartographic base data must be available. A major goal of the U.S. Geological Survey is to create, maintain, manage, and distribute a national cartographic and geographic digital database. This unified database will contain numerous categories (hydrography, hypsography, land use, etc.) that, through the use of standardized data-element definitions and formats, can be used easily and flexibly to prepare cartographic products and perform geoscience analysis. ?? 1983.

  1. Multiple elastic scattering of electrons in condensed matter

    NASA Astrophysics Data System (ADS)

    Jablonski, A.

    2017-01-01

    Since the 1940s, much attention has been devoted to the problem of accurate theoretical description of electron transport in condensed matter. The needed information for describing different aspects of the electron transport is the angular distribution of electron directions after multiple elastic collisions. This distribution can be expanded into a series of Legendre polynomials with coefficients, Al. In the present work, a database of these coefficients for all elements up to uranium (Z=92) and a dense grid of electron energies varying from 50 to 5000 eV has been created. The database makes possible the following applications: (i) accurate interpolation of coefficients Al for any element and any energy from the above range, (ii) fast calculations of the differential and total elastic-scattering cross sections, (iii) determination of the angular distribution of directions after multiple collisions, (iv) calculations of the probability of elastic backscattering from solids, and (v) calculations of the calibration curves for determination of the inelastic mean free paths of electrons. The last two applications provide data with comparable accuracy to Monte Carlo simulations, yet the running time is decreased by several orders of magnitude. All of the above applications are implemented in the Fortran program MULTI_SCATT. Numerous illustrative runs of this program are described. Despite a relatively large volume of the database of coefficients Al, the program MULTI_SCATT can be readily run on personal computers.

  2. SORTEZ: a relational translator for NCBI's ASN.1 database.

    PubMed

    Hart, K W; Searls, D B; Overton, G C

    1994-07-01

    The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.

  3. Exploratory visualization of astronomical data on ultra-high-resolution wall displays

    NASA Astrophysics Data System (ADS)

    Pietriga, Emmanuel; del Campo, Fernando; Ibsen, Amanda; Primet, Romain; Appert, Caroline; Chapuis, Olivier; Hempel, Maren; Muñoz, Roberto; Eyheramendy, Susana; Jordan, Andres; Dole, Hervé

    2016-07-01

    Ultra-high-resolution wall displays feature a very high pixel density over a large physical surface, which makes them well-suited to the collaborative, exploratory visualization of large datasets. We introduce FITS-OW, an application designed for such wall displays, that enables astronomers to navigate in large collections of FITS images, query astronomical databases, and display detailed, complementary data and documents about multiple sources simultaneously. We describe how astronomers interact with their data using both the wall's touchsensitive surface and handheld devices. We also report on the technical challenges we addressed in terms of distributed graphics rendering and data sharing over the computer clusters that drive wall displays.

  4. High-Performance Secure Database Access Technologies for HEP Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matthew Vranicar; John Weicher

    2006-04-17

    The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less

  5. Developing science gateways for drug discovery in a grid environment.

    PubMed

    Pérez-Sánchez, Horacio; Rezaei, Vahid; Mezhuyev, Vitaliy; Man, Duhu; Peña-García, Jorge; den-Haan, Helena; Gesing, Sandra

    2016-01-01

    Methods for in silico screening of large databases of molecules increasingly complement and replace experimental techniques to discover novel compounds to combat diseases. As these techniques become more complex and computationally costly we are faced with an increasing problem to provide the research community of life sciences with a convenient tool for high-throughput virtual screening on distributed computing resources. To this end, we recently integrated the biophysics-based drug-screening program FlexScreen into a service, applicable for large-scale parallel screening and reusable in the context of scientific workflows. Our implementation is based on Pipeline Pilot and Simple Object Access Protocol and provides an easy-to-use graphical user interface to construct complex workflows, which can be executed on distributed computing resources, thus accelerating the throughput by several orders of magnitude.

  6. Large-scale patterns of insect and disease activity in the conterminous United States and Alaska from the National Insect and Disease Detection Survey Database, 2010

    Treesearch

    Kevin M. Potter; Jeanine L. Paschke

    2013-01-01

    Analyzing patterns of forest pest infestations, diseases occurrences, forest declines and related biotic stress factors is necessary to monitor the health of forested ecosystems and their potential impacts on forest structure, composition, biodiversity, and species distributions (Castello and others 1995). Introduced nonnative insects and diseases, in particular, can...

  7. Reprocessing Microflare Data

    NASA Technical Reports Server (NTRS)

    Ryan, James M.

    1999-01-01

    The report concerns work on detecting and cataloging solar microflares using an automated. An accompanying figure represents the solar microflare distribution during the period of April 1991 to November 1992, the height of solar activity after the launch of CGRO. It also shows the distribution extending below the distribution obtained at GSFC by manual means. We have implemented significant refinements in the search algorithm. The algorithm in its simplest form searches for transient events and based upon the distribution of the signal among the different BATSE detectors, we can assign it to be of solar origin if the signal distribution conforms to what one expects from a burst or transient from that direction. One of the major problems in an earlier effort was to search for microflares and large flares simultaneously. The requirement for a dynamic range of almost 10(exp 4) resulted in ambiguous identifications at the low side of the distribution. We have since restricted the search to events with peak count rates under 2000/s. Larger events are easily identified in the manual search, so we have chosen not to duplicate that work. The second problem was that missing counts existed below channel 0 in the BATSE Large Area Detector (LAD) data. These have been recovered and are now included in the search process. This provides data below 20 keV, and as we get closer to the thermal part of the spectrum, it provides greater sensitivity. The third problem was that too many BATSE detectors were used in the search. Detectors with pointing directions far from the Sun, although detecting the event, had poorly known responses. Detectors greater than approximately 60 degrees off the Sun are no longer included in the search process. By reducing the systematic errors with the large off-axis detectors we can conduct more rigorous statistical tests of a candidate event to ascertain whether it originated from the solar direction. We have reprocessed the period in the early mission that covers solar maximum and constructed the microflare distribution shown in the figure. The results of the automated search start to deviate from the manual search results below about 1000/s. Not only do we now have this distribution but we have a database of solar microflares that was used to construct the distribution. This database contains the signal at higher energy channels as well as that in channel zero (and below). From this one can, using software at GSFC, construct a photon spectrum for some of the larger microflares. It can also be used in other solar studies, especially those that correlate the X-ray flux with emission at other wavelengths. With some additional effort we hope to integrate this database into the corresponding one residing at the Solar Data Analysis Center at GSFC. The entire CGRO mission's data can now be reprocessed to obtain the microflare distribution at all phases of the solar cycle. This work is in progress. The results of this work will be presented in forthcoming scientific workshops and conferences.

  8. Database of potential sources for earthquakes larger than magnitude 6 in Northern California

    USGS Publications Warehouse

    ,

    1996-01-01

    The Northern California Earthquake Potential (NCEP) working group, composed of many contributors and reviewers in industry, academia and government, has pooled its collective expertise and knowledge of regional tectonics to identify potential sources of large earthquakes in northern California. We have created a map and database of active faults, both surficial and buried, that forms the basis for the northern California portion of the national map of probabilistic seismic hazard. The database contains 62 potential sources, including fault segments and areally distributed zones. The working group has integrated constraints from broadly based plate tectonic and VLBI models with local geologic slip rates, geodetic strain rate, and microseismicity. Our earthquake source database derives from a scientific consensus that accounts for conflict in the diverse data. Our preliminary product, as described in this report brings to light many gaps in the data, including a need for better information on the proportion of deformation in fault systems that is aseismic.

  9. Big biology meets microclimatology: defining thermal niches of ectotherms at landscape scales for conservation planning.

    PubMed

    Isaak, Daniel J; Wenger, Seth J; Young, Michael K

    2017-04-01

    Temperature profoundly affects ecology, a fact ever more evident as the ability to measure thermal environments increases and global changes alter these environments. The spatial structure of thermalscapes is especially relevant to the distribution and abundance of ectothermic organisms, but the ability to describe biothermal relationships at extents and grains relevant to conservation planning has been limited by small or sparse data sets. Here, we combine a large occurrence database of >23 000 aquatic species surveys with stream microclimate scenarios supported by an equally large temperature database for a 149 000-km mountain stream network to describe thermal relationships for 14 fish and amphibian species. Species occurrence probabilities peaked across a wide range of temperatures (7.0-18.8°C) but distinct warm- or cold-edge distribution boundaries were apparent for all species and represented environments where populations may be most sensitive to thermal changes. Warm-edge boundary temperatures for a native species of conservation concern were used with geospatial data sets and a habitat occupancy model to highlight subsets of the network where conservation measures could benefit local populations by maintaining cool temperatures. Linking that strategic approach to local estimates of habitat impairment remains a key challenge but is also an opportunity to build relationships and develop synergies between the research, management, and regulatory communities. As with any data mining or species distribution modeling exercise, care is required in analysis and interpretation of results, but the use of large biological data sets with accurate microclimate scenarios can provide valuable information about the thermal ecology of many ectotherms and a spatially explicit way of guiding conservation investments. © 2017 by the Ecological Society of America.

  10. WLN's Database: New Directions.

    ERIC Educational Resources Information Center

    Ziegman, Bruce N.

    1988-01-01

    Describes features of the Western Library Network's database, including the database structure, authority control, contents, quality control, and distribution methods. The discussion covers changes in distribution necessitated by increasing telecommunications costs and the development of optical data disk products. (CLB)

  11. Model for non-Gaussian intraday stock returns

    NASA Astrophysics Data System (ADS)

    Gerig, Austin; Vicente, Javier; Fuentes, Miguel A.

    2009-12-01

    Stock prices are known to exhibit non-Gaussian dynamics, and there is much interest in understanding the origin of this behavior. Here, we present a model that explains the shape and scaling of the distribution of intraday stock price fluctuations (called intraday returns) and verify the model using a large database for several stocks traded on the London Stock Exchange. We provide evidence that the return distribution for these stocks is non-Gaussian and similar in shape and that the distribution appears stable over intraday time scales. We explain these results by assuming the volatility of returns is constant intraday but varies over longer periods such that its inverse square follows a gamma distribution. This produces returns that are Student distributed for intraday time scales. The predicted results show excellent agreement with the data for all stocks in our study and over all regions of the return distribution.

  12. Mars Global Digital Dune Database: MC2-MC29

    USGS Publications Warehouse

    Hayward, Rosalyn K.; Mullins, Kevin F.; Fenton, L.K.; Hare, T.M.; Titus, T.N.; Bourke, M.C.; Colaprete, Anthony; Christensen, P.R.

    2007-01-01

    Introduction The Mars Global Digital Dune Database presents data and describes the methodology used in creating the database. The database provides a comprehensive and quantitative view of the geographic distribution of moderate- to large-size dune fields from 65? N to 65? S latitude and encompasses ~ 550 dune fields. The database will be expanded to cover the entire planet in later versions. Although we have attempted to include all dune fields between 65? N and 65? S, some have likely been excluded for two reasons: 1) incomplete THEMIS IR (daytime) coverage may have caused us to exclude some moderate- to large-size dune fields or 2) resolution of THEMIS IR coverage (100m/pixel) certainly caused us to exclude smaller dune fields. The smallest dune fields in the database are ~ 1 km2 in area. While the moderate to large dune fields are likely to constitute the largest compilation of sediment on the planet, smaller stores of sediment of dunes are likely to be found elsewhere via higher resolution data. Thus, it should be noted that our database excludes all small dune fields and some moderate to large dune fields as well. Therefore the absence of mapped dune fields does not mean that such dune fields do not exist and is not intended to imply a lack of saltating sand in other areas. Where availability and quality of THEMIS visible (VIS) or Mars Orbiter Camera narrow angle (MOC NA) images allowed, we classifed dunes and included dune slipface measurements, which were derived from gross dune morphology and represent the prevailing wind direction at the last time of significant dune modification. For dunes located within craters, the azimuth from crater centroid to dune field centroid was calculated. Output from a general circulation model (GCM) is also included. In addition to polygons locating dune fields, the database includes over 1800 selected Thermal Emission Imaging System (THEMIS) infrared (IR), THEMIS visible (VIS) and Mars Orbiter Camera Narrow Angle (MOC NA) images that were used to build the database. The database is presented in a variety of formats. It is presented as a series of ArcReader projects which can be opened using the free ArcReader software. The latest version of ArcReader can be downloaded at http://www.esri.com/software/arcgis/arcreader/download.html. The database is also presented in ArcMap projects. The ArcMap projects allow fuller use of the data, but require ESRI ArcMap? software. Multiple projects were required to accommodate the large number of images needed. A fuller description of the projects can be found in the Dunes_ReadMe file and the ReadMe_GIS file in the Documentation folder. For users who prefer to create their own projects, the data is available in ESRI shapefile and geodatabase formats, as well as the open Geographic Markup Language (GML) format. A printable map of the dunes and craters in the database is available as a Portable Document Format (PDF) document. The map is also included as a JPEG file. ReadMe files are available in PDF and ASCII (.txt) files. Tables are available in both Excel (.xls) and ASCII formats.

  13. In-Memory Graph Databases for Web-Scale Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castellana, Vito G.; Morari, Alessandro; Weaver, Jesse R.

    RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++more » compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.« less

  14. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows.

    PubMed

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P; Zijdenbos, Alex P; Evans, Alan C

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources.

  15. The pipeline system for Octave and Matlab (PSOM): a lightweight scripting framework and execution engine for scientific workflows

    PubMed Central

    Bellec, Pierre; Lavoie-Courchesne, Sébastien; Dickinson, Phil; Lerch, Jason P.; Zijdenbos, Alex P.; Evans, Alan C.

    2012-01-01

    The analysis of neuroimaging databases typically involves a large number of inter-connected steps called a pipeline. The pipeline system for Octave and Matlab (PSOM) is a flexible framework for the implementation of pipelines in the form of Octave or Matlab scripts. PSOM does not introduce new language constructs to specify the steps and structure of the workflow. All steps of analysis are instead described by a regular Matlab data structure, documenting their associated command and options, as well as their input, output, and cleaned-up files. The PSOM execution engine provides a number of automated services: (1) it executes jobs in parallel on a local computing facility as long as the dependencies between jobs allow for it and sufficient resources are available; (2) it generates a comprehensive record of the pipeline stages and the history of execution, which is detailed enough to fully reproduce the analysis; (3) if an analysis is started multiple times, it executes only the parts of the pipeline that need to be reprocessed. PSOM is distributed under an open-source MIT license and can be used without restriction for academic or commercial projects. The package has no external dependencies besides Matlab or Octave, is straightforward to install and supports of variety of operating systems (Linux, Windows, Mac). We ran several benchmark experiments on a public database including 200 subjects, using a pipeline for the preprocessing of functional magnetic resonance images (fMRI). The benchmark results showed that PSOM is a powerful solution for the analysis of large databases using local or distributed computing resources. PMID:22493575

  16. Scenarios of large mammal loss in Europe for the 21st century.

    PubMed

    Rondinini, Carlo; Visconti, Piero

    2015-08-01

    Distributions and populations of large mammals are declining globally, leading to an increase in their extinction risk. We forecasted the distribution of extant European large mammals (17 carnivores and 10 ungulates) based on 2 Rio+20 scenarios of socioeconomic development: business as usual and reduced impact through changes in human consumption of natural resources. These scenarios are linked to scenarios of land-use change and climate change through the spatial allocation of land conversion up to 2050. We used a hierarchical framework to forecast the extent and distribution of mammal habitat based on species' habitat preferences (as described in the International Union for Conservation of Nature Red List database) within a suitable climatic space fitted to the species' current geographic range. We analyzed the geographic and taxonomic variation of habitat loss for large mammals and the potential effect of the reduced impact policy on loss mitigation. Averaging across scenarios, European large mammals were predicted to lose 10% of their habitat by 2050 (25% in the worst-case scenario). Predicted loss was much higher for species in northwestern Europe, where habitat is expected to be lost due to climate and land-use change. Change in human consumption patterns was predicted to substantially improve the conservation of habitat for European large mammals, but not enough to reduce extinction risk if species cannot adapt locally to climate change or disperse. © 2015 Society for Conservation Biology.

  17. Spatiotemporal database of US congressional elections, 1896–2014

    PubMed Central

    Wolf, Levi John

    2017-01-01

    High-quality historical data about US Congressional elections has long provided common ground for electoral studies. However, advances in geographic information science have recently made it efficient to compile, distribute, and analyze large spatio-temporal data sets on the structure of US Congressional districts. A single spatio-temporal data set that relates US Congressional election results to the spatial extent of the constituencies has not yet been developed. To address this, existing high-quality data sets of elections returns were combined with a spatiotemporal data set on Congressional district boundaries to generate a new spatio-temporal database of US Congressional election results that are explicitly linked to the geospatial data about the districts themselves. PMID:28809849

  18. The Advanced Composition Explorer Shock Database and Application to Particle Acceleration Theory

    NASA Technical Reports Server (NTRS)

    Parker, L. Neergaard; Zank, G. P.

    2015-01-01

    The theory of particle acceleration via diffusive shock acceleration (DSA) has been studied in depth by Gosling et al. (1981), van Nes et al. (1984), Mason (2000), Desai et al. (2003), Zank et al. (2006), among many others. Recently, Parker and Zank (2012, 2014) and Parker et al. (2014) using the Advanced Composition Explorer (ACE) shock database at 1 AU explored two questions: does the upstream distribution alone have enough particles to account for the accelerated downstream distribution and can the slope of the downstream accelerated spectrum be explained using DSA? As was shown in this research, diffusive shock acceleration can account for a large population of the shocks. However, Parker and Zank (2012, 2014) and Parker et al. (2014) used a subset of the larger ACE database. Recently, work has successfully been completed that allows for the entire ACE database to be considered in a larger statistical analysis. We explain DSA as it applies to single and multiple shocks and the shock criteria used in this statistical analysis. We calculate the expected injection energy via diffusive shock acceleration given upstream parameters defined from the ACE Solar Wind Electron, Proton, and Alpha Monitor (SWEPAM) data to construct the theoretical upstream distribution. We show the comparison of shock strength derived from diffusive shock acceleration theory to observations in the 50 keV to 5 MeV range from an instrument on ACE. Parameters such as shock velocity, shock obliquity, particle number, and time between shocks are considered. This study is further divided into single and multiple shock categories, with an additional emphasis on forward-forward multiple shock pairs. Finally with regard to forward-forward shock pairs, results comparing injection energies of the first shock, second shock, and second shock with previous energetic population will be given.

  19. The Advanced Composition Explorer Shock Database and Application to Particle Acceleration Theory

    NASA Technical Reports Server (NTRS)

    Parker, L. Neergaard; Zank, G. P.

    2015-01-01

    The theory of particle acceleration via diffusive shock acceleration (DSA) has been studied in depth by Gosling et al. (1981), van Nes et al. (1984), Mason (2000), Desai et al. (2003), Zank et al. (2006), among many others. Recently, Parker and Zank (2012, 2014) and Parker et al. (2014) using the Advanced Composition Explorer (ACE) shock database at 1 AU explored two questions: does the upstream distribution alone have enough particles to account for the accelerated downstream distribution and can the slope of the downstream accelerated spectrum be explained using DSA? As was shown in this research, diffusive shock acceleration can account for a large population of the shocks. However, Parker and Zank (2012, 2014) and Parker et al. (2014) used a subset of the larger ACE database. Recently, work has successfully been completed that allows for the entire ACE database to be considered in a larger statistical analysis. We explain DSA as it applies to single and multiple shocks and the shock criteria used in this statistical analysis. We calculate the expected injection energy via diffusive shock acceleration given upstream parameters defined from the ACE Solar Wind Electron, Proton, and Alpha Monitor (SWEPAM) data to construct the theoretical upstream distribution. We show the comparison of shock strength derived from diffusive shock acceleration theory to observations in the 50 keV to 5 MeV range from an instrument on ACE. Parameters such as shock velocity, shock obliquity, particle number, and time between shocks are considered. This study is further divided into single and multiple shock categories, with an additional emphasis on forward-forward multiple shock pairs. Finally with regard to forwardforward shock pairs, results comparing injection energies of the first shock, second shock, and second shock with previous energetic population will be given.

  20. Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database

    NASA Astrophysics Data System (ADS)

    Wan, Meng; Wu, Chao; Wang, Jing; Qiu, Yulei; Xin, Liping; Mullender, Sjoerd; Mühleisen, Hannes; Scheers, Bart; Zhang, Ying; Nes, Niels; Kersten, Martin; Huang, Yongpan; Deng, Jinsong; Wei, Jianyan

    2016-11-01

    The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ˜175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB’s high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

  1. Large-scale patterns of insect and disease activity in the Conterminous United States and Alaska from the National Insect and Disease Detection Survey Database, 2007 and 2008

    Treesearch

    Kevin M. Potter

    2012-01-01

    Analyzing patterns of forest pest infestation is necessary for monitoring the health of forested ecosystems because of the impacts that insects and diseases can have on forest structure, composition, biodiversity, and species distributions (Castello and others 1995). In particular, introduced nonnative insects and diseases can extensively damage the diversity, ecology...

  2. Estimating haplotype frequencies by combining data from large DNA pools with database information.

    PubMed

    Gasbarra, Dario; Kulathinal, Sangita; Pirinen, Matti; Sillanpää, Mikko J

    2011-01-01

    We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.

  3. The BioMart community portal: an innovative alternative to large, centralized data repositories

    PubMed Central

    Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J.; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W.; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J.; Dassi, Erik; Genova, Alex Di; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C.; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M.; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K.; Hayes, Richard D.; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M.; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W.; Dayem Ullah, Abu Z.; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek

    2015-01-01

    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. PMID:25897122

  4. Performance analysis of static locking in replicated distributed database systems

    NASA Technical Reports Server (NTRS)

    Kuang, Yinghong; Mukkamala, Ravi

    1991-01-01

    Data replication and transaction deadlocks can severely affect the performance of distributed database systems. Many current evaluation techniques ignore these aspects, because it is difficult to evaluate through analysis and time consuming to evaluate through simulation. A technique is used that combines simulation and analysis to closely illustrate the impact of deadlock and evaluate performance of replicated distributed database with both shared and exclusive locks.

  5. TCOF1 mutation database: novel mutation in the alternatively spliced exon 6A and update in mutation nomenclature.

    PubMed

    Splendore, Alessandra; Fanganiello, Roberto D; Masotti, Cibele; Morganti, Lucas S C; Passos-Bueno, M Rita

    2005-05-01

    Recently, a novel exon was described in TCOF1 that, although alternatively spliced, is included in the major protein isoform. In addition, most published mutations in this gene do not conform to current mutation nomenclature guidelines. Given these observations, we developed an online database of TCOF1 mutations in which all the reported mutations are renamed according to standard recommendations and in reference to the genomic and novel cDNA reference sequences (www.genoma.ib.usp.br/TCOF1_database). We also report in this work: 1) results of the first screening for large deletions in TCOF1 by Southern blot in patients without mutation detected by direct sequencing; 2) the identification of the first pathogenic mutation in the newly described exon 6A; and 3) statistical analysis of pathogenic mutations and polymorphism distribution throughout the gene.

  6. Heterogeneous distributed query processing: The DAVID system

    NASA Technical Reports Server (NTRS)

    Jacobs, Barry E.

    1985-01-01

    The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.

  7. Unleashing spatially distributed ecohydrology modeling using Big Data tools

    NASA Astrophysics Data System (ADS)

    Miles, B.; Idaszak, R.

    2015-12-01

    Physically based spatially distributed ecohydrology models are useful for answering science and management questions related to the hydrology and biogeochemistry of prairie, savanna, forested, as well as urbanized ecosystems. However, these models can produce hundreds of gigabytes of spatial output for a single model run over decadal time scales when run at regional spatial scales and moderate spatial resolutions (~100-km2+ at 30-m spatial resolution) or when run for small watersheds at high spatial resolutions (~1-km2 at 3-m spatial resolution). Numerical data formats such as HDF5 can store arbitrarily large datasets. However even in HPC environments, there are practical limits on the size of single files that can be stored and reliably backed up. Even when such large datasets can be stored, querying and analyzing these data can suffer from poor performance due to memory limitations and I/O bottlenecks, for example on single workstations where memory and bandwidth are limited, or in HPC environments where data are stored separately from computational nodes. The difficulty of storing and analyzing spatial data from ecohydrology models limits our ability to harness these powerful tools. Big Data tools such as distributed databases have the potential to surmount the data storage and analysis challenges inherent to large spatial datasets. Distributed databases solve these problems by storing data close to computational nodes while enabling horizontal scalability and fault tolerance. Here we present the architecture of and preliminary results from PatchDB, a distributed datastore for managing spatial output from the Regional Hydro-Ecological Simulation System (RHESSys). The initial version of PatchDB uses message queueing to asynchronously write RHESSys model output to an Apache Cassandra cluster. Once stored in the cluster, these data can be efficiently queried to quickly produce both spatial visualizations for a particular variable (e.g. maps and animations), as well as point time series of arbitrary variables at arbitrary points in space within a watershed or river basin. By treating ecohydrology modeling as a Big Data problem, we hope to provide a platform for answering transformative science and management questions related to water quantity and quality in a world of non-stationary climate.

  8. Performance Studies on Distributed Virtual Screening

    PubMed Central

    Krüger, Jens; de la Garza, Luis; Kohlbacher, Oliver; Nagel, Wolfgang E.

    2014-01-01

    Virtual high-throughput screening (vHTS) is an invaluable method in modern drug discovery. It permits screening large datasets or databases of chemical structures for those structures binding possibly to a drug target. Virtual screening is typically performed by docking code, which often runs sequentially. Processing of huge vHTS datasets can be parallelized by chunking the data because individual docking runs are independent of each other. The goal of this work is to find an optimal splitting maximizing the speedup while considering overhead and available cores on Distributed Computing Infrastructures (DCIs). We have conducted thorough performance studies accounting not only for the runtime of the docking itself, but also for structure preparation. Performance studies were conducted via the workflow-enabled science gateway MoSGrid (Molecular Simulation Grid). As input we used benchmark datasets for protein kinases. Our performance studies show that docking workflows can be made to scale almost linearly up to 500 concurrent processes distributed even over large DCIs, thus accelerating vHTS campaigns significantly. PMID:25032219

  9. Assessing species distribution using Google Street View: a pilot study with the Pine Processionary Moth.

    PubMed

    Rousselet, Jérôme; Imbert, Charles-Edouard; Dekri, Anissa; Garcia, Jacques; Goussard, Francis; Vincent, Bruno; Denux, Olivier; Robinet, Christelle; Dorkeld, Franck; Roques, Alain; Rossi, Jean-Pierre

    2013-01-01

    Mapping species spatial distribution using spatial inference and prediction requires a lot of data. Occurrence data are generally not easily available from the literature and are very time-consuming to collect in the field. For that reason, we designed a survey to explore to which extent large-scale databases such as Google maps and Google Street View could be used to derive valid occurrence data. We worked with the Pine Processionary Moth (PPM) Thaumetopoea pityocampa because the larvae of that moth build silk nests that are easily visible. The presence of the species at one location can therefore be inferred from visual records derived from the panoramic views available from Google Street View. We designed a standardized procedure allowing evaluating the presence of the PPM on a sampling grid covering the landscape under study. The outputs were compared to field data. We investigated two landscapes using grids of different extent and mesh size. Data derived from Google Street View were highly similar to field data in the large-scale analysis based on a square grid with a mesh of 16 km (96% of matching records). Using a 2 km mesh size led to a strong divergence between field and Google-derived data (46% of matching records). We conclude that Google database might provide useful occurrence data for mapping the distribution of species which presence can be visually evaluated such as the PPM. However, the accuracy of the output strongly depends on the spatial scales considered and on the sampling grid used. Other factors such as the coverage of Google Street View network with regards to sampling grid size and the spatial distribution of host trees with regards to road network may also be determinant.

  10. Assessing Species Distribution Using Google Street View: A Pilot Study with the Pine Processionary Moth

    PubMed Central

    Dekri, Anissa; Garcia, Jacques; Goussard, Francis; Vincent, Bruno; Denux, Olivier; Robinet, Christelle; Dorkeld, Franck; Roques, Alain; Rossi, Jean-Pierre

    2013-01-01

    Mapping species spatial distribution using spatial inference and prediction requires a lot of data. Occurrence data are generally not easily available from the literature and are very time-consuming to collect in the field. For that reason, we designed a survey to explore to which extent large-scale databases such as Google maps and Google street view could be used to derive valid occurrence data. We worked with the Pine Processionary Moth (PPM) Thaumetopoea pityocampa because the larvae of that moth build silk nests that are easily visible. The presence of the species at one location can therefore be inferred from visual records derived from the panoramic views available from Google street view. We designed a standardized procedure allowing evaluating the presence of the PPM on a sampling grid covering the landscape under study. The outputs were compared to field data. We investigated two landscapes using grids of different extent and mesh size. Data derived from Google street view were highly similar to field data in the large-scale analysis based on a square grid with a mesh of 16 km (96% of matching records). Using a 2 km mesh size led to a strong divergence between field and Google-derived data (46% of matching records). We conclude that Google database might provide useful occurrence data for mapping the distribution of species which presence can be visually evaluated such as the PPM. However, the accuracy of the output strongly depends on the spatial scales considered and on the sampling grid used. Other factors such as the coverage of Google street view network with regards to sampling grid size and the spatial distribution of host trees with regards to road network may also be determinant. PMID:24130675

  11. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  12. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  13. Architecture Knowledge for Evaluating Scalable Databases

    DTIC Science & Technology

    2015-01-16

    problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly

  14. Preliminary surficial geologic map database of the Amboy 30 x 60 minute quadrangle, California

    USGS Publications Warehouse

    Bedford, David R.; Miller, David M.; Phelps, Geoffrey A.

    2006-01-01

    The surficial geologic map database of the Amboy 30x60 minute quadrangle presents characteristics of surficial materials for an area approximately 5,000 km2 in the eastern Mojave Desert of California. This map consists of new surficial mapping conducted between 2000 and 2005, as well as compilations of previous surficial mapping. Surficial geology units are mapped and described based on depositional process and age categories that reflect the mode of deposition, pedogenic effects occurring post-deposition, and, where appropriate, the lithologic nature of the material. The physical properties recorded in the database focus on those that drive hydrologic, biologic, and physical processes such as particle size distribution (PSD) and bulk density. This version of the database is distributed with point data representing locations of samples for both laboratory determined physical properties and semi-quantitative field-based information. Future publications will include the field and laboratory data as well as maps of distributed physical properties across the landscape tied to physical process models where appropriate. The database is distributed in three parts: documentation, spatial map-based data, and printable map graphics of the database. Documentation includes this file, which provides a discussion of the surficial geology and describes the format and content of the map data, a database 'readme' file, which describes the database contents, and FGDC metadata for the spatial map information. Spatial data are distributed as Arc/Info coverage in ESRI interchange (e00) format, or as tabular data in the form of DBF3-file (.DBF) file formats. Map graphics files are distributed as Postscript and Adobe Portable Document Format (PDF) files, and are appropriate for representing a view of the spatial database at the mapped scale.

  15. Modelling the distribution of domestic ducks in Monsoon Asia

    USGS Publications Warehouse

    Van Bockel, Thomas P.; Prosser, Diann; Franceschini, Gianluca; Biradar, Chandra; Wint, William; Robinson, Tim; Gilbert, Marius

    2011-01-01

    Domestic ducks are considered to be an important reservoir of highly pathogenic avian influenza (HPAI), as shown by a number of geospatial studies in which they have been identified as a significant risk factor associated with disease presence. Despite their importance in HPAI epidemiology, their large-scale distribution in Monsoon Asia is poorly understood. In this study, we created a spatial database of domestic duck census data in Asia and used it to train statistical distribution models for domestic duck distributions at a spatial resolution of 1km. The method was based on a modelling framework used by the Food and Agriculture Organisation to produce the Gridded Livestock of the World (GLW) database, and relies on stratified regression models between domestic duck densities and a set of agro-ecological explanatory variables. We evaluated different ways of stratifying the analysis and of combining the prediction to optimize the goodness of fit of the predictions. We found that domestic duck density could be predicted with reasonable accuracy (mean RMSE and correlation coefficient between log-transformed observed and predicted densities being 0.58 and 0.80, respectively), using a stratification based on livestock production systems. We tested the use of artificially degraded data on duck distributions in Thailand and Vietnam as training data, and compared the modelled outputs with the original high-resolution data. This showed, for these two countries at least, that these approaches could be used to accurately disaggregate provincial level (administrative level 1) statistical data to provide high resolution model distributions.

  16. In-database processing of a large collection of remote sensing data: applications and implementation

    NASA Astrophysics Data System (ADS)

    Kikhtenko, Vladimir; Mamash, Elena; Chubarov, Dmitri; Voronina, Polina

    2016-04-01

    Large archives of remote sensing data are now available to scientists, yet the need to work with individual satellite scenes or product files constrains studies that span a wide temporal range or spatial extent. The resources (storage capacity, computing power and network bandwidth) required for such studies are often beyond the capabilities of individual geoscientists. This problem has been tackled before in remote sensing research and inspired several information systems. Some of them such as NASA Giovanni [1] and Google Earth Engine have already proved their utility for science. Analysis tasks involving large volumes of numerical data are not unique to Earth Sciences. Recent advances in data science are enabled by the development of in-database processing engines that bring processing closer to storage, use declarative query languages to facilitate parallel scalability and provide high-level abstraction of the whole dataset. We build on the idea of bridging the gap between file archives containing remote sensing data and databases by integrating files into relational database as foreign data sources and performing analytical processing inside the database engine. Thereby higher level query language can efficiently address problems of arbitrary size: from accessing the data associated with a specific pixel or a grid cell to complex aggregation over spatial or temporal extents over a large number of individual data files. This approach was implemented using PostgreSQL for a Siberian regional archive of satellite data products holding hundreds of terabytes of measurements from multiple sensors and missions taken over a decade-long span. While preserving the original storage layout and therefore compatibility with existing applications the in-database processing engine provides a toolkit for provisioning remote sensing data in scientific workflows and applications. The use of SQL - a widely used higher level declarative query language - simplifies interoperability between desktop GIS, web applications and geographic web services and interactive scientific applications (MATLAB, IPython). The system is also automatically ingesting direct readout data from meteorological and research satellites in near-real time with distributed acquisition workflows managed by Taverna workflow engine [2]. The system has demonstrated its utility in performing non-trivial analytic processing such as the computation of the Robust Satellite Technique (RST) indices [3]. It had been useful in different tasks such as studying urban heat islands, analyzing patterns in the distribution of wildfire occurrences, detecting phenomena related to seismic and earthquake activity. Initial experience has highlighted several limitations of the proposed approach yet it has demonstrated ability to facilitate the use of large archives of remote sensing data by geoscientists. 1. J.G. Acker, G. Leptoukh, Online analysis enhances use of NASA Earth science data. EOS Trans. AGU, 2007, 88(2), P. 14-17. 2. D. Hull, K. Wolsfencroft, R. Stevens, C. Goble, M.R. Pocock, P. Li and T. Oinn, Taverna: a tool for building and running workflows of services. Nucleic Acids Research. 2006. V. 34. P. W729-W732. 3. V. Tramutoli, G. Di Bello, N. Pergola, S. Piscitelli, Robust satellite techniques for remote sensing of seismically active areas // Annals of Geophysics. 2001. no. 44(2). P. 295-312.

  17. High throughput profile-profile based fold recognition for the entire human proteome.

    PubMed

    McGuffin, Liam J; Smith, Richard T; Bryson, Kevin; Sørensen, Søren-Aksel; Jones, David T

    2006-06-07

    In order to maintain the most comprehensive structural annotation databases we must carry out regular updates for each proteome using the latest profile-profile fold recognition methods. The ability to carry out these updates on demand is necessary to keep pace with the regular updates of sequence and structure databases. Providing the highest quality structural models requires the most intensive profile-profile fold recognition methods running with the very latest available sequence databases and fold libraries. However, running these methods on such a regular basis for every sequenced proteome requires large amounts of processing power. In this paper we describe and benchmark the JYDE (Job Yield Distribution Environment) system, which is a meta-scheduler designed to work above cluster schedulers, such as Sun Grid Engine (SGE) or Condor. We demonstrate the ability of JYDE to distribute the load of genomic-scale fold recognition across multiple independent Grid domains. We use the most recent profile-profile version of our mGenTHREADER software in order to annotate the latest version of the Human proteome against the latest sequence and structure databases in as short a time as possible. We show that our JYDE system is able to scale to large numbers of intensive fold recognition jobs running across several independent computer clusters. Using our JYDE system we have been able to annotate 99.9% of the protein sequences within the Human proteome in less than 24 hours, by harnessing over 500 CPUs from 3 independent Grid domains. This study clearly demonstrates the feasibility of carrying out on demand high quality structural annotations for the proteomes of major eukaryotic organisms. Specifically, we have shown that it is now possible to provide complete regular updates of profile-profile based fold recognition models for entire eukaryotic proteomes, through the use of Grid middleware such as JYDE.

  18. The ATLAS TAGS database distribution and management - Operational challenges of a multi-terabyte distributed database

    NASA Astrophysics Data System (ADS)

    Viegas, F.; Malon, D.; Cranshaw, J.; Dimitrov, G.; Nowak, M.; Nairz, A.; Goossens, L.; Gallas, E.; Gamboa, C.; Wong, A.; Vinek, E.

    2010-04-01

    The TAG files store summary event quantities that allow a quick selection of interesting events. This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools. The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites. The sheer volume and high rate of production makes this application a challenge to data and resource management, in many aspects. This paper will focus on the operational challenges of this system. These include: uploading the data from files to the CERN's and remote sites' databases; distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.

  19. Brief Report: Databases in the Asia-Pacific Region: The Potential for a Distributed Network Approach.

    PubMed

    Lai, Edward Chia-Cheng; Man, Kenneth K C; Chaiyakunapruk, Nathorn; Cheng, Ching-Lan; Chien, Hsu-Chih; Chui, Celine S L; Dilokthornsakul, Piyameth; Hardy, N Chantelle; Hsieh, Cheng-Yang; Hsu, Chung Y; Kubota, Kiyoshi; Lin, Tzu-Chieh; Liu, Yanfang; Park, Byung Joo; Pratt, Nicole; Roughead, Elizabeth E; Shin, Ju-Young; Watcharathanakij, Sawaeng; Wen, Jin; Wong, Ian C K; Yang, Yea-Huei Kao; Zhang, Yinghong; Setoguchi, Soko

    2015-11-01

    This study describes the availability and characteristics of databases in Asian-Pacific countries and assesses the feasibility of a distributed network approach in the region. A web-based survey was conducted among investigators using healthcare databases in the Asia-Pacific countries. Potential survey participants were identified through the Asian Pharmacoepidemiology Network. Investigators from a total of 11 databases participated in the survey. Database sources included four nationwide claims databases from Japan, South Korea, and Taiwan; two nationwide electronic health records from Hong Kong and Singapore; a regional electronic health record from western China; two electronic health records from Thailand; and cancer and stroke registries from Taiwan. We identified 11 databases with capabilities for distributed network approaches. Many country-specific coding systems and terminologies have been already converted to international coding systems. The harmonization of health expenditure data is a major obstacle for future investigations attempting to evaluate issues related to medical costs.

  20. Performance analysis of static locking in replicated distributed database systems

    NASA Technical Reports Server (NTRS)

    Kuang, Yinghong; Mukkamala, Ravi

    1991-01-01

    Data replications and transaction deadlocks can severely affect the performance of distributed database systems. Many current evaluation techniques ignore these aspects, because it is difficult to evaluate through analysis and time consuming to evaluate through simulation. Here, a technique is discussed that combines simulation and analysis to closely illustrate the impact of deadlock and evaluate performance of replicated distributed databases with both shared and exclusive locks.

  1. A Database for Decision-Making in Training and Distributed Learning Technology

    DTIC Science & Technology

    1998-04-01

    developer must answer these questions: ♦ Who will develop the courseware? Should we outsource ? ♦ What media should we use? How much will it cost? ♦ What...to develop , the database can be useful for answering staffing questions and planning transitions to technology- assisted courses. The database...of distributed learning curricula in com- parison to traditional methods. To develop a military-wide distributed learning plan, the existing course

  2. Waiting time distributions in financial markets

    NASA Astrophysics Data System (ADS)

    Sabatelli, L.; Keating, S.; Dudley, J.; Richmond, P.

    2002-05-01

    We study waiting time distributions for data representing two completely different financial markets that have dramatically different characteristics. The first are data for the Irish market during the 19th century over the period 1850 to 1854. A total of 10 stocks out of a database of 60 are examined. The second database is for Japanese yen currency fluctuations during the latter part of the 20th century (1989-1992). The Irish stock activity was recorded on a daily basis and activity was characterised by waiting times that varied from one day to a few months. The Japanese yen data was recorded every minute over 24 hour periods and the waiting times varied from a minute to a an hour or so. For both data sets, the waiting time distributions exhibit power law tails. The results for Irish daily data can be easily interpreted using the model of a continuous time random walk first proposed by Montroll and applied recently to some financial data by Mainardi, Scalas and colleagues. Yen data show a quite different behaviour. For large waiting times, the Irish data exhibit a cut off; the Yen data exhibit two humps that could arise as result of major trading centres in the World.

  3. Distribution System Upgrade Unit Cost Database

    DOE Data Explorer

    Horowitz, Kelsey

    2017-11-30

    This database contains unit cost information for different components that may be used to integrate distributed photovotaic (D-PV) systems onto distribution systems. Some of these upgrades and costs may also apply to integration of other distributed energy resources (DER). Which components are required, and how many of each, is system-specific and should be determined by analyzing the effects of distributed PV at a given penetration level on the circuit of interest in combination with engineering assessments on the efficacy of different solutions to increase the ability of the circuit to host additional PV as desired. The current state of the distribution system should always be considered in these types of analysis. The data in this database was collected from a variety of utilities, PV developers, technology vendors, and published research reports. Where possible, we have included information on the source of each data point and relevant notes. In some cases where data provided is sensitive or proprietary, we were not able to specify the source, but provide other information that may be useful to the user (e.g. year, location where equipment was installed). NREL has carefully reviewed these sources prior to inclusion in this database. Additional information about the database, data sources, and assumptions is included in the "Unit_cost_database_guide.doc" file included in this submission. This guide provides important information on what costs are included in each entry. Please refer to this guide before using the unit cost database for any purpose.

  4. Study on parallel and distributed management of RS data based on spatial database

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Wu, Hongqiao; Liu, Shijin

    2009-10-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  5. Patterns, biases and prospects in the distribution and diversity of Neotropical snakes

    PubMed Central

    Sawaya, Ricardo J.; Zizka, Alexander; Laffan, Shawn; Faurby, Søren; Pyron, R. Alexander; Bérnils, Renato S.; Jansen, Martin; Passos, Paulo; Prudente, Ana L. C.; Cisneros‐Heredia, Diego F.; Braz, Henrique B.; Nogueira, Cristiano de C.; Antonelli, Alexandre; Meiri, Shai

    2017-01-01

    Abstract Motivation We generated a novel database of Neotropical snakes (one of the world's richest herpetofauna) combining the most comprehensive, manually compiled distribution dataset with publicly available data. We assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling density and sampling biases. Main types of variables contained We compiled three databases of species occurrences: a dataset downloaded from the Global Biodiversity Information Facility (GBIF), a verified dataset built through taxonomic work and specialized literature, and a combined dataset comprising a cleaned version of the GBIF dataset merged with the verified dataset. Spatial location and grain Neotropics, Behrmann projection equivalent to 1° × 1°. Time period Specimens housed in museums during the last 150 years. Major taxa studied Squamata: Serpentes. Software format Geographical information system (GIS). Results The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well‐sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed. Main conclusions The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high‐diversity areas and sampling efforts be directed towards Amazonia and poorly known species. PMID:29398972

  6. Unsuspected Diversity of Arsenite-Oxidizing Bacteria as Revealed by Widespread Distribution of the aoxB Gene in Prokaryotes ▿ †

    PubMed Central

    Heinrich-Salmeron, Audrey; Cordi, Audrey; Brochier-Armanet, Céline; Halter, David; Pagnout, Christophe; Abbaszadeh-fard, Elham; Montaut, Didier; Seby, Fabienne; Bertin, Philippe N.; Bauda, Pascale; Arsène-Ploetze, Florence

    2011-01-01

    In this study, new strains were isolated from an environment with elevated arsenic levels, Sainte-Marie-aux-Mines (France), and the diversity of aoxB genes encoding the arsenite oxidase large subunit was investigated. The distribution of bacterial aoxB genes is wider than what was previously thought. AoxB subfamilies characterized by specific signatures were identified. An exhaustive analysis of AoxB sequences from this study and from public databases shows that horizontal gene transfer has likely played a role in the spreading of aoxB in prokaryotic communities. PMID:21571879

  7. Surface Observation Climatic Summaries for Nellis AFB, Nevada

    DTIC Science & Technology

    1992-05-01

    DISTRIBUTION OF THIS DOMWI! TO THE PUBLIC AT LARGE, OR BY THE DEFENSE TECHNICAL IMKNMTI1M CENTER (DTIC) TO THE NATIOAL T•ECICRL INFO TION SERVICE (NTS). JOSEPH...DOCUMENTS FORMERLY KNOW AS THE REVISED UNIFON4 StlMMRRY OF SURFACE OBSERVATIONS (RUSSW) AND THE LIMITED SURFACE OBSERVATIONS CLIMATIC SWSU.R (LISOCS...RECORD (POR). -SUMMARY OF DAY- (SOD) INFOEATIOR IS SUMMARIZED )FRO ALL AVAILABLE DATA IN THE OL-A, USARETJC CLIMATIC DATABASE. 14. SUBJECT TOM

  8. On the frequency-magnitude distribution of converging boundaries

    NASA Astrophysics Data System (ADS)

    Marzocchi, W.; Laura, S.; Heuret, A.; Funiciello, F.

    2011-12-01

    The occurrence of the last mega-thrust earthquake in Japan has clearly remarked the high risk posed to society by such events in terms of social and economic losses even at large spatial scale. The primary component for a balanced and objective mitigation of the impact of these earthquakes is the correct forecast of where such kind of events may occur in the future. To date, there is a wide range of opinions about where mega-thrust earthquakes can occur. Here, we aim at presenting some detailed statistical analysis of a database of worldwide interplate earthquakes occurring at current subduction zones. The database has been recently published in the framework of the EURYI Project 'Convergent margins and seismogenesis: defining the risk of great earthquakes by using statistical data and modelling', and it provides a unique opportunity to explore in detail the seismogenic process in subducting lithosphere. In particular, the statistical analysis of this database allows us to explore many interesting scientific issues such as the existence of different frequency-magnitude distributions across the trenches, the quantitative characterization of subduction zones that are able to produce more likely mega-thrust earthquakes, the prominent features that characterize converging boundaries with different seismic activity and so on. Besides the scientific importance, such issues may lead to improve our mega-thrust earthquake forecasting capability.

  9. Resident database interfaces to the DAVID system, a heterogeneous distributed database management system

    NASA Technical Reports Server (NTRS)

    Moroh, Marsha

    1988-01-01

    A methodology for building interfaces of resident database management systems to a heterogeneous distributed database management system under development at NASA, the DAVID system, was developed. The feasibility of that methodology was demonstrated by construction of the software necessary to perform the interface task. The interface terminology developed in the course of this research is presented. The work performed and the results are summarized.

  10. Telecommunications issues of intelligent database management for ground processing systems in the EOS era

    NASA Technical Reports Server (NTRS)

    Touch, Joseph D.

    1994-01-01

    Future NASA earth science missions, including the Earth Observing System (EOS), will be generating vast amounts of data that must be processed and stored at various locations around the world. Here we present a stepwise-refinement of the intelligent database management (IDM) of the distributed active archive center (DAAC - one of seven regionally-located EOSDIS archive sites) architecture, to showcase the telecommunications issues involved. We develop this architecture into a general overall design. We show that the current evolution of protocols is sufficient to support IDM at Gbps rates over large distances. We also show that network design can accommodate a flexible data ingestion storage pipeline and a user extraction and visualization engine, without interference between the two.

  11. Extreme Precipitation and High-Impact Landslides

    NASA Technical Reports Server (NTRS)

    Kirschbaum, Dalia; Adler, Robert; Huffman, George; Peters-Lidard, Christa

    2012-01-01

    It is well known that extreme or prolonged rainfall is the dominant trigger of landslides; however, there remain large uncertainties in characterizing the distribution of these hazards and meteorological triggers at the global scale. Researchers have evaluated the spatiotemporal distribution of extreme rainfall and landslides at local and regional scale primarily using in situ data, yet few studies have mapped rainfall-triggered landslide distribution globally due to the dearth of landslide data and consistent precipitation information. This research uses a newly developed Global Landslide Catalog (GLC) and a 13-year satellite-based precipitation record from Tropical Rainfall Measuring Mission (TRMM) data. For the first time, these two unique products provide the foundation to quantitatively evaluate the co-occurence of precipitation and rainfall-triggered landslides globally. The GLC, available from 2007 to the present, contains information on reported rainfall-triggered landslide events around the world using online media reports, disaster databases, etc. When evaluating this database, we observed that 2010 had a large number of high-impact landslide events relative to previous years. This study considers how variations in extreme and prolonged satellite-based rainfall are related to the distribution of landslides over the same time scales for three active landslide areas: Central America, the Himalayan Arc, and central-eastern China. Several test statistics confirm that TRMM rainfall generally scales with the observed increase in landslide reports and fatal events for 2010 and previous years over each region. These findings suggest that the co-occurrence of satellite precipitation and landslide reports may serve as a valuable indicator for characterizing the spatiotemporal distribution of landslide-prone areas in order to establish a global rainfall-triggered landslide climatology. This research also considers the sources for this extreme rainfall, citing teleconnections from ENSO as likely contributors to regional precipitation variability. This work demonstrates the potential for using satellite-based precipitation estimates to identify potentially active landslide areas at the global scale in order to improve landslide cataloging and quantify landslide triggering at daily, monthly and yearly time scales.

  12. CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

    NASA Astrophysics Data System (ADS)

    Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

    2011-12-01

    JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web-based interface by a metadata editor in CMO as needed. Then daily differential uptake of metadata from the XML database to databases in several distribution websites is automatically processed using a convertor defined by the EAI software. Currently, CMO is available for three distribution websites: "Deep Sea Floor Rock Sample Database GANSEKI", "Marine Biological Sample Database", and "JAMSTEC E-library of Deep-sea Images". CMO is planned to provide "JAMSTEC Data Site for Research Cruises" with metadata in the future.

  13. Mynodbcsv: lightweight zero-config database solution for handling very large CSV files.

    PubMed

    Adaszewski, Stanisław

    2014-01-01

    Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: "no copy" approach--data stay mostly in the CSV files; "zero configuration"--no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results.

  14. WikiPEATia - a web based platform for assembling peatland data through ‘crowd sourcing’

    NASA Astrophysics Data System (ADS)

    Wisser, D.; Glidden, S.; Fieseher, C.; Treat, C. C.; Routhier, M.; Frolking, S. E.

    2009-12-01

    The Earth System Science community is realizing that peatlands are an important and unique terrestrial ecosystem that has not yet been well-integrated into large-scale earth system analyses. A major hurdle is the lack of accessible, geospatial data of peatland distribution, coupled with data on peatland properties (e.g., vegetation composition, peat depth, basal dates, soil chemistry, peatland class) at the global scale. This data, however, is available at the local scale. Although a comprehensive global database on peatlands probably lags similar data on more economically important ecosystems such as forests, grasslands, croplands, a large amount of field data have been collected over the past several decades. A few efforts have been made to map peatlands at large scales but existing data have not been assembled into a single geospatial database that is publicly accessible or do not depict data with a level of detail that is needed in the Earth System Science Community. A global peatland database would contribute to advances in a number of research fields such as hydrology, vegetation and ecosystem modeling, permafrost modeling, and earth system modeling. We present a Web 2.0 approach that uses state-of-the-art webserver and innovative online mapping technologies and is designed to create such a global database through ‘crowd-sourcing’. Primary functions of the online system include form-driven textual user input of peatland research metadata, spatial data input of peatland areas via a mapping interface, database editing and querying editing capabilities, as well as advanced visualization and data analysis tools. WikiPEATia provides an integrated information technology platform for assembling, integrating, and posting peatland-related geospatial datasets facilitates and encourages research community involvement. A successful effort will make existing peatland data much more useful to the research community, and will help to identify significant data gaps.

  15. Mynodbcsv: Lightweight Zero-Config Database Solution for Handling Very Large CSV Files

    PubMed Central

    Adaszewski, Stanisław

    2014-01-01

    Volumes of data used in science and industry are growing rapidly. When researchers face the challenge of analyzing them, their format is often the first obstacle. Lack of standardized ways of exploring different data layouts requires an effort each time to solve the problem from scratch. Possibility to access data in a rich, uniform manner, e.g. using Structured Query Language (SQL) would offer expressiveness and user-friendliness. Comma-separated values (CSV) are one of the most common data storage formats. Despite its simplicity, with growing file size handling it becomes non-trivial. Importing CSVs into existing databases is time-consuming and troublesome, or even impossible if its horizontal dimension reaches thousands of columns. Most databases are optimized for handling large number of rows rather than columns, therefore, performance for datasets with non-typical layouts is often unacceptable. Other challenges include schema creation, updates and repeated data imports. To address the above-mentioned problems, I present a system for accessing very large CSV-based datasets by means of SQL. It's characterized by: “no copy” approach – data stay mostly in the CSV files; “zero configuration” – no need to specify database schema; written in C++, with boost [1], SQLite [2] and Qt [3], doesn't require installation and has very small size; query rewriting, dynamic creation of indices for appropriate columns and static data retrieval directly from CSV files ensure efficient plan execution; effortless support for millions of columns; due to per-value typing, using mixed text/numbers data is easy; very simple network protocol provides efficient interface for MATLAB and reduces implementation time for other languages. The software is available as freeware along with educational videos on its website [4]. It doesn't need any prerequisites to run, as all of the libraries are included in the distribution package. I test it against existing database solutions using a battery of benchmarks and discuss the results. PMID:25068261

  16. SMALL-SCALE AND GLOBAL DYNAMOS AND THE AREA AND FLUX DISTRIBUTIONS OF ACTIVE REGIONS, SUNSPOT GROUPS, AND SUNSPOTS: A MULTI-DATABASE STUDY

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muñoz-Jaramillo, Andrés; Windmueller, John C.; Amouzou, Ernest C.

    2015-02-10

    In this work, we take advantage of 11 different sunspot group, sunspot, and active region databases to characterize the area and flux distributions of photospheric magnetic structures. We find that, when taken separately, different databases are better fitted by different distributions (as has been reported previously in the literature). However, we find that all our databases can be reconciled by the simple application of a proportionality constant, and that, in reality, different databases are sampling different parts of a composite distribution. This composite distribution is made up by linear combination of Weibull and log-normal distributions—where a pure Weibull (log-normal) characterizesmore » the distribution of structures with fluxes below (above) 10{sup 21}Mx (10{sup 22}Mx). Additionally, we demonstrate that the Weibull distribution shows the expected linear behavior of a power-law distribution (when extended to smaller fluxes), making our results compatible with the results of Parnell et al. We propose that this is evidence of two separate mechanisms giving rise to visible structures on the photosphere: one directly connected to the global component of the dynamo (and the generation of bipolar active regions), and the other with the small-scale component of the dynamo (and the fragmentation of magnetic structures due to their interaction with turbulent convection)« less

  17. Monitoring of services with non-relational databases and map-reduce framework

    NASA Astrophysics Data System (ADS)

    Babik, M.; Souto, F.

    2012-12-01

    Service Availability Monitoring (SAM) is a well-established monitoring framework that performs regular measurements of the core site services and reports the corresponding availability and reliability of the Worldwide LHC Computing Grid (WLCG) infrastructure. One of the existing extensions of SAM is Site Wide Area Testing (SWAT), which gathers monitoring information from the worker nodes via instrumented jobs. This generates quite a lot of monitoring data to process, as there are several data points for every job and several million jobs are executed every day. The recent uptake of non-relational databases opens a new paradigm in the large-scale storage and distributed processing of systems with heavy read-write workloads. For SAM this brings new possibilities to improve its model, from performing aggregation of measurements to storing raw data and subsequent re-processing. Both SAM and SWAT are currently tuned to run at top performance, reaching some of the limits in storage and processing power of their existing Oracle relational database. We investigated the usability and performance of non-relational storage together with its distributed data processing capabilities. For this, several popular systems have been compared. In this contribution we describe our investigation of the existing non-relational databases suited for monitoring systems covering Cassandra, HBase and MongoDB. Further, we present our experiences in data modeling and prototyping map-reduce algorithms focusing on the extension of the already existing availability and reliability computations. Finally, possible future directions in this area are discussed, analyzing the current deficiencies of the existing Grid monitoring systems and proposing solutions to leverage the benefits of the non-relational databases to get more scalable and flexible frameworks.

  18. Community-Supported Data Repositories in Paleobiology: A 'Middle Tail' Between the Geoscientific and Informatics Communities

    NASA Astrophysics Data System (ADS)

    Williams, J. W.; Ashworth, A. C.; Betancourt, J. L.; Bills, B.; Blois, J.; Booth, R.; Buckland, P.; Charles, D.; Curry, B. B.; Goring, S. J.; Davis, E.; Grimm, E. C.; Graham, R. W.; Smith, A. J.

    2015-12-01

    Community-supported data repositories (CSDRs) in paleoecology and paleoclimatology have a decades-long tradition and serve multiple critical scientific needs. CSDRs facilitate synthetic large-scale scientific research by providing open-access and curated data that employ community-supported metadata and data standards. CSDRs serve as a 'middle tail' or boundary organization between information scientists and the long-tail community of individual geoscientists collecting and analyzing paleoecological data. Over the past decades, a distributed network of CSDRs has emerged, each serving a particular suite of data and research communities, e.g. Neotoma Paleoecology Database, Paleobiology Database, International Tree Ring Database, NOAA NCEI for Paleoclimatology, Morphobank, iDigPaleo, and Integrated Earth Data Alliance. Recently, these groups have organized into a common Paleobiology Data Consortium dedicated to improving interoperability and sharing best practices and protocols. The Neotoma Paleoecology Database offers one example of an active and growing CSDR, designed to facilitate research into ecological and evolutionary dynamics during recent past global change. Neotoma combines a centralized database structure with distributed scientific governance via multiple virtual constituent data working groups. The Neotoma data model is flexible and can accommodate a variety of paleoecological proxies from many depositional contests. Data input into Neotoma is done by trained Data Stewards, drawn from their communities. Neotoma data can be searched, viewed, and returned to users through multiple interfaces, including the interactive Neotoma Explorer map interface, REST-ful Application Programming Interfaces (APIs), the neotoma R package, and the Tilia stratigraphic software. Neotoma is governed by geoscientists and provides community engagement through training workshops for data contributors, stewards, and users. Neotoma is engaged in the Paleobiological Data Consortium and other efforts to improve interoperability among cyberinfrastructure in the paleogeosciences.

  19. A BRDF-BPDF database for the analysis of Earth target reflectances

    NASA Astrophysics Data System (ADS)

    Breon, Francois-Marie; Maignan, Fabienne

    2017-01-01

    Land surface reflectance is not isotropic. It varies with the observation geometry that is defined by the sun, view zenith angles, and the relative azimuth. In addition, the reflectance is linearly polarized. The reflectance anisotropy is quantified by the bidirectional reflectance distribution function (BRDF), while its polarization properties are defined by the bidirectional polarization distribution function (BPDF). The POLDER radiometer that flew onboard the PARASOL microsatellite remains the only space instrument that measured numerous samples of the BRDF and BPDF of Earth targets. Here, we describe a database of representative BRDFs and BPDFs derived from the POLDER measurements. From the huge number of data acquired by the spaceborne instrument over a period of 7 years, we selected a set of targets with high-quality observations. The selection aimed for a large number of observations, free of significant cloud or aerosol contamination, acquired in diverse observation geometries with a focus on the backscatter direction that shows the specific hot spot signature. The targets are sorted according to the 16-class International Geosphere-Biosphere Programme (IGBP) land cover classification system, and the target selection aims at a spatial representativeness within the class. The database thus provides a set of high-quality BRDF and BPDF samples that can be used to assess the typical variability of natural surface reflectances or to evaluate models. It is available freely from the PANGAEA website (doi:10.1594/PANGAEA.864090). In addition to the database, we provide a visualization and analysis tool based on the Interactive Data Language (IDL). It allows an interactive analysis of the measurements and a comparison against various BRDF and BPDF analytical models. The present paper describes the input data, the selection principles, the database format, and the analysis tool

  20. An environmental database for Venice and tidal zones

    NASA Astrophysics Data System (ADS)

    Macaluso, L.; Fant, S.; Marani, A.; Scalvini, G.; Zane, O.

    2003-04-01

    The natural environment is a complex, highly variable and physically non reproducible system (not in laboratory, nor in a confined territory). Environmental experimental studies are thus necessarily based on field measurements distributed in time and space. Only extensive data collections can provide the representative samples of the system behavior which are essential for scientific advancement. The assimilation of large data collections into accessible archives must necessarily be implemented in electronic databases. In the case of tidal environments in general, and of the Venice lagoon in particular, it is useful to establish a database, freely accessible to the scientific community, documenting the dynamics of such systems and their response to anthropic pressures and climatic variability. At the Istituto Veneto di Scienze, Lettere ed Arti in Venice (Italy) two internet environmental databases has been developed: one collects information regarding in detail the Venice lagoon; the other co-ordinate the research consortium of the "TIDE" EU RTD project, that attends to three different tidal areas: Venice Lagoon (Italy), Morecambe Bay (England), and Forth Estuary (Scotland). The archives may be accessed through the URL: www.istitutoveneto.it. The first one is freely available and applies to anyone is interested. It is continuously updated and has been structured in order to promote documentation concerning Venetian environment and disseminate this information for educational purposes (see "Dissemination" section). The second one is supplied by scientists and engineers working on this tidal system for various purposes (scientific, management, conservation purposes, etc.); it applies to interested researchers and grows with their own contributions. Both intend to promote scientific communication, to contribute to the realization of a distributed information system collecting homogeneous themes, and to initiate the interconnection among databases regarding different kinds of environment.

  1. Bridging the Gap between the Data Base and User in a Distributed Environment.

    ERIC Educational Resources Information Center

    Howard, Richard D.; And Others

    1989-01-01

    The distribution of databases physically separates users from those who administer the database and the administrators who perform database administration. By drawing on the work of social scientists in reliability and validity, a set of concepts and a list of questions to ensure data quality were developed. (Author/MLW)

  2. A Web-based open-source database for the distribution of hyperspectral signatures

    NASA Astrophysics Data System (ADS)

    Ferwerda, J. G.; Jones, S. D.; Du, Pei-Jun

    2006-10-01

    With the coming of age of field spectroscopy as a non-destructive means to collect information on the physiology of vegetation, there is a need for storage of signatures, and, more importantly, their metadata. Without the proper organisation of metadata, the signatures itself become limited. In order to facilitate re-distribution of data, a database for the storage & distribution of hyperspectral signatures and their metadata was designed. The database was built using open-source software, and can be used by the hyperspectral community to share their data. Data is uploaded through a simple web-based interface. The database recognizes major file-formats by ASD, GER and International Spectronics. The database source code is available for download through the hyperspectral.info web domain, and we happily invite suggestion for additions & modification for the database to be submitted through the online forums on the same website.

  3. Distributed Database Control and Allocation. Volume 3. Distributed Database System Designer’s Handbook.

    DTIC Science & Technology

    1983-10-01

    Multiversion Data 2-18 2.7.1 Multiversion Timestamping 2-20 2.T.2 Multiversion Looking 2-20 2.8 Combining the Techniques 2-22 3. Database Recovery Algorithms...See rTHEM79, GIFF79] for details. 2.7 Multiversion Data Let us return to a database system model where each logical data item is stored at one DM...In a multiversion database each Write wifxl, produces a new copy (or version) of x, denoted xi. Thus, the value of z is a set of ver- sions. For each

  4. Antibiotic distribution channels in Thailand: results of key-informant interviews, reviews of drug regulations and database searches.

    PubMed

    Sommanustweechai, Angkana; Chanvatik, Sunicha; Sermsinsiri, Varavoot; Sivilaikul, Somsajee; Patcharanarumol, Walaiporn; Yeung, Shunmay; Tangcharoensathien, Viroj

    2018-02-01

    To analyse how antibiotics are imported, manufactured, distributed and regulated in Thailand. We gathered information, on antibiotic distribution in Thailand, in in-depth interviews - with 43 key informants from farms, health facilities, pharmaceutical and animal feed industries, private pharmacies and regulators- and in database and literature searches. In 2016-2017, licensed antibiotic distribution in Thailand involves over 700 importers and about 24 000 distributors - e.g. retail pharmacies and wholesalers. Thailand imports antibiotics and active pharmaceutical ingredients. There is no system for monitoring the distribution of active ingredients, some of which are used directly on farms, without being processed. Most antibiotics can be bought from pharmacies, for home or farm use, without a prescription. Although the 1987 Drug Act classified most antibiotics as "dangerous drugs", it only classified a few of them as prescription-only medicines and placed no restrictions on the quantities of antibiotics that could be sold to any individual. Pharmacists working in pharmacies are covered by some of the Act's regulations, but the quality of their dispensing and prescribing appears to be largely reliant on their competences. In Thailand, most antibiotics are easily and widely available from retail pharmacies, without a prescription. If the inappropriate use of active pharmaceutical ingredients and antibiotics is to be reduced, we need to reclassify and restrict access to certain antibiotics and to develop systems to audit the dispensing of antibiotics in the retail sector and track the movements of active ingredients.

  5. BioModels Database: An enhanced, curated and annotated resource for published quantitative kinetic models

    PubMed Central

    2010-01-01

    Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation systems, and to study the clustering of models based upon their annotations. Model deposition to the database today is advised by several publishers of scientific journals. The models in BioModels Database are freely distributed and reusable; the underlying software infrastructure is also available from SourceForge https://sourceforge.net/projects/biomodels/ under the GNU General Public License. PMID:20587024

  6. The Design and Implementation of a Relational to Network Query Translator for a Distributed Database Management System.

    DTIC Science & Technology

    1985-12-01

    RELATIONAL TO NETWORK QUERY TRANSLATOR FOR A DISTRIBUTED DATABASE MANAGEMENT SYSTEM TH ESI S .L Kevin H. Mahoney -- Captain, USAF AFIT/GCS/ENG/85D-7...NETWORK QUERY TRANSLATOR FOR A DISTRIBUTED DATABASE MANAGEMENT SYSTEM - THESIS Presented to the Faculty of the School of Engineering of the Air Force...Institute of Technology Air University In Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Systems - Kevin H. Mahoney

  7. Canopies to Continents: What spatial scales are needed to represent landcover distributions in earth system models?

    NASA Astrophysics Data System (ADS)

    Guenther, A. B.; Duhl, T.

    2011-12-01

    Increasing computational resources have enabled a steady improvement in the spatial resolution used for earth system models. Land surface models and landcover distributions have kept ahead by providing higher spatial resolution than typically used in these models. Satellite observations have played a major role in providing high resolution landcover distributions over large regions or the entire earth surface but ground observations are needed to calibrate these data and provide accurate inputs for models. As our ability to resolve individual landscape components improves, it is important to consider what scale is sufficient for providing inputs to earth system models. The required spatial scale is dependent on the processes being represented and the scientific questions being addressed. This presentation will describe the development a contiguous U.S. landcover database using high resolution imagery (1 to 1000 meters) and surface observations of species composition and other landcover characteristics. The database includes plant functional types and species composition and is suitable for driving land surface models (CLM and MEGAN) that predict land surface exchange of carbon, water, energy and biogenic reactive gases (e.g., isoprene, sesquiterpenes, and NO). We investigate the sensitivity of model results to landcover distributions with spatial scales ranging over six orders of magnitude (1 meter to 1000000 meters). The implications for predictions of regional climate and air quality will be discussed along with recommendations for regional and global earth system modeling.

  8. Antarctic icebergs distributions 1992-2014

    NASA Astrophysics Data System (ADS)

    Tournadre, J.; Bouhier, N.; Girard-Ardhuin, F.; Rémy, F.

    2016-01-01

    Basal melting of floating ice shelves and iceberg calving constitute the two almost equal paths of freshwater flux between the Antarctic ice cap and the Southern Ocean. The largest icebergs (>100 km2) transport most of the ice volume but their basal melting is small compared to their breaking into smaller icebergs that constitute thus the major vector of freshwater. The archives of nine altimeters have been processed to create a database of small icebergs (<8 km2) within open water containing the positions, sizes, and volumes spanning the 1992-2014 period. The intercalibrated monthly ice volumes from the different altimeters have been merged in a homogeneous 23 year climatology. The iceberg size distribution, covering the 0.1-10,000 km2 range, estimated by combining small and large icebergs size measurements follows well a power law of slope -1.52 ± 0.32 close to the -3/2 laws observed and modeled for brittle fragmentation. The global volume of ice and its distribution between the ocean basins present a very strong interannual variability only partially explained by the number of large icebergs. Indeed, vast zones of the Southern Ocean free of large icebergs are largely populated by small iceberg drifting over thousands of kilometers. The correlation between the global small and large icebergs volumes shows that small icebergs are mainly generated by large ones breaking. Drifting and trapping by sea ice can transport small icebergs for long period and distances. Small icebergs act as an ice diffuse process along large icebergs trajectories while sea ice trapping acts as a buffer delaying melting.

  9. E-MSD: improving data deposition and structure quality.

    PubMed

    Tagari, M; Tate, J; Swaminathan, G J; Newman, R; Naim, A; Vranken, W; Kapopoulou, A; Hussain, A; Fillon, J; Henrick, K; Velankar, S

    2006-01-01

    The Macromolecular Structure Database (MSD) (http://www.ebi.ac.uk/msd/) [H. Boutselakis, D. Dimitropoulos, J. Fillon, A. Golovin, K. Henrick, A. Hussain, J. Ionides, M. John, P. A. Keller, E. Krissinel et al. (2003) E-MSD: the European Bioinformatics Institute Macromolecular Structure Database. Nucleic Acids Res., 31, 458-462.] group is one of the three partners in the worldwide Protein DataBank (wwPDB), the consortium entrusted with the collation, maintenance and distribution of the global repository of macromolecular structure data [H. Berman, K. Henrick and H. Nakamura (2003) Announcing the worldwide Protein Data Bank. Nature Struct. Biol., 10, 980.]. Since its inception, the MSD group has worked with partners around the world to improve the quality of PDB data, through a clean up programme that addresses inconsistencies and inaccuracies in the legacy archive. The improvements in data quality in the legacy archive have been achieved largely through the creation of a unified data archive, in the form of a relational database that stores all of the data in the wwPDB. The three partners are working towards improving the tools and methods for the deposition of new data by the community at large. The implementation of the MSD database, together with the parallel development of improved tools and methodologies for data harvesting, validation and archival, has lead to significant improvements in the quality of data that enters the archive. Through this and related projects in the NMR and EM realms the MSD continues to improve the quality of publicly available structural data.

  10. Hotspot Patterns: The Formal Definition and Automatic Detection of Architecture Smells

    DTIC Science & Technology

    2015-01-15

    serious question for a project manager or architect: how to determine which parts of the code base should be given higher priority for maintenance and...services framework; Hadoop8 is a tool for distributed processing of large data sets; HBase9 is the Hadoop database; Ivy10 is a dependency management tool...answer this question more rigorously, we conducted Pearson Correlation Analysis to test the dependency between the number of issues a file involves

  11. Generating equilateral random polygons in confinement

    NASA Astrophysics Data System (ADS)

    Diao, Y.; Ernst, C.; Montemayor, A.; Ziegler, U.

    2011-10-01

    One challenging problem in biology is to understand the mechanism of DNA packing in a confined volume such as a cell. It is known that confined circular DNA is often knotted and hence the topology of the extracted (and relaxed) circular DNA can be used as a probe of the DNA packing mechanism. However, in order to properly estimate the topological properties of the confined circular DNA structures using mathematical models, it is necessary to generate large ensembles of simulated closed chains (i.e. polygons) of equal edge lengths that are confined in a volume such as a sphere of certain fixed radius. Finding efficient algorithms that properly sample the space of such confined equilateral random polygons is a difficult problem. In this paper, we propose a method that generates confined equilateral random polygons based on their probability distribution. This method requires the creation of a large database initially. However, once the database has been created, a confined equilateral random polygon of length n can be generated in linear time in terms of n. The errors introduced by the method can be controlled and reduced by the refinement of the database. Furthermore, our numerical simulations indicate that these errors are unbiased and tend to cancel each other in a long polygon.

  12. Access to Emissions Distributions and Related Ancillary Data through the ECCAD database

    NASA Astrophysics Data System (ADS)

    Darras, Sabine; Granier, Claire; Liousse, Catherine; De Graaf, Erica; Enriquez, Edgar; Boulanger, Damien; Brissebrat, Guillaume

    2017-04-01

    The ECCAD database (Emissions of atmospheric Compounds and Compilation of Ancillary Data) provides a user-friendly access to global and regional surface emissions for a large set of chemical compounds and ancillary data (land use, active fires, burned areas, population,etc). The emissions inventories are time series gridded data at spatial resolution from 1x1 to 0.1x0.1 degrees. ECCAD is the emissions database of the GEIA (Global Emissions InitiAtive) project and a sub-project of the French Atmospheric Data Center AERIS (http://www.aeris-data.fr). ECCAD has currently more than 2200 users originating from more than 80 countries. The project benefits from this large international community of users to expand the number of emission datasets made available. ECCAD provides detailed metadata for each of the datasets and various tools for data visualization, for computing global and regional totals and for interactive spatial and temporal analysis. The data can be downloaded as interoperable NetCDF CF-compliant files, i.e. the data are compatible with many other client interfaces. The presentation will provide information on the datasets available within ECCAD, as well as examples of the analysis work that can be done online through the website: http://eccad.aeris-data.fr.

  13. Access to Emissions Distributions and Related Ancillary Data through the ECCAD database

    NASA Astrophysics Data System (ADS)

    Darras, Sabine; Enriquez, Edgar; Granier, Claire; Liousse, Catherine; Boulanger, Damien; Fontaine, Alain

    2016-04-01

    The ECCAD database (Emissions of atmospheric Compounds and Compilation of Ancillary Data) provides a user-friendly access to global and regional surface emissions for a large set of chemical compounds and ancillary data (land use, active fires, burned areas, population,etc). The emissions inventories are time series gridded data at spatial resolution from 1x1 to 0.1x0.1 degrees. ECCAD is the emissions database of the GEIA (Global Emissions InitiAtive) project and a sub-project of the French Atmospheric Data Center AERIS (http://www.aeris-data.fr). ECCAD has currently more than 2200 users originating from more than 80 countries. The project benefits from this large international community of users to expand the number of emission datasets made available. ECCAD provides detailed metadata for each of the datasets and various tools for data visualization, for computing global and regional totals and for interactive spatial and temporal analysis. The data can be downloaded as interoperable NetCDF CF-compliant files, i.e. the data are compatible with many other client interfaces. The presentation will provide information on the datasets available within ECCAD, as well as examples of the analysis work that can be done online through the website: http://eccad.aeris-data.fr.

  14. The BioMart community portal: an innovative alternative to large, centralized data repositories.

    PubMed

    Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J; Dassi, Erik; Di Genova, Alex; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K; Hayes, Richard D; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W; Dayem Ullah, Abu Z; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek

    2015-07-01

    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Virtual screening applications: a study of ligand-based methods and different structure representations in four different scenarios.

    PubMed

    Hristozov, Dimitar P; Oprea, Tudor I; Gasteiger, Johann

    2007-01-01

    Four different ligand-based virtual screening scenarios are studied: (1) prioritizing compounds for subsequent high-throughput screening (HTS); (2) selecting a predefined (small) number of potentially active compounds from a large chemical database; (3) assessing the probability that a given structure will exhibit a given activity; (4) selecting the most active structure(s) for a biological assay. Each of the four scenarios is exemplified by performing retrospective ligand-based virtual screening for eight different biological targets using two large databases--MDDR and WOMBAT. A comparison between the chemical spaces covered by these two databases is presented. The performance of two techniques for ligand--based virtual screening--similarity search with subsequent data fusion (SSDF) and novelty detection with Self-Organizing Maps (ndSOM) is investigated. Three different structure representations--2,048-dimensional Daylight fingerprints, topological autocorrelation weighted by atomic physicochemical properties (sigma electronegativity, polarizability, partial charge, and identity) and radial distribution functions weighted by the same atomic physicochemical properties--are compared. Both methods were found applicable in scenario one. The similarity search was found to perform slightly better in scenario two while the SOM novelty detection is preferred in scenario three. No method/descriptor combination achieved significant success in scenario four.

  16. The European Southern Observatory-MIDAS table file system

    NASA Technical Reports Server (NTRS)

    Peron, M.; Grosbol, P.

    1992-01-01

    The new and substantially upgraded version of the Table File System in MIDAS is presented as a scientific database system. MIDAS applications for performing database operations on tables are discussed, for instance, the exchange of the data to and from the TFS, the selection of objects, the uncertainty joins across tables, and the graphical representation of data. This upgraded version of the TFS is a full implementation of the binary table extension of the FITS format; in addition, it also supports arrays of strings. Different storage strategies for optimal access of very large data sets are implemented and are addressed in detail. As a simple relational database, the TFS may be used for the management of personal data files. This opens the way to intelligent pipeline processing of large amounts of data. One of the key features of the Table File System is to provide also an extensive set of tools for the analysis of the final results of a reduction process. Column operations using standard and special mathematical functions as well as statistical distributions can be carried out; commands for linear regression and model fitting using nonlinear least square methods and user-defined functions are available. Finally, statistical tests of hypothesis and multivariate methods can also operate on tables.

  17. Distributed Episodic Exploratory Planning (DEEP)

    DTIC Science & Technology

    2008-12-01

    API). For DEEP, Hibernate offered the following advantages: • Abstracts SQL by utilizing HQL so any database with a Java Database Connectivity... Hibernate SQL ICCRTS International Command and Control Research and Technology Symposium JDB Java Distributed Blackboard JDBC Java Database Connectivity...selected because of its opportunistic reasoning capabilities and implemented in Java for platform independence. Java was chosen for ease of

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bourham, Mohamed A.; Gilligan, John G.

    Safety considerations in large future fusion reactors like ITER are important before licensing the reactor. Several scenarios are considered hazardous, which include safety of plasma-facing components during hard disruptions, high heat fluxes and thermal stresses during normal operation, accidental energy release, and aerosol formation and transport. Disruption events, in large tokamaks like ITER, are expected to produce local heat fluxes on plasma-facing components, which may exceed 100 GW/m{sup 2} over a period of about 0.1 ms. As a result, the surface temperature dramatically increases, which results in surface melting and vaporization, and produces thermal stresses and surface erosion. Plasma-facing componentsmore » safety issues extends to cover a wide range of possible scenarios, including disruption severity and the impact of plasma-facing components on disruption parameters, accidental energy release and short/long term LOCA's, and formation of airborne particles by convective current transport during a LOVA (water/air ingress disruption) accident scenario. Study, and evaluation of, disruption-induced aerosol generation and mobilization is essential to characterize database on particulate formation and distribution for large future fusion tokamak reactor like ITER. In order to provide database relevant to ITER, the SIRENS electrothermal plasma facility at NCSU has been modified to closely simulate heat fluxes expected in ITER.« less

  19. Monte Carlo simulations of product distributions and contained metal estimates

    USGS Publications Warehouse

    Gettings, Mark E.

    2013-01-01

    Estimation of product distributions of two factors was simulated by conventional Monte Carlo techniques using factor distributions that were independent (uncorrelated). Several simulations using uniform distributions of factors show that the product distribution has a central peak approximately centered at the product of the medians of the factor distributions. Factor distributions that are peaked, such as Gaussian (normal) produce an even more peaked product distribution. Piecewise analytic solutions can be obtained for independent factor distributions and yield insight into the properties of the product distribution. As an example, porphyry copper grades and tonnages are now available in at least one public database and their distributions were analyzed. Although both grade and tonnage can be approximated with lognormal distributions, they are not exactly fit by them. The grade shows some nonlinear correlation with tonnage for the published database. Sampling by deposit from available databases of grade, tonnage, and geological details of each deposit specifies both grade and tonnage for that deposit. Any correlation between grade and tonnage is then preserved and the observed distribution of grades and tonnages can be used with no assumption of distribution form.

  20. Patterns of Spatial Variation of Assemblages Associated with Intertidal Rocky Shores: A Global Perspective

    PubMed Central

    Cruz-Motta, Juan José; Miloslavich, Patricia; Palomo, Gabriela; Iken, Katrin; Konar, Brenda; Pohle, Gerhard; Trott, Tom; Benedetti-Cecchi, Lisandro; Herrera, César; Hernández, Alejandra; Sardi, Adriana; Bueno, Andrea; Castillo, Julio; Klein, Eduardo; Guerra-Castro, Edlin; Gobin, Judith; Gómez, Diana Isabel; Riosmena-Rodríguez, Rafael; Mead, Angela; Bigatti, Gregorio; Knowlton, Ann; Shirayama, Yoshihisa

    2010-01-01

    Assemblages associated with intertidal rocky shores were examined for large scale distribution patterns with specific emphasis on identifying latitudinal trends of species richness and taxonomic distinctiveness. Seventy-two sites distributed around the globe were evaluated following the standardized sampling protocol of the Census of Marine Life NaGISA project (www.nagisa.coml.org). There were no clear patterns of standardized estimators of species richness along latitudinal gradients or among Large Marine Ecosystems (LMEs); however, a strong latitudinal gradient in taxonomic composition (i.e., proportion of different taxonomic groups in a given sample) was observed. Environmental variables related to natural influences were strongly related to the distribution patterns of the assemblages on the LME scale, particularly photoperiod, sea surface temperature (SST) and rainfall. In contrast, no environmental variables directly associated with human influences (with the exception of the inorganic pollution index) were related to assemblage patterns among LMEs. Correlations of the natural assemblages with either latitudinal gradients or environmental variables were equally strong suggesting that neither neutral models nor models based solely on environmental variables sufficiently explain spatial variation of these assemblages at a global scale. Despite the data shortcomings in this study (e.g., unbalanced sample distribution), we show the importance of generating biological global databases for the use in large-scale diversity comparisons of rocky intertidal assemblages to stimulate continued sampling and analyses. PMID:21179546

  1. Bringing modeling to the masses: A web based system to predict potential species distributions

    USGS Publications Warehouse

    Graham, Jim; Newman, Greg; Kumar, Sunil; Jarnevich, Catherine S.; Young, Nick; Crall, Alycia W.; Stohlgren, Thomas J.; Evangelista, Paul

    2010-01-01

    Predicting current and potential species distributions and abundance is critical for managing invasive species, preserving threatened and endangered species, and conserving native species and habitats. Accurate predictive models are needed at local, regional, and national scales to guide field surveys, improve monitoring, and set priorities for conservation and restoration. Modeling capabilities, however, are often limited by access to software and environmental data required for predictions. To address these needs, we built a comprehensive web-based system that: (1) maintains a large database of field data; (2) provides access to field data and a wealth of environmental data; (3) accesses values in rasters representing environmental characteristics; (4) runs statistical spatial models; and (5) creates maps that predict the potential species distribution. The system is available online at www.niiss.org, and provides web-based tools for stakeholders to create potential species distribution models and maps under current and future climate scenarios.

  2. A Latency-Tolerant Partitioner for Distributed Computing on the Information Power Grid

    NASA Technical Reports Server (NTRS)

    Das, Sajal K.; Harvey, Daniel J.; Biwas, Rupak; Kwak, Dochan (Technical Monitor)

    2001-01-01

    NASA's Information Power Grid (IPG) is an infrastructure designed to harness the power of graphically distributed computers, databases, and human expertise, in order to solve large-scale realistic computational problems. This type of a meta-computing environment is necessary to present a unified virtual machine to application developers that hides the intricacies of a highly heterogeneous environment and yet maintains adequate security. In this paper, we present a novel partitioning scheme. called MinEX, that dynamically balances processor workloads while minimizing data movement and runtime communication, for applications that are executed in a parallel distributed fashion on the IPG. We also analyze the conditions that are required for the IPG to be an effective tool for such distributed computations. Our results show that MinEX is a viable load balancer provided the nodes of the IPG are connected by a high-speed asynchronous interconnection network.

  3. California dragonfly and damselfly (Odonata) database: temporal and spatial distribution of species records collected over the past century

    PubMed Central

    Ball-Damerow, Joan E.; Oboyski, Peter T.; Resh, Vincent H.

    2015-01-01

    Abstract The recently completed Odonata database for California consists of specimen records from the major entomology collections of the state, large Odonata collections outside of the state, previous literature, historical and recent field surveys, and from enthusiast group observations. The database includes 32,025 total records and 19,000 unique records for 106 species of dragonflies and damselflies, with records spanning 1879–2013. Records have been geographically referenced using the point-radius method to assign coordinates and an uncertainty radius to specimen locations. In addition to describing techniques used in data acquisition, georeferencing, and quality control, we present assessments of the temporal, spatial, and taxonomic distribution of records. We use this information to identify biases in the data, and to determine changes in species prevalence, latitudinal ranges, and elevation ranges when comparing records before 1976 and after 1979. The average latitude of where records occurred increased by 78 km over these time periods. While average elevation did not change significantly, the average minimum elevation across species declined by 108 m. Odonata distribution may be generally shifting northwards as temperature warms and to lower minimum elevations in response to increased summer water availability in low-elevation agricultural regions. The unexpected decline in elevation may also be partially the result of bias in recent collections towards centers of human population, which tend to occur at lower elevations. This study emphasizes the need to address temporal, spatial, and taxonomic biases in museum and observational records in order to produce reliable conclusions from such data. PMID:25709531

  4. Global rates of habitat loss and implications for amphibian conservation

    USGS Publications Warehouse

    Gallant, Alisa L.; Klaver, R.W.; Casper, G.S.; Lannoo, M.J.

    2007-01-01

    A large number of factors are known to affect amphibian population viability, but most authors agree that the principal causes of amphibian declines are habitat loss, alteration, and fragmentation. We provide a global assessment of land use dynamics in the context of amphibian distributions. We accomplished this by compiling global maps of amphibian species richness and recent rates of change in land cover, land use, and human population growth. The amphibian map was developed using a combination of published literature and digital databases. We used an ecoregion framework to help interpret species distributions across environmental, rather than political, boundaries. We mapped rates of land cover and use change with statistics from the World Resources Institute, refined with a global digital dataset on land cover derived from satellite data. Temporal maps of human population were developed from the World Resources Institute database and other published sources. Our resultant map of amphibian species richness illustrates that amphibians are distributed in an uneven pattern around the globe, preferring terrestrial and freshwater habitats in ecoregions that are warm and moist. Spatiotemporal patterns of human population show that, prior to the 20th century, population growth and spread was slower, most extensive in the temperate ecoregions, and largely exclusive of major regions of high amphibian richness. Since the beginning of the 20th century, human population growth has been exponential and has occurred largely in the subtropical and tropical ecoregions favored by amphibians. Population growth has been accompanied by broad-scale changes in land cover and land use, typically in support of agriculture. We merged information on land cover, land use, and human population growth to generate a composite map showing the rates at which humans have been changing the world. When compared with the map of amphibian species richness, we found that many of the regions of the earth supporting the richest assemblages of amphibians are currently undergoing the highest rates of landscape modification.

  5. How to ensure sustainable interoperability in heterogeneous distributed systems through architectural approach.

    PubMed

    Pape-Haugaard, Louise; Frank, Lars

    2011-01-01

    A major obstacle in ensuring ubiquitous information is the utilization of heterogeneous systems in eHealth. The objective in this paper is to illustrate how an architecture for distributed eHealth databases can be designed without lacking the characteristic features of traditional sustainable databases. The approach is firstly to explain traditional architecture in central and homogeneous distributed database computing, followed by a possible approach to use an architectural framework to obtain sustainability across disparate systems i.e. heterogeneous databases, concluded with a discussion. It is seen that through a method of using relaxed ACID properties on a service-oriented architecture it is possible to achieve data consistency which is essential when ensuring sustainable interoperability.

  6. Generalized Drivers in the Mammalian Endangerment Process

    PubMed Central

    González-Suárez, Manuela; Revilla, Eloy

    2014-01-01

    An important challenge for conservation today is to understand the endangerment process and identify any generalized patterns in how threats occur and aggregate across taxa. Here we use a global database describing main current external threats in mammals to evaluate the prevalence of distinct threatening processes, primarily of anthropogenic origin, and to identify generalized drivers of extinction and their association with vulnerability status and intrinsic species' traits. We detect several primary threat combinations that are generally associated with distinct species. In particular, large and widely distributed mammals are affected by combinations of direct exploitation and threats associated with increasing landscape modification that go from logging to intense human land-use. Meanwhile, small, narrowly distributed species are affected by intensifying levels of landscape modification but are not directly exploited. In general more vulnerable species are affected by a greater number of threats, suggesting increased extinction risk is associated with the accumulation of external threats. Overall, our findings show that endangerment in mammals is strongly associated with increasing habitat loss and degradation caused by human land-use intensification. For large and widely distributed mammals there is the additional risk of being hunted. PMID:24587315

  7. Statistical characterization of a large geochemical database and effect of sample size

    USGS Publications Warehouse

    Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

    2005-01-01

    The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.

  8. Monitoring product safety in the postmarketing environment.

    PubMed

    Sharrar, Robert G; Dieck, Gretchen S

    2013-10-01

    The safety profile of a medicinal product may change in the postmarketing environment. Safety issues not identified in clinical development may be seen and need to be evaluated. Methods of evaluating spontaneous adverse experience reports and identifying new safety risks include a review of individual reports, a review of a frequency distribution of a list of the adverse experiences, the development and analysis of a case series, and various ways of examining the database for signals of disproportionality, which may suggest a possible association. Regulatory agencies monitor product safety through a variety of mechanisms including signal detection of the adverse experience safety reports in databases and by requiring and monitoring risk management plans, periodic safety update reports and postauthorization safety studies. The United States Food and Drug Administration is working with public, academic and private entities to develop methods for using large electronic databases to actively monitor product safety. Important identified risks will have to be evaluated through observational studies and registries.

  9. CHEMICAL STRUCTURE INDEXING OF TOXICITY DATA ON ...

    EPA Pesticide Factsheets

    Standardized chemical structure annotation of public toxicity databases and information resources is playing an increasingly important role in the 'flattening' and integration of diverse sets of biological activity data on the Internet. This review discusses public initiatives that are accelerating the pace of this transformation, with particular reference to toxicology-related chemical information. Chemical content annotators, structure locator services, large structure/data aggregator web sites, structure browsers, International Union of Pure and Applied Chemistry (IUPAC) International Chemical Identifier (InChI) codes, toxicity data models and public chemical/biological activity profiling initiatives are all playing a role in overcoming barriers to the integration of toxicity data, and are bringing researchers closer to the reality of a mineable chemical Semantic Web. An example of this integration of data is provided by the collaboration among researchers involved with the Distributed Structure-Searchable Toxicity (DSSTox) project, the Carcinogenic Potency Project, projects at the National Cancer Institute and the PubChem database. Standardizing chemical structure annotation of public toxicity databases

  10. A Grid Metadata Service for Earth and Environmental Sciences

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni

    2010-05-01

    Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which increases flexibility and scalability; (iii) provides full support for Grid Security Infrastructure, which means (authorization, mutual authentication, data integrity, data confidentiality and delegation); (iv) is compatible with existing grid middleware such as gLite and Globus and finally (v) is currently adopted at the Euro-Mediterranean Centre for Climate Change (CMCC - Italy) to manage the entire CMCC data production activity as well as in the international Climate-G testbed.

  11. Seismic Search Engine: A distributed database for mining large scale seismic data

    NASA Astrophysics Data System (ADS)

    Liu, Y.; Vaidya, S.; Kuzma, H. A.

    2009-12-01

    The International Monitoring System (IMS) of the CTBTO collects terabytes worth of seismic measurements from many receiver stations situated around the earth with the goal of detecting underground nuclear testing events and distinguishing them from other benign, but more common events such as earthquakes and mine blasts. The International Data Center (IDC) processes and analyzes these measurements, as they are collected by the IMS, to summarize event detections in daily bulletins. Thereafter, the data measurements are archived into a large format database. Our proposed Seismic Search Engine (SSE) will facilitate a framework for data exploration of the seismic database as well as the development of seismic data mining algorithms. Analogous to GenBank, the annotated genetic sequence database maintained by NIH, through SSE, we intend to provide public access to seismic data and a set of processing and analysis tools, along with community-generated annotations and statistical models to help interpret the data. SSE will implement queries as user-defined functions composed from standard tools and models. Each query is compiled and executed over the database internally before reporting results back to the user. Since queries are expressed with standard tools and models, users can easily reproduce published results within this framework for peer-review and making metric comparisons. As an illustration, an example query is “what are the best receiver stations in East Asia for detecting events in the Middle East?” Evaluating this query involves listing all receiver stations in East Asia, characterizing known seismic events in that region, and constructing a profile for each receiver station to determine how effective its measurements are at predicting each event. The results of this query can be used to help prioritize how data is collected, identify defective instruments, and guide future sensor placements.

  12. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Troia, Matthew J.; McManamay, Ryan A.

    Primary biodiversity data constitute observations of particular species at given points in time and space. Open-access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open-access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records frommore » the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). In this study, we aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well-surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well-surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well-surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. Lastly, this comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.« less

  13. Filling in the GAPS: evaluating completeness and coverage of open-access biodiversity databases in the United States

    DOE PAGES

    Troia, Matthew J.; McManamay, Ryan A.

    2016-06-12

    Primary biodiversity data constitute observations of particular species at given points in time and space. Open-access electronic databases provide unprecedented access to these data, but their usefulness in characterizing species distributions and patterns in biodiversity depend on how complete species inventories are at a given survey location and how uniformly distributed survey locations are along dimensions of time, space, and environment. Our aim was to compare completeness and coverage among three open-access databases representing ten taxonomic groups (amphibians, birds, freshwater bivalves, crayfish, freshwater fish, fungi, insects, mammals, plants, and reptiles) in the contiguous United States. We compiled occurrence records frommore » the Global Biodiversity Information Facility (GBIF), the North American Breeding Bird Survey (BBS), and federally administered fish surveys (FFS). In this study, we aggregated occurrence records by 0.1° × 0.1° grid cells and computed three completeness metrics to classify each grid cell as well-surveyed or not. Next, we compared frequency distributions of surveyed grid cells to background environmental conditions in a GIS and performed Kolmogorov–Smirnov tests to quantify coverage through time, along two spatial gradients, and along eight environmental gradients. The three databases contributed >13.6 million reliable occurrence records distributed among >190,000 grid cells. The percent of well-surveyed grid cells was substantially lower for GBIF (5.2%) than for systematic surveys (BBS and FFS; 82.5%). Still, the large number of GBIF occurrence records produced at least 250 well-surveyed grid cells for six of nine taxonomic groups. Coverages of systematic surveys were less biased across spatial and environmental dimensions but were more biased in temporal coverage compared to GBIF data. GBIF coverages also varied among taxonomic groups, consistent with commonly recognized geographic, environmental, and institutional sampling biases. Lastly, this comprehensive assessment of biodiversity data across the contiguous United States provides a prioritization scheme to fill in the gaps by contributing existing occurrence records to the public domain and planning future surveys.« less

  14. Building a database for statistical characterization of ELMs on DIII-D

    NASA Astrophysics Data System (ADS)

    Fritch, B. J.; Marinoni, A.; Bortolon, A.

    2017-10-01

    Edge localized modes (ELMs) are bursty instabilities which occur in the edge region of H-mode plasmas and have the potential to damage in-vessel components of future fusion machines by exposing the divertor region to large energy and particle fluxes during each ELM event. While most ELM studies focus on average quantities (e.g. energy loss per ELM), this work investigates the statistical distributions of ELM characteristics, as a function of plasma parameters. A semi-automatic algorithm is being used to create a database documenting trigger times of the tens of thousands of ELMs for DIII-D discharges in scenarios relevant to ITER, thus allowing statistically significant analysis. Probability distributions of inter-ELM periods and energy losses will be determined and related to relevant plasma parameters such as density, stored energy, and current in order to constrain models and improve estimates of the expected inter-ELM periods and sizes, both of which must be controlled in future reactors. Work supported in part by US DoE under the Science Undergraduate Laboratory Internships (SULI) program, DE-FC02-04ER54698 and DE-FG02- 94ER54235.

  15. Legal Agreements and the Governance of Research Commons: Lessons from Materials Sharing in Mouse Genomics

    PubMed Central

    Mishra, Amrita

    2014-01-01

    Abstract Omics research infrastructure such as databases and bio-repositories requires effective governance to support pre-competitive research. Governance includes the use of legal agreements, such as Material Transfer Agreements (MTAs). We analyze the use of such agreements in the mouse research commons, including by two large-scale resource development projects: the International Knockout Mouse Consortium (IKMC) and International Mouse Phenotyping Consortium (IMPC). We combine an analysis of legal agreements and semi-structured interviews with 87 members of the mouse model research community to examine legal agreements in four contexts: (1) between researchers; (2) deposit into repositories; (3) distribution by repositories; and (4) exchanges between repositories, especially those that are consortium members of the IKMC and IMPC. We conclude that legal agreements for the deposit and distribution of research reagents should be kept as simple and standard as possible, especially when minimal enforcement capacity and resources exist. Simple and standardized legal agreements reduce transactional bottlenecks and facilitate the creation of a vibrant and sustainable research commons, supported by repositories and databases. PMID:24552652

  16. Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series

    PubMed Central

    Albers, D. J.; Hripcsak, George

    2012-01-01

    A method to estimate the time-dependent correlation via an empirical bias estimate of the time-delayed mutual information for a time-series is proposed. In particular, the bias of the time-delayed mutual information is shown to often be equivalent to the mutual information between two distributions of points from the same system separated by infinite time. Thus intuitively, estimation of the bias is reduced to estimation of the mutual information between distributions of data points separated by large time intervals. The proposed bias estimation techniques are shown to work for Lorenz equations data and glucose time series data of three patients from the Columbia University Medical Center database. PMID:22536009

  17. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    This report contains papers on the following topics: NREN Security Issues: Policies and Technologies; Layer Wars: Protect the Internet with Network Layer Security; Electronic Commission Management; Workflow 2000 - Electronic Document Authorization in Practice; Security Issues of a UNIX PEM Implementation; Implementing Privacy Enhanced Mail on VMS; Distributed Public Key Certificate Management; Protecting the Integrity of Privacy-enhanced Electronic Mail; Practical Authorization in Large Heterogeneous Distributed Systems; Security Issues in the Truffles File System; Issues surrounding the use of Cryptographic Algorithms and Smart Card Applications; Smart Card Augmentation of Kerberos; and An Overview of the Advanced Smart Card Access Control System.more » Selected papers were processed separately for inclusion in the Energy Science and Technology Database.« less

  18. A global organism detection and monitoring system for non-native species

    USGS Publications Warehouse

    Graham, J.; Newman, G.; Jarnevich, C.; Shory, R.; Stohlgren, T.J.

    2007-01-01

    Harmful invasive non-native species are a significant threat to native species and ecosystems, and the costs associated with non-native species in the United States is estimated at over $120 Billion/year. While some local or regional databases exist for some taxonomic groups, there are no effective geographic databases designed to detect and monitor all species of non-native plants, animals, and pathogens. We developed a web-based solution called the Global Organism Detection and Monitoring (GODM) system to provide real-time data from a broad spectrum of users on the distribution and abundance of non-native species, including attributes of their habitats for predictive spatial modeling of current and potential distributions. The four major subsystems of GODM provide dynamic links between the organism data, web pages, spatial data, and modeling capabilities. The core survey database tables for recording invasive species survey data are organized into three categories: "Where, Who & When, and What." Organisms are identified with Taxonomic Serial Numbers from the Integrated Taxonomic Information System. To allow users to immediately see a map of their data combined with other user's data, a custom geographic information system (GIS) Internet solution was required. The GIS solution provides an unprecedented level of flexibility in database access, allowing users to display maps of invasive species distributions or abundances based on various criteria including taxonomic classification (i.e., phylum or division, order, class, family, genus, species, subspecies, and variety), a specific project, a range of dates, and a range of attributes (percent cover, age, height, sex, weight). This is a significant paradigm shift from "map servers" to true Internet-based GIS solutions. The remainder of the system was created with a mix of commercial products, open source software, and custom software. Custom GIS libraries were created where required for processing large datasets, accessing the operating system, and to use existing libraries in C++, R, and other languages to develop the tools to track harmful species in space and time. The GODM database and system are crucial for early detection and rapid containment of invasive species. ?? 2007 Elsevier B.V. All rights reserved.

  19. Database System Design and Implementation for Marine Air-Traffic-Controller Training

    DTIC Science & Technology

    2017-06-01

    NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS Approved for public release. Distribution is unlimited. DATABASE SYSTEM DESIGN AND...thesis 4. TITLE AND SUBTITLE DATABASE SYSTEM DESIGN AND IMPLEMENTATION FOR MARINE AIR-TRAFFIC-CONTROLLER TRAINING 5. FUNDING NUMBERS 6. AUTHOR(S...12b. DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words) This project focused on the design , development, and implementation of a centralized

  20. Optical/IR from ground

    NASA Technical Reports Server (NTRS)

    Strom, Stephen; Sargent, Wallace L. W.; Wolff, Sidney; Ahearn, Michael F.; Angel, J. Roger; Beckwith, Steven V. W.; Carney, Bruce W.; Conti, Peter S.; Edwards, Suzan; Grasdalen, Gary

    1991-01-01

    Optical/infrared (O/IR) astronomy in the 1990's is reviewed. The following subject areas are included: research environment; science opportunities; technical development of the 1980's and opportunities for the 1990's; and ground-based O/IR astronomy outside the U.S. Recommendations are presented for: (1) large scale programs (Priority 1: a coordinated program for large O/IR telescopes); (2) medium scale programs (Priority 1: a coordinated program for high angular resolution; Priority 2: a new generation of 4-m class telescopes); (3) small scale programs (Priority 1: near-IR and optical all-sky surveys; Priority 2: a National Astrometric Facility); and (4) infrastructure issues (develop, purchase, and distribute optical CCDs and infrared arrays; a program to support large optics technology; a new generation of large filled aperture telescopes; a program to archive and disseminate astronomical databases; and a program for training new instrumentalists)

  1. An analysis of the lithology to resistivity relationships using airborne EM and boreholes

    NASA Astrophysics Data System (ADS)

    Barfod, Adrian A. S.; Christiansen, Anders V.; Møller, Ingelise

    2014-05-01

    We present a study of the relationship between dense airborne SkyTEM resistivity data and sparse lithological borehole data. Understanding the geological structures of the subsurface is of great importance to hydrogeological surveys. Large scale geological information can be gathered directly from boreholes or indirectly from large geophysical surveys. Borehole data provides detailed lithological information only at the position of the borehole and, due to the sparse nature of boreholes, they rarely provide sufficient information needed for high-accuracy groundwater models. Airborne geophysical data, on the other hand, provide dense spatial coverage, but are only indirectly bearing information on lithology through the resistivity models. Hitherherto, the integration of the geophysical data into geological and hydrogeological models has been often subjective, largely un-documented and painstakingly manual. This project presents a detailed study of the relationships between resistivity data and lithological borehole data. The purpose is to objectively describe the relationships between lithology and geophysical parameters and to document these relationships. This project has focused on utilizing preexisting datasets from the Danish national borehole database (JUPITER) and national geophysical database (GERDA). The study presented here is from the Norsminde catchment area (208 sq. km), situated in the municipality of Odder, Denmark. The Norsminde area contains a total of 758 boreholes and 106,770 SkyTEM soundings. The large amounts of data make the Norsminde area ideal for studying the relationship between geophysical data and lithological data. The subsurface is discretized into 20 cm horizontal sampling intervals from the highest elevation point to the depth of the deepest borehole. For each of these intervals a resistivity value is calculated at the position of the boreholes using a kriging formulation. The lithology data from the boreholes are then used to categorize the interpolated resistivity values according to lithology. The end result of this comparison is resistivity distributions for different lithology categories. The distributions provide detailed objective information of the resistivity properties of the subsurface and are a documentation of the resistivity imaging of the geological lithologies. We show that different lithologies are mapped at distinctively different resistivities but also that the geophysical inversion strategies influences the resulting distributions significantly.

  2. An indoor positioning technology in the BLE mobile payment system

    NASA Astrophysics Data System (ADS)

    Han, Tiantian; Ding, Lei

    2017-05-01

    Mobile payment system for large supermarkets, the core function is through the BLE low-power Bluetooth technology to achieve the amount of payment in the mobile payment system, can through an indoor positioning technology to achieve value-added services. The technology by collecting Bluetooth RSSI, the fingerprint database of sampling points corresponding is established. To get Bluetooth module RSSI by the AP. Then, to use k-Nearest Neighbor match the value of the fingerprint database. Thereby, to help businesses find customers through the mall location, combined settlement amount of the customer's purchase of goods, to analyze customer's behavior. When the system collect signal strength, the distribution of the sampling points of RSSI is analyzed and the value is filtered. The system, used in the laboratory is designed to demonstrate the feasibility.

  3. Data Aggregation System: A system for information retrieval on demand over relational and non-relational distributed data sources

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ball, G.; Kuznetsov, V.; Evans, D.

    We present the Data Aggregation System, a system for information retrieval and aggregation from heterogenous sources of relational and non-relational data for the Compact Muon Solenoid experiment on the CERN Large Hadron Collider. The experiment currently has a number of organically-developed data sources, including front-ends to a number of different relational databases and non-database data services which do not share common data structures or APIs (Application Programming Interfaces), and cannot at this stage be readily converged. DAS provides a single interface for querying all these services, a caching layer to speed up access to expensive underlying calls and the abilitymore » to merge records from different data services pertaining to a single primary key.« less

  4. Change detection analysis of multi-temporal imagery to assess environmental development on AL Sammalyah Island, Abu-Dhabi

    NASA Astrophysics Data System (ADS)

    Essa, Salem M.; Loughland, R.; Khogali, Mohamed E.

    2005-10-01

    AL Sammalyah Island is considered an important protected area in Abu Dhabi Emirate. The island has witnessed high rates of change in land use in the past few years starting from the early 1990s. Change detection analysis is conducted to monitor rate and spatial distribution of change occurring on the island. A three-phase research project has been implemented, an integrated Geographic Information System (GIS) database for the Island is the focus; the current phase main objective was to assess rate and spatial distribution of the change on the island using multi-date large scale aerial photos. Results of the current study demonstrated that total vegetation cover extent has increased from 3.742 km2 in 1994 to 5.101 km2 in 2005, an increase of 36.3% between 1994 and 2005. The study also showed that this increase in vegetation extent is mostly attributed to the increase in mangrove planted areas with an increase from 2.256 km2 in 1994 to 3.568 km2 in 2005, an increase of 58.2% in ten years. Remote sensing and GIS have been successfully used to quantify change extent, distribution and trajectories of change. The next step will be to complete the GIS database for AL Sammalyah Island.

  5. Regional trends and controlling factors of fatal landslides in Latin America and the Caribbean

    NASA Astrophysics Data System (ADS)

    Sepúlveda, S. A.; Petley, D. N.

    2015-04-01

    A database of landslides that caused loss of life in Latin America and the Caribbean in the period from 2004 and 2013 inclusive has been compiled using established techniques. This database indicates that in the ten year period a total of 11 631 people lost their lives across the region in 611 landslides. The geographical distribution of the landslides is very heterogeneous, with areas of high incidence in parts of the Caribbean (most notably Haiti), Central America, Colombia, and SE. Brazil. The number of landslides varies considerably between years; the El Niño/La Niña cycle emerges as a major factor controlling this variation, although the study period did not capture a large event. Analysis suggests that on a continental scale the mapped factors that best explain the observed distribution are topography, annual precipitation and population density. On a national basis we have compared the occurrence of fatality-inducing landslide occurrence with the production of research articles with a local author, which shows that there is a landslide research deficit in Latin America and the Caribbean. Understanding better the mechanisms, distributions causes and triggers of landslides in Latin America and the Caribbean must be an essential first step towards managing the hazard.

  6. Designing for Peta-Scale in the LSST Database

    NASA Astrophysics Data System (ADS)

    Kantor, J.; Axelrod, T.; Becla, J.; Cook, K.; Nikolaev, S.; Gray, J.; Plante, R.; Nieto-Santisteban, M.; Szalay, A.; Thakar, A.

    2007-10-01

    The Large Synoptic Survey Telescope (LSST), a proposed ground-based 8.4 m telescope with a 10 deg^2 field of view, will generate 15 TB of raw images every observing night. When calibration and processed data are added, the image archive, catalogs, and meta-data will grow 15 PB yr^{-1} on average. The LSST Data Management System (DMS) must capture, process, store, index, replicate, and provide open access to this data. Alerts must be triggered within 30 s of data acquisition. To do this in real-time at these data volumes will require advances in data management, database, and file system techniques. This paper describes the design of the LSST DMS and emphasizes features for peta-scale data. The LSST DMS will employ a combination of distributed database and file systems, with schema, partitioning, and indexing oriented for parallel operations. Image files are stored in a distributed file system with references to, and meta-data from, each file stored in the databases. The schema design supports pipeline processing, rapid ingest, and efficient query. Vertical partitioning reduces disk input/output requirements, horizontal partitioning allows parallel data access using arrays of servers and disks. Indexing is extensive, utilizing both conventional RAM-resident indexes and column-narrow, row-deep tag tables/covering indices that are extracted from tables that contain many more attributes. The DMS Data Access Framework is encapsulated in a middleware framework to provide a uniform service interface to all framework capabilities. This framework will provide the automated work-flow, replication, and data analysis capabilities necessary to make data processing and data quality analysis feasible at this scale.

  7. Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.

    PubMed

    Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas

    2017-07-24

    Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.

  8. Percentiles of the product of uncertainty factors for establishing probabilistic reference doses.

    PubMed

    Gaylor, D W; Kodell, R L

    2000-04-01

    Exposure guidelines for potentially toxic substances are often based on a reference dose (RfD) that is determined by dividing a no-observed-adverse-effect-level (NOAEL), lowest-observed-adverse-effect-level (LOAEL), or benchmark dose (BD) corresponding to a low level of risk, by a product of uncertainty factors. The uncertainty factors for animal to human extrapolation, variable sensitivities among humans, extrapolation from measured subchronic effects to unknown results for chronic exposures, and extrapolation from a LOAEL to a NOAEL can be thought of as random variables that vary from chemical to chemical. Selected databases are examined that provide distributions across chemicals of inter- and intraspecies effects, ratios of LOAELs to NOAELs, and differences in acute and chronic effects, to illustrate the determination of percentiles for uncertainty factors. The distributions of uncertainty factors tend to be approximately lognormally distributed. The logarithm of the product of independent uncertainty factors is approximately distributed as the sum of normally distributed variables, making it possible to estimate percentiles for the product. Hence, the size of the products of uncertainty factors can be selected to provide adequate safety for a large percentage (e.g., approximately 95%) of RfDs. For the databases used to describe the distributions of uncertainty factors, using values of 10 appear to be reasonable and conservative. For the databases examined the following simple "Rule of 3s" is suggested that exceeds the estimated 95th percentile of the product of uncertainty factors: If only a single uncertainty factor is required use 33, for any two uncertainty factors use 3 x 33 approximately 100, for any three uncertainty factors use a combined factor of 3 x 100 = 300, and if all four uncertainty factors are needed use a total factor of 3 x 300 = 900. If near the 99th percentile is desired use another factor of 3. An additional factor may be needed for inadequate data or a modifying factor for other uncertainties (e.g., different routes of exposure) not covered above.

  9. Life Sciences Data Archive (LSDA)

    NASA Technical Reports Server (NTRS)

    Fitts, M.; Johnson-Throop, Kathy; Thomas, D.; Shackelford, K.

    2008-01-01

    In the early days of spaceflight, space life sciences data were been collected and stored in numerous databases, formats, media-types and geographical locations. While serving the needs of individual research teams, these data were largely unknown/unavailable to the scientific community at large. As a result, the Space Act of 1958 and the Science Data Management Policy mandated that research data collected by the National Aeronautics and Space Administration be made available to the science community at large. The Biomedical Informatics and Health Care Systems Branch of the Space Life Sciences Directorate at JSC and the Data Archive Project at ARC, with funding from the Human Research Program through the Exploration Medical Capability Element, are fulfilling these requirements through the systematic population of the Life Sciences Data Archive. This program constitutes a formal system for the acquisition, archival and distribution of data for Life Sciences-sponsored experiments and investigations. The general goal of the archive is to acquire, preserve, and distribute these data using a variety of media which are accessible and responsive to inquiries from the science communities.

  10. Studies of Tissue Perfusion Failure at LAC+USCMC and the Incorporation of the Results into a National Trauma Database

    DTIC Science & Technology

    2005-04-01

    Context Link] 29. Gibson JG, Seligman AM, Peacock WC, et al: The circulating red cell and plasma volume and the distribution of blood in large and...BS; Gandhi, Ashutosh BS; Chien, Li-Chien MD; Lu, Kevin MD; Martin, Matthew J. MD; Chan, Linda S. PhD; Demetriades, Demetrios MD, PhD; Ahmadpour...Physicians Outcome Prediction of Emergency Patients by Noninvasive Hemodynamic Monitoring* William C. Shoemaker, MD; Charles C. J. Wo, BS; Linda

  11. A Secure Multicast Framework in Large and High-Mobility Network Groups

    NASA Astrophysics Data System (ADS)

    Lee, Jung-San; Chang, Chin-Chen

    With the widespread use of Internet applications such as Teleconference, Pay-TV, Collaborate tasks, and Message services, how to construct and distribute the group session key to all group members securely is becoming and more important. Instead of adopting the point-to-point packet delivery, these emerging applications are based upon the mechanism of multicast communication, which allows the group member to communicate with multi-party efficiently. There are two main issues in the mechanism of multicast communication: Key Distribution and Scalability. The first issue is how to distribute the group session key to all group members securely. The second one is how to maintain the high performance in large network groups. Group members in conventional multicast systems have to keep numerous secret keys in databases, which makes it very inconvenient for them. Furthermore, in case that a member joins or leaves the communication group, many involved participants have to change their own secret keys to preserve the forward secrecy and the backward secrecy. We consequently propose a novel version for providing secure multicast communication in large network groups. Our proposed framework not only preserves the forward secrecy and the backward secrecy but also possesses better performance than existing alternatives. Specifically, simulation results demonstrate that our scheme is suitable for high-mobility environments.

  12. Monte Carlo simulation of prompt γ-ray emission in proton therapy using a specific track length estimator

    NASA Astrophysics Data System (ADS)

    El Kanawati, W.; Létang, J. M.; Dauvergne, D.; Pinto, M.; Sarrut, D.; Testa, É.; Freud, N.

    2015-10-01

    A Monte Carlo (MC) variance reduction technique is developed for prompt-γ emitters calculations in proton therapy. Prompt-γ emitted through nuclear fragmentation reactions and exiting the patient during proton therapy could play an important role to help monitoring the treatment. However, the estimation of the number and the energy of emitted prompt-γ per primary proton with MC simulations is a slow process. In order to estimate the local distribution of prompt-γ emission in a volume of interest for a given proton beam of the treatment plan, a MC variance reduction technique based on a specific track length estimator (TLE) has been developed. First an elemental database of prompt-γ emission spectra is established in the clinical energy range of incident protons for all elements in the composition of human tissues. This database of the prompt-γ spectra is built offline with high statistics. Regarding the implementation of the prompt-γ TLE MC tally, each proton deposits along its track the expectation of the prompt-γ spectra from the database according to the proton kinetic energy and the local material composition. A detailed statistical study shows that the relative efficiency mainly depends on the geometrical distribution of the track length. Benchmarking of the proposed prompt-γ TLE MC technique with respect to an analogous MC technique is carried out. A large relative efficiency gain is reported, ca. 105.

  13. Clinical results of HIS, RIS, PACS integration using data integration CASE tools

    NASA Astrophysics Data System (ADS)

    Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.

    1995-05-01

    Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.

  14. OBSIFRAC: database-supported software for 3D modeling of rock mass fragmentation

    NASA Astrophysics Data System (ADS)

    Empereur-Mot, Luc; Villemin, Thierry

    2003-03-01

    Under stress, fractures in rock masses tend to form fully connected networks. The mass can thus be thought of as a 3D series of blocks produced by fragmentation processes. A numerical model has been developed that uses a relational database to describe such a mass. The model, which assumes the fractures to be plane, allows data from natural networks to test theories concerning fragmentation processes. In the model, blocks are bordered by faces that are composed of edges and vertices. A fracture can originate from a seed point, its orientation being controlled by the stress field specified by an orientation matrix. Alternatively, it can be generated from a discrete set of given orientations and positions. Both kinds of fracture can occur together in a model. From an original simple block, a given fracture produces two simple polyhedral blocks, and the original block becomes compound. Compound and simple blocks created throughout fragmentation are stored in the database. Several fragmentation processes have been studied. In one scenario, a constant proportion of blocks is fragmented at each step of the process. The resulting distribution appears to be fractal, although seed points are random in each fragmented block. In a second scenario, division affects only one random block at each stage of the process, and gives a Weibull volume distribution law. This software can be used for a large number of other applications.

  15. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. The effect is studied of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks, in a partitioned distributed database system. Six probabilistic models and expressions are developed for the numbers of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results so obtained are compared to results from simulation. From here, it is concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughout is also grossly undermined when such models are employed.

  16. Effects of distributed database modeling on evaluation of transaction rollbacks

    NASA Technical Reports Server (NTRS)

    Mukkamala, Ravi

    1991-01-01

    Data distribution, degree of data replication, and transaction access patterns are key factors in determining the performance of distributed database systems. In order to simplify the evaluation of performance measures, database designers and researchers tend to make simplistic assumptions about the system. Here, researchers investigate the effect of modeling assumptions on the evaluation of one such measure, the number of transaction rollbacks in a partitioned distributed database system. The researchers developed six probabilistic models and expressions for the number of rollbacks under each of these models. Essentially, the models differ in terms of the available system information. The analytical results obtained are compared to results from simulation. It was concluded that most of the probabilistic models yield overly conservative estimates of the number of rollbacks. The effect of transaction commutativity on system throughput is also grossly undermined when such models are employed.

  17. Deriving spatial patterns from a novel database of volcanic rock geochemistry in the Virunga Volcanic Province, East African Rift

    NASA Astrophysics Data System (ADS)

    Poppe, Sam; Barette, Florian; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu

    2016-04-01

    The Virunga Volcanic Province (VVP) is situated within the western branch of the East-African Rift. The geochemistry and petrology of its' volcanic products has been studied extensively in a fragmented manner. They represent a unique collection of silica-undersaturated, ultra-alkaline and ultra-potassic compositions, displaying marked geochemical variations over the area occupied by the VVP. We present a novel spatially-explicit database of existing whole-rock geochemical analyses of the VVP volcanics, compiled from international publications, (post-)colonial scientific reports and PhD theses. In the database, a total of 703 geochemical analyses of whole-rock samples collected from the 1950s until recently have been characterised with a geographical location, eruption source location, analytical results and uncertainty estimates for each of these categories. Comparative box plots and Kruskal-Wallis H tests on subsets of analyses with contrasting ages or analytical methods suggest that the overall database accuracy is consistent. We demonstrate how statistical techniques such as Principal Component Analysis (PCA) and subsequent cluster analysis allow the identification of clusters of samples with similar major-element compositions. The spatial patterns represented by the contrasting clusters show that both the historically active volcanoes represent compositional clusters which can be identified based on their contrasted silica and alkali contents. Furthermore, two sample clusters are interpreted to represent the most primitive, deep magma source within the VVP, different from the shallow magma reservoirs that feed the eight dominant large volcanoes. The samples from these two clusters systematically originate from locations which 1. are distal compared to the eight large volcanoes and 2. mostly coincide with the surface expressions of rift faults or NE-SW-oriented inherited Precambrian structures which were reactivated during rifting. The lava from the Mugogo eruption of 1957 belongs to these primitive clusters and is the only known to have erupted outside the current rift valley in historical times. We thus infer there is a distributed hazard of vent opening susceptibility additional to the susceptibility associated with the main Virunga edifices. This study suggests that the statistical analysis of such geochemical database may help to understand complex volcanic plumbing systems and the spatial distribution of volcanic hazards in active and poorly known volcanic areas such as the Virunga Volcanic Province.

  18. A checklist of macroparasites of Liza haematocheila (Temminck & Schlegel) (Teleostei: Mugilidae)

    PubMed Central

    Kostadinova, Aneta

    2008-01-01

    Background The mugilid fish Liza haematocheila (syn. Mugil soiuy), native to the Western North Pacific, provides opportunities to examine the changes of its parasite fauna after its translocation to the Sea of Azov and subsequent establishment in the Black Sea. However, the information on macroparasites of this host in both ranges of its current distribution comes from isolated studies published in difficult-to-access literature sources. Materials and methods Data from 53 publications, predominantly in Chinese, Russian and Ukrainian, were compiled from an extensive search of the literature and the Host-Parasite Database maintained up to 2005 at the Natural History Museum, London. Results The complete checklist of the metazoan parasites of L. haematocheila throughout its distributional range comprises summarised information for 69 nominal species of helminth and ectoparasitic crustacean parasites, from 45 genera and 27 families (370 host-parasite records in total) and includes the name of the parasite species, the area/locality of the host capture, and the author and date of the published record. The taxonomy is updated and the validity of the records and synonymies are critically evaluated. A comparison of the parasite faunas based on the records in the native and introduced/invasive range of L. haematocheila suggests that a large number of parasite species was 'lost' in the new distributional range whereas an even greater number was 'gained'. Conclusion Although the present checklist provides information that will facilitate future studies, the interesting question of macroparasite faunal diversity in L. haematocheila in its natural and introduced/invasive ranges cannot be dealt with the current data because of unreliability associated with the large number of non-documented and questionable records. This stresses the importance of data quality analysis in using host-parasite database and checklist data. PMID:19117506

  19. A checklist of macroparasites of Liza haematocheila (Temminck & Schlegel) (Teleostei: Mugilidae).

    PubMed

    Kostadinova, Aneta

    2008-12-31

    The mugilid fish Liza haematocheila (syn. Mugil soiuy), native to the Western North Pacific, provides opportunities to examine the changes of its parasite fauna after its translocation to the Sea of Azov and subsequent establishment in the Black Sea. However, the information on macroparasites of this host in both ranges of its current distribution comes from isolated studies published in difficult-to-access literature sources. Data from 53 publications, predominantly in Chinese, Russian and Ukrainian, were compiled from an extensive search of the literature and the Host-Parasite Database maintained up to 2005 at the Natural History Museum, London. The complete checklist of the metazoan parasites of L. haematocheila throughout its distributional range comprises summarised information for 69 nominal species of helminth and ectoparasitic crustacean parasites, from 45 genera and 27 families (370 host-parasite records in total) and includes the name of the parasite species, the area/locality of the host capture, and the author and date of the published record. The taxonomy is updated and the validity of the records and synonymies are critically evaluated. A comparison of the parasite faunas based on the records in the native and introduced/invasive range of L. haematocheila suggests that a large number of parasite species was 'lost' in the new distributional range whereas an even greater number was 'gained'. Although the present checklist provides information that will facilitate future studies, the interesting question of macroparasite faunal diversity in L. haematocheila in its natural and introduced/invasive ranges cannot be dealt with the current data because of unreliability associated with the large number of non-documented and questionable records. This stresses the importance of data quality analysis in using host-parasite database and checklist data.

  20. Landslide database dominated by rainfall triggered events

    NASA Astrophysics Data System (ADS)

    Devoli, G.; Strauch, W.; Álvarez, A.

    2009-04-01

    A digital landslide database has been created for Nicaragua to provide the scientific community and national authorities with a tool for landslide hazard assessment. Valuable information on landslide events has been obtained from a great variety of sources. On the basis of the data stored in the database, preliminary analyses performed at national scale aimed to characterize landslides in terms of spatial and temporal distribution, types of slope movements, triggering mechanisms, number of casualties and damage to infrastructure. A total of about 17000 events spatially distributed in mountainous and volcanic terrains have been collected in the database. The events are temporally distributed between 1826 and 2003, but a large number of the records (62% of the total number) occurred during the disastrous Hurricane Mitch in October 1998. The results showed that debris flows are the most common types of landslides recorded in the database (66% of the total amount), but other types, including rockfalls and slides, have also been identified. Rainfall, also associated with tropical cyclones, is the most frequent triggering mechanism of landslides in Nicaragua, but also seismic and volcanic activities are important triggers or, especially, the combination of one of them with rainfall. Rainfall has caused all types of failures, but debris flows and translational shallow slides are more frequent types. Earthquakes have most frequently triggered rockfalls and slides, while volcanic eruptions rockfalls and debris flows. Landslides triggered by rainfall were limited in time to the wet season that lasts from May to October and an increase in the number of events is observed during the months of September and October, which is in accord with the period of the rainy season in the Pacific and Northern and Central regions and when the country has the highest probability of being impacted by hurricanes. Both Atlantic and Pacific tropical cyclones have triggered landslides. At the medium scale, the influence of topographic and lithologic parameters on the occurrence of landslides was also analyzed and the physical characterization of landslides was done to better understand the landslide dynamics and run-out distances in both volcanic and non-volcanic areas. Data from fairly well documented events in Nicaragua were compared with other similar events in Central America and elsewhere and treated statistically to search for possible correlations and empirical relationships to predict run-out distances for different types of landslides, knowing the height of fall or the volume. The empirical relationships showed that debris flows and debris avalanches at volcanoes have the highest mobility and reach longer distances compared to other types of landslides. Because of their characteristics and downstream behaviour (long run-out distances and large volumes) both types of landslides have produced the highest number of victims in the country being the most dangerous to life and property.

  1. Polynomial chaos representation of databases on manifolds

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu

    2017-04-15

    Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less

  2. NASA's MERBoard: An Interactive Collaborative Workspace Platform. Chapter 4

    NASA Technical Reports Server (NTRS)

    Trimble, Jay; Wales, Roxana; Gossweiler, Rich

    2003-01-01

    This chapter describes the ongoing process by which a multidisciplinary group at NASA's Ames Research Center is designing and implementing a large interactive work surface called the MERBoard Collaborative Workspace. A MERBoard system involves several distributed, large, touch-enabled, plasma display systems with custom MERBoard software. A centralized server and database back the system. We are continually tuning MERBoard to support over two hundred scientists and engineers during the surface operations of the Mars Exploration Rover Missions. These scientists and engineers come from various disciplines and are working both in small and large groups over a span of space and time. We describe the multidisciplinary, human-centered process by which this h4ERBoard system is being designed, the usage patterns and social interactions that we have observed, and issues we are currently facing.

  3. A database of microwave and sub-millimetre ice particle single scattering properties

    NASA Astrophysics Data System (ADS)

    Ekelund, Robin; Eriksson, Patrick

    2016-04-01

    Ice crystal particles are today a large contributing factor as to why cold-type clouds such as cirrus remain a large uncertainty in global climate models and measurements. The reason for this is the complex and varied morphology in which ice particles appear, as compared to liquid droplets with an in general spheroidal shape, thus making the description of electromagnetic properties of ice particles more complicated. Single scattering properties of frozen hydrometers have traditionally been approximated by representing the particles as spheres using Mie theory. While such practices may work well in radio applications, where the size parameter of the particles is generally low, comparisons with measurements and simulations show that this assumption is insufficient when observing tropospheric cloud ice in the microwave or sub-millimetre regions. In order to assist the radiative transfer and remote sensing communities, a database of single scattering properties of semi-realistic particles is being produced. The data is being produced using DDA (Discrete Dipole Approximation) code which can treat arbitrarily shaped particles, and Tmatrix code for simpler shapes when found sufficiently accurate. The aim has been to mainly cover frequencies used by the upcoming ICI (Ice Cloud Imager) mission with launch in 2022. Examples of particles to be included are columns, plates, bullet rosettes, sector snowflakes and aggregates. The idea is to treat particles with good average optical properties with respect to the multitude of particles and aggregate types appearing in nature. The database will initially only cover macroscopically isotropic orientation, but will eventually also include horizontally aligned particles. Databases of DDA particles do already exist with varying accessibility. The goal of this database is to complement existing data. Regarding the distribution of the data, the plan is that the database shall be available in conjunction with the ARTS (Atmospheric Radiative Transfer Simulator) project.

  4. DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS ...

    EPA Pesticide Factsheets

    DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction ModelsAnn M. RichardUS Environmental Protection Agency, Research Triangle Park, NC, USADistributed: Decentralized set of standardized, field-delimited databases, each separatelyauthored and maintained, that are able to accommodate diverse toxicity data content;Structure-Searchable: Standard format (SDF) structure-data files that can be readily imported into available chemical relational databases and structure-searched;Tox: Toxicity data as it exists in widely disparate forms in current public databases, spanning diverse toxicity endpoints, test systems, levels of biological content, degrees of summarization, and information content.INTRODUCTIONThe economic and social pressures to reduce the need for animal testing and to better anticipate the potential for human and eco-toxicity of environmental, industrial, or pharmaceutical chemicals are as pressing today as at any time prior. However, the goal of predicting chemical toxicity in its many manifestations, the `T' in 'ADMET' (adsorption, distribution, metabolism, elimination, toxicity), remains one of the most difficult and largely unmet challenges in a chemical screening paradigm [1]. It is widely acknowledged that the single greatest hurdle to improving structure-activity relationship (SAR) toxicity prediction capabilities, in both the pharmaceutical and environmental regulation arenas, is the lack of suffici

  5. Distribution Characteristics of Air-Bone Gaps – Evidence of Bias in Manual Audiometry

    PubMed Central

    Margolis, Robert H.; Wilson, Richard H.; Popelka, Gerald R.; Eikelboom, Robert H.; Swanepoel, De Wet; Saly, George L.

    2015-01-01

    Objective Five databases were mined to examine distributions of air-bone gaps obtained by automated and manual audiometry. Differences in distribution characteristics were examined for evidence of influences unrelated to the audibility of test signals. Design The databases provided air- and bone-conduction thresholds that permitted examination of air-bone gap distributions that were free of ceiling and floor effects. Cases with conductive hearing loss were eliminated based on air-bone gaps, tympanometry, and otoscopy, when available. The analysis is based on 2,378,921 threshold determinations from 721,831 subjects from five databases. Results Automated audiometry produced air-bone gaps that were normally distributed suggesting that air- and bone-conduction thresholds are normally distributed. Manual audiometry produced air-bone gaps that were not normally distributed and show evidence of biasing effects of assumptions of expected results. In one database, the form of the distributions showed evidence of inclusion of conductive hearing losses. Conclusions Thresholds obtained by manual audiometry show tester bias effects from assumptions of the patient’s hearing loss characteristics. Tester bias artificially reduces the variance of bone-conduction thresholds and the resulting air-bone gaps. Because the automated method is free of bias from assumptions of expected results, these distributions are hypothesized to reflect the true variability of air- and bone-conduction thresholds and the resulting air-bone gaps. PMID:26627469

  6. Private database queries based on counterfactual quantum key distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-Li; Guo, Fen-Zhuo; Gao, Fei; Liu, Bin; Wen, Qiao-Yan

    2013-08-01

    Based on the fundamental concept of quantum counterfactuality, we propose a protocol to achieve quantum private database queries, which is a theoretical study of how counterfactuality can be employed beyond counterfactual quantum key distribution (QKD). By adding crucial detecting apparatus to the device of QKD, the privacy of both the distrustful user and the database owner can be guaranteed. Furthermore, the proposed private-database-query protocol makes full use of the low efficiency in the counterfactual QKD, and by adjusting the relevant parameters, the protocol obtains excellent flexibility and extensibility.

  7. Antibiotic distribution channels in Thailand: results of key-informant interviews, reviews of drug regulations and database searches

    PubMed Central

    Chanvatik, Sunicha; Sermsinsiri, Varavoot; Sivilaikul, Somsajee; Patcharanarumol, Walaiporn; Yeung, Shunmay; Tangcharoensathien, Viroj

    2018-01-01

    Abstract Objective To analyse how antibiotics are imported, manufactured, distributed and regulated in Thailand. Methods We gathered information, on antibiotic distribution in Thailand, in in-depth interviews – with 43 key informants from farms, health facilities, pharmaceutical and animal feed industries, private pharmacies and regulators– and in database and literature searches. Findings In 2016–2017, licensed antibiotic distribution in Thailand involves over 700 importers and about 24 000 distributors – e.g. retail pharmacies and wholesalers. Thailand imports antibiotics and active pharmaceutical ingredients. There is no system for monitoring the distribution of active ingredients, some of which are used directly on farms, without being processed. Most antibiotics can be bought from pharmacies, for home or farm use, without a prescription. Although the 1987 Drug Act classified most antibiotics as “dangerous drugs”, it only classified a few of them as prescription-only medicines and placed no restrictions on the quantities of antibiotics that could be sold to any individual. Pharmacists working in pharmacies are covered by some of the Act’s regulations, but the quality of their dispensing and prescribing appears to be largely reliant on their competences. Conclusion In Thailand, most antibiotics are easily and widely available from retail pharmacies, without a prescription. If the inappropriate use of active pharmaceutical ingredients and antibiotics is to be reduced, we need to reclassify and restrict access to certain antibiotics and to develop systems to audit the dispensing of antibiotics in the retail sector and track the movements of active ingredients. PMID:29403113

  8. Scripps Genome ADVISER: Annotation and Distributed Variant Interpretation SERver

    PubMed Central

    Pham, Phillip H.; Shipman, William J.; Erikson, Galina A.; Schork, Nicholas J.; Torkamani, Ali

    2015-01-01

    Interpretation of human genomes is a major challenge. We present the Scripps Genome ADVISER (SG-ADVISER) suite, which aims to fill the gap between data generation and genome interpretation by performing holistic, in-depth, annotations and functional predictions on all variant types and effects. The SG-ADVISER suite includes a de-identification tool, a variant annotation web-server, and a user interface for inheritance and annotation-based filtration. SG-ADVISER allows users with no bioinformatics expertise to manipulate large volumes of variant data with ease – without the need to download large reference databases, install software, or use a command line interface. SG-ADVISER is freely available at genomics.scripps.edu/ADVISER. PMID:25706643

  9. A multi-user real time inventorying system for radioactive materials: a networking approach.

    PubMed

    Mehta, S; Bandyopadhyay, D; Hoory, S

    1998-01-01

    A computerized system for radioisotope management and real time inventory coordinated across a large organization is reported. It handles hundreds of individual users and their separate inventory records. Use of highly efficient computer network and database technologies makes it possible to accept, maintain, and furnish all records related to receipt, usage, and disposal of the radioactive materials for the users separately and collectively. The system's central processor is an HP-9000/800 G60 RISC server and users from across the organization use their personal computers to login to this server using the TCP/IP networking protocol, which makes distributed use of the system possible. Radioisotope decay is automatically calculated by the program, so that it can make the up-to-date radioisotope inventory data of an entire institution available immediately. The system is specifically designed to allow use by large numbers of users (about 300) and accommodates high volumes of data input and retrieval without compromising simplicity and accuracy. Overall, it is an example of a true multi-user, on-line, relational database information system that makes the functioning of a radiation safety department efficient.

  10. The Epimed Monitor ICU Database®: a cloud-based national registry for adult intensive care unit patients in Brazil.

    PubMed

    Zampieri, Fernando Godinho; Soares, Márcio; Borges, Lunna Perdigão; Salluh, Jorge Ibrain Figueira; Ranzani, Otávio Tavares

    2017-01-01

    To describe the Epimed Monitor Database®, a Brazilian intensive care unit quality improvement database. We described the Epimed Monitor® Database, including its structure and core data. We presented aggregated informative data from intensive care unit admissions from 2010 to 2016 using descriptive statistics. We also described the expansion and growth of the database along with the geographical distribution of participating units in Brazil. The core data from the database includes demographic, administrative and physiological parameters, as well as specific report forms used to gather detailed data regarding the use of intensive care unit resources, infectious episodes, adverse events and checklists for adherence to best clinical practices. As of the end of 2016, 598 adult intensive care units in 318 hospitals totaling 8,160 intensive care unit beds were participating in the database. Most units were located at private hospitals in the southeastern region of the country. The number of yearly admissions rose during this period and included a predominance of medical admissions. The proportion of admissions due to cardiovascular disease declined, while admissions due to sepsis or infections became more common. Illness severity (Simplified Acute Physiology Score - SAPS 3 - 62 points), patient age (mean = 62 years) and hospital mortality (approximately 17%) remained reasonably stable during this time period. A large private database of critically ill patients is feasible and may provide relevant nationwide epidemiological data for quality improvement and benchmarking purposes among the participating intensive care units. This database is useful not only for administrative reasons but also for the improvement of daily care by facilitating the adoption of best practices and use for clinical research.

  11. IUEAGN: A database of ultraviolet spectra of active galactic nuclei

    NASA Technical Reports Server (NTRS)

    Pike, G.; Edelson, R.; Shull, J. M.; Saken, J.

    1993-01-01

    In 13 years of operation, IUE has gathered approximately 5000 spectra of almost 600 Active Galactic Nuclei (AGN). In order to undertake AGN studies which require large amounts of data, we are consistently reducing this entire archive and creating a homogeneous, easy-to-use database. First, the spectra are extracted using the Optimal extraction algorithm. Continuum fluxes are then measured across predefined bands, and line fluxes are measured with a multi-component fit. These results, along with source information such as redshifts and positions, are placed in the IUEAGN relational database. Analysis algorithms, statistical tests, and plotting packages run within the structure, and this flexible database can accommodate future data when they are released. This archival approach has already been used to survey line and continuum variability in six bright Seyfert 1s and rapid continuum variability in 14 blazars. Among the results that could only be obtained using a large archival study is evidence that blazars show a positive correlation between degree of variability and apparent luminosity, while Seyfert 1s show an anti-correlation. This suggests that beaming dominates the ultraviolet properties for blazars, while thermal emission from an accretion disk dominates for Seyfert 1s. Our future plans include a survey of line ratios in Seyfert 1s, to be fitted with photoionization models to test the models and determine the range of temperatures, densities and ionization parameters. We will also include data from IRAS, Einstein, EXOSAT, and ground-based telescopes to measure multi-wavelength correlations and broadband spectral energy distributions.

  12. The Interannual Stability of Cumulative Frequency Distributions for Convective System Size and Intensity

    NASA Technical Reports Server (NTRS)

    Mohr, Karen I.; Molinari, John; Thorncroft, Chris D,

    2010-01-01

    The characteristics of convective system populations in West Africa and the western Pacific tropical cyclone basin were analyzed to investigate whether interannual variability in convective activity in tropical continental and oceanic environments is driven by variations in the number of events during the wet season or by favoring large and/or intense convective systems. Convective systems were defined from TRMM data as a cluster of pixels with an 85 GHz polarization-corrected brightness temperature below 255 K and with an area at least 64 km 2. The study database consisted of convective systems in West Africa from May Sep for 1998-2007 and in the western Pacific from May Nov 1998-2007. Annual cumulative frequency distributions for system minimum brightness temperature and system area were constructed for both regions. For both regions, there were no statistically significant differences among the annual curves for system minimum brightness temperature. There were two groups of system area curves, split by the TRMM altitude boost in 2001. Within each set, there was no statistically significant interannual variability. Sub-setting the database revealed some sensitivity in distribution shape to the size of the sampling area, length of sample period, and climate zone. From a regional perspective, the stability of the cumulative frequency distributions implied that the probability that a convective system would attain a particular size or intensity does not change interannually. Variability in the number of convective events appeared to be more important in determining whether a year is wetter or drier than normal.

  13. The longevity of lava dome eruptions: analysis of the global DomeHaz database

    NASA Astrophysics Data System (ADS)

    Ogburn, S. E.; Wolpert, R.; Calder, E.; Pallister, J. S.; Wright, H. M. N.

    2015-12-01

    The likely duration of ongoing volcanic eruptions is a topic of great interest to volcanologists, volcano observatories, and communities near volcanoes. Lava dome forming eruptions can last from days to centuries, and can produce violent, difficult-to-forecast activity including vulcanian to plinian explosions and pyroclastic density currents. Periods of active dome extrusion are often interspersed with periods of relative quiescence, during which extrusion may slow or pause altogether, but persistent volcanic unrest continues. This contribution focuses on the durations of these longer-term unrest phases, hereafter eruptions, that include periods of both lava extrusion and quiescence. A new database of lava dome eruptions, DomeHaz, provides characteristics of 228 eruptions at 127 volcanoes; for which 177 have duration information. We find that while 78% of dome-forming eruptions do not continue for more than 5 years, the remainder can be very long-lived. The probability distributions of eruption durations are shown to be heavy-tailed and vary by magma composition. For this reason, eruption durations are modeled with generalized Pareto distributions whose governing parameters depend on each volcano's composition and eruption duration to date. Bayesian predictive distributions and associated uncertainties are presented for the remaining duration of ongoing eruptions of specified composition and duration to date. Forecasts of such natural events will always have large uncertainties, but the ability to quantify such uncertainty is key to effective communication with stakeholders and to mitigation of hazards. Projections are made for the remaining eruption durations of ongoing eruptions, including those at Soufrière Hills Volcano, Montserrat and Sinabung, Indonesia. This work provides a quantitative, transferable method and rationale on which to base long-term planning decisions for dome forming volcanoes of different compositions, regardless of the quality of an individual volcano's eruptive record, by leveraging a global database.

  14. New model for distributed multimedia databases and its application to networking of museums

    NASA Astrophysics Data System (ADS)

    Kuroda, Kazuhide; Komatsu, Naohisa; Komiya, Kazumi; Ikeda, Hiroaki

    1998-02-01

    This paper proposes a new distributed multimedia data base system where the databases storing MPEG-2 videos and/or super high definition images are connected together through the B-ISDN's, and also refers to an example of the networking of museums on the basis of the proposed database system. The proposed database system introduces a new concept of the 'retrieval manager' which functions an intelligent controller so that the user can recognize a set of image databases as one logical database. A user terminal issues a request to retrieve contents to the retrieval manager which is located in the nearest place to the user terminal on the network. Then, the retrieved contents are directly sent through the B-ISDN's to the user terminal from the server which stores the designated contents. In this case, the designated logical data base dynamically generates the best combination of such a retrieving parameter as a data transfer path referring to directly or data on the basis of the environment of the system. The generated retrieving parameter is then executed to select the most suitable data transfer path on the network. Therefore, the best combination of these parameters fits to the distributed multimedia database system.

  15. Sex Determination, Sex Chromosomes, and Karyotype Evolution in Insects.

    PubMed

    Blackmon, Heath; Ross, Laura; Bachtrog, Doris

    2017-01-01

    Insects harbor a tremendous diversity of sex determining mechanisms both within and between groups. For example, in some orders such as Hymenoptera, all members are haplodiploid, whereas Diptera contain species with homomorphic as well as male and female heterogametic sex chromosome systems or paternal genome elimination. We have established a large database on karyotypes and sex chromosomes in insects, containing information on over 13000 species covering 29 orders of insects. This database constitutes a unique starting point to report phylogenetic patterns on the distribution of sex determination mechanisms, sex chromosomes, and karyotypes among insects and allows us to test general theories on the evolutionary dynamics of karyotypes, sex chromosomes, and sex determination systems in a comparative framework. Phylogenetic analysis reveals that male heterogamety is the ancestral mode of sex determination in insects, and transitions to female heterogamety are extremely rare. Many insect orders harbor species with complex sex chromosomes, and gains and losses of the sex-limited chromosome are frequent in some groups. Haplodiploidy originated several times within insects, and parthenogenesis is rare but evolves frequently. Providing a single source to electronically access data previously distributed among more than 500 articles and books will not only accelerate analyses of the assembled data, but also provide a unique resource to guide research on which taxa are likely to be informative to address specific questions, for example, for genome sequencing projects or large-scale comparative studies. © The American Genetic Association 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  16. Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

    NASA Astrophysics Data System (ADS)

    Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

    2018-05-01

    Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.

  17. Content Based Image Retrieval based on Wavelet Transform coefficients distribution

    PubMed Central

    Lamard, Mathieu; Cazuguel, Guy; Quellec, Gwénolé; Bekri, Lynda; Roux, Christian; Cochener, Béatrice

    2007-01-01

    In this paper we propose a content based image retrieval method for diagnosis aid in medical fields. We characterize images without extracting significant features by using distribution of coefficients obtained by building signatures from the distribution of wavelet transform. The research is carried out by computing signature distances between the query and database images. Several signatures are proposed; they use a model of wavelet coefficient distribution. To enhance results, a weighted distance between signatures is used and an adapted wavelet base is proposed. Retrieval efficiency is given for different databases including a diabetic retinopathy, a mammography and a face database. Results are promising: the retrieval efficiency is higher than 95% for some cases using an optimization process. PMID:18003013

  18. Application of kernel functions for accurate similarity search in large chemical databases.

    PubMed

    Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H

    2010-04-29

    Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.

  19. Do large hiatal hernias affect esophageal peristalsis?

    PubMed Central

    Roman, Sabine; Kahrilas, Peter J; Kia, Leila; Luger, Daniel; Soper, Nathaniel; Pandolfino, John E

    2013-01-01

    Background & Aim Large hiatal hernias can be associated with a shortened or tortuous esophagus. We hypothesized that these anatomic changes may alter esophageal pressure topography (EPT) measurements made during high-resolution manometry (HRM). Our aim was to compare EPT measures of esophageal motility in patients with large hiatal hernias to those of patients without hernia. Methods Among 2000 consecutive clinical EPT, we identified 90 patients with large (>5 cm) hiatal hernias on endoscopy and at least 7 evaluable swallows on EPT. Within the same database a control group without hernia was selected. EPT was analyzed for lower esophageal sphincter (LES) pressure, Distal Contractile Integral (DCI), contraction amplitude, Contractile Front Velocity (CFV) and Distal Latency time (DL). Esophageal length was measured on EPT from the distal border of upper esophageal sphincter to the proximal border of the LES. EPT diagnosis was based on the Chicago Classification. Results The manometry catheter was coiled in the hernia and did not traverse the crural diaphragm in 44 patients (49%) with large hernia. Patients with large hernias had lower average LES pressures, lower DCI, slower CFV and shorter DL than patients without hernia. They also exhibited a shorter mean esophageal length. However, the distribution of peristaltic abnormalities was not different in patients with and without large hernia. Conclusions Patients with large hernias had an alteration of EPT measurements as a consequence of the associated shortened esophagus. However, the distribution of peristaltic disorders was unaffected by the presence of hernia. PMID:22508779

  20. Gamma-Ray Burst Intensity Distributions

    NASA Technical Reports Server (NTRS)

    Band, David L.; Norris, Jay P.; Bonnell, Jerry T.

    2004-01-01

    We use the lag-luminosity relation to calculate self-consistently the redshifts, apparent peak bolometric luminosities L(sub B1), and isotropic energies E(sub iso) for a large sample of BATSE bursts. We consider two different forms of the lag-luminosity relation; for both forms the median redshift, for our burst database is 1.6. We model the resulting sample of burst energies with power law and Gaussian dis- tributions, both of which are reasonable models. The power law model has an index of a = 1.76 plus or minus 0.05 (95% confidence) as opposed to the index of a = 2 predicted by the simple universal jet profile model; however, reasonable refinements to this model permit much greater flexibility in reconciling predicted and observed energy distributions.

  1. Introduction

    NASA Astrophysics Data System (ADS)

    Zhao, Ben; Garbacki, Paweł; Gkantsidis, Christos; Iamnitchi, Adriana; Voulgaris, Spyros

    After a decade of intensive investigation, peer-to-peer computing has established itself as an accepted research eld in the general area of distributed systems. Peer-to- peer computing can be seen as the democratization of computing over throwing traditional hierarchical designs favored in client-server systems largely brought about by last-mile network improvements which have made individual PCs rst-class citizens in the network community. Much of the early focus in peer-to-peer systems was on best-effort le sharing applications. In recent years, however, research has focused on peer-to-peer systems that provide operational properties and functionality similar to those shown by more traditional distributed systems. These properties include stronger consistency, reliability, and security guarantees suitable to supporting traditional applications such as databases.

  2. Integrating a local database into the StarView distributed user interface

    NASA Technical Reports Server (NTRS)

    Silberberg, D. P.

    1992-01-01

    A distributed user interface to the Space Telescope Data Archive and Distribution Service (DADS) known as StarView is being developed. The DADS architecture consists of the data archive as well as a relational database catalog describing the archive. StarView is a client/server system in which the user interface is the front-end client to the DADS catalog and archive servers. Users query the DADS catalog from the StarView interface. Query commands are transmitted via a network and evaluated by the database. The results are returned via the network and are displayed on StarView forms. Based on the results, users decide which data sets to retrieve from the DADS archive. Archive requests are packaged by StarView and sent to DADS, which returns the requested data sets to the users. The advantages of distributed client/server user interfaces over traditional one-machine systems are well known. Since users run software on machines separate from the database, the overall client response time is much faster. Also, since the server is free to process only database requests, the database response time is much faster. Disadvantages inherent in this architecture are slow overall database access time due to the network delays, lack of a 'get previous row' command, and that refinements of a previously issued query must be submitted to the database server, even though the domain of values have already been returned by the previous query. This architecture also does not allow users to cross correlate DADS catalog data with other catalogs. Clearly, a distributed user interface would be more powerful if it overcame these disadvantages. A local database is being integrated into StarView to overcome these disadvantages. When a query is made through a StarView form, which is often composed of fields from multiple tables, it is translated to an SQL query and issued to the DADS catalog. At the same time, a local database table is created to contain the resulting rows of the query. The returned rows are displayed on the form as well as inserted into the local database table. Identical results are produced by reissuing the query to either the DADS catalog or to the local table. Relational databases do not provide a 'get previous row' function because of the inherent complexity of retrieving previous rows of multiple-table joins. However, since this function is easily implemented on a single table, StarView uses the local table to retrieve the previous row. Also, StarView issues subsequent query refinements to the local table instead of the DADS catalog, eliminating the network transmission overhead. Finally, other catalogs can be imported into the local database for cross correlation with local tables. Overall, it is believe that this is a more powerful architecture for distributed, database user interfaces.

  3. Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

    PubMed Central

    Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

    2009-01-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578

  4. Document similarity measures and document browsing

    NASA Astrophysics Data System (ADS)

    Ahmadullin, Ildus; Fan, Jian; Damera-Venkata, Niranjan; Lim, Suk Hwan; Lin, Qian; Liu, Jerry; Liu, Sam; O'Brien-Strain, Eamonn; Allebach, Jan

    2011-03-01

    Managing large document databases is an important task today. Being able to automatically com- pare document layouts and classify and search documents with respect to their visual appearance proves to be desirable in many applications. We measure single page documents' similarity with respect to distance functions between three document components: background, text, and saliency. Each document component is represented as a Gaussian mixture distribution; and distances between dierent documents' components are calculated as probabilistic similarities between corresponding distributions. The similarity measure between documents is represented as a weighted sum of the components' distances. Using this document similarity measure, we propose a browsing mechanism operating on a document dataset. For these purposes, we use a hierarchical browsing environment which we call the document similarity pyramid. It allows the user to browse a large document dataset and to search for documents in the dataset that are similar to the query. The user can browse the dataset on dierent levels of the pyramid, and zoom into the documents that are of interest.

  5. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

    PubMed

    Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

    2009-06-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).

  6. A WebGIS system on the base of satellite data processing system for marine application

    NASA Astrophysics Data System (ADS)

    Gong, Fang; Wang, Difeng; Huang, Haiqing; Chen, Jianyu

    2007-10-01

    From 2002 to 2004, a satellite data processing system for marine application had been built up in State Key Laboratory of Satellite Ocean Environment Dynamics (Second Institute of Oceanography, State Oceanic Administration). The system received satellite data from TERRA, AQUA, NOAA-12/15/16/17/18, FY-1D and automatically generated Level3 products and Level4 products(products of single orbit and merged multi-orbits products) deriving from Level0 data, which is controlled by an operational control sub-system. Currently, the products created by this system play an important role in the marine environment monitoring, disaster monitoring and researches. Now a distribution platform has been developed on this foundation, namely WebGIS system for querying and browsing of oceanic remote sensing data. This system is based upon large database system-Oracle. We made use of the space database engine of ArcSDE and other middleware to perform database operation in addition. J2EE frame was adopted as development model, and Oracle 9.2 DBMS as database background and server. Simply using standard browsers(such as IE6.0), users can visit and browse the public service information that provided by system, including browsing for oceanic remote sensing data, and enlarge, contract, move, renew, traveling, further data inquiry, attribution search and data download etc. The system is still under test now. Founding of such a system will become an important distribution platform of Chinese satellite oceanic environment products of special topic and category (including Sea surface temperature, Concentration of chlorophyll, and so on), for the exaltation of satellite products' utilization and promoting the data share and the research of the oceanic remote sensing platform.

  7. Data model and relational database design for the New England Water-Use Data System (NEWUDS)

    USGS Publications Warehouse

    Tessler, Steven

    2001-01-01

    The New England Water-Use Data System (NEWUDS) is a database for the storage and retrieval of water-use data. NEWUDS can handle data covering many facets of water use, including (1) tracking various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the description, classification and location of places and organizations involved in water-use activities; (3) details about measured or estimated volumes of water associated with water-use activities; and (4) information about data sources and water resources associated with water use. In NEWUDS, each water transaction occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NEWUDS model are site, conveyance, transaction/rate, location, and owner. Other important entities include water resources (used for withdrawals and returns), data sources, and aliases. Multiple water-exchange estimates can be stored for individual transactions based on different methods or data sources. Storage of user-defined details is accommodated for several of the main entities. Numerous tables containing classification terms facilitate detailed descriptions of data items and can be used for routine or custom data summarization. NEWUDS handles single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database structure. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  8. Process evaluation distributed system

    NASA Technical Reports Server (NTRS)

    Moffatt, Christopher L. (Inventor)

    2006-01-01

    The distributed system includes a database server, an administration module, a process evaluation module, and a data display module. The administration module is in communication with the database server for providing observation criteria information to the database server. The process evaluation module is in communication with the database server for obtaining the observation criteria information from the database server and collecting process data based on the observation criteria information. The process evaluation module utilizes a personal digital assistant (PDA). A data display module in communication with the database server, including a website for viewing collected process data in a desired metrics form, the data display module also for providing desired editing and modification of the collected process data. The connectivity established by the database server to the administration module, the process evaluation module, and the data display module, minimizes the requirement for manual input of the collected process data.

  9. Heterogeneous distributed databases: A case study

    NASA Technical Reports Server (NTRS)

    Stewart, Tracy R.; Mukkamala, Ravi

    1991-01-01

    Alternatives are reviewed for accessing distributed heterogeneous databases and a recommended solution is proposed. The current study is limited to the Automated Information Systems Center at the Naval Sea Combat Systems Engineering Station at Norfolk, VA. This center maintains two databases located on Digital Equipment Corporation's VAX computers running under the VMS operating system. The first data base, ICMS, resides on a VAX11/780 and has been implemented using VAX DBMS, a CODASYL based system. The second database, CSA, resides on a VAX 6460 and has been implemented using the ORACLE relational database management system (RDBMS). Both databases are used for configuration management within the U.S. Navy. Different customer bases are supported by each database. ICMS tracks U.S. Navy ships and major systems (anti-sub, sonar, etc.). Even though the major systems on ships and submarines have totally different functions, some of the equipment within the major systems are common to both ships and submarines.

  10. Database Search Strategies & Tips. Reprints from the Best of "ONLINE" [and]"DATABASE."

    ERIC Educational Resources Information Center

    Online, Inc., Weston, CT.

    Reprints of 17 articles presenting strategies and tips for searching databases online appear in this collection, which is one in a series of volumes of reprints from "ONLINE" and "DATABASE" magazines. Edited for information professionals who use electronically distributed databases, these articles address such topics as: (1)…

  11. Characterizing worldwide patterns of fluvial geomorphology and hydrology with the Global River Widths from Landsat (GRWL) database

    NASA Astrophysics Data System (ADS)

    Allen, G. H.; Pavelsky, T.

    2015-12-01

    The width of a river reflects complex interactions between river water hydraulics and other physical factors like bank erosional resistance, sediment supply, and human-made structures. A broad range of fluvial process studies use spatially distributed river width data to understand and quantify flood hazards, river water flux, or fluvial greenhouse gas efflux. Ongoing technological advances in remote sensing, computing power, and model sophistication are moving river system science towards global-scale studies that aim to understand the Earth's fluvial system as a whole. As such, a global spatially distributed database of river location and width is necessary to better constrain these studies. Here we present the Global River Width from Landsat (GRWL) Database, the first global-scale database of river planform at mean discharge. With a resolution of 30 m, GRWL consists of 58 million measurements of river centerline location, width, and braiding index. In total, GRWL measures 2.1 million km of rivers wider than 30 m, corresponding to 602 thousand km2 of river water surface area, a metric used to calculate global greenhouse gas emissions from rivers to the atmosphere. Using data from GRWL, we find that ~20% of the world's rivers are located above 60ºN where little high quality information exists about rivers of any kind. Further, we find that ~10% of the world's large rivers are multichannel, which may impact the development of the new generation of regional and global hydrodynamic models. We also investigate the spatial controls of global fluvial geomorphology and river hydrology by comparing climate, topography, geology, and human population density to GRWL measurements. The GRWL Database will be made publically available upon publication to facilitate improved understanding of Earth's fluvial system. Finally, GRWL will be used as an a priori data for the joint NASA/CNES Surface Water and Ocean Topography (SWOT) Satellite Mission, planned for launch in 2020.

  12. Measures of dependence for multivariate Lévy distributions

    NASA Astrophysics Data System (ADS)

    Boland, J.; Hurd, T. R.; Pivato, M.; Seco, L.

    2001-02-01

    Recent statistical analysis of a number of financial databases is summarized. Increasing agreement is found that logarithmic equity returns show a certain type of asymptotic behavior of the largest events, namely that the probability density functions have power law tails with an exponent α≈3.0. This behavior does not vary much over different stock exchanges or over time, despite large variations in trading environments. The present paper proposes a class of multivariate distributions which generalizes the observed qualities of univariate time series. A new consequence of the proposed class is the "spectral measure" which completely characterizes the multivariate dependences of the extreme tails of the distribution. This measure on the unit sphere in M-dimensions, in principle completely general, can be determined empirically by looking at extreme events. If it can be observed and determined, it will prove to be of importance for scenario generation in portfolio risk management.

  13. Gender Distribution of Pediatric Stone Formers

    NASA Astrophysics Data System (ADS)

    Novak, Thomas E.; Trock, Bruce J.; Lakshmanan, Yegappan; Gearhart, John P.; Matlaga, Brian R.

    2008-09-01

    Recent epidemiologic evidence suggests that the gender prevalence among adult stone-formers is changing, with an increasing incidence of stone disease among women. No similar data have ever been reported for the pediatric stone-forming population. We performed a study to define the gender distribution among pediatric stone-formers using a large-scale national pediatric database. Our findings suggest that gender distribution among stone formers varies by age with male predominance in the first decade of life shifting to female predominance in the second decade. In contrast to adults, females in the pediatric population are more commonly affected by stones than are males. The incidence of pediatric stone disease appears to be increasing at a great rate in both sexes. Further studies should build on this hypothesis-generating work and define the effects of metabolic and environmental risk factors that may influence stone risk in the pediatric patient population

  14. Resources | Division of Cancer Prevention

    Cancer.gov

    Manual of Operations Version 3, 12/13/2012 (PDF, 162KB) Database Sources Consortium for Functional Glycomics databases Design Studies Related to the Development of Distributed, Web-based European Carbohydrate Databases (EUROCarbDB) |

  15. Accounting for rainfall spatial variability in the prediction of flash floods

    NASA Astrophysics Data System (ADS)

    Saharia, Manabendra; Kirstetter, Pierre-Emmanuel; Gourley, Jonathan J.; Hong, Yang; Vergara, Humberto; Flamig, Zachary L.

    2017-04-01

    Flash floods are a particularly damaging natural hazard worldwide in terms of both fatalities and property damage. In the United States, the lack of a comprehensive database that catalogues information related to flash flood timing, location, causative rainfall, and basin geomorphology has hindered broad characterization studies. First a representative and long archive of more than 15,000 flooding events during 2002-2011 is used to analyze the spatial and temporal variability of flash floods. We also derive large number of spatially distributed geomorphological and climatological parameters such as basin area, mean annual precipitation, basin slope etc. to identify static basin characteristics that influence flood response. For the same period, the National Severe Storms Laboratory (NSSL) has produced a decadal archive of Multi-Radar/Multi-Sensor (MRMS) radar-only precipitation rates at 1-km spatial resolution with 5-min temporal resolution. This provides an unprecedented opportunity to analyze the impact of event-level precipitation variability on flooding using a big data approach. To analyze the impact of sub-basin scale rainfall spatial variability on flooding, certain indices such as the first and second scaled moment of rainfall, horizontal gap, vertical gap etc. are computed from the MRMS dataset. Finally, flooding characteristics such as rise time, lag time, and peak discharge are linked to derived geomorphologic, climatologic, and rainfall indices to identify basin characteristics that drive flash floods. The database has been subjected to rigorous quality control by accounting for radar beam height and percentage snow in basins. So far studies involving rainfall variability indices have only been performed on a case study basis, and a large scale approach is expected to provide a deeper insight into how sub-basin scale precipitation variability affects flooding. Finally, these findings are validated using the National Weather Service storm reports and a historical flood fatalities database. This analysis framework will serve as a baseline for evaluating distributed hydrologic model simulations such as the Flooded Locations And Simulated Hydrographs Project (FLASH) (http://flash.ou.edu).

  16. Development, Use, and Impact of a Global Laboratory Database During the 2014 Ebola Outbreak in West Africa.

    PubMed

    Durski, Kara N; Singaravelu, Shalini; Teo, Junxiong; Naidoo, Dhamari; Bawo, Luke; Jambai, Amara; Keita, Sakoba; Yahaya, Ali Ahmed; Muraguri, Beatrice; Ahounou, Brice; Katawera, Victoria; Kuti-George, Fredson; Nebie, Yacouba; Kohar, T Henry; Hardy, Patrick Jowlehpah; Djingarey, Mamoudou Harouna; Kargbo, David; Mahmoud, Nuha; Assefa, Yewondwossen; Condell, Orla; N'Faly, Magassouba; Van Gurp, Leon; Lamanu, Margaret; Ryan, Julia; Diallo, Boubacar; Daffae, Foday; Jackson, Dikena; Malik, Fayyaz Ahmed; Raftery, Philomena; Formenty, Pierre

    2017-06-15

    The international impact, rapid widespread transmission, and reporting delays during the 2014 Ebola outbreak in West Africa highlighted the need for a global, centralized database to inform outbreak response. The World Health Organization and Emerging and Dangerous Pathogens Laboratory Network addressed this need by supporting the development of a global laboratory database. Specimens were collected in the affected countries from patients and dead bodies meeting the case definitions for Ebola virus disease. Test results were entered in nationally standardized spreadsheets and consolidated onto a central server. From March 2014 through August 2016, 256343 specimens tested for Ebola virus disease were captured in the database. Thirty-one specimen types were collected, and a variety of diagnostic tests were performed. Regular analysis of data described the functionality of laboratory and response systems, positivity rates, and the geographic distribution of specimens. With data standardization and end user buy-in, the collection and analysis of large amounts of data with multiple stakeholders and collaborators across various user-access levels was made possible and contributed to outbreak response needs. The usefulness and value of a multifunctional global laboratory database is far reaching, with uses including virtual biobanking, disease forecasting, and adaption to other disease outbreaks. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  17. Viral genome analysis and knowledge management.

    PubMed

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  18. Analysis and Design of a Distributed System for Management and Distribution of Natural Language Assertions

    DTIC Science & Technology

    2010-09-01

    5 2. SCIL Architecture ...............................................................................6 3. Assertions...137 x THIS PAGE INTENTIONALLY LEFT BLANK xi LIST OF FIGURES Figure 1. SCIL architecture...Database Connectivity LAN Local Area Network ODBC Open Database Connectivity SCIL Social-Cultural Content in Language UMD

  19. An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs.

    PubMed

    Cormode, Graham; Dasgupta, Anirban; Goyal, Amit; Lee, Chi Hoon

    2018-01-01

    Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users' queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with "vanilla" LSH, even when using the same amount of space.

  20. Incorporating client-server database architecture and graphical user interface into outpatient medical records.

    PubMed Central

    Fiacco, P. A.; Rice, W. H.

    1991-01-01

    Computerized medical record systems require structured database architectures for information processing. However, the data must be able to be transferred across heterogeneous platform and software systems. Client-Server architecture allows for distributive processing of information among networked computers and provides the flexibility needed to link diverse systems together effectively. We have incorporated this client-server model with a graphical user interface into an outpatient medical record system, known as SuperChart, for the Department of Family Medicine at SUNY Health Science Center at Syracuse. SuperChart was developed using SuperCard and Oracle SuperCard uses modern object-oriented programming to support a hypermedia environment. Oracle is a powerful relational database management system that incorporates a client-server architecture. This provides both a distributed database and distributed processing which improves performance. PMID:1807732

  1. On the connection of permafrost and debris flow activity in Austria

    NASA Astrophysics Data System (ADS)

    Huber, Thomas; Kaitna, Roland

    2016-04-01

    Debris flows represent a severe hazard in alpine regions and typically result from a critical combination of relief energy, water, and sediment. Hence, besides water-related trigger conditions, the availability of abundant sediment is a major control on debris flows activity in alpine regions. Increasing temperatures due to global warming are expected to affect periglacial regions and by that the distribution of alpine permafrost and the depth of the active layer, which in turn might lead to increased debris flow activity and increased interference with human interests. In this contribution we assess the importance of permafrost on documented debris flows in the past by connecting the modeled permafrost distribution with a large database of historic debris flows in Austria. The permafrost distribution is estimated based on a published model approach and mainly depends of altitude, relief, and exposition. The database of debris flows includes more than 4000 debris flow events in around 1900 watersheds. We find that 27 % of watersheds experiencing debris flow activity have a modeled permafrost area smaller than 5 % of total area. Around 7 % of the debris flow prone watersheds have an area larger than 5 %. Interestingly, our first results indicate that watersheds without permafrost experience significantly less, but more intense debris flow events than watersheds with modeled permafrost occurrence. Our study aims to contribute to a better understanding of geomorphic activity and the impact of climate change in alpine environments.

  2. National Databases for Neurosurgical Outcomes Research: Options, Strengths, and Limitations.

    PubMed

    Karhade, Aditya V; Larsen, Alexandra M G; Cote, David J; Dubois, Heloise M; Smith, Timothy R

    2017-08-05

    Quality improvement, value-based care delivery, and personalized patient care depend on robust clinical, financial, and demographic data streams of neurosurgical outcomes. The neurosurgical literature lacks a comprehensive review of large national databases. To assess the strengths and limitations of various resources for outcomes research in neurosurgery. A review of the literature was conducted to identify surgical outcomes studies using national data sets. The databases were assessed for the availability of patient demographics and clinical variables, longitudinal follow-up of patients, strengths, and limitations. The number of unique patients contained within each data set ranged from thousands (Quality Outcomes Database [QOD]) to hundreds of millions (MarketScan). Databases with both clinical and financial data included PearlDiver, Premier Healthcare Database, Vizient Clinical Data Base and Resource Manager, and the National Inpatient Sample. Outcomes collected by databases included patient-reported outcomes (QOD); 30-day morbidity, readmissions, and reoperations (National Surgical Quality Improvement Program); and disease incidence and disease-specific survival (Surveillance, Epidemiology, and End Results-Medicare). The strengths of large databases included large numbers of rare pathologies and multi-institutional nationally representative sampling; the limitations of these databases included variable data veracity, variable data completeness, and missing disease-specific variables. The improvement of existing large national databases and the establishment of new registries will be crucial to the future of neurosurgical outcomes research. Copyright © 2017 by the Congress of Neurological Surgeons

  3. Performance of Point and Range Queries for In-memory Databases using Radix Trees on GPUs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, Maksudul; Yoginath, Srikanth B; Perumalla, Kalyan S

    In in-memory database systems augmented by hardware accelerators, accelerating the index searching operations can greatly increase the runtime performance of database queries. Recently, adaptive radix trees (ART) have been shown to provide very fast index search implementation on the CPU. Here, we focus on an accelerator-based implementation of ART. We present a detailed performance study of our GPU-based adaptive radix tree (GRT) implementation over a variety of key distributions, synthetic benchmarks, and actual keys from music and book data sets. The performance is also compared with other index-searching schemes on the GPU. GRT on modern GPUs achieves some of themore » highest rates of index searches reported in the literature. For point queries, a throughput of up to 106 million and 130 million lookups per second is achieved for sparse and dense keys, respectively. For range queries, GRT yields 600 million and 1000 million lookups per second for sparse and dense keys, respectively, on a large dataset of 64 million 32-bit keys.« less

  4. Influence of Electron Molecule Resonant Vibrational Collisions over the Symmetric Mode and Direct Excitation-Dissociation Cross Sections of CO2 on the Electron Energy Distribution Function and Dissociation Mechanisms in Cold Pure CO2 Plasmas.

    PubMed

    Pietanza, L D; Colonna, G; Laporta, V; Celiberto, R; D'Ammando, G; Laricchiuta, A; Capitelli, M

    2016-05-05

    A new set of electron-vibrational (e-V) processes linking the first 10 vibrational levels of the symmetric mode of CO2 is derived by using a decoupled vibrational model and inserted in the Boltzmann equation for the electron energy distribution function (eedf). The new eedf and dissociation rates are in satisfactory agreement with the corresponding ones obtained by using the e-V cross sections reported in the database of Hake and Phelps (H-P). Large differences are, on the contrary, found when the experimental dissociation cross sections of Cosby and Helm are inserted in the Boltzman equation. Comparison of the corresponding rates with those obtained by using the low-energy threshold energy, reported in the H-P database, shows differences up to orders of magnitude, which decrease with the increasing of the reduced electric field. In all cases, we show the importance of superelastic vibrational collisions in affecting eedf and dissociation rates either in the direct electron impact mechanism or in the pure vibrational mechanism.

  5. Cultural macroevolution on neighbor graphs : vertical and horizontal transmission among Western North American Indian societies.

    PubMed

    Towner, Mary C; Grote, Mark N; Venti, Jay; Borgerhoff Mulder, Monique

    2012-09-01

    What are the driving forces of cultural macroevolution, the evolution of cultural traits that characterize societies or populations? This question has engaged anthropologists for more than a century, with little consensus regarding the answer. We develop and fit autologistic models, built upon both spatial and linguistic neighbor graphs, for 44 cultural traits of 172 societies in the Western North American Indian (WNAI) database. For each trait, we compare models including or excluding one or both neighbor graphs, and for the majority of traits we find strong evidence in favor of a model which uses both spatial and linguistic neighbors to predict a trait's distribution. Our results run counter to the assertion that cultural trait distributions can be explained largely by the transmission of traits from parent to daughter populations and are thus best analyzed with phylogenies. In contrast, we show that vertical and horizontal transmission pathways can be incorporated in a single model, that both transmission modes may indeed operate on the same trait, and that for most traits in the WNAI database, accounting for only one mode of transmission would result in a loss of information.

  6. Big data in sleep medicine: prospects and pitfalls in phenotyping

    PubMed Central

    Bianchi, Matt T; Russo, Kathryn; Gabbidon, Harriett; Smith, Tiaundra; Goparaju, Balaji; Westover, M Brandon

    2017-01-01

    Clinical polysomnography (PSG) databases are a rich resource in the era of “big data” analytics. We explore the uses and potential pitfalls of clinical data mining of PSG using statistical principles and analysis of clinical data from our sleep center. We performed retrospective analysis of self-reported and objective PSG data from adults who underwent overnight PSG (diagnostic tests, n=1835). Self-reported symptoms overlapped markedly between the two most common categories, insomnia and sleep apnea, with the majority reporting symptoms of both disorders. Standard clinical metrics routinely reported on objective data were analyzed for basic properties (missing values, distributions), pairwise correlations, and descriptive phenotyping. Of 41 continuous variables, including clinical and PSG derived, none passed testing for normality. Objective findings of sleep apnea and periodic limb movements were common, with 51% having an apnea–hypopnea index (AHI) >5 per hour and 25% having a leg movement index >15 per hour. Different visualization methods are shown for common variables to explore population distributions. Phenotyping methods based on clinical databases are discussed for sleep architecture, sleep apnea, and insomnia. Inferential pitfalls are discussed using the current dataset and case examples from the literature. The increasing availability of clinical databases for large-scale analytics holds important promise in sleep medicine, especially as it becomes increasingly important to demonstrate the utility of clinical testing methods in management of sleep disorders. Awareness of the strengths, as well as caution regarding the limitations, will maximize the productive use of big data analytics in sleep medicine. PMID:28243157

  7. Secondary analysis of a marketing research database reveals patterns in dairy product purchases over time.

    PubMed

    Van Wave, Timothy W; Decker, Michael

    2003-04-01

    Development of a method using marketing research data to assess food purchase behavior and consequent nutrient availability for purposes of nutrition surveillance, evaluation of intervention effects, and epidemiologic studies of diet-health relationships. Data collected on household food purchases accrued over a 13-week period were selected by using Universal Product Code numbers and household characteristics from a marketing research database. Universal Product Code numbers for 39,408 dairy product purchases were linked to a standard reference for food composition to estimate the nutrient content of foods purchased over time. Two thousand one hundred sixty-one households located in Victoria, Texas, and surrounding communities who were active members of a frequent shopper program. Demographic characteristics of sample households and the nutrient content of their dairy product purchases were analyzed using frequency distribution, cross tabulation, analysis of variance, and t test procedures. A method for using marketing research data was successfully used to estimate household purchases of specific foods and their nutrient content from a marketing database containing hundreds of thousands of records. Distribution of dairy product purchases and their concomitant nutrients between Hispanic and non-Hispanic households were significant (P<.01, P<.001, respectively) and sustained over time. Purchase records from large, nationally representative panels of shoppers, such as those maintained by major market research companies, might be used to accomplish detailed longitudinal epidemiologic studies or surveillance of national food- and nutrient-purchasing patterns within and between countries and segments of their respective populations.

  8. A Spatiotemporal Database to Track Human Scrub Typhus Using the VectorMap Application

    PubMed Central

    Kelly, Daryl J.; Foley, Desmond H.; Richards, Allen L.

    2015-01-01

    Scrub typhus is a potentially fatal mite-borne febrile illness, primarily of the Asia-Pacific Rim. With an endemic area greater than 13 million km2 and millions of people at risk, scrub typhus remains an underreported, often misdiagnosed febrile illness. A comprehensive, updatable map of the true distribution of cases has been lacking, and therefore the true risk of disease within the very large endemic area remains unknown. The purpose of this study was to establish a database and map to track human scrub typhus. An online search using PubMed and the United States Armed Forces Pest Management Board Literature Retrieval System was performed to identify articles describing human scrub typhus cases both within and outside the traditionally accepted endemic regions. Using World Health Organization guidelines, stringent criteria were used to establish diagnoses for inclusion in the database. The preliminary screening of 181 scrub typhus publications yielded 145 publications that met the case criterion, 267 case records, and 13 serosurvey records that could be georeferenced, describing 13,739 probable or confirmed human cases in 28 countries. A map service has been established within VectorMap (www.vectormap.org) to explore the role that relative location of vectors, hosts, and the pathogen play in the transmission of mite-borne scrub typhus. The online display of scrub typhus cases in VectorMap illustrates their presence and provides an up-to-date geographic distribution of proven scrub typhus cases. PMID:26678263

  9. A Spatiotemporal Database to Track Human Scrub Typhus Using the VectorMap Application.

    PubMed

    Kelly, Daryl J; Foley, Desmond H; Richards, Allen L

    2015-12-01

    Scrub typhus is a potentially fatal mite-borne febrile illness, primarily of the Asia-Pacific Rim. With an endemic area greater than 13 million km2 and millions of people at risk, scrub typhus remains an underreported, often misdiagnosed febrile illness. A comprehensive, updatable map of the true distribution of cases has been lacking, and therefore the true risk of disease within the very large endemic area remains unknown. The purpose of this study was to establish a database and map to track human scrub typhus. An online search using PubMed and the United States Armed Forces Pest Management Board Literature Retrieval System was performed to identify articles describing human scrub typhus cases both within and outside the traditionally accepted endemic regions. Using World Health Organization guidelines, stringent criteria were used to establish diagnoses for inclusion in the database. The preliminary screening of 181 scrub typhus publications yielded 145 publications that met the case criterion, 267 case records, and 13 serosurvey records that could be georeferenced, describing 13,739 probable or confirmed human cases in 28 countries. A map service has been established within VectorMap (www.vectormap.org) to explore the role that relative location of vectors, hosts, and the pathogen play in the transmission of mite-borne scrub typhus. The online display of scrub typhus cases in VectorMap illustrates their presence and provides an up-to-date geographic distribution of proven scrub typhus cases.

  10. COSPO/CENDI Industry Day Conference

    NASA Technical Reports Server (NTRS)

    1995-01-01

    The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless.

  11. Cross-checking of Large Evaluated and Experimental Nuclear Reaction Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zeydina, O.; Koning, A.J.; Soppera, N.

    2014-06-15

    Automated methods are presented for the verification of large experimental and evaluated nuclear reaction databases (e.g. EXFOR, JEFF, TENDL). These methods allow an assessment of the overall consistency of the data and detect aberrant values in both evaluated and experimental databases.

  12. Using relational databases for improved sequence similarity searching and large-scale genomic analyses.

    PubMed

    Mackey, Aaron J; Pearson, William R

    2004-10-01

    Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

  13. A high-performance spatial database based approach for pathology imaging algorithm evaluation

    PubMed Central

    Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

    2013-01-01

    Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905

  14. Toward unification of taxonomy databases in a distributed computer environment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi

    1994-12-31

    All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less

  15. Fossil-Fuel C02 Emissions Database and Exploration System

    NASA Astrophysics Data System (ADS)

    Krassovski, M.; Boden, T.; Andres, R. J.; Blasing, T. J.

    2012-12-01

    The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL) quantifies the release of carbon from fossil-fuel use and cement production at global, regional, and national spatial scales. The CDIAC emission time series estimates are based largely on annual energy statistics published at the national level by the United Nations (UN). CDIAC has developed a relational database to house collected data and information and a web-based interface to help users worldwide identify, explore and download desired emission data. The available information is divided in two major group: time series and gridded data. The time series data is offered for global, regional and national scales. Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). The gridded data presents annual and monthly estimates. Annual data presents a time series recording 1° latitude by 1° longitude CO2 emissions in units of million metric tons of carbon per year from anthropogenic sources for 1751-2008. The monthly, fossil-fuel CO2 emissions estimates from 1950-2008 provided in this database are derived from time series of global, regional, and national fossil-fuel CO2 emissions (Boden et al. 2011), the references therein, and the methodology described in Andres et al. (2011). The data accessible here take these tabular, national, mass-emissions data and distribute them spatially on a one degree latitude by one degree longitude grid. The within-country spatial distribution is achieved through a fixed population distribution as reported in Andres et al. (1996). This presentation introduces newly build database and web interface, reflects the present state and functionality of the Fossil-Fuel CO2 Emissions Database and Exploration System as well as future plans for expansion.

  16. Surgical research using national databases

    PubMed Central

    Leland, Hyuma; Heckmann, Nathanael

    2016-01-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945

  17. Surgical research using national databases.

    PubMed

    Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael

    2016-10-01

    Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.

  18. Warming-related shifts in the distribution of two competing coastal wrasses.

    PubMed

    Milazzo, Marco; Quattrocchi, Federico; Azzurro, Ernesto; Palmeri, Angelo; Chemello, Renato; Di Franco, Antonio; Guidetti, Paolo; Sala, Enric; Sciandra, Mariangela; Badalamenti, Fabio; García-Charton, José A

    2016-09-01

    Warming induces organisms to adapt or to move to track thermal optima, driving novel interspecific interactions or altering pre-existing ones. We investigated how rising temperatures can affect the distribution of two antagonist Mediterranean wrasses: the 'warm-water' Thalassoma pavo and the 'cool-water' Coris julis. Using field surveys and an extensive database of depth-related patterns of distribution of wrasses across 346 sites, last-decade and projected patterns of distribution for the middle (2040-2059) and the end of century (2080-2099) were analysed by a multivariate model-based framework. Results show that T. pavo dominates shallow waters at warmest locations, where C. julis locates deeper. The northernmost shallow locations are dominated by C. julis where T. pavo abundance is low. Projections suggest that the W-Mediterranean will become more suitable for T. pavo whilst large sectors of the E-Mediterranean will be unsuitable for C. julis, progressively restricting its distribution range. These shifts might result in fish communities' re-arrangement and novel functional responses throughout the food-web. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Numerical Prediction of the Influence of Process Parameters on Large Area Diamond Deposition by DC Arcjet with ARC Roots Rotating and Operating at Gas Recycling Mode

    NASA Astrophysics Data System (ADS)

    Lu, F. X.; Huang, T. B.; Tang, W. Z.; Song, J. H.; Tong, Y. M.

    A computer model have been set up for simulation of the flow and temperature field, and the radial distribution of atomic hydrogen and active carbonaceous species over a large area substrate surface for a new type dc arc plasma torch with rotating arc roots and operating at gas recycling mode A gas recycling radio of 90% was assumed. In numerical calculation of plasma chemistry, the Thermal-Calc program and a powerful thermodynamic database were employed. Numerical calculations to the computer model were performed using boundary conditions close to the experimental setup for large area diamond films deposition. The results showed that the flow and temperature field over substrate surface of Φ60-100mm were smooth and uniform. Calculations were also made with plasma of the same geometry but no arc roots rotation. It was clearly demonstrated that the design of rotating arc roots was advantageous for high quality uniform deposition of large area diamond films. Theoretical predictions on growth rate and film quality as well as their radial uniformity, and the influence of process parameters on large area diamond deposition were discussed in detail based on the spatial distribution of atomic hydrogen and the carbonaceous species in the plasma over the substrate surface obtained from thermodynamic calculations of plasma chemistry, and were compared with experimental observations.

  20. Design of special purpose database for credit cooperation bank business processing network system

    NASA Astrophysics Data System (ADS)

    Yu, Yongling; Zong, Sisheng; Shi, Jinfa

    2011-12-01

    With the popularization of e-finance in the city, the construction of e-finance is transfering to the vast rural market, and quickly to develop in depth. Developing the business processing network system suitable for the rural credit cooperative Banks can make business processing conveniently, and have a good application prospect. In this paper, We analyse the necessity of adopting special purpose distributed database in Credit Cooperation Band System, give corresponding distributed database system structure , design the specical purpose database and interface technology . The application in Tongbai Rural Credit Cooperatives has shown that system has better performance and higher efficiency.

  1. Some Reliability Issues in Very Large Databases.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1988-01-01

    Describes the unique reliability problems of very large databases that necessitate specialized techniques for hardware problem management. The discussion covers the use of controlled partial redundancy to improve reliability, issues in operating systems and database management systems design, and the impact of disk technology on very large…

  2. Use of large healthcare databases for rheumatology clinical research.

    PubMed

    Desai, Rishi J; Solomon, Daniel H

    2017-03-01

    Large healthcare databases, which contain data collected during routinely delivered healthcare to patients, can serve as a valuable resource for generating actionable evidence to assist medical and healthcare policy decision-making. In this review, we summarize use of large healthcare databases in rheumatology clinical research. Large healthcare data are critical to evaluate medication safety and effectiveness in patients with rheumatologic conditions. Three major sources of large healthcare data are: first, electronic medical records, second, health insurance claims, and third, patient registries. Each of these sources offers unique advantages, but also has some inherent limitations. To address some of these limitations and maximize the utility of these data sources for evidence generation, recent efforts have focused on linking different data sources. Innovations such as randomized registry trials, which aim to facilitate design of low-cost randomized controlled trials built on existing infrastructure provided by large healthcare databases, are likely to make clinical research more efficient in coming years. Harnessing the power of information contained in large healthcare databases, while paying close attention to their inherent limitations, is critical to generate a rigorous evidence-base for medical decision-making and ultimately enhancing patient care.

  3. The distribution of soil phosphorus for global biogeochemical modeling

    DOE PAGES

    Yang, Xiaojuan; Post, Wilfred M.; Thornton, Peter E.; ...

    2013-04-16

    We discuss that phosphorus (P) is a major element required for biological activity in terrestrial ecosystems. Although the total P content in most soils can be large, only a small fraction is available or in an organic form for biological utilization because it is bound either in incompletely weathered mineral particles, adsorbed on mineral surfaces, or, over the time of soil formation, made unavailable by secondary mineral formation (occluded). In order to adequately represent phosphorus availability in global biogeochemistry–climate models, a representation of the amount and form of P in soils globally is required. We develop an approach that buildsmore » on existing knowledge of soil P processes and databases of parent material and soil P measurements to provide spatially explicit estimates of different forms of naturally occurring soil P on the global scale. We assembled data on the various forms of phosphorus in soils globally, chronosequence information, and several global spatial databases to develop a map of total soil P and the distribution among mineral bound, labile, organic, occluded, and secondary P forms in soils globally. The amount of P, to 50cm soil depth, in soil labile, organic, occluded, and secondary pools is 3.6 ± 3, 8.6 ± 6, 12.2 ± 8, and 3.2 ± 2 Pg P (Petagrams of P, 1 Pg = 1 × 10 15g) respectively. The amount in soil mineral particles to the same depth is estimated at 13.0 ± 8 Pg P for a global soil total of 40.6 ± 18 Pg P. The large uncertainty in our estimates reflects our limited understanding of the processes controlling soil P transformations during pedogenesis and a deficiency in the number of soil P measurements. In spite of the large uncertainty, the estimated global spatial variation and distribution of different soil P forms presented in this study will be useful for global biogeochemistry models that include P as a limiting element in biological production by providing initial estimates of the available soil P for plant uptake and microbial utilization.« less

  4. Experimental evaluation of dynamic data allocation strategies in a distributed database with changing workloads

    NASA Technical Reports Server (NTRS)

    Brunstrom, Anna; Leutenegger, Scott T.; Simha, Rahul

    1995-01-01

    Traditionally, allocation of data in distributed database management systems has been determined by off-line analysis and optimization. This technique works well for static database access patterns, but is often inadequate for frequently changing workloads. In this paper we address how to dynamically reallocate data for partionable distributed databases with changing access patterns. Rather than complicated and expensive optimization algorithms, a simple heuristic is presented and shown, via an implementation study, to improve system throughput by 30 percent in a local area network based system. Based on artificial wide area network delays, we show that dynamic reallocation can improve system throughput by a factor of two and a half for wide area networks. We also show that individual site load must be taken into consideration when reallocating data, and provide a simple policy that incorporates load in the reallocation decision.

  5. Beyond a Climate-Centric View of Plant Distribution: Edaphic Variables Add Value to Distribution Models

    PubMed Central

    Beauregard, Frieda; de Blois, Sylvie

    2014-01-01

    Both climatic and edaphic conditions determine plant distribution, however many species distribution models do not include edaphic variables especially over large geographical extent. Using an exceptional database of vegetation plots (n = 4839) covering an extent of ∼55000 km2, we tested whether the inclusion of fine scale edaphic variables would improve model predictions of plant distribution compared to models using only climate predictors. We also tested how well these edaphic variables could predict distribution on their own, to evaluate the assumption that at large extents, distribution is governed largely by climate. We also hypothesized that the relative contribution of edaphic and climatic data would vary among species depending on their growth forms and biogeographical attributes within the study area. We modelled 128 native plant species from diverse taxa using four statistical model types and three sets of abiotic predictors: climate, edaphic, and edaphic-climate. Model predictive accuracy and variable importance were compared among these models and for species' characteristics describing growth form, range boundaries within the study area, and prevalence. For many species both the climate-only and edaphic-only models performed well, however the edaphic-climate models generally performed best. The three sets of predictors differed in the spatial information provided about habitat suitability, with climate models able to distinguish range edges, but edaphic models able to better distinguish within-range variation. Model predictive accuracy was generally lower for species without a range boundary within the study area and for common species, but these effects were buffered by including both edaphic and climatic predictors. The relative importance of edaphic and climatic variables varied with growth forms, with trees being more related to climate whereas lower growth forms were more related to edaphic conditions. Our study identifies the potential for non-climate aspects of the environment to pose a constraint to range expansion under climate change. PMID:24658097

  6. Beyond a climate-centric view of plant distribution: edaphic variables add value to distribution models.

    PubMed

    Beauregard, Frieda; de Blois, Sylvie

    2014-01-01

    Both climatic and edaphic conditions determine plant distribution, however many species distribution models do not include edaphic variables especially over large geographical extent. Using an exceptional database of vegetation plots (n = 4839) covering an extent of ∼55,000 km2, we tested whether the inclusion of fine scale edaphic variables would improve model predictions of plant distribution compared to models using only climate predictors. We also tested how well these edaphic variables could predict distribution on their own, to evaluate the assumption that at large extents, distribution is governed largely by climate. We also hypothesized that the relative contribution of edaphic and climatic data would vary among species depending on their growth forms and biogeographical attributes within the study area. We modelled 128 native plant species from diverse taxa using four statistical model types and three sets of abiotic predictors: climate, edaphic, and edaphic-climate. Model predictive accuracy and variable importance were compared among these models and for species' characteristics describing growth form, range boundaries within the study area, and prevalence. For many species both the climate-only and edaphic-only models performed well, however the edaphic-climate models generally performed best. The three sets of predictors differed in the spatial information provided about habitat suitability, with climate models able to distinguish range edges, but edaphic models able to better distinguish within-range variation. Model predictive accuracy was generally lower for species without a range boundary within the study area and for common species, but these effects were buffered by including both edaphic and climatic predictors. The relative importance of edaphic and climatic variables varied with growth forms, with trees being more related to climate whereas lower growth forms were more related to edaphic conditions. Our study identifies the potential for non-climate aspects of the environment to pose a constraint to range expansion under climate change.

  7. An algorithm of discovering signatures from DNA databases on a computer cluster.

    PubMed

    Lee, Hsiao Ping; Sheu, Tzu-Fang

    2014-10-05

    Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.

  8. Meteoroid, and debris special investigation group preliminary results: Size-frequency distribution and spatial density of large impact features on LDEF

    NASA Technical Reports Server (NTRS)

    See, Thomas H.; Hoerz, Friedrich; Zolensky, Michael E.; Allbrooks, Martha K.; Atkinson, Dale R.; Simon, Charles G.

    1992-01-01

    All craters greater than or equal to 500 microns and penetration holes greater than or equal to 300 microns in diameter on the entire Long Duration Exposure Facility (LDEF) were documented. Summarized here are the observations on the LDEF frame, which exposed aluminum 6061-T6 in 26 specific directions relative to LDEF's velocity vector. In addition, the opportunity arose to characterize the penetration holes in the A0178 thermal blankets, which pointed in nine directions. For each of the 26 directions, LDEF provided time-area products that approach those afforded by all previous space-retrieved materials combined. The objective here is to provide a factual database pertaining to the largest collisional events on the entire LDEF spacecraft with a minimum of interpretation. This database may serve to encourage and guide more interpretative efforts and modeling attempts.

  9. Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords.

    PubMed

    Sreenivasan, Sameet

    2013-09-26

    The generation of novelty is central to any creative endeavor. Novelty generation and the relationship between novelty and individual hedonic value have long been subjects of study in social psychology. However, few studies have utilized large-scale datasets to quantitatively investigate these issues. Here we consider the domain of American cinema and explore these questions using a database of films spanning a 70 year period. We use crowdsourced keywords from the Internet Movie Database as a window into the contents of films, and prescribe novelty scores for each film based on occurrence probabilities of individual keywords and keyword-pairs. These scores provide revealing insights into the dynamics of novelty in cinema. We investigate how novelty influences the revenue generated by a film, and find a relationship that resembles the Wundt-Berlyne curve. We also study the statistics of keyword occurrence and the aggregate distribution of keywords over a 100 year period.

  10. PhAST: pharmacophore alignment search tool.

    PubMed

    Hähnke, Volker; Hofmann, Bettina; Grgat, Tomislav; Proschak, Ewgenij; Steinhilber, Dieter; Schneider, Gisbert

    2009-04-15

    We present a ligand-based virtual screening technique (PhAST) for rapid hit and lead structure searching in large compound databases. Molecules are represented as strings encoding the distribution of pharmacophoric features on the molecular graph. In contrast to other text-based methods using SMILES strings, we introduce a new form of text representation that describes the pharmacophore of molecules. This string representation opens the opportunity for revealing functional similarity between molecules by sequence alignment techniques in analogy to homology searching in protein or nucleic acid sequence databases. We favorably compared PhAST with other current ligand-based virtual screening methods in a retrospective analysis using the BEDROC metric. In a prospective application, PhAST identified two novel inhibitors of 5-lipoxygenase product formation with minimal experimental effort. This outcome demonstrates the applicability of PhAST to drug discovery projects and provides an innovative concept of sequence-based compound screening with substantial scaffold hopping potential. 2008 Wiley Periodicals, Inc.

  11. Interpretation guidelines of a standard Y-chromosome STR 17-plex PCR-CE assay for crime casework.

    PubMed

    Roewer, Lutz; Geppert, Maria

    2012-01-01

    Y-STR analysis is an invaluable tool to examine evidence in sexual assault cases and in other forensic casework. Unambiguous detection of the male component in DNA mixtures with a high female background is still the main field of application of forensic Y-STR haplotyping. In the last years, powerful technologies including a 17-locus multiplex PCR assay have been introduced in the forensic laboratories. At the same time, statistical methods have been developed and adapted for interpretation of a nonrecombining, linear marker as the Y-chromosome which shows a strongly clustered geographical distribution due to the linear inheritance and the patrilocality of ancestral groups. Large population databases, namely the Y-STR Haplotype Reference Database (YHRD), have been established to assess the evidentiary value of Y-STR matches by means of frequency estimation methods (counting and extrapolation).

  12. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    PubMed

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  13. Quantitative analysis of the evolution of novelty in cinema through crowdsourced keywords

    PubMed Central

    Sreenivasan, Sameet

    2013-01-01

    The generation of novelty is central to any creative endeavor. Novelty generation and the relationship between novelty and individual hedonic value have long been subjects of study in social psychology. However, few studies have utilized large-scale datasets to quantitatively investigate these issues. Here we consider the domain of American cinema and explore these questions using a database of films spanning a 70 year period. We use crowdsourced keywords from the Internet Movie Database as a window into the contents of films, and prescribe novelty scores for each film based on occurrence probabilities of individual keywords and keyword-pairs. These scores provide revealing insights into the dynamics of novelty in cinema. We investigate how novelty influences the revenue generated by a film, and find a relationship that resembles the Wundt-Berlyne curve. We also study the statistics of keyword occurrence and the aggregate distribution of keywords over a 100 year period. PMID:24067890

  14. XRootD popularity on hadoop clusters

    NASA Astrophysics Data System (ADS)

    Meoni, Marco; Boccali, Tommaso; Magini, Nicolò; Menichetti, Luca; Giordano, Domenico; CMS Collaboration

    2017-10-01

    Performance data and metadata of the computing operations at the CMS experiment are collected through a distributed monitoring infrastructure, currently relying on a traditional Oracle database system. This paper shows how to harness Big Data architectures in order to improve the throughput and the efficiency of such monitoring. A large set of operational data - user activities, job submissions, resources, file transfers, site efficiencies, software releases, network traffic, machine logs - is being injected into a readily available Hadoop cluster, via several data streamers. The collected metadata is further organized running fast arbitrary queries; this offers the ability to test several Map&Reduce-based frameworks and measure the system speed-up when compared to the original database infrastructure. By leveraging a quality Hadoop data store and enabling an analytics framework on top, it is possible to design a mining platform to predict dataset popularity and discover patterns and correlations.

  15. Selectivity by host plants affects the distribution of arbuscular mycorrhizal fungi: evidence from ITS rDNA sequence metadata.

    PubMed

    Yang, Haishui; Zang, Yanyan; Yuan, Yongge; Tang, Jianjun; Chen, Xin

    2012-04-12

    Arbuscular mycorrhizal fungi (AMF) can form obligate symbioses with the vast majority of land plants, and AMF distribution patterns have received increasing attention from researchers. At the local scale, the distribution of AMF is well documented. Studies at large scales, however, are limited because intensive sampling is difficult. Here, we used ITS rDNA sequence metadata obtained from public databases to study the distribution of AMF at continental and global scales. We also used these sequence metadata to investigate whether host plant is the main factor that affects the distribution of AMF at large scales. We defined 305 ITS virtual taxa (ITS-VTs) among all sequences of the Glomeromycota by using a comprehensive maximum likelihood phylogenetic analysis. Each host taxonomic order averaged about 53% specific ITS-VTs, and approximately 60% of the ITS-VTs were host specific. Those ITS-VTs with wide host range showed wide geographic distribution. Most ITS-VTs occurred in only one type of host functional group. The distributions of most ITS-VTs were limited across ecosystem, across continent, across biogeographical realm, and across climatic zone. Non-metric multidimensional scaling analysis (NMDS) showed that AMF community composition differed among functional groups of hosts, and among ecosystem, continent, biogeographical realm, and climatic zone. The Mantel test showed that AMF community composition was significantly correlated with plant community composition among ecosystem, among continent, among biogeographical realm, and among climatic zone. The structural equation modeling (SEM) showed that the effects of ecosystem, continent, biogeographical realm, and climatic zone were mainly indirect on AMF distribution, but plant had strongly direct effects on AMF. The distribution of AMF as indicated by ITS rDNA sequences showed a pattern of high endemism at large scales. This pattern indicates high specificity of AMF for host at different scales (plant taxonomic order and functional group) and high selectivity from host plants for AMF. The effects of ecosystemic, biogeographical, continental and climatic factors on AMF distribution might be mediated by host plants.

  16. The Interannual Stability of Cumulative Frequency Distributions for Convective System Size and Intensity

    NASA Technical Reports Server (NTRS)

    Mohr, Karen I.; Molinari, John; Thorncroft, Chris

    2009-01-01

    The characteristics of convective system populations in West Africa and the western Pacific tropical cyclone basin were analyzed to investigate whether interannual variability in convective activity in tropical continental and oceanic environments is driven by variations in the number of events during the wet season or by favoring large and/or intense convective systems. Convective systems were defined from Tropical Rainfall Measuring Mission (TRMM) data as a cluster of pixels with an 85-GHz polarization-corrected brightness temperature below 255 K and with an area of at least 64 square kilometers. The study database consisted of convective systems in West Africa from May to September 1998-2007, and in the western Pacific from May to November 1998-2007. Annual cumulative frequency distributions for system minimum brightness temperature and system area were constructed for both regions. For both regions, there were no statistically significant differences between the annual curves for system minimum brightness temperature. There were two groups of system area curves, split by the TRMM altitude boost in 2001. Within each set, there was no statistically significant interannual variability. Subsetting the database revealed some sensitivity in distribution shape to the size of the sampling area, the length of the sample period, and the climate zone. From a regional perspective, the stability of the cumulative frequency distributions implied that the probability that a convective system would attain a particular size or intensity does not change interannually. Variability in the number of convective events appeared to be more important in determining whether a year is either wetter or drier than normal.

  17. K-distribution models for gas mixtures in hypersonic nonequilibrium flows

    NASA Astrophysics Data System (ADS)

    Bansal, Ankit

    Calculation of nonequilibrium radiation field in plasmas around a spacecraft entering into an atmosphere at hypersonic velocities is a very complicated and computationally expensive task. The objective of this Dissertation is to collect state-of-the art spectroscopic data for the evaluation of spectral absorption and emission coefficients of atomic and molecular gases, develop efficient and accurate spectral models and databases, and study the effect of radiation on wall heat loads and flowfield around the spacecraft. The most accurate simulation of radiative transport in the shock layer requires calculating the gas properties at a large number of wavelengths and solving the Radiative Transfer Equation (RTE) in a line-by-line (LBL) fashion, which is prohibitively expensive for coupled simulations. A number of k-distribution based spectral models are developed for atomic lines, continuum and molecular bands that allow efficient evaluation of radiative properties and heat loads in hypersonic shock layer plasma. Molecular radiation poses very different challenges than atomic radiation. A molecular spectrum is governed by simultaneous electronic, vibrational and rotational transitions, making the spectrum very strongly dependent on wavelength. In contrast to an atomic spectrum, where line wings play a major role in heat transfer, most of the heat transfer in molecular spectra occurs near line centers. As the first step, k-distribution models are developed separately for atomic and molecular species, taking advantage of the fact that in the Earth's atmosphere the radiative field is dominated by atomic species (N and O) and in Titan's and Mars' atmospheres molecular bands of CN and CO are dominant. There are a number of practical applications where both atomic and molecular species are present, for example, the vacuum-ultra-violet spectrum during Earth's reentry conditions is marked by emission from atomic bound-bound lines and continuum and simultaneous absorption by strong bands of N2. For such cases, a new model is developed for the treatment of gas mixtures containing atomic lines, continuum and molecular bands. Full-spectrum k-distribution (FSK) method provides very accurate results compared to those obtained from the exact line-by-line method. For cases involving more extreme gradients in species concentrations and temperature, full-spectrum k-distribution model is relatively less accurate, and the method is refined by dividing the spectrum into a number of groups or scales, leading to the development of multi-scale models. The detailed methodology of splitting the gas mixture into scales is presented. To utilize the full potential of the k-distribution methods, pre-calculated values of k-distributions are stored in databases, which can later be interpolated at local flow conditions. Accurate and compact part-spectrum k-distribution databases are developed for atomic species and molecular bands. These databases allow users to calculate desired full-spectrum k-distributions through look-up and interpolation. Application of the new spectral models and databases to shock layer plasma radiation is demonstrated by solving the radiative transfer equation along typical one-dimensional flowfields in Earth's, Titan's and Mars' atmospheres. The k-distribution methods are vastly more efficient than the line-by-line method. The efficiency of the method is compared with the line-by-line method by measuring computational times for a number of test problems, showing typical reduction in computational time by a factor of more than 500 for property evaluation and a factor of about 32,000 for the solution of the RTE. A large percentage of radiative energy emitted in the shock-layer is likely to escape the region, resulting in cooling of the shock layer. This may change the flow parameters in the flowfield and, in turn, can affect radiative as well as convective heat loads. A new flow solver is constructed to simulate coupled hypersonic flow-radiation over a reentry vehicle. The flow solver employs a number of existing schemes and tools available in OpenFOAM; along with a number of additional features for high temperature, compressible and chemically reacting flows, and k-distribution models for radiative calculations. The radiative transport is solved with the one-dimensional tangent slab and P1 solvers, and also with the two-dimensional P1 solver. The new solver is applied to simulate flow around an entry vehicle in Martian atmosphere. Results for uncoupled and coupled flow-radiation simulations are presented, highlighting the effects of radiative cooling on flowfield and wall fluxes.

  18. Building the Infrastructure of Resource Sharing: Union Catalogs, Distributed Search, and Cross-Database Linkage.

    ERIC Educational Resources Information Center

    Lynch, Clifford A.

    1997-01-01

    Union catalogs and distributed search systems are two ways users can locate materials in print and electronic formats. This article examines the advantages and limitations of both approaches and argues that they should be considered complementary rather than competitive. Discusses technologies creating linkage between catalogs and databases and…

  19. DISTRIBUTED STRUCTURE-SEARCHABLE TOXICITY (DSSTOX) DATABASE NETWORK: MAKING PUBLIC TOXICITY DATA RESOURCES MORE ACCESSIBLE AND USABLE FOR DATA EXPLORATION AND SAR DEVELOPMENT

    EPA Science Inventory


    Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development

    Many sources of public toxicity data are not currently linked to chemical structure, are not ...

  20. SPANG: a SPARQL client supporting generation and reuse of queries for distributed RDF databases.

    PubMed

    Chiba, Hirokazu; Uchiyama, Ikuo

    2017-02-08

    Toward improved interoperability of distributed biological databases, an increasing number of datasets have been published in the standardized Resource Description Framework (RDF). Although the powerful SPARQL Protocol and RDF Query Language (SPARQL) provides a basis for exploiting RDF databases, writing SPARQL code is burdensome for users including bioinformaticians. Thus, an easy-to-use interface is necessary. We developed SPANG, a SPARQL client that has unique features for querying RDF datasets. SPANG dynamically generates typical SPARQL queries according to specified arguments. It can also call SPARQL template libraries constructed in a local system or published on the Web. Further, it enables combinatorial execution of multiple queries, each with a distinct target database. These features facilitate easy and effective access to RDF datasets and integrative analysis of distributed data. SPANG helps users to exploit RDF datasets by generation and reuse of SPARQL queries through a simple interface. This client will enhance integrative exploitation of biological RDF datasets distributed across the Web. This software package is freely available at http://purl.org/net/spang .

  1. Web Proxy Auto Discovery for the WLCG

    NASA Astrophysics Data System (ADS)

    Dykstra, D.; Blomer, J.; Blumenfeld, B.; De Salvo, A.; Dewhurst, A.; Verguilov, V.

    2017-10-01

    All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily support that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which they direct to the nearest publicly accessible web proxy servers. The responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.

  2. Web Proxy Auto Discovery for the WLCG

    DOE PAGES

    Dykstra, D.; Blomer, J.; Blumenfeld, B.; ...

    2017-11-23

    All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily supportmore » that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which it directs to the nearest publicly accessible web proxy servers. Furthermore, the responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.« less

  3. Web Proxy Auto Discovery for the WLCG

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, D.; Blomer, J.; Blumenfeld, B.

    All four of the LHC experiments depend on web proxies (that is, squids) at each grid site to support software distribution by the CernVM FileSystem (CVMFS). CMS and ATLAS also use web proxies for conditions data distributed through the Frontier Distributed Database caching system. ATLAS & CMS each have their own methods for their grid jobs to find out which web proxies to use for Frontier at each site, and CVMFS has a third method. Those diverse methods limit usability and flexibility, particularly for opportunistic use cases, where an experiment’s jobs are run at sites that do not primarily supportmore » that experiment. This paper describes a new Worldwide LHC Computing Grid (WLCG) system for discovering the addresses of web proxies. The system is based on an internet standard called Web Proxy Auto Discovery (WPAD). WPAD is in turn based on another standard called Proxy Auto Configuration (PAC). Both the Frontier and CVMFS clients support this standard. The input into the WLCG system comes from squids registered in the ATLAS Grid Information System (AGIS) and CMS SITECONF files, cross-checked with squids registered by sites in the Grid Configuration Database (GOCDB) and the OSG Information Management (OIM) system, and combined with some exceptions manually configured by people from ATLAS and CMS who operate WLCG Squid monitoring. WPAD servers at CERN respond to http requests from grid nodes all over the world with a PAC file that lists available web proxies, based on IP addresses matched from a database that contains the IP address ranges registered to organizations. Large grid sites are encouraged to supply their own WPAD web servers for more flexibility, to avoid being affected by short term long distance network outages, and to offload the WLCG WPAD servers at CERN. The CERN WPAD servers additionally support requests from jobs running at non-grid sites (particularly for LHC@Home) which it directs to the nearest publicly accessible web proxy servers. Furthermore, the responses to those requests are geographically ordered based on a separate database that maps IP addresses to longitude and latitude.« less

  4. When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values

    PubMed Central

    Baldi, Pierre

    2010-01-01

    As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in large databases. The framework can be applied to different chemical representations and similarity measures but is demonstrated here using the most common binary fingerprints with the Tanimoto similarity measure. After introducing several probabilistic models of fingerprints, including the Conditional Gaussian Uniform model, we show that the distribution of Tanimoto scores can be approximated by the distribution of the ratio of two correlated Normal random variables associated with the corresponding unions and intersections. This remains true also when the distribution of similarity scores is conditioned on the size of the query molecules in order to derive more fine-grained results and improve chemical retrieval. The corresponding extreme value distributions for the maximum scores are approximated by Weibull distributions. From these various distributions and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework allows one to predict also the value of standard chemical retrieval metrics, such as Sensitivity and Specificity at fixed thresholds, or ROC (Receiver Operating Characteristic) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments carried in part with large sets of molecules from the ChemDB show remarkable agreement between theory and empirical results. PMID:20540577

  5. Distributed and parallel approach for handle and perform huge datasets

    NASA Astrophysics Data System (ADS)

    Konopko, Joanna

    2015-12-01

    Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.

  6. Does filler database size influence identification accuracy?

    PubMed

    Bergold, Amanda N; Heaton, Paul

    2018-06-01

    Police departments increasingly use large photo databases to select lineup fillers using facial recognition software, but this technological shift's implications have been largely unexplored in eyewitness research. Database use, particularly if coupled with facial matching software, could enable lineup constructors to increase filler-suspect similarity and thus enhance eyewitness accuracy (Fitzgerald, Oriet, Price, & Charman, 2013). However, with a large pool of potential fillers, such technologies might theoretically produce lineup fillers too similar to the suspect (Fitzgerald, Oriet, & Price, 2015; Luus & Wells, 1991; Wells, Rydell, & Seelau, 1993). This research proposes a new factor-filler database size-as a lineup feature affecting eyewitness accuracy. In a facial recognition experiment, we select lineup fillers in a legally realistic manner using facial matching software applied to filler databases of 5,000, 25,000, and 125,000 photos, and find that larger databases are associated with a higher objective similarity rating between suspects and fillers and lower overall identification accuracy. In target present lineups, witnesses viewing lineups created from the larger databases were less likely to make correct identifications and more likely to select known innocent fillers. When the target was absent, database size was associated with a lower rate of correct rejections and a higher rate of filler identifications. Higher algorithmic similarity ratings were also associated with decreases in eyewitness identification accuracy. The results suggest that using facial matching software to select fillers from large photograph databases may reduce identification accuracy, and provides support for filler database size as a meaningful system variable. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  7. Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data.

    PubMed

    Su, Xiaoquan; Xu, Jian; Ning, Kang

    2012-10-01

    It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. ningkang@qibebt.ac.cn Supplementary data are available at Bioinformatics online.

  8. Creation of a Genome-Wide Metabolic Pathway Database for Populus trichocarpa Using a New Approach for Reconstruction and Curation of Metabolic Pathways for Plants1[W][OA

    PubMed Central

    Zhang, Peifen; Dreher, Kate; Karthikeyan, A.; Chi, Anjo; Pujar, Anuradha; Caspi, Ron; Karp, Peter; Kirkup, Vanessa; Latendresse, Mario; Lee, Cynthia; Mueller, Lukas A.; Muller, Robert; Rhee, Seung Yon

    2010-01-01

    Metabolic networks reconstructed from sequenced genomes or transcriptomes can help visualize and analyze large-scale experimental data, predict metabolic phenotypes, discover enzymes, engineer metabolic pathways, and study metabolic pathway evolution. We developed a general approach for reconstructing metabolic pathway complements of plant genomes. Two new reference databases were created and added to the core of the infrastructure: a comprehensive, all-plant reference pathway database, PlantCyc, and a reference enzyme sequence database, RESD, for annotating metabolic functions of protein sequences. PlantCyc (version 3.0) includes 714 metabolic pathways and 2,619 reactions from over 300 species. RESD (version 1.0) contains 14,187 literature-supported enzyme sequences from across all kingdoms. We used RESD, PlantCyc, and MetaCyc (an all-species reference metabolic pathway database), in conjunction with the pathway prediction software Pathway Tools, to reconstruct a metabolic pathway database, PoplarCyc, from the recently sequenced genome of Populus trichocarpa. PoplarCyc (version 1.0) contains 321 pathways with 1,807 assigned enzymes. Comparing PoplarCyc (version 1.0) with AraCyc (version 6.0, Arabidopsis [Arabidopsis thaliana]) showed comparable numbers of pathways distributed across all domains of metabolism in both databases, except for a higher number of AraCyc pathways in secondary metabolism and a 1.5-fold increase in carbohydrate metabolic enzymes in PoplarCyc. Here, we introduce these new resources and demonstrate the feasibility of using them to identify candidate enzymes for specific pathways and to analyze metabolite profiling data through concrete examples. These resources can be searched by text or BLAST, browsed, and downloaded from our project Web site (http://plantcyc.org). PMID:20522724

  9. Marine Biodiversity in the Australian Region

    PubMed Central

    Butler, Alan J.; Rees, Tony; Beesley, Pam; Bax, Nicholas J.

    2010-01-01

    The entire Australian marine jurisdictional area, including offshore and sub-Antarctic islands, is considered in this paper. Most records, however, come from the Exclusive Economic Zone (EEZ) around the continent of Australia itself. The counts of species have been obtained from four primary databases (the Australian Faunal Directory, Codes for Australian Aquatic Biota, Online Zoological Collections of Australian Museums, and the Australian node of the Ocean Biogeographic Information System), but even these are an underestimate of described species. In addition, some partially completed databases for particular taxonomic groups, and specialized databases (for introduced and threatened species) have been used. Experts also provided estimates of the number of known species not yet in the major databases. For only some groups could we obtain an (expert opinion) estimate of undiscovered species. The databases provide patchy information about endemism, levels of threat, and introductions. We conclude that there are about 33,000 marine species (mainly animals) in the major databases, of which 130 are introduced, 58 listed as threatened and an unknown percentage endemic. An estimated 17,000 more named species are either known from the Australian EEZ but not in the present databases, or potentially occur there. It is crudely estimated that there may be as many as 250,000 species (known and yet to be discovered) in the Australian EEZ. For 17 higher taxa, there is sufficient detail for subdivision by Large Marine Domains, for comparison with other National and Regional Implementation Committees of the Census of Marine Life. Taxonomic expertise in Australia is unevenly distributed across taxa, and declining. Comments are given briefly on biodiversity management measures in Australia, including but not limited to marine protected areas. PMID:20689847

  10. Radiocarbon Dating the Anthropocene

    NASA Astrophysics Data System (ADS)

    Chaput, M. A.; Gajewski, K. J.

    2015-12-01

    The Anthropocene has no agreed start date since current suggestions for its beginning range from Pre-Industrial times to the Industrial Revolution, and from the mid-twentieth century to the future. To set the boundary of the Anthropocene in geological time, we must first understand when, how and to what extent humans began altering the Earth system. One aspect of this involves reconstructing the effects of prehistoric human activity on the physical landscape. However, for global reconstructions of land use and land cover change to be more accurately interpreted in the context of human interaction with the landscape, large-scale spatio-temporal demographic changes in prehistoric populations must be known. Estimates of the relative number of prehistoric humans in different regions of the world and at different moments in time are needed. To this end, we analyze a dataset of radiocarbon dates from the Canadian Archaeological Radiocarbon Database (CARD), the Palaeolithic Database of Europe and the AustArch Database of Australia, as well as published dates from South America. This is the first time such a large quantity of dates (approximately 60,000) has been mapped and studied at a global scale. Initial results from the analysis of temporal frequency distributions of calibrated radiocarbon dates, assumed to be proportional to population density, will be discussed. The utility of radiocarbon dates in studies of the Anthropocene will be evaluated and potential links between population density and changes in atmospheric greenhouse gas concentrations, climate, migration patterning and fire frequency coincidence will be considered.

  11. Uncovering Capgras delusion using a large-scale medical records database

    PubMed Central

    Marshall, Caryl; Kanji, Zara; Wilkinson, Sam; Halligan, Peter; Deeley, Quinton

    2017-01-01

    Background Capgras delusion is scientifically important but most commonly reported as single case studies. Studies analysing large clinical records databases focus on common disorders but none have investigated rare syndromes. Aims Identify cases of Capgras delusion and associated psychopathology, demographics, cognitive function and neuropathology in light of existing models. Method Combined computational data extraction and qualitative classification using 250 000 case records from South London and Maudsley Clinical Record Interactive Search (CRIS) database. Results We identified 84 individuals and extracted diagnosis-matched comparison groups. Capgras was not ‘monothematic’ in the majority of cases. Most cases involved misidentified family members or close partners but others were misidentified in 25% of cases, contrary to dual-route face recognition models. Neuroimaging provided no evidence for predominantly right hemisphere damage. Individuals were ethnically diverse with a range of psychosis spectrum diagnoses. Conclusions Capgras is more diverse than current models assume. Identification of rare syndromes complements existing ‘big data’ approaches in psychiatry. Declaration of interests V.B. is supported by a Wellcome Trust Seed Award in Science (200589/Z/16/Z) and the UCLH NIHR Biomedical Research Centre. S.W. is supported by a Wellcome Trust Strategic Award (WT098455MA). Q.D. has received a grant from King’s Health Partners. Copyright and usage © The Royal College of Psychiatrists 2017. This is an open access article distributed under the terms of the Creative Commons Non-Commercial, No Derivatives (CC BY-NC-ND) license. PMID:28794897

  12. Collection Fusion Using Bayesian Estimation of a Linear Regression Model in Image Databases on the Web.

    ERIC Educational Resources Information Center

    Kim, Deok-Hwan; Chung, Chin-Wan

    2003-01-01

    Discusses the collection fusion problem of image databases, concerned with retrieving relevant images by content based retrieval from image databases distributed on the Web. Focuses on a metaserver which selects image databases supporting similarity measures and proposes a new algorithm which exploits a probabilistic technique using Bayesian…

  13. Development of an exposure measurement database on five lung carcinogens (ExpoSYN) for quantitative retrospective occupational exposure assessment.

    PubMed

    Peters, Susan; Vermeulen, Roel; Olsson, Ann; Van Gelder, Rainer; Kendzia, Benjamin; Vincent, Raymond; Savary, Barbara; Williams, Nick; Woldbæk, Torill; Lavoué, Jérôme; Cavallo, Domenico; Cattaneo, Andrea; Mirabelli, Dario; Plato, Nils; Dahmann, Dirk; Fevotte, Joelle; Pesch, Beate; Brüning, Thomas; Straif, Kurt; Kromhout, Hans

    2012-01-01

    SYNERGY is a large pooled analysis of case-control studies on the joint effects of occupational carcinogens and smoking in the development of lung cancer. A quantitative job-exposure matrix (JEM) will be developed to assign exposures to five major lung carcinogens [asbestos, chromium, nickel, polycyclic aromatic hydrocarbons (PAH), and respirable crystalline silica (RCS)]. We assembled an exposure database, called ExpoSYN, to enable such a quantitative exposure assessment. Existing exposure databases were identified and European and Canadian research institutes were approached to identify pertinent exposure measurement data. Results of individual air measurements were entered anonymized according to a standardized protocol. The ExpoSYN database currently includes 356 551 measurements from 19 countries. In total, 140 666 personal and 215 885 stationary data points were available. Measurements were distributed over the five agents as follows: RCS (42%), asbestos (20%), chromium (16%), nickel (15%), and PAH (7%). The measurement data cover the time period from 1951 to present. However, only a small portion of measurements (1.4%) were performed prior to 1975. The major contributing countries for personal measurements were Germany (32%), UK (22%), France (14%), and Norway and Canada (both 11%). ExpoSYN is a unique occupational exposure database with measurements from 18 European countries and Canada covering a time period of >50 years. This database will be used to develop a country-, job-, and time period-specific quantitative JEM. This JEM will enable data-driven quantitative exposure assessment in a multinational pooled analysis of community-based lung cancer case-control studies.

  14. Width of surface rupture zone for thrust earthquakes: implications for earthquake fault zoning

    NASA Astrophysics Data System (ADS)

    Boncio, Paolo; Liberi, Francesca; Caldarella, Martina; Nurminen, Fiia-Charlotta

    2018-01-01

    The criteria for zoning the surface fault rupture hazard (SFRH) along thrust faults are defined by analysing the characteristics of the areas of coseismic surface faulting in thrust earthquakes. Normal and strike-slip faults have been deeply studied by other authors concerning the SFRH, while thrust faults have not been studied with comparable attention. Surface faulting data were compiled for 11 well-studied historic thrust earthquakes occurred globally (5.4 ≤ M ≤ 7.9). Several different types of coseismic fault scarps characterize the analysed earthquakes, depending on the topography, fault geometry and near-surface materials (simple and hanging wall collapse scarps, pressure ridges, fold scarps and thrust or pressure ridges with bending-moment or flexural-slip fault ruptures due to large-scale folding). For all the earthquakes, the distance of distributed ruptures from the principal fault rupture (r) and the width of the rupture zone (WRZ) were compiled directly from the literature or measured systematically in GIS-georeferenced published maps. Overall, surface ruptures can occur up to large distances from the main fault ( ˜ 2150 m on the footwall and ˜ 3100 m on the hanging wall). Most of the ruptures occur on the hanging wall, preferentially in the vicinity of the principal fault trace ( > ˜ 50 % at distances < ˜ 250 m). The widest WRZ are recorded where sympathetic slip (Sy) on distant faults occurs, and/or where bending-moment (B-M) or flexural-slip (F-S) fault ruptures, associated with large-scale folds (hundreds of metres to kilometres in wavelength), are present. A positive relation between the earthquake magnitude and the total WRZ is evident, while a clear correlation between the vertical displacement on the principal fault and the total WRZ is not found. The distribution of surface ruptures is fitted with probability density functions, in order to define a criterion to remove outliers (e.g. 90 % probability of the cumulative distribution function) and define the zone where the likelihood of having surface ruptures is the highest. This might help in sizing the zones of SFRH during seismic microzonation (SM) mapping. In order to shape zones of SFRH, a very detailed earthquake geologic study of the fault is necessary (the highest level of SM, i.e. Level 3 SM according to Italian guidelines). In the absence of such a very detailed study (basic SM, i.e. Level 1 SM of Italian guidelines) a width of ˜ 840 m (90 % probability from "simple thrust" database of distributed ruptures, excluding B-M, F-S and Sy fault ruptures) is suggested to be sufficiently precautionary. For more detailed SM, where the fault is carefully mapped, one must consider that the highest SFRH is concentrated in a narrow zone, ˜ 60 m in width, that should be considered as a fault avoidance zone (more than one-third of the distributed ruptures are expected to occur within this zone). The fault rupture hazard zones should be asymmetric compared to the trace of the principal fault. The average footwall to hanging wall ratio (FW : HW) is close to 1 : 2 in all analysed cases. These criteria are applicable to "simple thrust" faults, without considering possible B-M or F-S fault ruptures due to large-scale folding, and without considering sympathetic slip on distant faults. Areas potentially susceptible to B-M or F-S fault ruptures should have their own zones of fault rupture hazard that can be defined by detailed knowledge of the structural setting of the area (shape, wavelength, tightness and lithology of the thrust-related large-scale folds) and by geomorphic evidence of past secondary faulting. Distant active faults, potentially susceptible to sympathetic triggering, should be zoned as separate principal faults. The entire database of distributed ruptures (including B-M, F-S and Sy fault ruptures) can be useful in poorly known areas, in order to assess the extent of the area within which potential sources of fault displacement hazard can be present. The results from this study and the database made available in the Supplement can be used for improving the attenuation relationships for distributed faulting, with possible applications in probabilistic studies of fault displacement hazard.

  15. An evaluation of multi-probe locality sensitive hashing for computing similarities over web-scale query logs

    PubMed Central

    2018-01-01

    Many modern applications of AI such as web search, mobile browsing, image processing, and natural language processing rely on finding similar items from a large database of complex objects. Due to the very large scale of data involved (e.g., users’ queries from commercial search engines), computing such near or nearest neighbors is a non-trivial task, as the computational cost grows significantly with the number of items. To address this challenge, we adopt Locality Sensitive Hashing (a.k.a, LSH) methods and evaluate four variants in a distributed computing environment (specifically, Hadoop). We identify several optimizations which improve performance, suitable for deployment in very large scale settings. The experimental results demonstrate our variants of LSH achieve the robust performance with better recall compared with “vanilla” LSH, even when using the same amount of space. PMID:29346410

  16. Using SQL Databases for Sequence Similarity Searching and Analysis.

    PubMed

    Pearson, William R; Mackey, Aaron J

    2017-09-13

    Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  17. National Map Data Base On Landslide Prerequisites In Clay and Silt Areas - Development of Prototype

    NASA Astrophysics Data System (ADS)

    Viberg, Leif

    Swedish geotechnical institute, SGI, has in co-operation with Swedish geologic survey, Lantmateriet (land surveying) and Swedish Rescue Service developed a theme database on landslide prerequisites in clay and silt areas. The work is carried out on commission of the Swedish government. A report with suggestions for production of the database has been delivered to the government. The database is a prototype, which has been tested in an area in northern Sweden. Recommended presentation map scale is about 1:50 000. Distribution of the database via Internet is discussed. The aim of the database is to use it as a modern planning tool in combination with other databases, e g databases on flooding prognoses. The main use is supposed to be in early planning stages, e g for new building and infrastructure development and for risk analyses. The database can also be used in more acute cases, e g for risk analyses and rescue operations in connection with flooding over large areas. Users are supposed to be municipal and county planners and rescue services, infrastructure planners, consultants and assurance companies. The database is constructed by combination of two existing databases: Elevation data and soil map data. The investigation area is divided into three zones with different stability criteria: 1. Clay and silt in sloping ground or adjoining water. 2. Clay and silt in flat ground. 3. Rock and other soils than clay and silt. The geometrical and soil criteria for the zones are specified in an algoritm, that will do the job to sort out the different zones. The algoritm is thereby using data from the elevation and soil databases. The investigation area is divided into cells (raster format) with 5 x 5 m side length. Different algoritms had to be developed before reasonable calculation time was reached. The theme may be presented on screen or as a map plot. A prototype map has been produced for the test area. A description is accompanying the map. The database is suggested to be produced in landslide prone areas in Sweden and approximately 200-300 map sheets (25 x 25 km) are required.

  18. The Make 2D-DB II package: conversion of federated two-dimensional gel electrophoresis databases into a relational format and interconnection of distributed databases.

    PubMed

    Mostaguir, Khaled; Hoogland, Christine; Binz, Pierre-Alain; Appel, Ron D

    2003-08-01

    The Make 2D-DB tool has been previously developed to help build federated two-dimensional gel electrophoresis (2-DE) databases on one's own web site. The purpose of our work is to extend the strength of the first package and to build a more efficient environment. Such an environment should be able to fulfill the different needs and requirements arising from both the growing use of 2-DE techniques and the increasing amount of distributed experimental data.

  19. Establishment of an international database for genetic variants in esophageal cancer.

    PubMed

    Vihinen, Mauno

    2016-10-01

    The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage. © 2016 New York Academy of Sciences.

  20. ICA model order selection of task co-activation networks.

    PubMed

    Ray, Kimberly L; McKay, D Reese; Fox, Peter M; Riedel, Michael C; Uecker, Angela M; Beckmann, Christian F; Smith, Stephen M; Fox, Peter T; Laird, Angela R

    2013-01-01

    Independent component analysis (ICA) has become a widely used method for extracting functional networks in the brain during rest and task. Historically, preferred ICA dimensionality has widely varied within the neuroimaging community, but typically varies between 20 and 100 components. This can be problematic when comparing results across multiple studies because of the impact ICA dimensionality has on the topology of its resultant components. Recent studies have demonstrated that ICA can be applied to peak activation coordinates archived in a large neuroimaging database (i.e., BrainMap Database) to yield whole-brain task-based co-activation networks. A strength of applying ICA to BrainMap data is that the vast amount of metadata in BrainMap can be used to quantitatively assess tasks and cognitive processes contributing to each component. In this study, we investigated the effect of model order on the distribution of functional properties across networks as a method for identifying the most informative decompositions of BrainMap-based ICA components. Our findings suggest dimensionality of 20 for low model order ICA to examine large-scale brain networks, and dimensionality of 70 to provide insight into how large-scale networks fractionate into sub-networks. We also provide a functional and organizational assessment of visual, motor, emotion, and interoceptive task co-activation networks as they fractionate from low to high model-orders.

  1. ICA model order selection of task co-activation networks

    PubMed Central

    Ray, Kimberly L.; McKay, D. Reese; Fox, Peter M.; Riedel, Michael C.; Uecker, Angela M.; Beckmann, Christian F.; Smith, Stephen M.; Fox, Peter T.; Laird, Angela R.

    2013-01-01

    Independent component analysis (ICA) has become a widely used method for extracting functional networks in the brain during rest and task. Historically, preferred ICA dimensionality has widely varied within the neuroimaging community, but typically varies between 20 and 100 components. This can be problematic when comparing results across multiple studies because of the impact ICA dimensionality has on the topology of its resultant components. Recent studies have demonstrated that ICA can be applied to peak activation coordinates archived in a large neuroimaging database (i.e., BrainMap Database) to yield whole-brain task-based co-activation networks. A strength of applying ICA to BrainMap data is that the vast amount of metadata in BrainMap can be used to quantitatively assess tasks and cognitive processes contributing to each component. In this study, we investigated the effect of model order on the distribution of functional properties across networks as a method for identifying the most informative decompositions of BrainMap-based ICA components. Our findings suggest dimensionality of 20 for low model order ICA to examine large-scale brain networks, and dimensionality of 70 to provide insight into how large-scale networks fractionate into sub-networks. We also provide a functional and organizational assessment of visual, motor, emotion, and interoceptive task co-activation networks as they fractionate from low to high model-orders. PMID:24339802

  2. The VIMOS Ultra Deep Survey first data release: Spectra and spectroscopic redshifts of 698 objects up to zspec 6 in CANDELS

    NASA Astrophysics Data System (ADS)

    Tasca, L. A. M.; Le Fèvre, O.; Ribeiro, B.; Thomas, R.; Moreau, C.; Cassata, P.; Garilli, B.; Le Brun, V.; Lemaux, B. C.; Maccagni, D.; Pentericci, L.; Schaerer, D.; Vanzella, E.; Zamorani, G.; Zucca, E.; Amorin, R.; Bardelli, S.; Cassarà, L. P.; Castellano, M.; Cimatti, A.; Cucciati, O.; Durkalec, A.; Fontana, A.; Giavalisco, M.; Grazian, A.; Hathi, N. P.; Ilbert, O.; Paltani, S.; Pforr, J.; Scodeggio, M.; Sommariva, V.; Talia, M.; Tresse, L.; Vergani, D.; Capak, P.; Charlot, S.; Contini, T.; de la Torre, S.; Dunlop, J.; Fotopoulou, S.; Guaita, L.; Koekemoer, A.; López-Sanjuan, C.; Mellier, Y.; Salvato, M.; Scoville, N.; Taniguchi, Y.; Wang, P. W.

    2017-04-01

    This paper describes the first data release (DR1) of the VIMOS Ultra Deep Survey (VUDS). The VUDS-DR1 is the release of all low-resolution spectroscopic data obtained in 276.9 arcmin2 of the CANDELS-COSMOS and CANDELS-ECDFS survey areas, including accurate spectroscopic redshifts zspec and individual spectra obtained with VIMOS on the ESO-VLT. A total of 698 objects have a measured redshift, with 677 galaxies, two type-I AGN, and a small number of 19 contaminating stars. The targets of the spectroscopic survey are selected primarily on the basis of their photometric redshifts to ensure a broad population coverage. About 500 galaxies have zspec > 2, 48of which have zspec > 4; the highest reliable redshifts reach beyond zspec = 6. This data set approximately doubles the number of galaxies with spectroscopic redshifts at z > 3 in these fields. We discuss the general properties of the VUDS-DR1 sample in terms of the spectroscopic redshift distribution, the distribution of Lyman-α equivalent widths, and physical properties including stellar masses M⋆ and star formation rates derived from spectral energy distribution fitting with the knowledge of zspec. We highlight the properties of the most massive star-forming galaxies, noting the wide range in spectral properties, with Lyman-α in emission or in absorption, and in imaging properties with compact, multi-component, or pair morphologies. We present the catalogue database and data products. All VUDS-DR1 data are publicly available and can be retrieved from a dedicated query-based database. Future VUDS data releases will follow this VUDS-DR1 to give access to the spectra and associated measurement of 8000 objects in the full 1 square degree of the VUDS survey. Based on data obtained with the European Southern Observatory Very Large Telescope, Paranal, Chile, under Large Program 185.A-0791. http://cesam.lam.fr/vuds

  3. The use of Benford's law for evaluation of quality of occupational hygiene data.

    PubMed

    De Vocht, Frank; Kromhout, Hans

    2013-04-01

    Benford's law is the contra-intuitive empirical observation that the digits 1-9 are not equally likely to appear as the initial digit in numbers resulting from the same phenomenon. Manipulated, unrelated, or created numbers usually do not follow Benford's law, and as such this law has been used in the investigation of fraudulent data in, for example, accounting and to identify errors in data sets due to, for example, data transfer. We describe the use of Benford's law to screen occupational hygiene measurement data sets using exposure data from the European rubber manufacturing industry as an illustration. Two rubber process dust measurement data sets added to the European Union ExAsRub project but initially collected by the UK Health and Safety Executive (HSE) and British Rubber Manufacturers' Association (BRMA) and one pre- and one post-treatment n-nitrosamines data set collated in the German MEGA database and also added to the ExAsRub database were compared with the expected first-digit (1BL) and second-digit (2BL) Benford distributions. Evaluation indicated only small deviations from the expected 1BL and 2BL distributions for the data sets collated by the UK HSE and industry (BRMA), respectively, while for the MEGA data larger deviations were observed. To a large extent the latter could be attributed to imputation and replacement by a constant of n-nitrosamine measurements below the limit of detection, but further evaluation of these data to determine why other deviations from 1BL and 2BL expected distributions exist may be beneficial. Benford's law is a straightforward and easy-to-implement analytical tool to evaluate the quality of occupational hygiene data sets, and as such can be used to detect potential problems in large data sets that may be caused by malcontent a priori or a posteriori manipulation of data sets and by issues like treatment of observations below the limit of detection, rounding and transfer of data.

  4. Design and deployment of a large brain-image database for clinical and nonclinical research

    NASA Astrophysics Data System (ADS)

    Yang, Guo Liang; Lim, Choie Cheio Tchoyoson; Banukumar, Narayanaswami; Aziz, Aamer; Hui, Francis; Nowinski, Wieslaw L.

    2004-04-01

    An efficient database is an essential component of organizing diverse information on image metadata and patient information for research in medical imaging. This paper describes the design, development and deployment of a large database system serving as a brain image repository that can be used across different platforms in various medical researches. It forms the infrastructure that links hospitals and institutions together and shares data among them. The database contains patient-, pathology-, image-, research- and management-specific data. The functionalities of the database system include image uploading, storage, indexing, downloading and sharing as well as database querying and management with security and data anonymization concerns well taken care of. The structure of database is multi-tier client-server architecture with Relational Database Management System, Security Layer, Application Layer and User Interface. Image source adapter has been developed to handle most of the popular image formats. The database has a user interface based on web browsers and is easy to handle. We have used Java programming language for its platform independency and vast function libraries. The brain image database can sort data according to clinically relevant information. This can be effectively used in research from the clinicians" points of view. The database is suitable for validation of algorithms on large population of cases. Medical images for processing could be identified and organized based on information in image metadata. Clinical research in various pathologies can thus be performed with greater efficiency and large image repositories can be managed more effectively. The prototype of the system has been installed in a few hospitals and is working to the satisfaction of the clinicians.

  5. A design for the geoinformatics system

    NASA Astrophysics Data System (ADS)

    Allison, M. L.

    2002-12-01

    Informatics integrates and applies information technologies with scientific and technical disciplines. A geoinformatics system targets the spatially based sciences. The system is not a master database, but will collect pertinent information from disparate databases distributed around the world. Seamless interoperability of databases promises quantum leaps in productivity not only for scientific researchers but also for many areas of society including business and government. The system will incorporate: acquisition of analog and digital legacy data; efficient information and data retrieval mechanisms (via data mining and web services); accessibility to and application of visualization, analysis, and modeling capabilities; online workspace, software, and tutorials; GIS; integration with online scientific journal aggregates and digital libraries; access to real time data collection and dissemination; user-defined automatic notification and quality control filtering for selection of new resources; and application to field techniques such as mapping. In practical terms, such a system will provide the ability to gather data over the Web from a variety of distributed sources, regardless of computer operating systems, database formats, and servers. Search engines will gather data about any geographic location, above, on, or below ground, covering any geologic time, and at any scale or detail. A distributed network of digital geolibraries can archive permanent copies of databases at risk of being discontinued and those that continue to be maintained by the data authors. The geoinformatics system will generate results from widely distributed sources to function as a dynamic data network. Instead of posting a variety of pre-made tables, charts, or maps based on static databases, the interactive dynamic system creates these products on the fly, each time an inquiry is made, using the latest information in the appropriate databases. Thus, in the dynamic system, a map generated today may differ from one created yesterday and one to be created tomorrow, because the databases used to make it are constantly (and sometimes automatically) being updated.

  6. Peer-to-peer architecture for multi-departmental distributed PACS

    NASA Astrophysics Data System (ADS)

    Rosset, Antoine; Heuberger, Joris; Pysher, Lance; Ratib, Osman

    2006-03-01

    We have elected to explore peer-to-peer technology as an alternative to centralized PACS architecture for the increasing requirements for wide access to images inside and outside a radiology department. The goal being to allow users across the enterprise to access any study anytime without the need for prefetching or routing of images from central archive. Images can be accessed between different workstations and local storage nodes. We implemented "bonjour" a new remote file access technology developed by Apple allowing applications to share data and files remotely with optimized data access and data transfer. Our Open-source image display platform called OsiriX was adapted to allow sharing of local DICOM images through direct access of each local SQL database to be accessible from any other OsiriX workstation over the network. A server version of Osirix Core Data database also allows to access distributed archives servers in the same way. The infrastructure implemented allows fast and efficient access to any image anywhere anytime independently from the actual physical location of the data. It also allows benefiting from the performance of distributed low-cost and high capacity storage servers that can provide efficient caching of PACS data that was found to be 10 to 20 x faster that accessing the same date from the central PACS archive. It is particularly suitable for large hospitals and academic environments where clinical conferences, interdisciplinary discussions and successive sessions of image processing are often part of complex workflow or patient management and decision making.

  7. Monitoring performance of a highly distributed and complex computing infrastructure in LHCb

    NASA Astrophysics Data System (ADS)

    Mathe, Z.; Haen, C.; Stagni, F.

    2017-10-01

    In order to ensure an optimal performance of the LHCb Distributed Computing, based on LHCbDIRAC, it is necessary to be able to inspect the behavior over time of many components: firstly the agents and services on which the infrastructure is built, but also all the computing tasks and data transfers that are managed by this infrastructure. This consists of recording and then analyzing time series of a large number of observables, for which the usage of SQL relational databases is far from optimal. Therefore within DIRAC we have been studying novel possibilities based on NoSQL databases (ElasticSearch, OpenTSDB and InfluxDB) as a result of this study we developed a new monitoring system based on ElasticSearch. It has been deployed on the LHCb Distributed Computing infrastructure for which it collects data from all the components (agents, services, jobs) and allows creating reports through Kibana and a web user interface, which is based on the DIRAC web framework. In this paper we describe this new implementation of the DIRAC monitoring system. We give details on the ElasticSearch implementation within the DIRAC general framework, as well as an overview of the advantages of the pipeline aggregation used for creating a dynamic bucketing of the time series. We present the advantages of using the ElasticSearch DSL high-level library for creating and running queries. Finally we shall present the performances of that system.

  8. Cyclic subway networks are less risky in metropolises

    NASA Astrophysics Data System (ADS)

    Xiao, Ying; Zhang, Hai-Tao; Xu, Bowen; Zhu, Tao; Chen, Guanrong; Chen, Duxin

    2018-02-01

    Subways are crucial in modern transportation systems of metropolises. To quantitatively evaluate the potential risks of subway networks suffered from natural disasters or deliberate attacks, real data from seven Chinese subway systems are collected and their population distributions and anti-risk capabilities are analyzed. Counterintuitively, it is found that transfer stations with large numbers of connections are not the most crucial, but the stations and lines with large betweenness centrality are essential, if subway networks are being attacked. It is also found that cycles reduce such correlations due to the existence of alternative paths. To simulate the data-based observations, a network model is proposed to characterize the dynamics of subway systems under various intensities of attacks on stations and lines. This study sheds some light onto risk assessment of subway networks in metropolitan cities.

  9. Online Cross-Validation-Based Ensemble Learning

    PubMed Central

    Benkeser, David; Ju, Cheng; Lendle, Sam; van der Laan, Mark

    2017-01-01

    Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to identify the algorithm with the best performance. We show that by basing estimates on the cross-validation-selected algorithm, we are asymptotically guaranteed to perform as well as the true, unknown best-performing algorithm. We provide extensions of this approach including online estimation of the optimal ensemble of candidate online estimators. We illustrate excellent performance of our methods using simulations and a real data example where we make streaming predictions of infectious disease incidence using data from a large database. PMID:28474419

  10. A distributed computing environment with support for constraint-based task scheduling and scientific experimentation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahrens, J.P.; Shapiro, L.G.; Tanimoto, S.L.

    1997-04-01

    This paper describes a computing environment which supports computer-based scientific research work. Key features include support for automatic distributed scheduling and execution and computer-based scientific experimentation. A new flexible and extensible scheduling technique that is responsive to a user`s scheduling constraints, such as the ordering of program results and the specification of task assignments and processor utilization levels, is presented. An easy-to-use constraint language for specifying scheduling constraints, based on the relational database query language SQL, is described along with a search-based algorithm for fulfilling these constraints. A set of performance studies show that the environment can schedule and executemore » program graphs on a network of workstations as the user requests. A method for automatically generating computer-based scientific experiments is described. Experiments provide a concise method of specifying a large collection of parameterized program executions. The environment achieved significant speedups when executing experiments; for a large collection of scientific experiments an average speedup of 3.4 on an average of 5.5 scheduled processors was obtained.« less

  11. Climate-driven geographic distribution of the desert locust during recession periods: Subspecies' niche differentiation and relative risks under scenarios of climate change.

    PubMed

    Meynard, Christine N; Gay, Pierre-Emmanuel; Lecoq, Michel; Foucart, Antoine; Piou, Cyril; Chapuis, Marie-Pierre

    2017-11-01

    The desert locust is an agricultural pest that is able to switch from a harmless solitarious stage, during recession periods, to swarms of gregarious individuals that disperse long distances and affect areas from western Africa to India during outbreak periods. Large outbreaks have been recorded through centuries, and the Food and Agriculture Organization keeps a long-term, large-scale monitoring survey database in the area. However, there is also a much less known subspecies that occupies a limited area in Southern Africa. We used large-scale climatic and occurrence data of the solitarious phase of each subspecies during recession periods to understand whether both subspecies climatic niches differ from each other, what is the current potential geographical distribution of each subspecies, and how climate change is likely to shift their potential distribution with respect to current conditions. We evaluated whether subspecies are significantly specialized along available climate gradients by using null models of background climatic differences within and between southern and northern ranges and applying niche similarity and niche equivalency tests. The results point to climatic niche conservatism between the two clades. We complemented this analysis with species distribution modeling to characterize current solitarious distributions and forecast potential recession range shifts under two extreme climate change scenarios at the 2050 and 2090 time horizon. Projections suggest that, at a global scale, the northern clade could contract its solitarious recession range, while the southern clade is likely to expand its recession range. However, local expansions were also predicted in the northern clade, in particular in southern and northern margins of the current geographical distribution. In conclusion, monitoring and management practices should remain in place in northern Africa, while in Southern Africa the potential for the subspecies to pose a threat in the future should be investigated more closely. © 2017 John Wiley & Sons Ltd.

  12. Database of tsunami scenario simulations for Western Iberia: a tool for the TRIDEC Project Decision Support System for tsunami early warning

    NASA Astrophysics Data System (ADS)

    Armigliato, Alberto; Pagnoni, Gianluca; Zaniboni, Filippo; Tinti, Stefano

    2013-04-01

    TRIDEC is a EU-FP7 Project whose main goal is, in general terms, to develop suitable strategies for the management of crises possibly arising in the Earth management field. The general paradigms adopted by TRIDEC to develop those strategies include intelligent information management, the capability of managing dynamically increasing volumes and dimensionality of information in complex events, and collaborative decision making in systems that are typically very loosely coupled. The two areas where TRIDEC applies and tests its strategies are tsunami early warning and industrial subsurface development. In the field of tsunami early warning, TRIDEC aims at developing a Decision Support System (DSS) that integrates 1) a set of seismic, geodetic and marine sensors devoted to the detection and characterisation of possible tsunamigenic sources and to monitoring the time and space evolution of the generated tsunami, 2) large-volume databases of pre-computed numerical tsunami scenarios, 3) a proper overall system architecture. Two test areas are dealt with in TRIDEC: the western Iberian margin and the eastern Mediterranean. In this study, we focus on the western Iberian margin with special emphasis on the Portuguese coasts. The strategy adopted in TRIDEC plans to populate two different databases, called "Virtual Scenario Database" (VSDB) and "Matching Scenario Database" (MSDB), both of which deal only with earthquake-generated tsunamis. In the VSDB we simulate numerically few large-magnitude events generated by the major known tectonic structures in the study area. Heterogeneous slip distributions on the earthquake faults are introduced to simulate events as "realistically" as possible. The members of the VSDB represent the unknowns that the TRIDEC platform must be able to recognise and match during the early crisis management phase. On the other hand, the MSDB contains a very large number (order of thousands) of tsunami simulations performed starting from many different simple earthquake sources of different magnitudes and located in the "vicinity" of the virtual scenario earthquake. In the DSS perspective, the members of the MSDB have to be suitably combined based on the information coming from the sensor networks, and the results are used during the crisis evolution phase to forecast the degree of exposition of different coastal areas. We provide examples from both databases whose members are computed by means of the in-house software called UBO-TSUFD, implementing the non-linear shallow-water equations and solving them over a set of nested grids that guarantee a suitable spatial resolution (few tens of meters) in specific, suitably chosen, coastal areas.

  13. Random vs. systematic sampling from administrative databases involving human subjects.

    PubMed

    Hagino, C; Lo, R J

    1998-09-01

    Two sampling techniques, simple random sampling (SRS) and systematic sampling (SS), were compared to determine whether they yield similar and accurate distributions for the following four factors: age, gender, geographic location and years in practice. Any point estimate within 7 yr or 7 percentage points of its reference standard (SRS or the entire data set, i.e., the target population) was considered "acceptably similar" to the reference standard. The sampling frame was from the entire membership database of the Canadian Chiropractic Association. The two sampling methods were tested using eight different sample sizes of n (50, 100, 150, 200, 250, 300, 500, 800). From the profile/characteristics, summaries of four known factors [gender, average age, number (%) of chiropractors in each province and years in practice], between- and within-methods chi 2 tests and unpaired t tests were performed to determine whether any of the differences [descriptively greater than 7% or 7 yr] were also statistically significant. The strengths of the agreements between the provincial distributions were quantified by calculating the percent agreements for each (provincial pairwise-comparison methods). Any percent agreement less than 70% was judged to be unacceptable. Our assessments of the two sampling methods (SRS and SS) for the different sample sizes tested suggest that SRS and SS yielded acceptably similar results. Both methods started to yield "correct" sample profiles at approximately the same sample size (n > 200). SS is not only convenient, it can be recommended for sampling from large databases in which the data are listed without any inherent order biases other than alphabetical listing by surname.

  14. Capturing the Petermann Ice Island Flux With the CI2D3 Database

    NASA Astrophysics Data System (ADS)

    Crawford, A. J.; Crocker, G.; Mueller, D.; Saper, R.; Desjardins, L.; Carrieres, T.

    2017-12-01

    The Petermann Glacier ice tongue lost >460 km2 of areal extent ( 38 Gt of mass) due to three large calving events in 2008, 2010 and 2012, as well as three previously unrecorded events in 2011 and 2012. Hundreds of ice islands subsequently drifted south between Hall Basin and Newfoundland's Grand Banks, but no systematic data collection or analysis has been conducted for the full flux of fragments prior to the present study. To accomplish this, the Canadian Ice Service's extensive RADARSAT-1 and -2 synthetic aperture radar image archive was mined to create the Canadian Ice Island Drift, Deterioration and Detection (CI2D3) Database. Over 15000 fragments have been digitized in GIS software from 3200 SAR scenes. A unique characteristic of the database is the inclusion of the lineage (i.e., connecting repeat observations or mother-daughter fragments) for all tracked fragments with areas >0.25 km2. This genealogical information was used to isolate ice islands that were about to fracture in order to assess the environmental conditions and morphological characteristics that influence this deterioration mechanism. Fracture counts showed a significant relationship with sea ice concentration (r = -0.56). However, variations in relative thickness played a large role in fracturing likelihood regardless of sea ice conditions. The exceedance probability of the daughter fragment length was calculated, as is often conducted for offshore industry hazard assessment. Grounded ice islands, which are hazards to seafloor installations and disturb benthic ecology, were recognized from their negligible drift speeds and two grounding hot-spots were identified along the Coburg and eastern Baffin island coasts. Petermann ice islands have been noted to drift along specific isobaths due to the influence of bathymetry on ocean currents. 50% of observations occurred between the 100 and 300 m isobaths, and smaller ice islands were observed more frequently in deeper regions. The CI2D3 Database can be utilized for the development of operational models and remote sensing tools for ice island detection, as well as assessing the distribution of Greenland Ice Sheet freshwater. The database will contribute to the study of these large, tabular icebergs that are anticipated to continue calving in both Polar Regions, including at the Petermann Glacier.

  15. National Utility Rate Database: Preprint

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ong, S.; McKeel, R.

    2012-08-01

    When modeling solar energy technologies and other distributed energy systems, using high-quality expansive electricity rates is essential. The National Renewable Energy Laboratory (NREL) developed a utility rate platform for entering, storing, updating, and accessing a large collection of utility rates from around the United States. This utility rate platform lives on the Open Energy Information (OpenEI) website, OpenEI.org, allowing the data to be programmatically accessed from a web browser, using an application programming interface (API). The semantic-based utility rate platform currently has record of 1,885 utility rates and covers over 85% of the electricity consumption in the United States.

  16. Workflow based framework for life science informatics.

    PubMed

    Tiwari, Abhishek; Sekhar, Arvind K T

    2007-10-01

    Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.

  17. Use of Patient Registries and Administrative Datasets for the Study of Pediatric Cancer

    PubMed Central

    Rice, Henry E.; Englum, Brian R.; Gulack, Brian C.; Adibe, Obinna O.; Tracy, Elizabeth T.; Kreissman, Susan G.; Routh, Jonathan C.

    2015-01-01

    Analysis of data from large administrative databases and patient registries is increasingly being used to study childhood cancer care, although the value of these data sources remains unclear to many clinicians. Interpretation of large databases requires a thorough understanding of how the dataset was designed, how data were collected, and how to assess data quality. This review will detail the role of administrative databases and registry databases for the study of childhood cancer, tools to maximize information from these datasets, and recommendations to improve the use of these databases for the study of pediatric oncology. PMID:25807938

  18. Spatial trends in leaf size of Amazonian rainforest trees

    NASA Astrophysics Data System (ADS)

    Malhado, A. C. M.; Malhi, Y.; Whittaker, R. J.; Ladle, R. J.; Ter Steege, H.; Aragão, L. E. O. C.; Quesada, C. A.; Araujo-Murakami, A.; Phillips, O. L.; Peacock, J.; Lopez-Gonzalez, G.; Baker, T. R.; Butt, N.; Anderson, L. O.; Arroyo, L.; Almeida, S.; Higuchi, N.; Killeen, T. J.; Monteagudo, A.; Neill, D.; Pitman, N.; Prieto, A.; Salomão, R. P.; Silva, N.; Vásquez-Martínez, R.; Laurance, W. F.

    2009-02-01

    Leaf size influences many aspects of tree function such as rates of transpiration and photosynthesis and, consequently, often varies in a predictable way in response to environmental gradients. The recent development of pan-Amazonian databases based on permanent botanical plots (e.g. RAINFOR, ATDN) has now made it possible to assess trends in leaf size across environmental gradients in Amazonia. Previous plot-based studies have shown that the community structure of Amazonian trees breaks down into at least two major ecological gradients corresponding with variations in soil fertility (decreasing south to northeast) and length of the dry season (increasing from northwest to south and east). Here we describe the results of the geographic distribution of leaf size categories based on 121 plots distributed across eight South American countries. We find that, as predicted, the Amazon forest is predominantly populated by tree species and individuals in the mesophyll size class (20.25-182.25 cm2). The geographic distribution of species and individuals with large leaves (>20.25 cm2) is complex but is generally characterized by a higher proportion of such trees in the north-west of the region. Spatially corrected regressions reveal weak correlations between the proportion of large-leaved species and metrics of water availability. We also find a significant negative relationship between leaf size and wood density.

  19. Spatial trends in leaf size of Amazonian rainforest trees

    NASA Astrophysics Data System (ADS)

    Malhado, A. C. M.; Malhi, Y.; Whittaker, R. J.; Ladle, R. J.; Ter Steege, H.; Phillips, O. L.; Butt, N.; Aragão, L. E. O. C.; Quesada, C. A.; Araujo-Murakami, A.; Arroyo, L.; Peacock, J.; Lopez-Gonzalez, G.; Baker, T. R.; Anderson, L. O.; Almeida, S.; Higuchi, N.; Killeen, T. J.; Monteagudo, A.; Neill, D.; Pitman, N.; Prieto, A.; Salomão, R. P.; Vásquez-Martínez, R.; Laurance, W. F.

    2009-08-01

    Leaf size influences many aspects of tree function such as rates of transpiration and photosynthesis and, consequently, often varies in a predictable way in response to environmental gradients. The recent development of pan-Amazonian databases based on permanent botanical plots has now made it possible to assess trends in leaf size across environmental gradients in Amazonia. Previous plot-based studies have shown that the community structure of Amazonian trees breaks down into at least two major ecological gradients corresponding with variations in soil fertility (decreasing from southwest to northeast) and length of the dry season (increasing from northwest to south and east). Here we describe the geographic distribution of leaf size categories based on 121 plots distributed across eight South American countries. We find that the Amazon forest is predominantly populated by tree species and individuals in the mesophyll size class (20.25-182.25 cm2). The geographic distribution of species and individuals with large leaves (>20.25 cm2) is complex but is generally characterized by a higher proportion of such trees in the northwest of the region. Spatially corrected regressions reveal weak correlations between the proportion of large-leaved species and metrics of water availability. We also find a significant negative relationship between leaf size and wood density.

  20. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2013-06-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.

  1. The importance of data quality for generating reliable distribution models for rare, elusive, and cryptic species

    Treesearch

    Keith B. Aubry; Catherine M. Raley; Kevin S. McKelvey

    2017-01-01

    The availability of spatially referenced environmental data and species occurrence records in online databases enable practitioners to easily generate species distribution models (SDMs) for a broad array of taxa. Such databases often include occurrence records of unknown reliability, yet little information is available on the influence of data quality on SDMs generated...

  2. The methane distribution on Titan: high resolution spectroscopy in the near-IR with Keck NIRSPEC/AO

    NASA Astrophysics Data System (ADS)

    Adamkovics, Mate; Mitchell, Jonathan L.

    2014-11-01

    The distribution of methane on Titan is a diagnostic of regional scale meteorology and large scale atmospheric circulation. The observed formation of clouds and the transport of heat through the atmosphere both depend on spatial and temporal variations in methane humidity. We have performed observations to measure the the distribution on methane Titan using high spectral resolution near-IR (H-band) observations made with NIRSPEC, with adaptive optics, at Keck Observatory in July 2014. This work builds on previous attempts at this measurement with improvement in the observing protocol and data reduction, together with increased integration times. Radiative transfer models using line-by-line calculation of methane opacities from the HITRAN2012 database are used to retrieve methane abundances. We will describe analysis of the reduced observations, which show latitudinal spatial variation in the region the spectrum that is thought to be sensitive to methane abundance. Quantifying the methane abundance variation requires models that include the spatial variation in surface albedo and meridional haze gradient; we will describe (currently preliminary) analysis of the the methane distribution and uncertainties in the retrieval.

  3. DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

    EPA Science Inventory

    DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
    Ann M. Richard
    US Environmental Protection Agency, Research Triangle Park, NC, USA

    Distributed: Decentralized set of standardized, field-delimited databases,...

  4. PROGRESS REPORT ON THE DSSTOX DATABASE NETWORK: NEWLY LAUNCHED WEBSITE, APPLICATIONS, FUTURE PLANS

    EPA Science Inventory

    Progress Report on the DSSTox Database Network: Newly Launched Website, Applications, Future Plans

    Progress will be reported on development of the Distributed Structure-Searchable Toxicity (DSSTox) Database Network and the newly launched public website that coordinates and...

  5. Image Databases.

    ERIC Educational Resources Information Center

    Pettersson, Rune

    Different kinds of pictorial databases are described with respect to aims, user groups, search possibilities, storage, and distribution. Some specific examples are given for databases used for the following purposes: (1) labor markets for artists; (2) document management; (3) telling a story; (4) preservation (archives and museums); (5) research;…

  6. Practical Quantum Private Database Queries Based on Passive Round-Robin Differential Phase-shift Quantum Key Distribution.

    PubMed

    Li, Jian; Yang, Yu-Guang; Chen, Xiu-Bo; Zhou, Yi-Hua; Shi, Wei-Min

    2016-08-19

    A novel quantum private database query protocol is proposed, based on passive round-robin differential phase-shift quantum key distribution. Compared with previous quantum private database query protocols, the present protocol has the following unique merits: (i) the user Alice can obtain one and only one key bit so that both the efficiency and security of the present protocol can be ensured, and (ii) it does not require to change the length difference of the two arms in a Mach-Zehnder interferometer and just chooses two pulses passively to interfere with so that it is much simpler and more practical. The present protocol is also proved to be secure in terms of the user security and database security.

  7. Domain Regeneration for Cross-Database Micro-Expression Recognition

    NASA Astrophysics Data System (ADS)

    Zong, Yuan; Zheng, Wenming; Huang, Xiaohua; Shi, Jingang; Cui, Zhen; Zhao, Guoying

    2018-05-01

    In this paper, we investigate the cross-database micro-expression recognition problem, where the training and testing samples are from two different micro-expression databases. Under this setting, the training and testing samples would have different feature distributions and hence the performance of most existing micro-expression recognition methods may decrease greatly. To solve this problem, we propose a simple yet effective method called Target Sample Re-Generator (TSRG) in this paper. By using TSRG, we are able to re-generate the samples from target micro-expression database and the re-generated target samples would share same or similar feature distributions with the original source samples. For this reason, we can then use the classifier learned based on the labeled source samples to accurately predict the micro-expression categories of the unlabeled target samples. To evaluate the performance of the proposed TSRG method, extensive cross-database micro-expression recognition experiments designed based on SMIC and CASME II databases are conducted. Compared with recent state-of-the-art cross-database emotion recognition methods, the proposed TSRG achieves more promising results.

  8. Conducting Privacy-Preserving Multivariable Propensity Score Analysis When Patient Covariate Information Is Stored in Separate Locations.

    PubMed

    Bohn, Justin; Eddings, Wesley; Schneeweiss, Sebastian

    2017-03-15

    Distributed networks of health-care data sources are increasingly being utilized to conduct pharmacoepidemiologic database studies. Such networks may contain data that are not physically pooled but instead are distributed horizontally (separate patients within each data source) or vertically (separate measures within each data source) in order to preserve patient privacy. While multivariable methods for the analysis of horizontally distributed data are frequently employed, few practical approaches have been put forth to deal with vertically distributed health-care databases. In this paper, we propose 2 propensity score-based approaches to vertically distributed data analysis and test their performance using 5 example studies. We found that these approaches produced point estimates close to what could be achieved without partitioning. We further found a performance benefit (i.e., lower mean squared error) for sequentially passing a propensity score through each data domain (called the "sequential approach") as compared with fitting separate domain-specific propensity scores (called the "parallel approach"). These results were validated in a small simulation study. This proof-of-concept study suggests a new multivariable analysis approach to vertically distributed health-care databases that is practical, preserves patient privacy, and warrants further investigation for use in clinical research applications that rely on health-care databases. © The Author 2017. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. PDS: A Performance Database Server

    DOE PAGES

    Berry, Michael W.; Dongarra, Jack J.; Larose, Brian H.; ...

    1994-01-01

    The process of gathering, archiving, and distributing computer benchmark data is a cumbersome task usually performed by computer users and vendors with little coordination. Most important, there is no publicly available central depository of performance data for all ranges of machines from personal computers to supercomputers. We present an Internet-accessible performance database server (PDS) that can be used to extract current benchmark data and literature. As an extension to the X-Windows-based user interface (Xnetlib) to the Netlib archival system, PDS provides an on-line catalog of public domain computer benchmarks such as the LINPACK benchmark, Perfect benchmarks, and the NAS parallelmore » benchmarks. PDS does not reformat or present the benchmark data in any way that conflicts with the original methodology of any particular benchmark; it is thereby devoid of any subjective interpretations of machine performance. We believe that all branches (research laboratories, academia, and industry) of the general computing community can use this facility to archive performance metrics and make them readily available to the public. PDS can provide a more manageable approach to the development and support of a large dynamic database of published performance metrics.« less

  10. HC Forum®: a web site based on an international human cytogenetic database

    PubMed Central

    Cohen, Olivier; Mermet, Marie-Ange; Demongeot, Jacques

    2001-01-01

    Familial structural rearrangements of chromosomes represent a factor of malformation risk that could vary over a large range, making genetic counseling difficult. However, they also represent a powerful tool for increasing knowledge of the genome, particularly by studying breakpoints and viable imbalances of the genome. We have developed a collaborative database that now includes data on more than 4100 families, from which we have developed a web site called HC Forum® (http://HCForum.imag.fr). It offers geneticists assistance in diagnosis and in genetic counseling by assessing the malformation risk with statistical models. For researchers, interactive interfaces exhibit the distribution of chromosomal breakpoints and of the genome regions observed at birth in trisomy or in monosomy. Dedicated tools including an interactive pedigree allow electronic submission of data, which will be anonymously shown in a forum for discussions. After validation, data are definitively registered in the database with the email of the sender, allowing direct location of biological material. Thus HC Forum® constitutes a link between diagnosis laboratories and genome research centers, and after 1 year, more than 700 users from about 40 different countries already exist. PMID:11125121

  11. Dynamic publication model for neurophysiology databases.

    PubMed

    Gardner, D; Abato, M; Knuth, K H; DeBellis, R; Erde, S M

    2001-08-29

    We have implemented a pair of database projects, one serving cortical electrophysiology and the other invertebrate neurones and recordings. The design for each combines aspects of two proven schemes for information interchange. The journal article metaphor determined the type, scope, organization and quantity of data to comprise each submission. Sequence databases encouraged intuitive tools for data viewing, capture, and direct submission by authors. Neurophysiology required transcending these models with new datatypes. Time-series, histogram and bivariate datatypes, including illustration-like wrappers, were selected by their utility to the community of investigators. As interpretation of neurophysiological recordings depends on context supplied by metadata attributes, searches are via visual interfaces to sets of controlled-vocabulary metadata trees. Neurones, for example, can be specified by metadata describing functional and anatomical characteristics. Permanence is advanced by data model and data formats largely independent of contemporary technology or implementation, including Java and the XML standard. All user tools, including dynamic data viewers that serve as a virtual oscilloscope, are Java-based, free, multiplatform, and distributed by our application servers to any contemporary networked computer. Copyright is retained by submitters; viewer displays are dynamic and do not violate copyright of related journal figures. Panels of neurophysiologists view and test schemas and tools, enhancing community support.

  12. Very Large Data Volumes Analysis of Collaborative Systems with Finite Number of States

    ERIC Educational Resources Information Center

    Ivan, Ion; Ciurea, Cristian; Pavel, Sorin

    2010-01-01

    The collaborative system with finite number of states is defined. A very large database is structured. Operations on large databases are identified. Repetitive procedures for collaborative systems operations are derived. The efficiency of such procedures is analyzed. (Contains 6 tables, 5 footnotes and 3 figures.)

  13. Biodiversity and distribution of polar freshwater DNA viruses

    PubMed Central

    Aguirre de Cárcer, Daniel; López-Bueno, Alberto; Pearce, David A.; Alcamí, Antonio

    2015-01-01

    Viruses constitute the most abundant biological entities and a large reservoir of genetic diversity on Earth. Despite the recent surge in their study, our knowledge on their actual biodiversity and distribution remains sparse. We report the first metagenomic analysis of Arctic freshwater viral DNA communities and a comparative analysis with other freshwater environments. Arctic viromes are dominated by unknown and single-stranded DNA viruses with no close relatives in the database. These unique viral DNA communities mostly relate to each other and present some minor genetic overlap with other environments studied, including an Arctic Ocean virome. Despite common environmental conditions in polar ecosystems, the Arctic and Antarctic DNA viromes differ at the fine-grain genetic level while sharing a similar taxonomic composition. The study uncovers some viral lineages with a bipolar distribution, suggesting a global dispersal capacity for viruses, and seemingly indicates that viruses do not follow the latitudinal diversity gradient known for macroorganisms. Our study sheds light into the global biogeography and connectivity of viral communities. PMID:26601189

  14. Source environment feature related phylogenetic distribution pattern of anoxygenic photosynthetic bacteria as revealed by pufM analysis.

    PubMed

    Zeng, Yonghui; Jiao, Nianzhi

    2007-06-01

    Anoxygenic photosynthesis, performed primarily by anoxygenic photosynthetic bacteria (APB), has been supposed to arise on Earth more than 3 billion years ago. The long established APB are distributed in almost every corner where light can reach. However, the relationship between APB phylogeny and source environments has been largely unexplored. Here we retrieved the pufM sequences and related source information of 89 pufM containing species from the public database. Phylogenetic analysis revealed that horizontal gene transfer (HGT) most likely occurred within 11 out of a total 21 pufM subgroups, not only among species within the same class but also among species of different phyla or subphyla. A clear source environment feature related phylogenetic distribution pattern was observed, with all species from oxic habitats and those from anoxic habitats clustering into independent subgroups, respectively. HGT among ancient APB and subsequent long term evolution and adaptation to separated niches may have contributed to the coupling of environment and pufM phylogeny.

  15. System for Performing Single Query Searches of Heterogeneous and Dispersed Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Okimura, Takeshi (Inventor); Gurram, Mohana M. (Inventor); Tran, Vu Hoang (Inventor); Knight, Christopher D. (Inventor); Trinh, Anh Ngoc (Inventor)

    2017-01-01

    The present invention is a distributed computer system of heterogeneous databases joined in an information grid and configured with an Application Programming Interface hardware which includes a search engine component for performing user-structured queries on multiple heterogeneous databases in real time. This invention reduces overhead associated with the impedance mismatch that commonly occurs in heterogeneous database queries.

  16. DSSTOX (DISTRIBUTED STRUCTURE-SEARCHABLE ...

    EPA Pesticide Factsheets

    Distributed Structure-Searchable Toxicity Database Network Major trends affecting public toxicity information resources have the potential to significantly alter the future of predictive toxicology. Chemical toxicity screening is undergoing shifts towards greater use of more fundamental information on gene/protein expression patterns and bioactivity and bioassay profiles, the latter generated with highthroughput screening technologies. Curated, systematically organized, and webaccessible toxicity and biological activity data in association with chemical structures, enabling the integration of diverse data information domains, will fuel the next frontier of advancement for QSAR (quantitative structure-activity relationship) and data mining technologies. The DSSTox project is supporting progress towards these goals on many fronts, promoting the use of formalized and structure-annotated toxicity data models, helping to interface these efforts with QSAR modelers, linking data from diverse sources, and creating a large, quality reviewed, central chemical structure information resource linked to various toxicity data sources

  17. Organization and dissemination of multimedia medical databases on the WWW.

    PubMed

    Todorovski, L; Ribaric, S; Dimec, J; Hudomalj, E; Lunder, T

    1999-01-01

    In the paper, we focus on the problem of building and disseminating multimedia medical databases on the World Wide Web (WWW). The current results of the ongoing project of building a prototype dermatology images database and its WWW presentation are presented. The dermatology database is part of an ambitious plan concerning an organization of a network of medical institutions building distributed and federated multimedia databases of a much wider scale.

  18. Development, deployment and operations of ATLAS databases

    NASA Astrophysics Data System (ADS)

    Vaniachine, A. V.; Schmitt, J. G. v. d.

    2008-07-01

    In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.

  19. PIGD: a database for intronless genes in the Poaceae.

    PubMed

    Yan, Hanwei; Jiang, Cuiping; Li, Xiaoyu; Sheng, Lei; Dong, Qing; Peng, Xiaojian; Li, Qian; Zhao, Yang; Jiang, Haiyang; Cheng, Beijiu

    2014-10-01

    Intronless genes are a feature of prokaryotes; however, they are widespread and unequally distributed among eukaryotes and represent an important resource to study the evolution of gene architecture. Although many databases on exons and introns exist, there is currently no cohesive database that collects intronless genes in plants into a single database. In this study, we present the Poaceae Intronless Genes Database (PIGD), a user-friendly web interface to explore information on intronless genes from different plants. Five Poaceae species, Sorghum bicolor, Zea mays, Setaria italica, Panicum virgatum and Brachypodium distachyon, are included in the current release of PIGD. Gene annotations and sequence data were collected and integrated from different databases. The primary focus of this study was to provide gene descriptions and gene product records. In addition, functional annotations, subcellular localization prediction and taxonomic distribution are reported. PIGD allows users to readily browse, search and download data. BLAST and comparative analyses are also provided through this online database, which is available at http://pigd.ahau.edu.cn/. PIGD provides a solid platform for the collection, integration and analysis of intronless genes in the Poaceae. As such, this database will be useful for subsequent bio-computational analysis in comparative genomics and evolutionary studies.

  20. Teaching Case: Adapting the Access Northwind Database to Support a Database Course

    ERIC Educational Resources Information Center

    Dyer, John N.; Rogers, Camille

    2015-01-01

    A common problem encountered when teaching database courses is that few large illustrative databases exist to support teaching and learning. Most database textbooks have small "toy" databases that are chapter objective specific, and thus do not support application over the complete domain of design, implementation and management concepts…

  1. Virtual Queue in a Centralized Database Environment

    NASA Astrophysics Data System (ADS)

    Kar, Amitava; Pal, Dibyendu Kumar

    2010-10-01

    Today is the era of the Internet. Every matter whether it be a gather of knowledge or planning a holiday or booking of ticket etc everything can be obtained from the internet. This paper intends to calculate the different queuing measures when some booking or purchase is done through the internet subject to the limitations in the number of tickets or seats. It involves a lot of database activities like read and write. This paper takes care of the time involved in the requests of a service, taken as arrival and the time involved in providing the required information, taken as service and thereby tries to calculate the distribution of arrival and service and the various measures of the queuing. This paper considers the database as centralized database for the sake of simplicity as the alternating concept of distributed database would rather complicate the calculation.

  2. Large-Scale 1:1 Computing Initiatives: An Open Access Database

    ERIC Educational Resources Information Center

    Richardson, Jayson W.; McLeod, Scott; Flora, Kevin; Sauers, Nick J.; Kannan, Sathiamoorthy; Sincar, Mehmet

    2013-01-01

    This article details the spread and scope of large-scale 1:1 computing initiatives around the world. What follows is a review of the existing literature around 1:1 programs followed by a description of the large-scale 1:1 database. Main findings include: 1) the XO and the Classmate PC dominate large-scale 1:1 initiatives; 2) if professional…

  3. Study on parallel and distributed management of RS data based on spatial data base

    NASA Astrophysics Data System (ADS)

    Chen, Yingbiao; Qian, Qinglan; Liu, Shijin

    2006-12-01

    With the rapid development of current earth-observing technology, RS image data storage, management and information publication become a bottle-neck for its appliance and popularization. There are two prominent problems in RS image data storage and management system. First, background server hardly handle the heavy process of great capacity of RS data which stored at different nodes in a distributing environment. A tough burden has put on the background server. Second, there is no unique, standard and rational organization of Multi-sensor RS data for its storage and management. And lots of information is lost or not included at storage. Faced at the above two problems, the paper has put forward a framework for RS image data parallel and distributed management and storage system. This system aims at RS data information system based on parallel background server and a distributed data management system. Aiming at the above two goals, this paper has studied the following key techniques and elicited some revelatory conclusions. The paper has put forward a solid index of "Pyramid, Block, Layer, Epoch" according to the properties of RS image data. With the solid index mechanism, a rational organization for different resolution, different area, different band and different period of Multi-sensor RS image data is completed. In data storage, RS data is not divided into binary large objects to be stored at current relational database system, while it is reconstructed through the above solid index mechanism. A logical image database for the RS image data file is constructed. In system architecture, this paper has set up a framework based on a parallel server of several common computers. Under the framework, the background process is divided into two parts, the common WEB process and parallel process.

  4. Trends in Solar energy Driven Vertical Ground Source Heat Pump Systems in Sweden - An Analysis Based on the Swedish Well Database

    NASA Astrophysics Data System (ADS)

    Juhlin, K.; Gehlin, S.

    2016-12-01

    Sweden is a world leader in developing and using vertical ground source heat pump (GSHP) technology. GSHP systems extract passively stored solar energy in the ground and the Earth's natural geothermal energy. Geothermal energy is an admitted renewable energy source in Sweden since 2007 and is the third largest renewable energy source in the country today. The Geological Survey of Sweden (SGU) is the authority in Sweden that provides open access geological data of rock, soil and groundwater for the public. All wells drilled must be registered in the SGU Well Database and it is the well driller's duty to submit registration of drilled wells.Both active and passive geothermal energy systems are in use. Large GSHP systems, with at least 20 boreholes, are active geothermal energy systems. Energy is stored in the ground which allows both comfort heating and cooling to be extracted. Active systems are therefore relevant for larger properties and industrial buildings. Since 1978 more than 600 000 wells (water wells, GSHP boreholes etc) have been registered in the Well Database, with around 20 000 new registrations per year. Of these wells an estimated 320 000 wells are registered as GSHP boreholes. The vast majority of these boreholes are single boreholes for single-family houses. The number of properties with registered vertical borehole GSHP installations amounts to approximately 243 000. Of these sites between 300-350 are large GSHP systems with at least 20 boreholes. While the increase in number of new registrations for smaller homes and households has slowed down after the rapid development in the 80's and 90's, the larger installations for commercial and industrial buildings have increased in numbers over the last ten years. This poster uses data from the SGU Well Database to quantify and analyze the trends in vertical GSHP systems reported between 1978-2015 in Sweden, with special focus on large systems. From the new aggregated data, conclusions can be drawn about the development of larger vertical GSHP system installments over the years and the geographical distribution in Sweden.

  5. A Database as a Service for the Healthcare System to Store Physiological Signal Data.

    PubMed

    Chang, Hsien-Tsung; Lin, Tsai-Huei

    2016-01-01

    Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.

  6. A Database as a Service for the Healthcare System to Store Physiological Signal Data

    PubMed Central

    Lin, Tsai-Huei

    2016-01-01

    Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415

  7. Digital Video of Live-Scan Fingerprint Data

    National Institute of Standards and Technology Data Gateway

    NIST Digital Video of Live-Scan Fingerprint Data (PC database for purchase)   NIST Special Database 24 contains MPEG-2 (Moving Picture Experts Group) compressed digital video of live-scan fingerprint data. The database is being distributed for use in developing and testing of fingerprint verification systems.

  8. The HARPS-N archive through a Cassandra, NoSQL database suite?

    NASA Astrophysics Data System (ADS)

    Molinari, Emilio; Guerra, Jose; Harutyunyan, Avet; Lodi, Marcello; Martin, Adrian

    2016-07-01

    The TNG-INAF is developing the science archive for the WEAVE instrument. The underlying architecture of the archive is based on a non relational database, more precisely, on Apache Cassandra cluster, which uses a NoSQL technology. In order to test and validate the use of this architecture, we created a local archive which we populated with all the HARPSN spectra collected at the TNG since the instrument's start of operations in mid-2012, as well as developed tools for the analysis of this data set. The HARPS-N data set is two orders of magnitude smaller than WEAVE, but we want to demonstrate the ability to walk through a complete data set and produce scientific output, as valuable as that produced by an ordinary pipeline, though without accessing directly the FITS files. The analytics is done by Apache Solr and Spark and on a relational PostgreSQL database. As an example, we produce observables like metallicity indexes for the targets in the archive and compare the results with the ones coming from the HARPS-N regular data reduction software. The aim of this experiment is to explore the viability of a high availability cluster and distributed NoSQL database as a platform for complex scientific analytics on a large data set, which will then be ported to the WEAVE Archive System (WAS) which we are developing for the WEAVE multi object, fiber spectrograph.

  9. Rasdaman for Big Spatial Raster Data

    NASA Astrophysics Data System (ADS)

    Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.

    2015-12-01

    Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.

  10. Nationwide incidence of motor neuron disease using the French health insurance information system database.

    PubMed

    Kab, Sofiane; Moisan, Frédéric; Preux, Pierre-Marie; Marin, Benoît; Elbaz, Alexis

    2017-08-01

    There are no estimates of the nationwide incidence of motor neuron disease (MND) in France. We used the French health insurance information system to identify incident MND cases (2012-2014), and compared incidence figures to those from three external sources. We identified incident MND cases (2012-2014) based on three data sources (riluzole claims, hospitalisation records, long-term chronic disease benefits), and computed MND incidence by age, gender, and geographic region. We used French mortality statistics, Limousin ALS registry data, and previous European studies based on administrative databases to perform external comparisons. We identified 6553 MND incident cases. After standardisation to the United States 2010 population, the age/gender-standardised incidence was 2.72/100,000 person-years (males, 3.37; females, 2.17; male:female ratio = 1.53, 95% CI1.46-1.61). There was no major spatial difference in MND distribution. Our data were in agreement with the French death database (standardised mortality ratio = 1.01, 95% CI = 0.96-1.06) and Limousin ALS registry (standardised incidence ratio = 0.92, 95% CI = 0.72-1.15). Incidence estimates were in the same range as those from previous studies. We report French nationwide incidence estimates of MND. Administrative databases including hospital discharge data and riluzole claims offer an interesting approach to identify large population-based samples of patients with MND for epidemiologic studies and surveillance.

  11. Overview of Historical Earthquake Document Database in Japan and Future Development

    NASA Astrophysics Data System (ADS)

    Nishiyama, A.; Satake, K.

    2014-12-01

    In Japan, damage and disasters from historical large earthquakes have been documented and preserved. Compilation of historical earthquake documents started in the early 20th century and 33 volumes of historical document source books (about 27,000 pages) have been published. However, these source books are not effectively utilized for researchers due to a contamination of low-reliability historical records and a difficulty for keyword searching by characters and dates. To overcome these problems and to promote historical earthquake studies in Japan, construction of text database started in the 21 century. As for historical earthquakes from the beginning of the 7th century to the early 17th century, "Online Database of Historical Documents in Japanese Earthquakes and Eruptions in the Ancient and Medieval Ages" (Ishibashi, 2009) has been already constructed. They investigated the source books or original texts of historical literature, emended the descriptions, and assigned the reliability of each historical document on the basis of written age. Another database compiled the historical documents for seven damaging earthquakes occurred along the Sea of Japan coast in Honshu, central Japan in the Edo period (from the beginning of the 17th century to the middle of the 19th century) and constructed text database and seismic intensity data base. These are now publicized on the web (written only in Japanese). However, only about 9 % of the earthquake source books have been digitized so far. Therefore, we plan to digitize all of the remaining historical documents by the research-program which started in 2014. The specification of the data base will be similar for previous ones. We also plan to combine this database with liquefaction traces database, which will be constructed by other research program, by adding the location information described in historical documents. Constructed database would be utilized to estimate the distributions of seismic intensities and tsunami heights.

  12. IDAAPM: integrated database of ADMET and adverse effects of predictive modeling based on FDA approved drug data.

    PubMed

    Legehar, Ashenafi; Xhaard, Henri; Ghemtio, Leo

    2016-01-01

    The disposition of a pharmaceutical compound within an organism, i.e. its Absorption, Distribution, Metabolism, Excretion, Toxicity (ADMET) properties and adverse effects, critically affects late stage failure of drug candidates and has led to the withdrawal of approved drugs. Computational methods are effective approaches to reduce the number of safety issues by analyzing possible links between chemical structures and ADMET or adverse effects, but this is limited by the size, quality, and heterogeneity of the data available from individual sources. Thus, large, clean and integrated databases of approved drug data, associated with fast and efficient predictive tools are desirable early in the drug discovery process. We have built a relational database (IDAAPM) to integrate available approved drug data such as drug approval information, ADMET and adverse effects, chemical structures and molecular descriptors, targets, bioactivity and related references. The database has been coupled with a searchable web interface and modern data analytics platform (KNIME) to allow data access, data transformation, initial analysis and further predictive modeling. Data were extracted from FDA resources and supplemented from other publicly available databases. Currently, the database contains information regarding about 19,226 FDA approval applications for 31,815 products (small molecules and biologics) with their approval history, 2505 active ingredients, together with as many ADMET properties, 1629 molecular structures, 2.5 million adverse effects and 36,963 experimental drug-target bioactivity data. IDAAPM is a unique resource that, in a single relational database, provides detailed information on FDA approved drugs including their ADMET properties and adverse effects, the corresponding targets with bioactivity data, coupled with a data analytics platform. It can be used to perform basic to complex drug-target ADMET or adverse effects analysis and predictive modeling. IDAAPM is freely accessible at http://idaapm.helsinki.fi and can be exploited through a KNIME workflow connected to the database.Graphical abstractFDA approved drug data integration for predictive modeling.

  13. Database architectures for Space Telescope Science Institute

    NASA Astrophysics Data System (ADS)

    Lubow, Stephen

    1993-08-01

    At STScI nearly all large applications require database support. A general purpose architecture has been developed and is in use that relies upon an extended client-server paradigm. Processing is in general distributed across three processes, each of which generally resides on its own processor. Database queries are evaluated on one such process, called the DBMS server. The DBMS server software is provided by a database vendor. The application issues database queries and is called the application client. This client uses a set of generic DBMS application programming calls through our STDB/NET programming interface. Intermediate between the application client and the DBMS server is the STDB/NET server. This server accepts generic query requests from the application and converts them into the specific requirements of the DBMS server. In addition, it accepts query results from the DBMS server and passes them back to the application. Typically the STDB/NET server is local to the DBMS server, while the application client may be remote. The STDB/NET server provides additional capabilities such as database deadlock restart and performance monitoring. This architecture is currently in use for some major STScI applications, including the ground support system. We are currently investigating means of providing ad hoc query support to users through the above architecture. Such support is critical for providing flexible user interface capabilities. The Universal Relation advocated by Ullman, Kernighan, and others appears to be promising. In this approach, the user sees the entire database as a single table, thereby freeing the user from needing to understand the detailed schema. A software layer provides the translation between the user and detailed schema views of the database. However, many subtle issues arise in making this transformation. We are currently exploring this scheme for use in the Hubble Space Telescope user interface to the data archive system (DADS).

  14. [The 'Beijing clinical database' on severe acute respiratory syndrome patients: its design, process, quality control and evaluation].

    PubMed

    2004-04-01

    To develop a large database on clinical presentation, treatment and prognosis of all clinical diagnosed severe acute respiratory syndrome (SARS) cases in Beijing during the 2003 "crisis", in order to conduct further clinical studies. The database was designed by specialists, under the organization of the Beijing Commanding Center for SARS Treatment and Cure, including 686 data items in six sub-databases: primary medical-care seeking, vital signs, common symptoms and signs, treatment, laboratory and auxiliary test, and cost. All hospitals having received SARS inpatients were involved in the project. Clinical data was transferred and coded by trained doctors and data entry was carried out by trained nurses, according to a uniformed protocol. A series of procedures had been taken before the database was finally established which included programmed logic checking, digit-by-digit check on 5% random sample, data linkage for transferred cases, coding of characterized information, database structure standardization, case reviewe by computer program according to SARS Clinical Diagnosis Criteria issued by the Ministry of Health, and exclusion of unqualified patients. The database involved 2148 probable SARS cases in accordant with the clinical diagnosis criteria, including 1291 with complete records. All cases and record-complete cases showed an almost identical distribution in sex, age, occupation, residence areas and time of onset. The completion rate of data was not significantly different between the two groups except for some items on primary medical-care seeking. Specifically, the data completion rate was 73% - 100% in primary medical-care seeking, 90% in common symptoms and signs, 100% for treatment, 98% for temperature, 90% for pulse, 100% for outcomes and 98% for costs in hospital. The number of cases collected in the Beijing Clinical Database of SARS Patients was fairly complete. Cases with complete records showed that they could serve as excellent representatives of all cases. The completeness of data was quite satisfactory with primary clinical items which allowed for further clinical studies.

  15. Managing Data, Provenance and Chaos through Standardization and Automation at the Georgia Coastal Ecosystems LTER Site

    NASA Astrophysics Data System (ADS)

    Sheldon, W.

    2013-12-01

    Managing data for a large, multidisciplinary research program such as a Long Term Ecological Research (LTER) site is a significant challenge, but also presents unique opportunities for data stewardship. LTER research is conducted within multiple organizational frameworks (i.e. a specific LTER site as well as the broader LTER network), and addresses both specific goals defined in an NSF proposal as well as broader goals of the network; therefore, every LTER data can be linked to rich contextual information to guide interpretation and comparison. The challenge is how to link the data to this wealth of contextual metadata. At the Georgia Coastal Ecosystems LTER we developed an integrated information management system (GCE-IMS) to manage, archive and distribute data, metadata and other research products as well as manage project logistics, administration and governance (figure 1). This system allows us to store all project information in one place, and provide dynamic links through web applications and services to ensure content is always up to date on the web as well as in data set metadata. The database model supports tracking changes over time in personnel roles, projects and governance decisions, allowing these databases to serve as canonical sources of project history. Storing project information in a central database has also allowed us to standardize both the formatting and content of critical project information, including personnel names, roles, keywords, place names, attribute names, units, and instrumentation, providing consistency and improving data and metadata comparability. Lookup services for these standard terms also simplify data entry in web and database interfaces. We have also coupled the GCE-IMS to our MATLAB- and Python-based data processing tools (i.e. through database connections) to automate metadata generation and packaging of tabular and GIS data products for distribution. Data processing history is automatically tracked throughout the data lifecycle, from initial import through quality control, revision and integration by our data processing system (GCE Data Toolbox for MATLAB), and included in metadata for versioned data products. This high level of automation and system integration has proven very effective in managing the chaos and scalability of our information management program.

  16. Submarine canyons represent an essential habitat network for krill hotspots in a Large Marine Ecosystem.

    PubMed

    Santora, Jarrod A; Zeno, Ramona; Dorman, Jeffrey G; Sydeman, William J

    2018-05-15

    Submarine canyon systems are ubiquitous features of marine ecosystems, known to support high levels of biodiversity. Canyons may be important to benthic-pelagic ecosystem coupling, but their role in concentrating plankton and structuring pelagic communities is not well known. We hypothesize that at the scale of a large marine ecosystem, canyons provide a critical habitat network, which maintain energy flow and trophic interactions. We evaluate canyon characteristics relative to the distribution and abundance of krill, critically important prey in the California Current Ecosystem. Using a geological database, we conducted a census of canyon locations, evaluated their dimensions, and quantified functional relationships with krill hotspots (i.e., sites of persistently elevated abundance) derived from hydro-acoustic surveys. We found that 76% of krill hotspots occurred within and adjacent to canyons. Most krill hotspots were associated with large shelf-incising canyons. Krill hotspots and canyon dimensions displayed similar coherence as a function of latitude and indicate a potential regional habitat network. The latitudinal migration of many fish, seabirds and mammals may be enhanced by using this canyon-krill network to maintain foraging opportunities. Biogeographic assessments and predictions of krill and krill-predator distributions under climate change may be improved by accounting for canyons in habitat models.

  17. ClearedLeavesDB: an online database of cleared plant leaf images

    PubMed Central

    2014-01-01

    Background Leaf vein networks are critical to both the structure and function of leaves. A growing body of recent work has linked leaf vein network structure to the physiology, ecology and evolution of land plants. In the process, multiple institutions and individual researchers have assembled collections of cleared leaf specimens in which vascular bundles (veins) are rendered visible. In an effort to facilitate analysis and digitally preserve these specimens, high-resolution images are usually created, either of entire leaves or of magnified leaf subsections. In a few cases, collections of digital images of cleared leaves are available for use online. However, these collections do not share a common platform nor is there a means to digitally archive cleared leaf images held by individual researchers (in addition to those held by institutions). Hence, there is a growing need for a digital archive that enables online viewing, sharing and disseminating of cleared leaf image collections held by both institutions and individual researchers. Description The Cleared Leaf Image Database (ClearedLeavesDB), is an online web-based resource for a community of researchers to contribute, access and share cleared leaf images. ClearedLeavesDB leverages resources of large-scale, curated collections while enabling the aggregation of small-scale collections within the same online platform. ClearedLeavesDB is built on Drupal, an open source content management platform. It allows plant biologists to store leaf images online with corresponding meta-data, share image collections with a user community and discuss images and collections via a common forum. We provide tools to upload processed images and results to the database via a web services client application that can be downloaded from the database. Conclusions We developed ClearedLeavesDB, a database focusing on cleared leaf images that combines interactions between users and data via an intuitive web interface. The web interface allows storage of large collections and integrates with leaf image analysis applications via an open application programming interface (API). The open API allows uploading of processed images and other trait data to the database, further enabling distribution and documentation of analyzed data within the community. The initial database is seeded with nearly 19,000 cleared leaf images representing over 40 GB of image data. Extensible storage and growth of the database is ensured by using the data storage resources of the iPlant Discovery Environment. ClearedLeavesDB can be accessed at http://clearedleavesdb.org. PMID:24678985

  18. ClearedLeavesDB: an online database of cleared plant leaf images.

    PubMed

    Das, Abhiram; Bucksch, Alexander; Price, Charles A; Weitz, Joshua S

    2014-03-28

    Leaf vein networks are critical to both the structure and function of leaves. A growing body of recent work has linked leaf vein network structure to the physiology, ecology and evolution of land plants. In the process, multiple institutions and individual researchers have assembled collections of cleared leaf specimens in which vascular bundles (veins) are rendered visible. In an effort to facilitate analysis and digitally preserve these specimens, high-resolution images are usually created, either of entire leaves or of magnified leaf subsections. In a few cases, collections of digital images of cleared leaves are available for use online. However, these collections do not share a common platform nor is there a means to digitally archive cleared leaf images held by individual researchers (in addition to those held by institutions). Hence, there is a growing need for a digital archive that enables online viewing, sharing and disseminating of cleared leaf image collections held by both institutions and individual researchers. The Cleared Leaf Image Database (ClearedLeavesDB), is an online web-based resource for a community of researchers to contribute, access and share cleared leaf images. ClearedLeavesDB leverages resources of large-scale, curated collections while enabling the aggregation of small-scale collections within the same online platform. ClearedLeavesDB is built on Drupal, an open source content management platform. It allows plant biologists to store leaf images online with corresponding meta-data, share image collections with a user community and discuss images and collections via a common forum. We provide tools to upload processed images and results to the database via a web services client application that can be downloaded from the database. We developed ClearedLeavesDB, a database focusing on cleared leaf images that combines interactions between users and data via an intuitive web interface. The web interface allows storage of large collections and integrates with leaf image analysis applications via an open application programming interface (API). The open API allows uploading of processed images and other trait data to the database, further enabling distribution and documentation of analyzed data within the community. The initial database is seeded with nearly 19,000 cleared leaf images representing over 40 GB of image data. Extensible storage and growth of the database is ensured by using the data storage resources of the iPlant Discovery Environment. ClearedLeavesDB can be accessed at http://clearedleavesdb.org.

  19. Security in the CernVM File System and the Frontier Distributed Database Caching System

    NASA Astrophysics Data System (ADS)

    Dykstra, D.; Blomer, J.

    2014-06-01

    Both the CernVM File System (CVMFS) and the Frontier Distributed Database Caching System (Frontier) distribute centrally updated data worldwide for LHC experiments using http proxy caches. Neither system provides privacy or access control on reading the data, but both control access to updates of the data and can guarantee the authenticity and integrity of the data transferred to clients over the internet. CVMFS has since its early days required digital signatures and secure hashes on all distributed data, and recently Frontier has added X.509-based authenticity and integrity checking. In this paper we detail and compare the security models of CVMFS and Frontier.

  20. The distribution of common construction materials at risk to acid deposition in the United States

    NASA Astrophysics Data System (ADS)

    Lipfert, Frederick W.; Daum, Mary L.

    Information on the geographic distribution of various types of exposed materials is required to estimate the economic costs of damage to construction materials from acid deposition. This paper focuses on the identification, evaluation and interpretation of data describing the distributions of exterior construction materials, primarily in the United States. This information could provide guidance on how data needed for future economic assessments might be acquired in the most cost-effective ways. Materials distribution surveys from 16 cities in the U.S. and Canada and five related databases from government agencies and trade organizations were examined. Data on residential buildings are more commonly available than on nonresidential buildings; little geographically resolved information on distributions of materials in infrastructure was found. Survey results generally agree with the appropriate ancillary databases, but the usefulness of the databases is often limited by their coarse spatial resolution. Information on those materials which are most sensitive to acid deposition is especially scarce. Since a comprehensive error analysis has never been performed on the data required for an economic assessment, it is not possible to specify the corresponding detailed requirements for data on the distributions of materials.

  1. TabSQL: a MySQL tool to facilitate mapping user data to public databases.

    PubMed

    Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng

    2010-06-23

    With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.

  2. TabSQL: a MySQL tool to facilitate mapping user data to public databases

    PubMed Central

    2010-01-01

    Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251

  3. Maritime Operations in Disconnected, Intermittent, and Low-Bandwidth Environments

    DTIC Science & Technology

    2013-06-01

    of a Dynamic Distributed Database ( DDD ) is a core element enabling the distributed operation of networks and applications, as described in this...document. The DDD is a database containing all the relevant information required to reconfigure the applications, routing, and other network services...optimize application configuration. Figure 5 gives a snapshot of entries in the DDD . In current testing, the DDD is replicated using Domino

  4. Data model and relational database design for the New Jersey Water-Transfer Data System (NJWaTr)

    USGS Publications Warehouse

    Tessler, Steven

    2003-01-01

    The New Jersey Water-Transfer Data System (NJWaTr) is a database design for the storage and retrieval of water-use data. NJWaTr can manage data encompassing many facets of water use, including (1) the tracking of various types of water-use activities (withdrawals, returns, transfers, distributions, consumptive-use, wastewater collection, and treatment); (2) the storage of descriptions, classifications and locations of places and organizations involved in water-use activities; (3) the storage of details about measured or estimated volumes of water associated with water-use activities; and (4) the storage of information about data sources and water resources associated with water use. In NJWaTr, each water transfer occurs unidirectionally between two site objects, and the sites and conveyances form a water network. The core entities in the NJWaTr model are site, conveyance, transfer/volume, location, and owner. Other important entities include water resource (used for withdrawals and returns), data source, permit, and alias. Multiple water-exchange estimates based on different methods or data sources can be stored for individual transfers. Storage of user-defined details is accommodated for several of the main entities. Many tables contain classification terms to facilitate the detailed description of data items and can be used for routine or custom data summarization. NJWaTr accommodates single-user and aggregate-user water-use data, can be used for large or small water-network projects, and is available as a stand-alone Microsoft? Access database. Data stored in the NJWaTr structure can be retrieved in user-defined combinations to serve visualization and analytical applications. Users can customize and extend the database, link it to other databases, or implement the design in other relational database applications.

  5. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    NASA Astrophysics Data System (ADS)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption forecasting. The more data in WOVOdat, the more useful it will be.

  6. The EXOSAT database and archive

    NASA Technical Reports Server (NTRS)

    Reynolds, A. P.; Parmar, A. N.

    1992-01-01

    The EXOSAT database provides on-line access to the results and data products (spectra, images, and lightcurves) from the EXOSAT mission as well as access to data and logs from a number of other missions (such as EINSTEIN, COS-B, ROSAT, and IRAS). In addition, a number of familiar optical, infrared, and x ray catalogs, including the Hubble Space Telescope (HST) guide star catalog are available. The complete database is located at the EXOSAT observatory at ESTEC in the Netherlands and is accessible remotely via a captive account. The database management system was specifically developed to efficiently access the database and to allow the user to perform statistical studies on large samples of astronomical objects as well as to retrieve scientific and bibliographic information on single sources. The system was designed to be mission independent and includes timing, image processing, and spectral analysis packages as well as software to allow the easy transfer of analysis results and products to the user's own institute. The archive at ESTEC comprises a subset of the EXOSAT observations, stored on magnetic tape. Observations of particular interest were copied in compressed format to an optical jukebox, allowing users to retrieve and analyze selected raw data entirely from their terminals. Such analysis may be necessary if the user's needs are not accommodated by the products contained in the database (in terms of time resolution, spectral range, and the finesse of the background subtraction, for instance). Long-term archiving of the full final observation data is taking place at ESRIN in Italy as part of the ESIS program, again using optical media, and ESRIN have now assumed responsibility for distributing the data to the community. Tests showed that raw observational data (typically several tens of megabytes for a single target) can be transferred via the existing networks in reasonable time.

  7. CHERNOLITTM. Chernobyl Bibliographic Search System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Caff, F., Jr.; Kennedy, R.A.; Mahaffey, J.A.

    1992-03-02

    The Chernobyl Bibliographic Search System (Chernolit TM) provides bibliographic data in a usable format for research studies relating to the Chernobyl nuclear accident that occurred in the former Ukrainian Republic of the USSR in 1986. Chernolit TM is a portable and easy to use product. The bibliographic data is provided under the control of a graphical user interface so that the user may quickly and easily retrieve pertinent information from the large database. The user may search the database for occurrences of words, names, or phrases; view bibliographic references on screen; and obtain reports of selected references. Reports may bemore » viewed on the screen, printed, or accumulated in a folder that is written to a disk file when the user exits the software. Chernolit TM provides a cost-effective alternative to multiple, independent literature searches. Forty-five hundred references concerning the accident, including abstracts, are distributed with Chernolit TM. The data contained in the database were obtained from electronic literature searches and from requested donations from individuals and organizations. These literature searches interrogated the Energy Science and Technology database (formerly DOE ENERGY) of the DIALOG Information Retrieval Service. Energy Science and Technology, provided by the U.S. DOE, Washington, D.C., is a multi-disciplinary database containing references to the world`s scientific and technical literature on energy. All unclassified information processed at the Office of Scientific and Technical Information (OSTI) of the U.S. DOE is included in the database. In addition, information on many documents has been manually added to Chernolit TM. Most of this information was obtained in response to requests for data sent to people and/or organizations throughout the world.« less

  8. Chernobyl Bibliographic Search System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Carr, Jr, F.; Kennedy, R. A.; Mahaffey, J. A.

    1992-05-11

    The Chernobyl Bibliographic Search System (Chernolit TM) provides bibliographic data in a usable format for research studies relating to the Chernobyl nuclear accident that occurred in the former Ukrainian Republic of the USSR in 1986. Chernolit TM is a portable and easy to use product. The bibliographic data is provided under the control of a graphical user interface so that the user may quickly and easily retrieve pertinent information from the large database. The user may search the database for occurrences of words, names, or phrases; view bibliographic references on screen; and obtain reports of selected references. Reports may bemore » viewed on the screen, printed, or accumulated in a folder that is written to a disk file when the user exits the software. Chernolit TM provides a cost-effective alternative to multiple, independent literature searches. Forty-five hundred references concerning the accident, including abstracts, are distributed with Chernolit TM. The data contained in the database were obtained from electronic literature searches and from requested donations from individuals and organizations. These literature searches interrogated the Energy Science and Technology database (formerly DOE ENERGY) of the DIALOG Information Retrieval Service. Energy Science and Technology, provided by the U.S. DOE, Washington, D.C., is a multi-disciplinary database containing references to the world''s scientific and technical literature on energy. All unclassified information processed at the Office of Scientific and Technical Information (OSTI) of the U.S. DOE is included in the database. In addition, information on many documents has been manually added to Chernolit TM. Most of this information was obtained in response to requests for data sent to people and/or organizations throughout the world.« less

  9. An integrated database on ticks and tick-borne zoonoses in the tropics and subtropics with special reference to developing and emerging countries.

    PubMed

    Vesco, Umberto; Knap, Nataša; Labruna, Marcelo B; Avšič-Županc, Tatjana; Estrada-Peña, Agustín; Guglielmone, Alberto A; Bechara, Gervasio H; Gueye, Arona; Lakos, Andras; Grindatto, Anna; Conte, Valeria; De Meneghi, Daniele

    2011-05-01

    Tick-borne zoonoses (TBZ) are emerging diseases worldwide. A large amount of information (e.g. case reports, results of epidemiological surveillance, etc.) is dispersed through various reference sources (ISI and non-ISI journals, conference proceedings, technical reports, etc.). An integrated database-derived from the ICTTD-3 project ( http://www.icttd.nl )-was developed in order to gather TBZ records in the (sub-)tropics, collected both by the authors and collaborators worldwide. A dedicated website ( http://www.tickbornezoonoses.org ) was created to promote collaboration and circulate information. Data collected are made freely available to researchers for analysis by spatial methods, integrating mapped ecological factors for predicting TBZ risk. The authors present the assembly process of the TBZ database: the compilation of an updated list of TBZ relevant for (sub-)tropics, the database design and its structure, the method of bibliographic search, the assessment of spatial precision of geo-referenced records. At the time of writing, 725 records extracted from 337 publications related to 59 countries in the (sub-)tropics, have been entered in the database. TBZ distribution maps were also produced. Imported cases have been also accounted for. The most important datasets with geo-referenced records were those on Spotted Fever Group rickettsiosis in Latin-America and Crimean-Congo Haemorrhagic Fever in Africa. The authors stress the need for international collaboration in data collection to update and improve the database. Supervision of data entered remains always necessary. Means to foster collaboration are discussed. The paper is also intended to describe the challenges encountered to assemble spatial data from various sources and to help develop similar data collections.

  10. Development and application of a database of food ingredient fraud and economically motivated adulteration from 1980 to 2010.

    PubMed

    Moore, Jeffrey C; Spink, John; Lipp, Markus

    2012-04-01

    Food ingredient fraud and economically motivated adulteration are emerging risks, but a comprehensive compilation of information about known problematic ingredients and detection methods does not currently exist. The objectives of this research were to collect such information from publicly available articles in scholarly journals and general media, organize into a database, and review and analyze the data to identify trends. The results summarized are a database that will be published in the US Pharmacopeial Convention's Food Chemicals Codex, 8th edition, and includes 1305 records, including 1000 records with analytical methods collected from 677 references. Olive oil, milk, honey, and saffron were the most common targets for adulteration reported in scholarly journals, and potentially harmful issues identified include spices diluted with lead chromate and lead tetraoxide, substitution of Chinese star anise with toxic Japanese star anise, and melamine adulteration of high protein content foods. High-performance liquid chromatography and infrared spectroscopy were the most common analytical detection procedures, and chemometrics data analysis was used in a large number of reports. Future expansion of this database will include additional publically available articles published before 1980 and in other languages, as well as data outside the public domain. The authors recommend in-depth analyses of individual incidents. This report describes the development and application of a database of food ingredient fraud issues from publicly available references. The database provides baseline information and data useful to governments, agencies, and individual companies assessing the risks of specific products produced in specific regions as well as products distributed and sold in other regions. In addition, the report describes current analytical technologies for detecting food fraud and identifies trends and developments. © 2012 US Pharmacupia Journal of Food Science © 2012 Institute of Food Technologistsreg;

  11. Creating databases for biological information: an introduction.

    PubMed

    Stein, Lincoln

    2002-08-01

    The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.

  12. Large-Scale Spatial Distribution Patterns of Gastropod Assemblages in Rocky Shores

    PubMed Central

    Miloslavich, Patricia; Cruz-Motta, Juan José; Klein, Eduardo; Iken, Katrin; Weinberger, Vanessa; Konar, Brenda; Trott, Tom; Pohle, Gerhard; Bigatti, Gregorio; Benedetti-Cecchi, Lisandro; Shirayama, Yoshihisa; Mead, Angela; Palomo, Gabriela; Ortiz, Manuel; Gobin, Judith; Sardi, Adriana; Díaz, Juan Manuel; Knowlton, Ann; Wong, Melisa; Peralta, Ana C.

    2013-01-01

    Gastropod assemblages from nearshore rocky habitats were studied over large spatial scales to (1) describe broad-scale patterns in assemblage composition, including patterns by feeding modes, (2) identify latitudinal pattern of biodiversity, i.e., richness and abundance of gastropods and/or regional hotspots, and (3) identify potential environmental and anthropogenic drivers of these assemblages. Gastropods were sampled from 45 sites distributed within 12 Large Marine Ecosystem regions (LME) following the NaGISA (Natural Geography in Shore Areas) standard protocol (www.nagisa.coml.org). A total of 393 gastropod taxa from 87 families were collected. Eight of these families (9.2%) appeared in four or more different LMEs. Among these, the Littorinidae was the most widely distributed (8 LMEs) followed by the Trochidae and the Columbellidae (6 LMEs). In all regions, assemblages were dominated by few species, the most diverse and abundant of which were herbivores. No latitudinal gradients were evident in relation to species richness or densities among sampling sites. Highest diversity was found in the Mediterranean and in the Gulf of Alaska, while highest densities were found at different latitudes and represented by few species within one genus (e.g. Afrolittorina in the Agulhas Current, Littorina in the Scotian Shelf, and Lacuna in the Gulf of Alaska). No significant correlation was found between species composition and environmental variables (r≤0.355, p>0.05). Contributing variables to this low correlation included invasive species, inorganic pollution, SST anomalies, and chlorophyll-a anomalies. Despite data limitations in this study which restrict conclusions in a global context, this work represents the first effort to sample gastropod biodiversity on rocky shores using a standardized protocol across a wide scale. Our results will generate more work to build global databases allowing for large-scale diversity comparisons of rocky intertidal assemblages. PMID:23967204

  13. Ultraviolet studies of the intergalactic medium, active galactic nuclei, and the low-z Ly-alpha forest

    NASA Astrophysics Data System (ADS)

    Penton, Steven Victor

    1999-05-01

    A database of all active galactic nuclei (AGN) observed with the International Ultraviolet Explorer (IUE, 1976-1995) was created to determine the brightest UV (1250 Å) extragalactic sources. Combined spectra, and continuum lightcurves are available for ~700 AGN. Fifteen targets were selected from this database for observation of the low-z Lyα forest with the Hubble Space Telescope. These observations were taken with the Goddard High Resolution spectrograph and the G160M grating (1991-1997). 111 significance level >3σ Lyα absorbers were detected in the redshift range, 0.002 < z < 0.069. This Thesis evaluates the physical properties of these Lyα absorbers and compares them to their high-z counterparts. In addition, we use large galaxy catalogs (i.e. the CfA Redshift Survey) to compare the relationship between known galaxies and the low-z Lyα forest. We find that the low-z absorbers are similar in physical characteristic and density to those detected at high- z. Some of these clouds appear to be primordial matter, owing to the lack of detected metallicity. A comparison to the known galaxy distribution indicates that the low-z Lyα forest clusters less than galaxies, but more than random. This suggests that at least a fraction of the absorbers are associated with the gas in galaxy associations (i.e. filaments), while a second population is distributed more uniformly. Over equal pathlengths (cΔz ~60,000 km s -1 each) of galaxy-rich and galaxy-poor environments (voids), we determine that 80% of Lyα absorbers are near large-scale galactic structures (i.e. filaments), while 20% are in galaxy voids.

  14. Algorithm development and the clinical and economic burden of Cushing's disease in a large US health plan database.

    PubMed

    Burton, Tanya; Le Nestour, Elisabeth; Neary, Maureen; Ludlam, William H

    2016-04-01

    This study aimed to develop an algorithm to identify patients with CD, and quantify the clinical and economic burden that patients with CD face compared to CD-free controls. A retrospective cohort study of CD patients was conducted in a large US commercial health plan database between 1/1/2007 and 12/31/2011. A control group with no evidence of CD during the same time was matched 1:3 based on demographics. Comorbidity rates were compared using Poisson and health care costs were compared using robust variance estimation. A case-finding algorithm identified 877 CD patients, who were matched to 2631 CD-free controls. The age and sex distribution of the selected population matched the known epidemiology of CD. CD patients were found to have comorbidity rates that were two to five times higher and health care costs that were four to seven times higher than CD-free controls. An algorithm based on eight pituitary conditions and procedures appeared to identify CD patients in a claims database without a unique diagnosis code. Young CD patients had high rates of comorbidities that are more commonly observed in an older population (e.g., diabetes, hypertension, and cardiovascular disease). Observed health care costs were also high for CD patients compared to CD-free controls, but may have been even higher if the sample had included healthier controls with no health care use as well. Earlier diagnosis, improved surgery success rates, and better treatments may all help to reduce the chronic comorbidity and high health care costs associated with CD.

  15. Frequency of pacemaker malfunction associated with monopolar electrosurgery during pulse generator replacement or upgrade surgery.

    PubMed

    Lin, Yun; Melby, Daniel P; Krishnan, Balaji; Adabag, Selcuk; Tholakanahalli, Venkatakrishna; Li, Jian-Ming

    2017-08-01

    The aim of this study is to investigate the frequency of electrosurgery-related pacemaker malfunction. A retrospective study was conducted to investigate electrosurgery-related pacemaker malfunction in consecutive patients undergoing pulse generator (PG) replacement or upgrade from two large hospitals in Minneapolis, MN between January 2011 and January 2014. The occurrence of this pacemaker malfunction was then studied by using MAUDE database for all four major device vendors. A total of 1398 consecutive patients from 2 large tertiary referral centers in Minneapolis, MN undergoing PG replacement or upgrade surgery were retrospectively studied. Four patients (0.3% of all patients), all with pacemakers from St Jude Medical (2.8%, 4 of 142) had output failure or inappropriately low pacing rate below 30 bpm during electrosurgery, despite being programmed in an asynchronous mode. During the same period, 1174 cases of pacemaker malfunctions were reported on the same models in MAUDE database, 37 of which (3.2%) were electrosurgery-related. Twenty-four cases (65%) had output failure or inappropriate low pacing rate. The distribution of adverse events was loss of pacing (59.5%), reversion to backup pacing (32.4%), inappropriate low pacing rate (5.4%), and ventricular fibrillation (2.7%). The majority of these (78.5%) occurred during PG replacement at ERI or upgrade surgery. No electrosurgery-related malfunction was found in MAUDE database on 862 pacemaker malfunction cases during the same period from other vendors. Electrosurgery during PG replacement or upgrade surgery can trigger output failure or inappropriate low pacing rate in certain models of modern pacemakers. Cautions should be taken for pacemaker-dependent patients.

  16. Inferring rupture characteristics using new databases for 3D slab geometry and earthquake rupture models

    NASA Astrophysics Data System (ADS)

    Hayes, G. P.; Plescia, S. M.; Moore, G.

    2017-12-01

    The U.S. Geological Survey National Earthquake Information Center has recently published a database of finite fault models for globally distributed M7.5+ earthquakes since 1990. Concurrently, we have also compiled a database of three-dimensional slab geometry models for all global subduction zones, to update and replace Slab1.0. Here, we use these two new and valuable resources to infer characteristics of earthquake rupture and propagation in subduction zones, where the vast majority of large-to-great-sized earthquakes occur. For example, we can test questions that are fairly prevalent in seismological literature. Do large ruptures preferentially occur where subduction zones are flat (e.g., Bletery et al., 2016)? Can `flatness' be mapped to understand and quantify earthquake potential? Do the ends of ruptures correlate with significant changes in slab geometry, and/or bathymetric features entering the subduction zone? Do local subduction zone geometry changes spatially correlate with areas of low slip in rupture models (e.g., Moreno et al., 2012)? Is there a correlation between average seismogenic zone dip, and/or seismogenic zone width, and earthquake size? (e.g., Hayes et al., 2012; Heuret et al., 2011). These issues are fundamental to the understanding of earthquake rupture dynamics and subduction zone seismogenesis, and yet many are poorly understood or are still debated in scientific literature. We attempt to address these questions and similar issues in this presentation, and show how these models can be used to improve our understanding of earthquake hazard in subduction zones.

  17. Macrostrat: A Platform for Geological Data Integration and Deep-Time Earth Crust Research

    NASA Astrophysics Data System (ADS)

    Peters, Shanan E.; Husson, Jon M.; Czaplewski, John

    2018-04-01

    Characterizing the lithology, age, and physical-chemical properties of rocks and sediments in the Earth's upper crust is necessary to fully assess energy, water, and mineral resources and to address many fundamental questions. Although a large number of geological maps, regional geological syntheses, and sample-based measurements have been produced, there is no openly available database that integrates rock record-derived data, while also facilitating large-scale, quantitative characterization of the volume, age, and material properties of the upper crust. Here we describe Macrostrat, a relational geospatial database and supporting cyberinfrastructure that is designed to enable quantitative spatial and geochronological analyses of the entire assemblage of surface and subsurface sedimentary, igneous, and metamorphic rocks. Macrostrat contains general, comprehensive summaries of the age and properties of 33,903 lithologically and chronologically defined geological units distributed across 1,474 regions in North and South America, the Caribbean, New Zealand, and the deep sea. Sample-derived data, including fossil occurrences in the Paleobiology Database, more than 180,000 geochemical and outcrop-derived measurements, and more than 2.3 million bedrock geologic map units from over 200 map sources, are linked to specific Macrostrat units and/or lithologies. Macrostrat has generated numerous quantitative results and its infrastructure is used as a data platform in several independently developed mobile applications. It is necessary to expand geographic coverage and to refine age models and material properties to arrive at a more precise characterization of the upper crust globally and test fundamental hypotheses about the long-term evolution of Earth systems.

  18. Towards a New Assessment of Urban Areas from Local to Global Scales

    NASA Astrophysics Data System (ADS)

    Bhaduri, B. L.; Roy Chowdhury, P. K.; McKee, J.; Weaver, J.; Bright, E.; Weber, E.

    2015-12-01

    Since early 2000s, starting with NASA MODIS, satellite based remote sensing has facilitated collection of imagery with medium spatial resolution but high temporal resolution (daily). This trend continues with an increasing number of sensors and data products. Increasing spatial and temporal resolutions of remotely sensed data archives, from both public and commercial sources, have significantly enhanced the quality of mapping and change data products. However, even with automation of such analysis on evolving computing platforms, rates of data processing have been suboptimal largely because of the ever-increasing pixel to processor ratio coupled with limitations of the computing architectures. Novel approaches utilizing spatiotemporal data mining techniques and computational architectures have emerged that demonstrates the potential for sustained and geographically scalable landscape monitoring to be operational. We exemplify this challenge with two broad research initiatives on High Performance Geocomputation at Oak Ridge National Laboratory: (a) mapping global settlement distribution; (b) developing national critical infrastructure databases. Our present effort, on large GPU based architectures, to exploit high resolution (1m or less) satellite and airborne imagery for extracting settlements at global scale is yielding understanding of human settlement patterns and urban areas at unprecedented resolution. Comparison of such urban land cover database, with existing national and global land cover products, at various geographic scales in selected parts of the world is revealing intriguing patterns and insights for urban assessment. Early results, from the USA, Taiwan, and Egypt, indicate closer agreements (5-10%) in urban area assessments among databases at larger, aggregated geographic extents. However, spatial variability at local scales could be significantly different (over 50% disagreement).

  19. Unsolicited Patient Complaints in Ophthalmology: An Empirical Analysis from a Large National Database.

    PubMed

    Kohanim, Sahar; Sternberg, Paul; Karrass, Jan; Cooper, William O; Pichert, James W

    2016-02-01

    The number of unsolicited patient complaints about a physician has been shown to correlate with increased malpractice risk. Using a large national patient complaint database, we evaluated the number and content of unsolicited patient complaints about ophthalmologists to identify significant risk factors for receiving a complaint. Retrospective cohort study. Ophthalmologists, nonophthalmic surgeons, nonophthalmic nonsurgeons. We analyzed 2087 unsolicited or spontaneous complaints reported about 815 ophthalmologists practicing in 24 academic and nonacademic organizations using the Patient Advocacy Reporting System (PARS). Complaints against 5273 nonophthalmic surgeons and 19487 nonophthalmic nonsurgeons during the same period were used for comparison. Complaint type profiles were assigned using a previously validated standardized coding system. We (1) described the distribution of complaints against ophthalmologists; (2) compared the distribution and rates of patient complaints about ophthalmologists with those of nonophthalmic surgeons and nonophthalmic nonsurgeons in the database; (3) analyzed differences in complaint type profiles and quantity of complaints by ophthalmic subspecialty, practice setting, physician gender, medical school type, and graduation date; and (4) identified significant risk factors for high numbers of unsolicited patient complaints after adjusting for other covariates. Unsolicited patient complaints. Ophthalmologists had significantly fewer complaints per physician than other nonophthalmic surgeons and nonsurgeons. Sixty-three percent of ophthalmologists had 0 complaints, whereas 10% of ophthalmologists accounted for 61% of all complaints. Ophthalmologists from academic centers, female ophthalmologists, and younger ophthalmologists had significantly more complaints (P < 0.01), and general ophthalmologists had significantly fewer complaints than subspecialists (P < 0.05). After adjusting for covariates using multivariable analysis, working at an academic center was a statistically significant risk factor (adjusted relative risk, 1.82; 95% confidence interval, 1.36-2.43; P < 0.001). Ophthalmologists had significantly fewer complaints than nonophthalmic surgeons and nonophthalmic nonsurgeons, and by implication may have a lower malpractice risk as a group. Nevertheless, a small number of ophthalmologists generated a disproportionate number of complaints. Working at an academic center was a significant independent risk factor for having more patient complaints. Further research is needed to clarify the underlying reasons for this association and to identify interventions that may decrease this risk. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  20. Macrostrat and GeoDeepDive: A Platform for Geological Data Integration and Deep-Time Research

    NASA Astrophysics Data System (ADS)

    Husson, J. M.; Peters, S. E.; Ross, I.; Czaplewski, J. J.

    2016-12-01

    Characterizing the quantity, lithology, age, and properties of rocks and sediments in the upper crust is central to many questions in Earth science. Although a large number of geological maps, regional syntheses, and sample-based measurements have been published in a variety of formats, there is no system for integrating and accessing rock record-derived data or for facilitating the large-scale quantitative interrogation of the physical, chemical, and biological properties of Earth's crust. Here we describe two data resources that aim to overcome some of these limitations: 1) Macrostrat, a geospatial database and supporting cyberinfrastructure that is designed to enable quantitative analyses of the entire assemblage of surface and subsurface sedimentary, igneous and metamorphic rocks, and 2) GeoDeepDive, a digital library and high throughput computing system designed to facilitate the location and extraction of information and data from the published literature. Macrostrat currently contains general summaries of the age and lithology of rocks and sediments in the upper crust at 1,474 regions in North and Central America, the Caribbean, New Zealand, and the deep sea. Distributed among these geographic regions are nearly 34,000 lithologically and chronologically-defined geological units, many of which are linked to a bedrock geologic map database with more than 1.7 million globally distributed units. Sample-derived data, including fossil occurrences in the Paleobiology Database and more than 180,000 geochemical and outcrop-derived measurements are linked to Macrostrat units and/or lithologies within those units. The rock names, lithological terms, and geological time intervals that are applied to Macrostrat units define a hierarchical, spatially and temporally indexed vocabulary that is leveraged by GeoDeepDive in order to provide researchers access to data within the scientific literature as it is published and ingested into the infrastructure. All data in Macrostrat are accessible via an Application Programming Interface, which enables the development of mobile and analytical applications. The GeoDeepDive infrastructure also supports the development and execution of applications that are tailored to the specific, literature-based data location and extraction needs of geoscientists.

  1. Contribution of human, climate and biophysical drivers to the spatial distribution of wildfires in a French Mediterranean area: where do wildfires start and spread?

    NASA Astrophysics Data System (ADS)

    Ruffault, Julien; Mouillot, Florent; Moebius, Flavia

    2013-04-01

    Understanding the contribution of biophysical and human drivers to the spatial distribution of fires at regional scale has many ecological and economical implications in a context of on-going global changes. However these fire drivers often interact in complex ways, such that disentangling and assessing the relative contribution of human vs. biophysical factors remains a major challenge. Indeed, the identification of biophysical conditions that promote fires are confused by the inherent stochasticity in fire occurrences and fire spread on the one hand and, by the influence of human factors -through both fire ignition and suppression - on the other. Moreover, different factors may drive fire ignition and fire spread, in such a way that the areas with the highest density of ignitions may not coincide with those where large fires occur. In the present study, we investigated the drivers of fires ignition and spread in a Mediterranean area of southern France. We used a 17 years fire database (the PROMETHEE database from 1989-2006) combined with a set of 8 explanatory variables describing the spatial pattern in ignitions, vegetation and fire weather. We first isolated the weather conditions affecting the fire occurrence and their spread using a statistical model of the weather/fuel water status for each fire event.. The results of these statistical models were used to map the fire weather in terms of average number of days with suitable conditions for burning. Then, we used Boosted regression trees (BRT) models to assess the relative importance of the different variables on the distribution of wildfire with different sizes and to assess the relationship between each variables and fire occurrence and spread probabilities. We found that human activities explained up to 50 % of the spatial distribution of fire ignitions (SDI). The distribution of large fire was chiefly explained by fuel characteristics (about 40%). Surprisingly, the weather indices explained only 20 % of the SDI and its contribution does no vary according to the size of considered fire events. These results suggest that changes in fuel characteristics and human settlements/ activities, rather than weather conditions are the most likely to modify the future distribution of fires in this Mediterranean area. These conclusions provide useful information on the scenarios that could arise from the interaction of changes in climate and land cover for the Mediterranean area in the near future.

  2. Responses of coral reef fishes to past climate changes are related to life-history traits.

    PubMed

    Ottimofiore, Eduardo; Albouy, Camille; Leprieur, Fabien; Descombes, Patrice; Kulbicki, Michel; Mouillot, David; Parravicini, Valeriano; Pellissier, Loïc

    2017-03-01

    Coral reefs and their associated fauna are largely impacted by ongoing climate change. Unravelling species responses to past climatic variations might provide clues on the consequence of ongoing changes. Here, we tested the relationship between changes in sea surface temperature and sea levels during the Quaternary and present-day distributions of coral reef fish species. We investigated whether species-specific responses are associated with life-history traits. We collected a database of coral reef fish distribution together with life-history traits for the Indo-Pacific Ocean. We ran species distribution models (SDMs) on 3,725 tropical reef fish species using contemporary environmental factors together with a variable describing isolation from stable coral reef areas during the Quaternary. We quantified the variance explained independently by isolation from stable areas in the SDMs and related it to a set of species traits including body size and mobility. The variance purely explained by isolation from stable coral reef areas on the distribution of extant coral reef fish species largely varied across species. We observed a triangular relationship between the contribution of isolation from stable areas in the SDMs and body size. Species, whose distribution is more associated with historical changes, occurred predominantly in the Indo-Australian archipelago, where the mean size of fish assemblages is the lowest. Our results suggest that the legacy of habitat changes of the Quaternary is still detectable in the extant distribution of many fish species, especially those with small body size and the most sedentary. Because they were the least able to colonize distant habitats in the past, fish species with smaller body size might have the most pronounced lags in tracking ongoing climate change.

  3. High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan

    2010-10-04

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less

  4. Recovery and validation of historical sediment quality data from coastal and estuarine areas: An integrated approach

    USGS Publications Warehouse

    Manheim, F.T.; Buchholtz ten Brink, Marilyn R.; Mecray, E.L.

    1998-01-01

    A comprehensive database of sediment chemistry and environmental parameters has been compiled for Boston Harbor and Massachusetts Bay. This work illustrates methodologies for rescuing and validating sediment data from heterogeneous historical sources. It greatly expands spatial and temporal data coverage of estuarine and coastal sediments. The database contains about 3500 samples containing inorganic chemical, organic, texture and other environmental data dating from 1955 to 1994. Cooperation with local and federal agencies as well as universities was essential in locating and screening documents for the database. More than 80% of references utilized came from sources with limited distribution (gray literature). Task sharing was facilitated by a comprehensive and clearly defined data dictionary for sediments. It also served as a data entry template and flat file format for data processing and as a basis for interpretation and graphical illustration. Standard QA/QC protocols are usually inapplicable to historical sediment data. In this work outliers and data quality problems were identified by batch screening techniques that also provide visualizations of data relationships and geochemical affinities. No data were excluded, but qualifying comments warn users of problem data. For Boston Harbor, the proportion of irreparable or seriously questioned data was remarkably small (<5%), although concentration values for metals and organic contaminants spanned 3 orders of magnitude for many elements or compounds. Data from the historical database provide alternatives to dated cores for measuring changes in surficial sediment contamination level with time. The data indicate that spatial inhomogeneity in harbor environments can be large with respect to sediment-hosted contaminants. Boston Inner Harbor surficial sediments showed decreases in concentrations of Cu, Hg, and Zn of 40 to 60% over a 17-year period.A comprehensive database of sediment chemistry and environmental parameters has been compiled for Boston Harbor and Massachusetts Bay. This work illustrates methodologies for rescuing and validating sediment data from heterogeneous historical sources. It greatly expands spatial and temporal data coverage of estuarine and coastal sediments. The database contains about 3500 samples containing inorganic chemical, organic, texture and other environmental data dating from 1995 to 1994. Cooperation with local and federal agencies as well as universities was essential in locating and screening documents for the database. More than 80% of references utilized came from sources with limited distribution (gray Task sharing was facilitated by a comprehensive and clearly defined data dictionary for sediments. It also served as a data entry template and flat file format for data processing and as a basis for interpretation and graphical illustration. Standard QA/QC protocols are usually inapplicable to historical sediment data. In this work outliers and data quality problems were identified by batch screening techniques that also provide visualizations of data relationships and geochemical affinities. No data were excluded, but qualifying comments warn users of problem data. For Boston Harbor, the proportion of irreparable or seriously questioned data was remarkably small (<5%), although concentration values for metals and organic contaminants spanned 3 orders of magnitude for many elements or compounds. Data from the historical database provide alternatives to dated cores for measuring changes in surficial sediment contamination level with time. The data indicate that spatial inhomogeneity in harbor environments can be large with respect to sediment-hosted contaminants. Boston Inner Harbor surficial sediments showed decreases in concentrations Cu, Hg, and Zn of 40 to 60% over a 17-year period.

  5. A knowledge base architecture for distributed knowledge agents

    NASA Technical Reports Server (NTRS)

    Riedesel, Joel; Walls, Bryan

    1990-01-01

    A tuple space based object oriented model for knowledge base representation and interpretation is presented. An architecture for managing distributed knowledge agents is then implemented within the model. The general model is based upon a database implementation of a tuple space. Objects are then defined as an additional layer upon the database. The tuple space may or may not be distributed depending upon the database implementation. A language for representing knowledge and inference strategy is defined whose implementation takes advantage of the tuple space. The general model may then be instantiated in many different forms, each of which may be a distinct knowledge agent. Knowledge agents may communicate using tuple space mechanisms as in the LINDA model as well as using more well known message passing mechanisms. An implementation of the model is presented describing strategies used to keep inference tractable without giving up expressivity. An example applied to a power management and distribution network for Space Station Freedom is given.

  6. Study on distributed generation algorithm of variable precision concept lattice based on ontology heterogeneous database

    NASA Astrophysics Data System (ADS)

    WANG, Qingrong; ZHU, Changfeng

    2017-06-01

    Integration of distributed heterogeneous data sources is the key issues under the big data applications. In this paper the strategy of variable precision is introduced to the concept lattice, and the one-to-one mapping mode of variable precision concept lattice and ontology concept lattice is constructed to produce the local ontology by constructing the variable precision concept lattice for each subsystem, and the distributed generation algorithm of variable precision concept lattice based on ontology heterogeneous database is proposed to draw support from the special relationship between concept lattice and ontology construction. Finally, based on the standard of main concept lattice of the existing heterogeneous database generated, a case study has been carried out in order to testify the feasibility and validity of this algorithm, and the differences between the main concept lattice and the standard concept lattice are compared. Analysis results show that this algorithm above-mentioned can automatically process the construction process of distributed concept lattice under the heterogeneous data sources.

  7. Practical Quantum Private Database Queries Based on Passive Round-Robin Differential Phase-shift Quantum Key Distribution

    PubMed Central

    Li, Jian; Yang, Yu-Guang; Chen, Xiu-Bo; Zhou, Yi-Hua; Shi, Wei-Min

    2016-01-01

    A novel quantum private database query protocol is proposed, based on passive round-robin differential phase-shift quantum key distribution. Compared with previous quantum private database query protocols, the present protocol has the following unique merits: (i) the user Alice can obtain one and only one key bit so that both the efficiency and security of the present protocol can be ensured, and (ii) it does not require to change the length difference of the two arms in a Mach-Zehnder interferometer and just chooses two pulses passively to interfere with so that it is much simpler and more practical. The present protocol is also proved to be secure in terms of the user security and database security. PMID:27539654

  8. Information resources at the National Center for Biotechnology Information.

    PubMed Central

    Woodsmall, R M; Benson, D A

    1993-01-01

    The National Center for Biotechnology Information (NCBI), part of the National Library of Medicine, was established in 1988 to perform basic research in the field of computational molecular biology as well as build and distribute molecular biology databases. The basic research has led to new algorithms and analysis tools for interpreting genomic data and has been instrumental in the discovery of human disease genes for neurofibromatosis and Kallmann syndrome. The principal database responsibility is the National Institutes of Health (NIH) genetic sequence database, GenBank. NCBI, in collaboration with international partners, builds, distributes, and provides online and CD-ROM access to over 112,000 DNA sequences. Another major program is the integration of multiple sequences databases and related bibliographic information and the development of network-based retrieval systems for Internet access. PMID:8374583

  9. Thermodynamics of firms' growth

    PubMed Central

    Zambrano, Eduardo; Hernando, Alberto; Hernando, Ricardo; Plastino, Angelo

    2015-01-01

    The distribution of firms' growth and firms' sizes is a topic under intense scrutiny. In this paper, we show that a thermodynamic model based on the maximum entropy principle, with dynamical prior information, can be constructed that adequately describes the dynamics and distribution of firms' growth. Our theoretical framework is tested against a comprehensive database of Spanish firms, which covers, to a very large extent, Spain's economic activity, with a total of 1 155 142 firms evolving along a full decade. We show that the empirical exponent of Pareto's law, a rule often observed in the rank distribution of large-size firms, is explained by the capacity of economic system for creating/destroying firms, and that can be used to measure the health of a capitalist-based economy. Indeed, our model predicts that when the exponent is larger than 1, creation of firms is favoured; when it is smaller than 1, destruction of firms is favoured instead; and when it equals 1 (matching Zipf's law), the system is in a full macroeconomic equilibrium, entailing ‘free’ creation and/or destruction of firms. For medium and smaller firm sizes, the dynamical regime changes, the whole distribution can no longer be fitted to a single simple analytical form and numerical prediction is required. Our model constitutes the basis for a full predictive framework regarding the economic evolution of an ensemble of firms. Such a structure can be potentially used to develop simulations and test hypothetical scenarios, such as economic crisis or the response to specific policy measures. PMID:26510828

  10. Thermodynamics of firms' growth.

    PubMed

    Zambrano, Eduardo; Hernando, Alberto; Fernández Bariviera, Aurelio; Hernando, Ricardo; Plastino, Angelo

    2015-11-06

    The distribution of firms' growth and firms' sizes is a topic under intense scrutiny. In this paper, we show that a thermodynamic model based on the maximum entropy principle, with dynamical prior information, can be constructed that adequately describes the dynamics and distribution of firms' growth. Our theoretical framework is tested against a comprehensive database of Spanish firms, which covers, to a very large extent, Spain's economic activity, with a total of 1,155,142 firms evolving along a full decade. We show that the empirical exponent of Pareto's law, a rule often observed in the rank distribution of large-size firms, is explained by the capacity of economic system for creating/destroying firms, and that can be used to measure the health of a capitalist-based economy. Indeed, our model predicts that when the exponent is larger than 1, creation of firms is favoured; when it is smaller than 1, destruction of firms is favoured instead; and when it equals 1 (matching Zipf's law), the system is in a full macroeconomic equilibrium, entailing 'free' creation and/or destruction of firms. For medium and smaller firm sizes, the dynamical regime changes, the whole distribution can no longer be fitted to a single simple analytical form and numerical prediction is required. Our model constitutes the basis for a full predictive framework regarding the economic evolution of an ensemble of firms. Such a structure can be potentially used to develop simulations and test hypothetical scenarios, such as economic crisis or the response to specific policy measures. © 2015 The Authors.

  11. Evaluation and validity of a LORETA normative EEG database.

    PubMed

    Thatcher, R W; North, D; Biver, C

    2005-04-01

    To evaluate the reliability and validity of a Z-score normative EEG database for Low Resolution Electromagnetic Tomography (LORETA), EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) were acquired from 106 normal subjects, and the cross-spectrum was computed and multiplied by the Key Institute's LORETA 2,394 gray matter pixel T Matrix. After a log10 transform or a Box-Cox transform the mean and standard deviation of the *.lor files were computed for each of the 2394 gray matter pixels, from 1 to 30 Hz, for each of the subjects. Tests of Gaussianity were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of a Z-score database was computed by measuring the approximation to a Gaussian distribution. The validity of the LORETA normative database was evaluated by the degree to which confirmed brain pathologies were localized using the LORETA normative database. Log10 and Box-Cox transforms approximated Gaussian distribution in the range of 95.64% to 99.75% accuracy. The percentage of normative Z-score values at 2 standard deviations ranged from 1.21% to 3.54%, and the percentage of Z-scores at 3 standard deviations ranged from 0% to 0.83%. Left temporal lobe epilepsy, right sensory motor hematoma and a right hemisphere stroke exhibited maximum Z-score deviations in the same locations as the pathologies. We conclude: (1) Adequate approximation to a Gaussian distribution can be achieved using LORETA by using a log10 transform or a Box-Cox transform and parametric statistics, (2) a Z-Score normative database is valid with adequate sensitivity when using LORETA, and (3) the Z-score LORETA normative database also consistently localized known pathologies to the expected Brodmann areas as an hypothesis test based on the surface EEG before computing LORETA.

  12. Apollo2Go: a web service adapter for the Apollo genome viewer to enable distributed genome annotation.

    PubMed

    Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus F X

    2007-08-30

    Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from ftp://ftpmips.gsf.de/plants/apollo_webservice.

  13. Apollo2Go: a web service adapter for the Apollo genome viewer to enable distributed genome annotation

    PubMed Central

    Klee, Kathrin; Ernst, Rebecca; Spannagl, Manuel; Mayer, Klaus FX

    2007-01-01

    Background Apollo, a genome annotation viewer and editor, has become a widely used genome annotation and visualization tool for distributed genome annotation projects. When using Apollo for annotation, database updates are carried out by uploading intermediate annotation files into the respective database. This non-direct database upload is laborious and evokes problems of data synchronicity. Results To overcome these limitations we extended the Apollo data adapter with a generic, configurable web service client that is able to retrieve annotation data in a GAME-XML-formatted string and pass it on to Apollo's internal input routine. Conclusion This Apollo web service adapter, Apollo2Go, simplifies the data exchange in distributed projects and aims to render the annotation process more comfortable. The Apollo2Go software is freely available from . PMID:17760972

  14. Heterogeneous slip distribution on faults responsible for large earthquakes: characterization and implications for tsunami modelling

    NASA Astrophysics Data System (ADS)

    Baglione, Enrico; Armigliato, Alberto; Pagnoni, Gianluca; Tinti, Stefano

    2017-04-01

    The fact that ruptures on the generating faults of large earthquakes are strongly heterogeneous has been demonstrated over the last few decades by a large number of studies. The effort to retrieve reliable finite-fault models (FFMs) for large earthquakes occurred worldwide, mainly by means of the inversion of different kinds of geophysical data, has been accompanied in the last years by the systematic collection and format homogenisation of the published/proposed FFMs for different earthquakes into specifically conceived databases, such as SRCMOD. The main aim of this study is to explore characteristic patterns of the slip distribution of large earthquakes, by using a subset of the FFMs contained in SRCMOD, covering events with moment magnitude equal or larger than 6 and occurred worldwide over the last 25 years. We focus on those FFMs that exhibit a single and clear region of high slip (i.e. a single asperity), which is found to represent the majority of the events. For these FFMs, it sounds reasonable to best-fit the slip model by means of a 2D Gaussian distributions. Two different methods are used (least-square and highest-similarity) and correspondingly two "best-fit" indexes are introduced. As a result, two distinct 2D Gaussian distributions for each FFM are obtained. To quantify how well these distributions are able to mimic the original slip heterogeneity, we calculate and compare the vertical displacements at the Earth surface in the near field induced by the original FFM slip, by an equivalent uniform-slip model, by a depth-dependent slip model, and by the two "best" Gaussian slip models. The coseismic vertical surface displacement is used as the metric for comparison. Results show that, on average, the best results are the ones obtained with 2D Gaussian distributions based on similarity index fitting. Finally, we restrict our attention to those single-asperity FFMs associated to earthquakes which generated tsunamis. We choose few events for which tsunami data (water level time series and/or run-up measurements) are available. Using the results mentioned above, for each chosen event the coseismic vertical displacement fields computed for different slip distributions are used as initial conditions for numerical tsunami simulations, performed by means of the shallow-water code UBO-TSUFD. The comparison of the numerical results for different initial conditions to the experimental data is presented and discussed. This study was funded in the frame of the EU Project called ASTARTE - "Assessment, STrategy And Risk Reduction for Tsunamis in Europe", Grant 603839, 7th FP (ENV.2013.6.4-3).

  15. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.

    PubMed

    Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A; McGowan, Thomas; Wroblewski, Matthew S; Seymour, Sean L; Griffin, Timothy J

    2013-04-01

    Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. Influence of ECG measurement accuracy on ECG diagnostic statements.

    PubMed

    Zywietz, C; Celikag, D; Joseph, G

    1996-01-01

    Computer analysis of electrocardiograms (ECGs) provides a large amount of ECG measurement data, which may be used for diagnostic classification and storage in ECG databases. Until now, neither error limits for ECG measurements have been specified nor has their influence on diagnostic statements been systematically investigated. An analytical method is presented to estimate the influence of measurement errors on the accuracy of diagnostic ECG statements. Systematic (offset) errors will usually result in an increase of false positive or false negative statements since they cause a shift of the working point on the receiver operating characteristics curve. Measurement error dispersion broadens the distribution function of discriminative measurement parameters and, therefore, usually increases the overlap between discriminative parameters. This results in a flattening of the receiver operating characteristics curve and an increase of false positive and false negative classifications. The method developed has been applied to ECG conduction defect diagnoses by using the proposed International Electrotechnical Commission's interval measurement tolerance limits. These limits appear too large because more than 30% of false positive atrial conduction defect statements and 10-18% of false intraventricular conduction defect statements could be expected due to tolerated measurement errors. To assure long-term usability of ECG measurement databases, it is recommended that systems provide its error tolerance limits obtained on a defined test set.

  17. Environmental concern-based site screening of carbon dioxide geological storage in China.

    PubMed

    Cai, Bofeng; Li, Qi; Liu, Guizhen; Liu, Lancui; Jin, Taotao; Shi, Hui

    2017-08-08

    Environmental impacts and risks related to carbon dioxide (CO 2 ) capture and storage (CCS) projects may have direct effects on the decision-making process during CCS site selection. This paper proposes a novel method of environmental optimization for CCS site selection using China's ecological red line approach. Moreover, this paper established a GIS based spatial analysis model of environmental optimization during CCS site selection by a large database. The comprehensive data coverage of environmental elements and fine 1 km spatial resolution were used in the database. The quartile method was used for value assignment for specific indicators including the prohibited index and restricted index. The screening results show that areas classified as having high environmental suitability (classes III and IV) in China account for 620,800 km 2 and 156,600 km 2 , respectively, and are mainly distributed in Inner Mongolia, Qinghai and Xinjiang. The environmental suitability class IV areas of Bayingol Mongolian Autonomous Prefecture, Hotan Prefecture, Aksu Prefecture, Hulunbuir, Xilingol League and other prefecture-level regions not only cover large land areas, but also form a continuous area in the three provincial-level administrative units. This study may benefit the national macro-strategic deployment and implementation of CCS spatial layout and environmental management in China.

  18. Exploring root symbiotic programs in the model legume Medicago truncatula using EST analysis.

    PubMed

    Journet, Etienne-Pascal; van Tuinen, Diederik; Gouzy, Jérome; Crespeau, Hervé; Carreau, Véronique; Farmer, Mary-Jo; Niebel, Andreas; Schiex, Thomas; Jaillon, Olivier; Chatagnier, Odile; Godiard, Laurence; Micheli, Fabienne; Kahn, Daniel; Gianinazzi-Pearson, Vivienne; Gamas, Pascal

    2002-12-15

    We report on a large-scale expressed sequence tag (EST) sequencing and analysis program aimed at characterizing the sets of genes expressed in roots of the model legume Medicago truncatula during interactions with either of two microsymbionts, the nitrogen-fixing bacterium Sinorhizobium meliloti or the arbuscular mycorrhizal fungus Glomus intraradices. We have designed specific tools for in silico analysis of EST data, in relation to chimeric cDNA detection, EST clustering, encoded protein prediction, and detection of differential expression. Our 21 473 5'- and 3'-ESTs could be grouped into 6359 EST clusters, corresponding to distinct virtual genes, along with 52 498 other M.truncatula ESTs available in the dbEST (NCBI) database that were recruited in the process. These clusters were manually annotated, using a specifically developed annotation interface. Analysis of EST cluster distribution in various M.truncatula cDNA libraries, supported by a refined R test to evaluate statistical significance and by 'electronic northern' representation, enabled us to identify a large number of novel genes predicted to be up- or down-regulated during either symbiotic root interaction. These in silico analyses provide a first global view of the genetic programs for root symbioses in M.truncatula. A searchable database has been built and can be accessed through a public interface.

  19. Regional spatial-temporal spread of citrus huanglongbing is affected by rain in Florida.

    PubMed

    Shimwela, Mpoki; Schubert, Timothy S; Albritton, Matthew; Halbert, Susan E; Jones, Debra J; Sun, Xiaoan; Roberts, Pamela; Singer, Burton; Lee, Wen Suk; Jones, Jeffrey B; Ploetz, Randy; van Bruggen, Ariena H C

    2018-06-06

    Citrus huanglongbing (HLB), associated with Candidatus Liberibacter asiaticus (Las), disseminated by Asian Citrus Psyllid (ACP), has devastated citrus in Florida since 2005. Data on HLB occurrence were stored in databases (2005-2012). Cumulative HLB-positive citrus blocks were subjected to kernel density analysis and kriging. Relative disease incidence per county was calculated by dividing HLB numbers by relative tree numbers and maximum incidence. Spatio-temporal HLB distributions were correlated with weather. Relative HLB incidence correlated positively with rainfall. The focus expansion rate was 1626 m month-1, similar to that in Brazil. Relative HLB incidence in counties with primarily large groves increased at a lower rate (0.24 year-1) than in counties with smaller groves in hotspot areas (0.67 year-1), confirming reports that large-scale HLB management may slow epidemic progress.

  20. A Computational framework for telemedicine.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Foster, I.; von Laszewski, G.; Thiruvathukal, G. K.

    1998-07-01

    Emerging telemedicine applications require the ability to exploit diverse and geographically distributed resources. Highspeed networks are used to integrate advanced visualization devices, sophisticated instruments, large databases, archival storage devices, PCs, workstations, and supercomputers. This form of telemedical environment is similar to networked virtual supercomputers, also known as metacomputers. Metacomputers are already being used in many scientific application areas. In this article, we analyze requirements necessary for a telemedical computing infrastructure and compare them with requirements found in a typical metacomputing environment. We will show that metacomputing environments can be used to enable a more powerful and unified computational infrastructure formore » telemedicine. The Globus metacomputing toolkit can provide the necessary low level mechanisms to enable a large scale telemedical infrastructure. The Globus toolkit components are designed in a modular fashion and can be extended to support the specific requirements for telemedicine.« less

  1. Database Development for Ocean Impacts: Imaging, Outreach, and Rapid Response

    DTIC Science & Technology

    2012-09-30

    1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Database Development for Ocean Impacts: Imaging, Outreach...Development for Ocean Impacts: Imaging, Outreach, and Rapid Response 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d...hoses ( Applied Ocean Physics & Engineering department, WHOI, to evaluate wear and locate in mooring optical cables used in the Right Whale monitoring

  2. Relativistic quantum private database queries

    NASA Astrophysics Data System (ADS)

    Sun, Si-Jia; Yang, Yu-Guang; Zhang, Ming-Ou

    2015-04-01

    Recently, Jakobi et al. (Phys Rev A 83, 022301, 2011) suggested the first practical private database query protocol (J-protocol) based on the Scarani et al. (Phys Rev Lett 92, 057901, 2004) quantum key distribution protocol. Unfortunately, the J-protocol is just a cheat-sensitive private database query protocol. In this paper, we present an idealized relativistic quantum private database query protocol based on Minkowski causality and the properties of quantum information. Also, we prove that the protocol is secure in terms of the user security and the database security.

  3. The Network Configuration of an Object Relational Database Management System

    NASA Technical Reports Server (NTRS)

    Diaz, Philip; Harris, W. C.

    2000-01-01

    The networking and implementation of the Oracle Database Management System (ODBMS) requires developers to have knowledge of the UNIX operating system as well as all the features of the Oracle Server. The server is an object relational database management system (DBMS). By using distributed processing, processes are split up between the database server and client application programs. The DBMS handles all the responsibilities of the server. The workstations running the database application concentrate on the interpretation and display of data.

  4. Orthographic and Phonological Neighborhood Databases across Multiple Languages.

    PubMed

    Marian, Viorica

    2017-01-01

    The increased globalization of science and technology and the growing number of bilinguals and multilinguals in the world have made research with multiple languages a mainstay for scholars who study human function and especially those who focus on language, cognition, and the brain. Such research can benefit from large-scale databases and online resources that describe and measure lexical, phonological, orthographic, and semantic information. The present paper discusses currently-available resources and underscores the need for tools that enable measurements both within and across multiple languages. A general review of language databases is followed by a targeted introduction to databases of orthographic and phonological neighborhoods. A specific focus on CLEARPOND illustrates how databases can be used to assess and compare neighborhood information across languages, to develop research materials, and to provide insight into broad questions about language. As an example of how using large-scale databases can answer questions about language, a closer look at neighborhood effects on lexical access reveals that not only orthographic, but also phonological neighborhoods can influence visual lexical access both within and across languages. We conclude that capitalizing upon large-scale linguistic databases can advance, refine, and accelerate scientific discoveries about the human linguistic capacity.

  5. Information Power Grid: Distributed High-Performance Computing and Large-Scale Data Management for Science and Engineering

    NASA Technical Reports Server (NTRS)

    Johnston, William E.; Gannon, Dennis; Nitzberg, Bill

    2000-01-01

    We use the term "Grid" to refer to distributed, high performance computing and data handling infrastructure that incorporates geographically and organizationally dispersed, heterogeneous resources that are persistent and supported. This infrastructure includes: (1) Tools for constructing collaborative, application oriented Problem Solving Environments / Frameworks (the primary user interfaces for Grids); (2) Programming environments, tools, and services providing various approaches for building applications that use aggregated computing and storage resources, and federated data sources; (3) Comprehensive and consistent set of location independent tools and services for accessing and managing dynamic collections of widely distributed resources: heterogeneous computing systems, storage systems, real-time data sources and instruments, human collaborators, and communications systems; (4) Operational infrastructure including management tools for distributed systems and distributed resources, user services, accounting and auditing, strong and location independent user authentication and authorization, and overall system security services The vision for NASA's Information Power Grid - a computing and data Grid - is that it will provide significant new capabilities to scientists and engineers by facilitating routine construction of information based problem solving environments / frameworks. Such Grids will knit together widely distributed computing, data, instrument, and human resources into just-in-time systems that can address complex and large-scale computing and data analysis problems. Examples of these problems include: (1) Coupled, multidisciplinary simulations too large for single systems (e.g., multi-component NPSS turbomachine simulation); (2) Use of widely distributed, federated data archives (e.g., simultaneous access to metrological, topological, aircraft performance, and flight path scheduling databases supporting a National Air Space Simulation systems}; (3) Coupling large-scale computing and data systems to scientific and engineering instruments (e.g., realtime interaction with experiments through real-time data analysis and interpretation presented to the experimentalist in ways that allow direct interaction with the experiment (instead of just with instrument control); (5) Highly interactive, augmented reality and virtual reality remote collaborations (e.g., Ames / Boeing Remote Help Desk providing field maintenance use of coupled video and NDI to a remote, on-line airframe structures expert who uses this data to index into detailed design databases, and returns 3D internal aircraft geometry to the field); (5) Single computational problems too large for any single system (e.g. the rotocraft reference calculation). Grids also have the potential to provide pools of resources that could be called on in extraordinary / rapid response situations (such as disaster response) because they can provide common interfaces and access mechanisms, standardized management, and uniform user authentication and authorization, for large collections of distributed resources (whether or not they normally function in concert). IPG development and deployment is addressing requirements obtained by analyzing a number of different application areas, in particular from the NASA Aero-Space Technology Enterprise. This analysis has focussed primarily on two types of users: the scientist / design engineer whose primary interest is problem solving (e.g. determining wing aerodynamic characteristics in many different operating environments), and whose primary interface to IPG will be through various sorts of problem solving frameworks. The second type of user is the tool designer: the computational scientists who convert physics and mathematics into code that can simulate the physical world. These are the two primary users of IPG, and they have rather different requirements. The results of the analysis of the needs of these two types of users provides a broad set of requirements that gives rise to a general set of required capabilities. The IPG project is intended to address all of these requirements. In some cases the required computing technology exists, and in some cases it must be researched and developed. The project is using available technology to provide a prototype set of capabilities in a persistent distributed computing testbed. Beyond this, there are required capabilities that are not immediately available, and whose development spans the range from near-term engineering development (one to two years) to much longer term R&D (three to six years). Additional information is contained in the original.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Roehm, Dominic; Pavel, Robert S.; Barros, Kipton

    We present an adaptive sampling method supplemented by a distributed database and a prediction method for multiscale simulations using the Heterogeneous Multiscale Method. A finite-volume scheme integrates the macro-scale conservation laws for elastodynamics, which are closed by momentum and energy fluxes evaluated at the micro-scale. In the original approach, molecular dynamics (MD) simulations are launched for every macro-scale volume element. Our adaptive sampling scheme replaces a large fraction of costly micro-scale MD simulations with fast table lookup and prediction. The cloud database Redis provides the plain table lookup, and with locality aware hashing we gather input data for our predictionmore » scheme. For the latter we use kriging, which estimates an unknown value and its uncertainty (error) at a specific location in parameter space by using weighted averages of the neighboring points. We find that our adaptive scheme significantly improves simulation performance by a factor of 2.5 to 25, while retaining high accuracy for various choices of the algorithm parameters.« less

  7. MedBlock: Efficient and Secure Medical Data Sharing Via Blockchain.

    PubMed

    Fan, Kai; Wang, Shangyang; Ren, Yanhui; Li, Hui; Yang, Yintang

    2018-06-21

    With the development of electronic information technology, electronic medical records (EMRs) have been a common way to store the patients' data in hospitals. They are stored in different hospitals' databases, even for the same patient. Therefore, it is difficult to construct a summarized EMR for one patient from multiple hospital databases due to the security and privacy concerns. Meanwhile, current EMRs systems lack a standard data management and sharing policy, making it difficult for pharmaceutical scientists to develop precise medicines based on data obtained under different policies. To solve the above problems, we proposed a blockchain-based information management system, MedBlock, to handle patients' information. In this scheme, the distributed ledger of MedBlock allows the efficient EMRs access and EMRs retrieval. The improved consensus mechanism achieves consensus of EMRs without large energy consumption and network congestion. In addition, MedBlock also exhibits high information security combining the customized access control protocols and symmetric cryptography. MedBlock can play an important role in the sensitive medical information sharing.

  8. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lees, R. M.; Xu, Li-Hong; Appadoo, D. R. T.

    New astronomical facilities such as HIFI on the Herschel Space Observatory, the SOFIA airborne IR telescope and the ALMA sub-mm telescope array will yield spectra from interstellar and protostellar sources with vastly increased sensitivity and frequency coverage. This creates the need for major enhancements to laboratory databases for the more prominent interstellar 'weed' species in order to model and account for their lines in observed spectra in the search for new and more exotic interstellar molecular 'flowers'. With its large-amplitude internal torsional motion, methanol has particularly rich spectra throughout the FIR and IR regions and, being very widely distributed throughoutmore » the galaxy, is perhaps the most notorious interstellar weed. Thus, we have recorded new spectra for a variety of methanol isotopic species on the high-resolution FTIR spectrometer on the CLS FIR beamline. The aim is to extend quantum number coverage of the data, improve our understanding of the energy level structure, and provide the astronomical community with better databases and models of the spectral patterns with greater predictive power for a range of astrophysical conditions.« less

  9. Large Scale Landslide Database System Established for the Reservoirs in Southern Taiwan

    NASA Astrophysics Data System (ADS)

    Tsai, Tsai-Tsung; Tsai, Kuang-Jung; Shieh, Chjeng-Lun

    2017-04-01

    Typhoon Morakot seriously attack southern Taiwan awaken the public awareness of large scale landslide disasters. Large scale landslide disasters produce large quantity of sediment due to negative effects on the operating functions of reservoirs. In order to reduce the risk of these disasters within the study area, the establishment of a database for hazard mitigation / disaster prevention is necessary. Real time data and numerous archives of engineering data, environment information, photo, and video, will not only help people make appropriate decisions, but also bring the biggest concern for people to process and value added. The study tried to define some basic data formats / standards from collected various types of data about these reservoirs and then provide a management platform based on these formats / standards. Meanwhile, in order to satisfy the practicality and convenience, the large scale landslide disasters database system is built both provide and receive information abilities, which user can use this large scale landslide disasters database system on different type of devices. IT technology progressed extreme quick, the most modern system might be out of date anytime. In order to provide long term service, the system reserved the possibility of user define data format /standard and user define system structure. The system established by this study was based on HTML5 standard language, and use the responsive web design technology. This will make user can easily handle and develop this large scale landslide disasters database system.

  10. Assessment of landslide distribution map reliability in Niigata prefecture - Japan using frequency ratio approach

    NASA Astrophysics Data System (ADS)

    Rahardianto, Trias; Saputra, Aditya; Gomez, Christopher

    2017-07-01

    Research on landslide susceptibility has evolved rapidly over the few last decades thanks to the availability of large databases. Landslide research used to be focused on discreet events but the usage of large inventory dataset has become a central pillar of landslide susceptibility, hazard, and risk assessment. Indeed, extracting meaningful information from the large database is now at the forth of geoscientific research, following the big-data research trend. Indeed, the more comprehensive information of the past landslide available in a particular area is, the better the produced map will be, in order to support the effective decision making, planning, and engineering practice. The landslide inventory data which is freely accessible online gives an opportunity for many researchers and decision makers to prevent casualties and economic loss caused by future landslides. This data is advantageous especially for areas with poor landslide historical data. Since the construction criteria of landslide inventory map and its quality evaluation remain poorly defined, the assessment of open source landslide inventory map reliability is required. The present contribution aims to assess the reliability of open-source landslide inventory data based on the particular topographical setting of the observed area in Niigata prefecture, Japan. Geographic Information System (GIS) platform and statistical approach are applied to analyze the data. Frequency ratio method is utilized to model and assess the landslide map. The outcomes of the generated model showed unsatisfactory results with AUC value of 0.603 indicate the low prediction accuracy and unreliability of the model.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

    We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less

  12. Toward server-side, high performance climate change data analytics in the Earth System Grid Federation (ESGF) eco-system

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; Williams, Dean; Aloisio, Giovanni

    2016-04-01

    In many scientific domains such as climate, data is often n-dimensional and requires tools that support specialized data types and primitives to be properly stored, accessed, analysed and visualized. Moreover, new challenges arise in large-scale scenarios and eco-systems where petabytes (PB) of data can be available and data can be distributed and/or replicated (e.g., the Earth System Grid Federation (ESGF) serving the Coupled Model Intercomparison Project, Phase 5 (CMIP5) experiment, providing access to 2.5PB of data for the Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5). Most of the tools currently available for scientific data analysis in the climate domain fail at large scale since they: (1) are desktop based and need the data locally; (2) are sequential, so do not benefit from available multicore/parallel machines; (3) do not provide declarative languages to express scientific data analysis tasks; (4) are domain-specific, which ties their adoption to a specific domain; and (5) do not provide a workflow support, to enable the definition of complex "experiments". The Ophidia project aims at facing most of the challenges highlighted above by providing a big data analytics framework for eScience. Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes ("datacubes"). The project relies on a strong background of high performance database management and OLAP systems to manage large scientific data sets. It also provides a native workflow management support, to define processing chains and workflows with tens to hundreds of data analytics operators to build real scientific use cases. With regard to interoperability aspects, the talk will present the contribution provided both to the RDA Working Group on Array Databases, and the Earth System Grid Federation (ESGF) Compute Working Team. Also highlighted will be the results of large scale climate model intercomparison data analysis experiments, for example: (1) defined in the context of the EU H2020 INDIGO-DataCloud project; (2) implemented in a real geographically distributed environment involving CMCC (Italy) and LLNL (US) sites; (3) exploiting Ophidia as server-side, parallel analytics engine; and (4) applied on real CMIP5 data sets available through ESGF.

  13. GlobTherm, a global database on thermal tolerances for aquatic and terrestrial organisms.

    PubMed

    Bennett, Joanne M; Calosi, Piero; Clusella-Trullas, Susana; Martínez, Brezo; Sunday, Jennifer; Algar, Adam C; Araújo, Miguel B; Hawkins, Bradford A; Keith, Sally; Kühn, Ingolf; Rahbek, Carsten; Rodríguez, Laura; Singer, Alexander; Villalobos, Fabricio; Ángel Olalla-Tárraga, Miguel; Morales-Castilla, Ignacio

    2018-03-13

    How climate affects species distributions is a longstanding question receiving renewed interest owing to the need to predict the impacts of global warming on biodiversity. Is climate change forcing species to live near their critical thermal limits? Are these limits likely to change through natural selection? These and other important questions can be addressed with models relating geographical distributions of species with climate data, but inferences made with these models are highly contingent on non-climatic factors such as biotic interactions. Improved understanding of climate change effects on species will require extensive analysis of thermal physiological traits, but such data are both scarce and scattered. To overcome current limitations, we created the GlobTherm database. The database contains experimentally derived species' thermal tolerance data currently comprising over 2,000 species of terrestrial, freshwater, intertidal and marine multicellular algae, plants, fungi, and animals. The GlobTherm database will be maintained and curated by iDiv with the aim to keep expanding it, and enable further investigations on the effects of climate on the distribution of life on Earth.

  14. Large-scale annotation of small-molecule libraries using public databases.

    PubMed

    Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A

    2007-01-01

    While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.

  15. Library Micro-Computing, Vol. 2. Reprints from the Best of "ONLINE" [and]"DATABASE."

    ERIC Educational Resources Information Center

    Online, Inc., Weston, CT.

    Reprints of 19 articles pertaining to library microcomputing appear in this collection, the second of two volumes on this topic in a series of volumes of reprints from "ONLINE" and "DATABASE" magazines. Edited for information professionals who use electronically distributed databases, these articles address such topics as: (1)…

  16. Web Database Development: Implications for Academic Publishing.

    ERIC Educational Resources Information Center

    Fernekes, Bob

    This paper discusses the preliminary planning, design, and development of a pilot project to create an Internet accessible database and search tool for locating and distributing company data and scholarly work. Team members established four project objectives: (1) to develop a Web accessible database and decision tool that creates Web pages on the…

  17. Prevalence and geographical distribution of Usher syndrome in Germany.

    PubMed

    Spandau, Ulrich H M; Rohrschneider, Klaus

    2002-06-01

    To estimate the prevalence of Usher syndrome in Heidelberg and Mannheim and to map its geographical distribution in Germany. Usher syndrome patients were ascertained through the databases of the Low Vision Department at the University of Heidelberg, and of the patient support group Pro Retina. Ophthalmic and audiologic examinations and medical records were used to classify patients into one of the subtypes. The database of the University of Heidelberg contains 247 Usher syndrome patients, 63 with Usher syndrome type 1 (USH1) and 184 with Usher syndrome type 2 (USH2). The USH1:USH2 ratio in the Heidelberg database was 1:3. The Pro Retina database includes 248 Usher syndrome patients, 21 with USH1 and 227 with USH2. The total number of Usher syndrome patients was 424, with 75 USH1 and 349 USH2 patients; 71 patients were in both databases. The prevalence of Usher syndrome in Heidelberg and suburbs was calculated to be 6.2 per 100,000 inhabitants. There seems to be a homogeneous distribution in Germany for both subtypes. Knowledge of the high prevalence of Usher syndrome, with up to 5,000 patients in Germany, should lead to increased awareness and timely diagnosis by ophthalmologists and otologists. It should also ensure that these patients receive good support through hearing and vision aids.

  18. Geologic map and map database of parts of Marin, San Francisco, Alameda, Contra Costa, and Sonoma counties, California

    USGS Publications Warehouse

    Blake, M.C.; Jones, D.L.; Graymer, R.W.; digital database by Soule, Adam

    2000-01-01

    This digital map database, compiled from previously published and unpublished data, and new mapping by the authors, represents the general distribution of bedrock and surficial deposits in the mapped area. Together with the accompanying text file (mageo.txt, mageo.pdf, or mageo.ps), it provides current information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:62,500 or smaller general distribution of bedrock and surficial deposits in the mapped area. Together with the accompanying text file (mageo.txt, mageo.pdf, or mageo.ps), it provides current information on the geologic structure and stratigraphy of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:62,500 or smaller.

  19. MADGE: scalable distributed data management software for cDNA microarrays.

    PubMed

    McIndoe, Richard A; Lanzen, Aaron; Hurtz, Kimberly

    2003-01-01

    The human genome project and the development of new high-throughput technologies have created unparalleled opportunities to study the mechanism of diseases, monitor the disease progression and evaluate effective therapies. Gene expression profiling is a critical tool to accomplish these goals. The use of nucleic acid microarrays to assess the gene expression of thousands of genes simultaneously has seen phenomenal growth over the past five years. Although commercial sources of microarrays exist, investigators wanting more flexibility in the genes represented on the array will turn to in-house production. The creation and use of cDNA microarrays is a complicated process that generates an enormous amount of information. Effective data management of this information is essential to efficiently access, analyze, troubleshoot and evaluate the microarray experiments. We have developed a distributable software package designed to track and store the various pieces of data generated by a cDNA microarray facility. This includes the clone collection storage data, annotation data, workflow queues, microarray data, data repositories, sample submission information, and project/investigator information. This application was designed using a 3-tier client server model. The data access layer (1st tier) contains the relational database system tuned to support a large number of transactions. The data services layer (2nd tier) is a distributed COM server with full database transaction support. The application layer (3rd tier) is an internet based user interface that contains both client and server side code for dynamic interactions with the user. This software is freely available to academic institutions and non-profit organizations at http://www.genomics.mcg.edu/niddkbtc.

  20. The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research

    PubMed Central

    Estivalet, Gustavo L.; Meunier, Fanny

    2015-01-01

    In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional’s corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior. PMID:26630138

  1. Craters of the Pluto-Charon system

    NASA Astrophysics Data System (ADS)

    Robbins, Stuart J.; Singer, Kelsi N.; Bray, Veronica J.; Schenk, Paul; Lauer, Tod R.; Weaver, Harold A.; Runyon, Kirby; McKinnon, William B.; Beyer, Ross A.; Porter, Simon; White, Oliver L.; Hofgartner, Jason D.; Zangari, Amanda M.; Moore, Jeffrey M.; Young, Leslie A.; Spencer, John R.; Binzel, Richard P.; Buie, Marc W.; Buratti, Bonnie J.; Cheng, Andrew F.; Grundy, William M.; Linscott, Ivan R.; Reitsema, Harold J.; Reuter, Dennis C.; Showalter, Mark R.; Tyler, G. Len; Olkin, Catherine B.; Ennico, Kimberly S.; Stern, S. Alan; New Horizons Lorri, Mvic Instrument Teams

    2017-05-01

    NASA's New Horizons flyby mission of the Pluto-Charon binary system and its four moons provided humanity with its first spacecraft-based look at a large Kuiper Belt Object beyond Triton. Excluding this system, multiple Kuiper Belt Objects (KBOs) have been observed for only 20 years from Earth, and the KBO size distribution is unconstrained except among the largest objects. Because small KBOs will remain beyond the capabilities of ground-based observatories for the foreseeable future, one of the best ways to constrain the small KBO population is to examine the craters they have made on the Pluto-Charon system. The first step to understanding the crater population is to map it. In this work, we describe the steps undertaken to produce a robust crater database of impact features on Pluto, Charon, and their two largest moons, Nix and Hydra. These include an examination of different types of images and image processing, and we present an analysis of variability among the crater mapping team, where crater diameters were found to average ± 10% uncertainty across all sizes measured (∼0.5-300 km). We also present a few basic analyses of the crater databases, finding that Pluto's craters' differential size-frequency distribution across the encounter hemisphere has a power-law slope of approximately -3.1 ± 0.1 over diameters D ≈ 15-200 km, and Charon's has a slope of -3.0 ± 0.2 over diameters D ≈ 10-120 km; it is significantly shallower on both bodies at smaller diameters. We also better quantify evidence of resurfacing evidenced by Pluto's craters in contrast with Charon's. With this work, we are also releasing our database of potential and probable impact craters: 5287 on Pluto, 2287 on Charon, 35 on Nix, and 6 on Hydra.

  2. The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research.

    PubMed

    Estivalet, Gustavo L; Meunier, Fanny

    2015-01-01

    In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the Núcleo Interinstitucional de Linguística Computacional's corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior.

  3. Bovine Genome Database: supporting community annotation and analysis of the Bos taurus genome

    PubMed Central

    2010-01-01

    Background A goal of the Bovine Genome Database (BGD; http://BovineGenome.org) has been to support the Bovine Genome Sequencing and Analysis Consortium (BGSAC) in the annotation and analysis of the bovine genome. We were faced with several challenges, including the need to maintain consistent quality despite diversity in annotation expertise in the research community, the need to maintain consistent data formats, and the need to minimize the potential duplication of annotation effort. With new sequencing technologies allowing many more eukaryotic genomes to be sequenced, the demand for collaborative annotation is likely to increase. Here we present our approach, challenges and solutions facilitating a large distributed annotation project. Results and Discussion BGD has provided annotation tools that supported 147 members of the BGSAC in contributing 3,871 gene models over a fifteen-week period, and these annotations have been integrated into the bovine Official Gene Set. Our approach has been to provide an annotation system, which includes a BLAST site, multiple genome browsers, an annotation portal, and the Apollo Annotation Editor configured to connect directly to our Chado database. In addition to implementing and integrating components of the annotation system, we have performed computational analyses to create gene evidence tracks and a consensus gene set, which can be viewed on individual gene pages at BGD. Conclusions We have provided annotation tools that alleviate challenges associated with distributed annotation. Our system provides a consistent set of data to all annotators and eliminates the need for annotators to format data. Involving the bovine research community in genome annotation has allowed us to leverage expertise in various areas of bovine biology to provide biological insight into the genome sequence. PMID:21092105

  4. Craters of the Pluto-Charon System

    NASA Technical Reports Server (NTRS)

    Robbins, Stuart J.; Singer, Kelsi N.; Bray, Veronica J.; Schenk, Paul; Lauer, Todd R.; Weaver, Harold A.; Runyon, Kirby; Mckinnon, William B.; Beyer, Ross A.; Porter, Simon; hide

    2016-01-01

    NASA's New Horizons flyby mission of the Pluto-Charon binary system and its four moons provided humanity with its first spacecraft-based look at a large Kuiper Belt Object beyond Triton. Excluding this system, multiple Kuiper Belt Objects (KBOs) have been observed for only 20 years from Earth, and the KBO size distribution is unconstrained except among the largest objects. Because small KBOs will remain beyond the capabilities of ground-based observatories for the foreseeable future, one of the best ways to constrain the small KBO population is to examine the craters they have made on the Pluto-Charon system. The first step to understanding the crater population is to map it. In this work, we describe the steps undertaken to produce a robust crater database of impact features on Pluto, Charon, and their two largest moons, Nix and Hydra. These include an examination of different types of images and image processing, and we present an analysis of variability among the crater mapping team, where crater diameters were found to average +/-10% uncertainty across all sizes measured (approx.0.5-300 km). We also present a few basic analyses of the crater databases, finding that Pluto's craters' differential size-frequency distribution across the encounter hemisphere has a power-law slope of approximately -3.1 +/- 0.1 over diameters D approx. = 15-200 km, and Charon's has a slope of -3.0 +/- 0.2 over diameters D approx. = 10-120 km; it is significantly shallower on both bodies at smaller diameters. We also better quantify evidence of resurfacing evidenced by Pluto's craters in contrast with Charon's. With this work, we are also releasing our database of potential and probable impact craters: 5287 on Pluto, 2287 on Charon, 35 on Nix, and 6 on Hydra.

  5. Digital Image Support in the ROADNet Real-time Monitoring Platform

    NASA Astrophysics Data System (ADS)

    Lindquist, K. G.; Hansen, T. S.; Newman, R. L.; Vernon, F. L.; Nayak, A.; Foley, S.; Fricke, T.; Orcutt, J.; Rajasekar, A.

    2004-12-01

    The ROADNet real-time monitoring infrastructure has allowed researchers to integrate geophysical monitoring data from a wide variety of signal domains. Antelope-based data transport, relational-database buffering and archiving, backup/replication/archiving through the Storage Resource Broker, and a variety of web-based distribution tools create a powerful monitoring platform. In this work we discuss our use of the ROADNet system for the collection and processing of digital image data. Remote cameras have been deployed at approximately 32 locations as of September 2004, including the SDSU Santa Margarita Ecological Reserve, the Imperial Beach pier, and the Pinon Flats geophysical observatory. Fire monitoring imagery has been obtained through a connection to the HPWREN project. Near-real-time images obtained from the R/V Roger Revelle include records of seafloor operations by the JASON submersible, as part of a maintenance mission for the H2O underwater seismic observatory. We discuss acquisition mechanisms and the packet architecture for image transport via Antelope orbservers, including multi-packet support for arbitrarily large images. Relational database storage supports archiving of timestamped images, image-processing operations, grouping of related images and cameras, support for motion-detect triggers, thumbnail images, pre-computed video frames, support for time-lapse movie generation and storage of time-lapse movies. Available ROADNet monitoring tools include both orbserver-based display of incoming real-time images and web-accessible searching and distribution of images and movies driven by the relational database (http://mercali.ucsd.edu/rtapps/rtimbank.php). An extension to the Kepler Scientific Workflow System also allows real-time image display via the Ptolemy project. Custom time-lapse movies may be made from the ROADNet web pages.

  6. High performance semantic factoring of giga-scale semantic graph databases.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    al-Saffar, Sinan; Adolf, Bob; Haglin, David

    2010-10-01

    As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less

  7. Reverse screening methods to search for the protein targets of chemopreventive compounds

    NASA Astrophysics Data System (ADS)

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-05-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction.

  8. Visualising biological data: a semantic approach to tool and database integration

    PubMed Central

    Pettifer, Steve; Thorne, David; McDermott, Philip; Marsh, James; Villéger, Alice; Kell, Douglas B; Attwood, Teresa K

    2009-01-01

    Motivation In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are ad hoc collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customised for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research. Methods To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks. Results The toolkit, named Utopia, is freely available from . PMID:19534744

  9. Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds.

    PubMed

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-01-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget, and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB, and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction.

  10. Reverse Screening Methods to Search for the Protein Targets of Chemopreventive Compounds

    PubMed Central

    Huang, Hongbin; Zhang, Guigui; Zhou, Yuquan; Lin, Chenru; Chen, Suling; Lin, Yutong; Mai, Shangkang; Huang, Zunnan

    2018-01-01

    This article is a systematic review of reverse screening methods used to search for the protein targets of chemopreventive compounds or drugs. Typical chemopreventive compounds include components of traditional Chinese medicine, natural compounds and Food and Drug Administration (FDA)-approved drugs. Such compounds are somewhat selective but are predisposed to bind multiple protein targets distributed throughout diverse signaling pathways in human cells. In contrast to conventional virtual screening, which identifies the ligands of a targeted protein from a compound database, reverse screening is used to identify the potential targets or unintended targets of a given compound from a large number of receptors by examining their known ligands or crystal structures. This method, also known as in silico or computational target fishing, is highly valuable for discovering the target receptors of query molecules from terrestrial or marine natural products, exploring the molecular mechanisms of chemopreventive compounds, finding alternative indications of existing drugs by drug repositioning, and detecting adverse drug reactions and drug toxicity. Reverse screening can be divided into three major groups: shape screening, pharmacophore screening and reverse docking. Several large software packages, such as Schrödinger and Discovery Studio; typical software/network services such as ChemMapper, PharmMapper, idTarget, and INVDOCK; and practical databases of known target ligands and receptor crystal structures, such as ChEMBL, BindingDB, and the Protein Data Bank (PDB), are available for use in these computational methods. Different programs, online services and databases have different applications and constraints. Here, we conducted a systematic analysis and multilevel classification of the computational programs, online services and compound libraries available for shape screening, pharmacophore screening and reverse docking to enable non-specialist users to quickly learn and grasp the types of calculations used in protein target fishing. In addition, we review the main features of these methods, programs and databases and provide a variety of examples illustrating the application of one or a combination of reverse screening methods for accurate target prediction. PMID:29868550

  11. Biomineralization of Schlumbergerella floresiana, a significant carbonate-producing benthic foraminifer.

    PubMed

    Sabbatini, A; Bédouet, L; Marie, A; Bartolini, A; Landemarre, L; Weber, M X; Gusti Ngurah Kade Mahardika, I; Berland, S; Zito, F; Vénec-Peyré, M-T

    2014-07-01

    Most foraminifera that produce a shell are efficient biomineralizers. We analyzed the calcitic shell of the large tropical benthic foraminifer Schlumbergerella floresiana. We found a suite of macromolecules containing many charged and polar amino acids and glycine that are also abundant in biomineralization proteins of other phyla. As neither genomic nor transcriptomic data are available for foraminiferal biomineralization yet, de novo-generated sequences, obtained from organic matrices submitted to ms blast database search, led to the characterization of 156 peptides. Very few homologous proteins were matched in the proteomic database, implying that the peptides are derived from unknown proteins present in the foraminiferal organic matrices. The amino acid distribution of these peptides was queried against the uniprot database and the mollusk uniprot database for comparison. The mollusks compose a well-studied phylum that yield a large variety of biomineralization proteins. These results showed that proteins extracted from S. floresiana shells contained sequences enriched with glycine, alanine, and proline, making a set of residues that provided a signature unique to foraminifera. Three of the de novo peptides exhibited sequence similarities to peptides found in proteins such as pre-collagen-P and a group of P-type ATPases including a calcium-transporting ATPase. Surprisingly, the peptide that was most similar to the collagen-like protein was a glycine-rich peptide reported from the test and spine proteome of sea urchin. The molecules, identified by matrix-assisted laser desorption ionization-time of flight mass spectrometry analyses, included acid-soluble N-glycoproteins with its sugar moieties represented by high-mannose-type glycans and carbohydrates. Describing the nature of the proteins, and associated molecules in the skeletal structure of living foraminifera, can elucidate the biomineralization mechanisms of these major carbonate producers in marine ecosystems. As fossil foraminifera provide important paleoenvironmental and paleoclimatic information, a better understanding of biomineralization in these organisms will have far-reaching impacts. © 2014 John Wiley & Sons Ltd.

  12. Visualising biological data: a semantic approach to tool and database integration.

    PubMed

    Pettifer, Steve; Thorne, David; McDermott, Philip; Marsh, James; Villéger, Alice; Kell, Douglas B; Attwood, Teresa K

    2009-06-16

    In the biological sciences, the need to analyse vast amounts of information has become commonplace. Such large-scale analyses often involve drawing together data from a variety of different databases, held remotely on the internet or locally on in-house servers. Supporting these tasks are ad hoc collections of data-manipulation tools, scripting languages and visualisation software, which are often combined in arcane ways to create cumbersome systems that have been customized for a particular purpose, and are consequently not readily adaptable to other uses. For many day-to-day bioinformatics tasks, the sizes of current databases, and the scale of the analyses necessary, now demand increasing levels of automation; nevertheless, the unique experience and intuition of human researchers is still required to interpret the end results in any meaningful biological way. Putting humans in the loop requires tools to support real-time interaction with these vast and complex data-sets. Numerous tools do exist for this purpose, but many do not have optimal interfaces, most are effectively isolated from other tools and databases owing to incompatible data formats, and many have limited real-time performance when applied to realistically large data-sets: much of the user's cognitive capacity is therefore focused on controlling the software and manipulating esoteric file formats rather than on performing the research. To confront these issues, harnessing expertise in human-computer interaction (HCI), high-performance rendering and distributed systems, and guided by bioinformaticians and end-user biologists, we are building reusable software components that, together, create a toolkit that is both architecturally sound from a computing point of view, and addresses both user and developer requirements. Key to the system's usability is its direct exploitation of semantics, which, crucially, gives individual components knowledge of their own functionality and allows them to interoperate seamlessly, removing many of the existing barriers and bottlenecks from standard bioinformatics tasks. The toolkit, named Utopia, is freely available from http://utopia.cs.man.ac.uk/.

  13. Internet Portal For A Distributed Management of Groundwater

    NASA Astrophysics Data System (ADS)

    Meissner, U. F.; Rueppel, U.; Gutzke, T.; Seewald, G.; Petersen, M.

    The management of groundwater resources for the supply of German cities and sub- urban areas has become a matter of public interest during the last years. Negative headlines in the Rhein-Main-Area dealt with cracks in buildings as well as damaged woodlands and inundated agriculture areas as an effect of varying groundwater levels. Usually a holistic management of groundwater resources is not existent because of the complexity of the geological system, the large number of involved groups and their divergent interests and a lack of essential information. The development of a network- based information system for an efficient groundwater management was the target of the project: ?Grundwasser-Online?[1]. The management of groundwater resources has to take into account various hydro- geological, climatic, water-economical, chemical and biological interrelations [2]. Thus, the traditional approaches in information retrieval, which are characterised by a high personnel and time expenditure, are not sufficient. Furthermore, the efficient control of the groundwater cultivation requires a direct communication between the different water supply companies, the consultant engineers, the scientists, the govern- mental agencies and the public, by using computer networks. The presented groundwater information system consists of different components, especially for the collection, storage, evaluation and visualisation of groundwater- relevant information. Network-based technologies are used [3]. For the collection of time-dependant groundwater-relevant information, modern technologies of Mobile Computing have been analysed in order to provide an integrated approach in the man- agement of large groundwater systems. The aggregated information is stored within a distributed geo-scientific database system which enables a direct integration of simu- lation programs for the evaluation of interactions in groundwater systems. Thus, even a prognosis for the evolution of groundwater states can be given. In order to gener- ate reports automatically, technologies are utilised. The visualisation of geo-scientific databases in the internet considering their geographic reference is performed with internet map servers. According to the communication of the map server with the un- derlying geo-scientific database, it is necessary that the demanded data can be filtered interactively in the internet browser using chronological and logical criteria. With re- gard to public use the security aspects within the described distributed system are of 1 major importance. Therefore, security methods for the modelling of access rights in combination with digital signatures have been analysed and implemented in order to provide a secure data exchange and communication between the different partners in the network 2

  14. Income distribution patterns from a complete social security database

    NASA Astrophysics Data System (ADS)

    Derzsy, N.; Néda, Z.; Santos, M. A.

    2012-11-01

    We analyze the income distribution of employees for 9 consecutive years (2001-2009) using a complete social security database for an economically important district of Romania. The database contains detailed information on more than half million taxpayers, including their monthly salaries from all employers where they worked. Besides studying the characteristic distribution functions in the high and low/medium income limits, the database allows us a detailed dynamical study by following the time-evolution of the taxpayers income. To our knowledge, this is the first extensive study of this kind (a previous Japanese taxpayers survey was limited to two years). In the high income limit we prove once again the validity of Pareto’s law, obtaining a perfect scaling on four orders of magnitude in the rank for all the studied years. The obtained Pareto exponents are quite stable with values around α≈2.5, in spite of the fact that during this period the economy developed rapidly and also a financial-economic crisis hit Romania in 2007-2008. For the low and medium income category we confirmed the exponential-type income distribution. Following the income of employees in time, we have found that the top limit of the income distribution is a highly dynamical region with strong fluctuations in the rank. In this region, the observed dynamics is consistent with a multiplicative random growth hypothesis. Contrarily with previous results obtained for the Japanese employees, we find that the logarithmic growth-rate is not independent of the income.

  15. HZE reactions and data-base development

    NASA Technical Reports Server (NTRS)

    Townsend, Lawrence W.; Cucinotta, Francis A.; Wilson, John W.

    1993-01-01

    The primary cosmic rays are dispersed over a large range of linear energy transfer (LET) values and their distribution over LET is a determinant of biological response. This LET distribution is modified by radiation shielding thickness and shield material composition. The current uncertainties in nuclear cross sections will not allow the composition of the shield material to be distinguished in order to minimize biological risk. An overview of the development of quantum mechanical models of heavy ion reactions will be given and computational results compared with experiments. A second approach is the development of phenomenological models from semi-classical considerations. These models provide the current data base in high charge and energy (HZE) shielding studies. They will be compared with available experimental data. The background material for this lecture will be available as a review document of over 30 years of research at Langley but will include new results obtained over the last year.

  16. Gossip-Based Dissemination

    NASA Astrophysics Data System (ADS)

    Friedman, Roy; Kermarrec, Anne-Marie; Miranda, Hugo; Rodrigues, Luís

    Gossip-based networking has emerged as a viable approach to disseminate information reliably and efficiently in large-scale systems. Initially introduced for database replication [222], the applicability of the approach extends much further now. For example, it has been applied for data aggregation [415], peer sampling [416] and publish/subscribe systems [845]. Gossip-based protocols rely on a periodic peer-wise exchange of information in wired systems. By changing the way each peer is selected for the gossip communication, and which data are exchanged and processed [451], gossip systems can be used to perform different distributed tasks, such as, among others: overlay maintenance, distributed computation, and information dissemination (a collection of papers on gossip can be found in [451]). In a wired setting, the peer sampling service, allowing for a random or specific peer selection, is often provided as an independent service, able to operate independently from other gossip-based services [416].

  17. Aerosol Optical Depth Distribution in Extratropical Cyclones over the Northern Hemisphere Oceans

    NASA Technical Reports Server (NTRS)

    Naud, Catherine M.; Posselt, Derek J.; van den Heever, Susan C.

    2016-01-01

    Using Moderate Resolution Imaging Spectroradiometer and an extratropical cyclone database,the climatological distribution of aerosol optical depth (AOD) in extratropical cyclones is explored based solely on observations. Cyclone-centered composites of aerosol optical depth are constructed for the Northern Hemisphere mid-latitude ocean regions, and their seasonal variations are examined. These composites are found to be qualitatively stable when the impact of clouds and surface insolation or brightness is tested. The larger AODs occur in spring and summer and are preferentially found in the warm frontal and in the post-cold frontal regions in all seasons. The fine mode aerosols dominate the cold sector AODs, but the coarse mode aerosols display large AODs in the warm sector. These differences between the aerosol modes are related to the varying source regions of the aerosols and could potentially have different impacts on cloud and precipitation within the cyclones.

  18. Effect of extreme data loss on heart rate signals quantified by entropy analysis

    NASA Astrophysics Data System (ADS)

    Li, Yu; Wang, Jun; Li, Jin; Liu, Dazhao

    2015-02-01

    The phenomenon of data loss always occurs in the analysis of large databases. Maintaining the stability of analysis results in the event of data loss is very important. In this paper, we used a segmentation approach to generate a synthetic signal that is randomly wiped from data according to the Gaussian distribution and the exponential distribution of the original signal. Then, the logistic map is used as verification. Finally, two methods of measuring entropy-base-scale entropy and approximate entropy-are comparatively analyzed. Our results show the following: (1) Two key parameters-the percentage and the average length of removed data segments-can change the sequence complexity according to logistic map testing. (2) The calculation results have preferable stability for base-scale entropy analysis, which is not sensitive to data loss. (3) The loss percentage of HRV signals should be controlled below the range (p = 30 %), which can provide useful information in clinical applications.

  19. Complexity of the international agro-food trade network and its impact on food safety.

    PubMed

    Ercsey-Ravasz, Mária; Toroczkai, Zoltán; Lakner, Zoltán; Baranyi, József

    2012-01-01

    With the world's population now in excess of 7 billion, it is vital to ensure the chemical and microbiological safety of our food, while maintaining the sustainability of its production, distribution and trade. Using UN databases, here we show that the international agro-food trade network (IFTN), with nodes and edges representing countries and import-export fluxes, respectively, has evolved into a highly heterogeneous, complex supply-chain network. Seven countries form the core of the IFTN, with high values of betweenness centrality and each trading with over 77% of all the countries in the world. Graph theoretical analysis and a dynamic food flux model show that the IFTN provides a vehicle suitable for the fast distribution of potential contaminants but unsuitable for tracing their origin. In particular, we show that high values of node betweenness and vulnerability correlate well with recorded large food poisoning outbreaks.

  20. Using Historical Atlas Data to Develop High-Resolution Distribution Models of Freshwater Fishes

    PubMed Central

    Huang, Jian; Frimpong, Emmanuel A.

    2015-01-01

    Understanding the spatial pattern of species distributions is fundamental in biogeography, and conservation and resource management applications. Most species distribution models (SDMs) require or prefer species presence and absence data for adequate estimation of model parameters. However, observations with unreliable or unreported species absences dominate and limit the implementation of SDMs. Presence-only models generally yield less accurate predictions of species distribution, and make it difficult to incorporate spatial autocorrelation. The availability of large amounts of historical presence records for freshwater fishes of the United States provides an opportunity for deriving reliable absences from data reported as presence-only, when sampling was predominantly community-based. In this study, we used boosted regression trees (BRT), logistic regression, and MaxEnt models to assess the performance of a historical metacommunity database with inferred absences, for modeling fish distributions, investigating the effect of model choice and data properties thereby. With models of the distribution of 76 native, non-game fish species of varied traits and rarity attributes in four river basins across the United States, we show that model accuracy depends on data quality (e.g., sample size, location precision), species’ rarity, statistical modeling technique, and consideration of spatial autocorrelation. The cross-validation area under the receiver-operating-characteristic curve (AUC) tended to be high in the spatial presence-absence models at the highest level of resolution for species with large geographic ranges and small local populations. Prevalence affected training but not validation AUC. The key habitat predictors identified and the fish-habitat relationships evaluated through partial dependence plots corroborated most previous studies. The community-based SDM framework broadens our capability to model species distributions by innovatively removing the constraint of lack of species absence data, thus providing a robust prediction of distribution for stream fishes in other regions where historical data exist, and for other taxa (e.g., benthic macroinvertebrates, birds) usually observed by community-based sampling designs. PMID:26075902

  1. PANDA asymmetric-configuration passive decay heat removal test results

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fischer, O.; Dreier, J.; Aubert, C.

    1997-12-01

    PANDA is a large-scale, low-pressure test facility for investigating passive decay heat removal systems for the next generation of LWRs. In the first series of experiments, PANDA was used to examine the long-term LOCA response of the Passive Containment Cooling System (PCCS) for the General Electric (GE) Simplified Boiling Water Reactor (SBWR). The test objectives include concept demonstration and extension of the database available for qualification of containment codes. Also included is the study of the effects of nonuniform distributions of steam and noncondensable gases in the Dry-well (DW) and in the Suppression Chamber (SC). 3 refs., 9 figs.

  2. High rate information systems - Architectural trends in support of the interdisciplinary investigator

    NASA Technical Reports Server (NTRS)

    Handley, Thomas H., Jr.; Preheim, Larry E.

    1990-01-01

    Data systems requirements in the Earth Observing System (EOS) Space Station Freedom (SSF) eras indicate increasing data volume, increased discipline interplay, higher complexity and broader data integration and interpretation. A response to the needs of the interdisciplinary investigator is proposed, considering the increasing complexity and rising costs of scientific investigation. The EOS Data Information System, conceived to be a widely distributed system with reliable communication links between central processing and the science user community, is described. Details are provided on information architecture, system models, intelligent data management of large complex databases, and standards for archiving ancillary data, using a research library, a laboratory and collaboration services.

  3. The clinical value of large neuroimaging data sets in Alzheimer's disease.

    PubMed

    Toga, Arthur W

    2012-02-01

    Rapid advances in neuroimaging and cyberinfrastructure technologies have brought explosive growth in the Web-based warehousing, availability, and accessibility of imaging data on a variety of neurodegenerative and neuropsychiatric disorders and conditions. There has been a prolific development and emergence of complex computational infrastructures that serve as repositories of databases and provide critical functionalities such as sophisticated image analysis algorithm pipelines and powerful three-dimensional visualization and statistical tools. The statistical and operational advantages of collaborative, distributed team science in the form of multisite consortia push this approach in a diverse range of population-based investigations. Copyright © 2012 Elsevier Inc. All rights reserved.

  4. Morphological classification and spatial distribution of Philippine volcanoes

    NASA Astrophysics Data System (ADS)

    Paguican, E. M. R.; Kervyn, M.; Grosse, P.

    2016-12-01

    The Philippines is an island arc composed of two major blocks: the aseismic Palawan microcontinental block and the Philippine mobile belt. It is bounded by opposing subduction zones, with the left-lateral Philippine Fault running north-south. This setting is ideal for volcano formation and growth, making it one of the best places to study the controls on island arc volcano morphometry and evolution. In this study, we created a database of volcanic edifices and structures identified on the SRTM 30 m digital elevation models (DEM). We computed the morphometry of each edifice using MORVOLC, an IDL code for generating quantitative parameters based on a defined volcano base and DEM. Morphometric results illustrate the large range of sizes and volumes of Philippine volcanoes. Heirarchical classification by principal component analysis distinguishes between large massifs, large cones/sub-cones, small shields/sub-cones, and small cones, based mainly on size (volume, basal width) and steepness (height/basal width ratio, average slopes). Poisson Nearest Neighbor analysis was used to examine the spatial distribution of volcano centroids. Spatial distribution of the different types of volcanoes suggests that large volcanic massifs formed on thickened crust. Although all the volcanic fields and arcs are a response to tectonic activity such as subduction or rifting, only West Luzon, North and South Mindanao, and Eastern Philippines volcanic arcs and Basilan, Macolod, and Maramag volcanic fields present a statistical clustering of volcanic centers. Spatial distribution and preferential alignment of edifices in all volcanic fields confirm that regional structures had some control on their formation. Volcanoes start either as steep cones or as less steep sub-cones and shields. They then grow into large cones, sub-cones and eventually into massifs as eruption focus shifts within the volcano and new eruptive material is deposited on the slopes. Examination of the directions of volcano collapse scars and erosional amphitheater valleys suggests that, during their development, volcano growth is affected by movement of underlying tectonic structures, weight and stability of the growing edifice, structure and composition of the substrata, and intense erosion associated with tropical rainfall.

  5. The Ophidia Stack: Toward Large Scale, Big Data Analytics Experiments for Climate Change

    NASA Astrophysics Data System (ADS)

    Fiore, S.; Williams, D. N.; D'Anca, A.; Nassisi, P.; Aloisio, G.

    2015-12-01

    The Ophidia project is a research effort on big data analytics facing scientific data analysis challenges in multiple domains (e.g. climate change). It provides a "datacube-oriented" framework responsible for atomically processing and manipulating scientific datasets, by providing a common way to run distributive tasks on large set of data fragments (chunks). Ophidia provides declarative, server-side, and parallel data analysis, jointly with an internal storage model able to efficiently deal with multidimensional data and a hierarchical data organization to manage large data volumes. The project relies on a strong background on high performance database management and On-Line Analytical Processing (OLAP) systems to manage large scientific datasets. The Ophidia analytics platform provides several data operators to manipulate datacubes (about 50), and array-based primitives (more than 100) to perform data analysis on large scientific data arrays. To address interoperability, Ophidia provides multiple server interfaces (e.g. OGC-WPS). From a client standpoint, a Python interface enables the exploitation of the framework into Python-based eco-systems/applications (e.g. IPython) and the straightforward adoption of a strong set of related libraries (e.g. SciPy, NumPy). The talk will highlight a key feature of the Ophidia framework stack: the "Analytics Workflow Management System" (AWfMS). The Ophidia AWfMS coordinates, orchestrates, optimises and monitors the execution of multiple scientific data analytics and visualization tasks, thus supporting "complex analytics experiments". Some real use cases related to the CMIP5 experiment will be discussed. In particular, with regard to the "Climate models intercomparison data analysis" case study proposed in the EU H2020 INDIGO-DataCloud project, workflows related to (i) anomalies, (ii) trend, and (iii) climate change signal analysis will be presented. Such workflows will be distributed across multiple sites - according to the datasets distribution - and will include intercomparison, ensemble, and outlier analysis. The two-level workflow solution envisioned in INDIGO (coarse grain for distributed tasks orchestration, and fine grain, at the level of a single data analytics cluster instance) will be presented and discussed.

  6. The Raid distributed database system

    NASA Technical Reports Server (NTRS)

    Bhargava, Bharat; Riedl, John

    1989-01-01

    Raid, a robust and adaptable distributed database system for transaction processing (TP), is described. Raid is a message-passing system, with server processes on each site to manage concurrent processing, consistent replicated copies during site failures, and atomic distributed commitment. A high-level layered communications package provides a clean location-independent interface between servers. The latest design of the package delivers messages via shared memory in a configuration with several servers linked into a single process. Raid provides the infrastructure to investigate various methods for supporting reliable distributed TP. Measurements on TP and server CPU time are presented, along with data from experiments on communications software, consistent replicated copy control during site failures, and concurrent distributed checkpointing. A software tool for evaluating the implementation of TP algorithms in an operating-system kernel is proposed.

  7. Statistical Downscaling in Multi-dimensional Wave Climate Forecast

    NASA Astrophysics Data System (ADS)

    Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.

    2009-04-01

    Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the suitability of this methodology to be used for seasonal forecast and for long-term climate change scenario projection of wave climate.

  8. KA-SB: from data integration to large scale reasoning

    PubMed Central

    Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F

    2009-01-01

    Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402

  9. Further Refinement of the LEWICE SLD Model

    NASA Technical Reports Server (NTRS)

    Wright, William B.

    2006-01-01

    A research project is underway at NASA Glenn Research Center to produce computer software that can accurately predict ice growth for any meteorological conditions for any aircraft surface. This report will present results from version 3.2 of this software, which is called LEWICE. This version differs from previous releases in that it incorporates additional thermal analysis capabilities, a pneumatic boot model, interfaces to external computational fluid dynamics (CFD) flow solvers and has an empirical model for the supercooled large droplet (SLD) regime. An extensive comparison against the database of ice shapes and collection efficiencies that have been generated in the NASA Glenn Icing Research Tunnel (IRT) has also been performed. The complete set of data used for this comparison will eventually be available in a contractor report. This paper will show the differences in collection efficiency and ice shape between LEWICE 3.2 and experimental data. This report will first describe the LEWICE 3.2 SLD model. A semi-empirical approach was used to incorporate first order physical effects of large droplet phenomena into icing software. Comparisons are then made to every two-dimensional case in the water collection database and the ice shape database. Each collection efficiency condition was run using the following four assumptions: 1) potential flow, no splashing; 2) potential flow, with splashing; 3) Navior-Stokes, no splashing; 4) Navi r-Stokes, with splashing. All cases were run with 21 bin drop size distributions and a lift correction (angle of attack adjustment). Quantitative comparisons are shown for impingement limit, maximum water catch, and total collection efficiency. Due to the large number of ice shape cases, comprehensive comparisons were limited to potential flow cases with and without splashing. Quantitative comparisons are shown for horn height, horn angle, icing limit, area, and leading edge thickness. The results show that the predicted results for both ice shape and water collection are within the accuracy limits of the experimental data for the majority of cases.

  10. Statistical properties of share volume traded in financial markets

    NASA Astrophysics Data System (ADS)

    Gopikrishnan, Parameswaran; Plerou, Vasiliki; Gabaix, Xavier; Stanley, H. Eugene

    2000-10-01

    We quantitatively investigate the ideas behind the often-expressed adage ``it takes volume to move stock prices,'' and study the statistical properties of the number of shares traded QΔt for a given stock in a fixed time interval Δt. We analyze transaction data for the largest 1000 stocks for the two-year period 1994-95, using a database that records every transaction for all securities in three major US stock markets. We find that the distribution P(QΔt) displays a power-law decay, and that the time correlations in QΔt display long-range persistence. Further, we investigate the relation between QΔt and the number of transactions NΔt in a time interval Δt, and find that the long-range correlations in QΔt are largely due to those of NΔt. Our results are consistent with the interpretation that the large equal-time correlation previously found between QΔt and the absolute value of price change \\|GΔt\\| (related to volatility) are largely due to NΔt.

  11. Structuring intuition with theory: The high-throughput way

    NASA Astrophysics Data System (ADS)

    Fornari, Marco

    2015-03-01

    First principles methodologies have grown in accuracy and applicability to the point where large databases can be built, shared, and analyzed with the goal of predicting novel compositions, optimizing functional properties, and discovering unexpected relationships between the data. In order to be useful to a large community of users, data should be standardized, validated, and distributed. In addition, tools to easily manage large datasets should be made available to effectively lead to materials development. Within the AFLOW consortium we have developed a simple frame to expand, validate, and mine data repositories: the MTFrame. Our minimalistic approach complement AFLOW and other existing high-throughput infrastructures and aims to integrate data generation with data analysis. We present few examples from our work on materials for energy conversion. Our intent s to pinpoint the usefulness of high-throughput methodologies to guide the discovery process by quantitatively structuring the scientific intuition. This work was supported by ONR-MURI under Contract N00014-13-1-0635 and the Duke University Center for Materials Genomics.

  12. Visualizing the semantic content of large text databases using text maps

    NASA Technical Reports Server (NTRS)

    Combs, Nathan

    1993-01-01

    A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.

  13. Improving data management and dissemination in web based information systems by semantic enrichment of descriptive data aspects

    NASA Astrophysics Data System (ADS)

    Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan

    2010-10-01

    The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.

  14. Distributed processor allocation for launching applications in a massively connected processors complex

    DOEpatents

    Pedretti, Kevin

    2008-11-18

    A compute processor allocator architecture for allocating compute processors to run applications in a multiple processor computing apparatus is distributed among a subset of processors within the computing apparatus. Each processor of the subset includes a compute processor allocator. The compute processor allocators can share a common database of information pertinent to compute processor allocation. A communication path permits retrieval of information from the database independently of the compute processor allocators.

  15. Practical private database queries based on a quantum-key-distribution protocol

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jakobi, Markus; Humboldt-Universitaet zu Berlin, D-10117 Berlin; Simon, Christoph

    2011-02-15

    Private queries allow a user, Alice, to learn an element of a database held by a provider, Bob, without revealing which element she is interested in, while limiting her information about the other elements. We propose to implement private queries based on a quantum-key-distribution protocol, with changes only in the classical postprocessing of the key. This approach makes our scheme both easy to implement and loss tolerant. While unconditionally secure private queries are known to be impossible, we argue that an interesting degree of security can be achieved by relying on fundamental physical principles instead of unverifiable security assumptions inmore » order to protect both the user and the database. We think that the scope exists for such practical private queries to become another remarkable application of quantum information in the footsteps of quantum key distribution.« less

  16. A revision of the distribution of sea kraits (Reptilia, Laticauda) with an updated occurrence dataset for ecological and conservation research

    PubMed Central

    Gherghel, Iulian; Papeş, Monica; Brischoux, François; Sahlean, Tiberiu; Strugariu, Alexandru

    2016-01-01

    Abstract The genus Laticauda (Reptilia: Elapidae), commonly known as sea kraits, comprises eight species of marine amphibious snakes distributed along the shores of the Western Pacific Ocean and the Eastern Indian Ocean. We review the information available on the geographic range of sea kraits and analyze their distribution patterns. Generally, we found that south and south-west of Japan, Philippines Archipelago, parts of Indonesia, and Vanuatu have the highest diversity of sea krait species. Further, we compiled the information available on sea kraits’ occurrences from a variety of sources, including museum records, field surveys, and the scientific literature. The final database comprises 694 occurrence records, with Laticauda colubrina having the highest number of records and Laticauda schistorhyncha the lowest. The occurrence records were georeferenced and compiled as a database for each sea krait species. This database can be freely used for future studies. PMID:27110155

  17. A revision of the distribution of sea kraits (Reptilia, Laticauda) with an updated occurrence dataset for ecological and conservation research.

    PubMed

    Gherghel, Iulian; Papeş, Monica; Brischoux, François; Sahlean, Tiberiu; Strugariu, Alexandru

    2016-01-01

    The genus Laticauda (Reptilia: Elapidae), commonly known as sea kraits, comprises eight species of marine amphibious snakes distributed along the shores of the Western Pacific Ocean and the Eastern Indian Ocean. We review the information available on the geographic range of sea kraits and analyze their distribution patterns. Generally, we found that south and south-west of Japan, Philippines Archipelago, parts of Indonesia, and Vanuatu have the highest diversity of sea krait species. Further, we compiled the information available on sea kraits' occurrences from a variety of sources, including museum records, field surveys, and the scientific literature. The final database comprises 694 occurrence records, with Laticauda colubrina having the highest number of records and Laticauda schistorhyncha the lowest. The occurrence records were georeferenced and compiled as a database for each sea krait species. This database can be freely used for future studies.

  18. The EpiSLI Database: A Publicly Available Database on Speech and Language

    ERIC Educational Resources Information Center

    Tomblin, J. Bruce

    2010-01-01

    Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…

  19. Mugshot Identification Database (MID)

    National Institute of Standards and Technology Data Gateway

    NIST Mugshot Identification Database (MID) (Web, free access)   NIST Special Database 18 is being distributed for use in development and testing of automated mugshot identification systems. The database consists of three CD-ROMs, containing a total of 3248 images of variable size using lossless compression. A newer version of the compression/decompression software on the CDROM can be found at the website http://www.nist.gov/itl/iad/ig/nigos.cfm as part of the NBIS package.

  20. Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model

    DTIC Science & Technology

    2014-04-01

    time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest

Top