Sample records for scalable database access

  1. Advanced technologies for scalable ATLAS conditions database access on the grid

    NASA Astrophysics Data System (ADS)

    Basset, R.; Canali, L.; Dimitrov, G.; Girone, M.; Hawkings, R.; Nevski, P.; Valassi, A.; Vaniachine, A.; Viegas, F.; Walker, R.; Wong, A.

    2010-04-01

    During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions Db data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions Db data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested a novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of pilot job submission on the Grid: instead of the actual query, an ATLAS utility library sends to the database server a pilot query first.

  2. A Web-Based Database for Nurse Led Outreach Teams (NLOT) in Toronto.

    PubMed

    Li, Shirley; Kuo, Mu-Hsing; Ryan, David

    2016-01-01

    A web-based system can provide access to real-time data and information. Healthcare is moving towards digitizing patients' medical information and securely exchanging it through web-based systems. In one of Ontario's health regions, Nurse Led Outreach Teams (NLOT) provide emergency mobile nursing services to help reduce unnecessary transfers from long-term care homes to emergency departments. Currently the NLOT team uses a Microsoft Access database to keep track of the health information on the residents that they serve. The Access database lacks scalability, portability, and interoperability. The objective of this study is the development of a web-based database using Oracle Application Express that is easily accessible from mobile devices. The web-based database will allow NLOT nurses to enter and access resident information anytime and from anywhere.

  3. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.

    PubMed

    Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.

  4. Multi-Resolution Playback of Network Trace Files

    DTIC Science & Technology

    2015-06-01

    a com- plete MySQL database, C++ developer tools and the libraries utilized in the development of the system (Boost and Libcrafter), and Wireshark...XE suite has a limit to the allowed size of each database. In order to be scalable, the project had to switch to the MySQL database suite. The...programs that access the database use the MySQL C++ connector, provided by Oracle, and the supplied methods and libraries. 4.4 Flow Generator Chapter 3

  5. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data

    PubMed Central

    Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191

  6. Scalable global grid catalogue for Run3 and beyond

    NASA Astrophysics Data System (ADS)

    Martinez Pedreira, M.; Grigoras, C.; ALICE Collaboration

    2017-10-01

    The AliEn (ALICE Environment) file catalogue is a global unique namespace providing mapping between a UNIX-like logical name structure and the corresponding physical files distributed over 80 storage elements worldwide. Powerful search tools and hierarchical metadata information are integral parts of the system and are used by the Grid jobs as well as local users to store and access all files on the Grid storage elements. The catalogue has been in production since 2005 and over the past 11 years has grown to more than 2 billion logical file names. The backend is a set of distributed relational databases, ensuring smooth growth and fast access. Due to the anticipated fast future growth, we are looking for ways to enhance the performance and scalability by simplifying the catalogue schema while keeping the functionality intact. We investigated different backend solutions, such as distributed key value stores, as replacement for the relational database. This contribution covers the architectural changes in the system, together with the technology evaluation, benchmark results and conclusions.

  7. [Application of the life sciences platform based on oracle to biomedical informations].

    PubMed

    Zhao, Zhi-Yun; Li, Tai-Huan; Yang, Hong-Qiao

    2008-03-01

    The life sciences platform based on Oracle database technology is introduced in this paper. By providing a powerful data access, integrating a variety of data types, and managing vast quantities of data, the software presents a flexible, safe and scalable management platform for biomedical data processing.

  8. Design of a Multi Dimensional Database for the Archimed DataWarehouse.

    PubMed

    Bréant, Claudine; Thurler, Gérald; Borst, François; Geissbuhler, Antoine

    2005-01-01

    The Archimed data warehouse project started in 1993 at the Geneva University Hospital. It has progressively integrated seven data marts (or domains of activity) archiving medical data such as Admission/Discharge/Transfer (ADT) data, laboratory results, radiology exams, diagnoses, and procedure codes. The objective of the Archimed data warehouse is to facilitate the access to an integrated and coherent view of patient medical in order to support analytical activities such as medical statistics, clinical studies, retrieval of similar cases and data mining processes. This paper discusses three principal design aspects relative to the conception of the database of the data warehouse: 1) the granularity of the database, which refers to the level of detail or summarization of data, 2) the database model and architecture, describing how data will be presented to end users and how new data is integrated, 3) the life cycle of the database, in order to ensure long term scalability of the environment. Both, the organization of patient medical data using a standardized elementary fact representation and the use of the multi dimensional model have proved to be powerful design tools to integrate data coming from the multiple heterogeneous database systems part of the transactional Hospital Information System (HIS). Concurrently, the building of the data warehouse in an incremental way has helped to control the evolution of the data content. These three design aspects bring clarity and performance regarding data access. They also provide long term scalability to the system and resilience to further changes that may occur in source systems feeding the data warehouse.

  9. CORAL Server and CORAL Server Proxy: Scalable Access to Relational Databases from CORAL Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Valassi, A.; /CERN; Bartoldus, R.

    The CORAL software is widely used at CERN by the LHC experiments to access the data they store on relational databases, such as Oracle. Two new components have recently been added to implement a model involving a middle tier 'CORAL server' deployed close to the database and a tree of 'CORAL server proxies', providing data caching and multiplexing, deployed close to the client. A first implementation of the two new components, released in the summer 2009, is now deployed in the ATLAS online system to read the data needed by the High Level Trigger, allowing the configuration of a farmmore » of several thousand processes. This paper reviews the architecture of the software, its development status and its usage in ATLAS.« less

  10. Architecture Knowledge for Evaluating Scalable Databases

    DTIC Science & Technology

    2015-01-16

    problems, arising from the proliferation of new data models and distributed technologies for building scalable, available data stores . Architects must...longer are relational databases the de facto standard for building data repositories. Highly distributed, scalable “ NoSQL ” databases [11] have emerged...This is especially challenging at the data storage layer. The multitude of competing NoSQL database technologies creates a complex and rapidly

  11. Perspectives in astrophysical databases

    NASA Astrophysics Data System (ADS)

    Frailis, Marco; de Angelis, Alessandro; Roberto, Vito

    2004-07-01

    Astrophysics has become a domain extremely rich of scientific data. Data mining tools are needed for information extraction from such large data sets. This asks for an approach to data management emphasizing the efficiency and simplicity of data access; efficiency is obtained using multidimensional access methods and simplicity is achieved by properly handling metadata. Moreover, clustering and classification techniques on large data sets pose additional requirements in terms of computation and memory scalability and interpretability of results. In this study we review some possible solutions.

  12. Sequential data access with Oracle and Hadoop: a performance comparison

    NASA Astrophysics Data System (ADS)

    Baranowski, Zbigniew; Canali, Luca; Grancher, Eric

    2014-06-01

    The Hadoop framework has proven to be an effective and popular approach for dealing with "Big Data" and, thanks to its scaling ability and optimised storage access, Hadoop Distributed File System-based projects such as MapReduce or HBase are seen as candidates to replace traditional relational database management systems whenever scalable speed of data processing is a priority. But do these projects deliver in practice? Does migrating to Hadoop's "shared nothing" architecture really improve data access throughput? And, if so, at what cost? Authors answer these questions-addressing cost/performance as well as raw performance- based on a performance comparison between an Oracle-based relational database and Hadoop's distributed solutions like MapReduce or HBase for sequential data access. A key feature of our approach is the use of an unbiased data model as certain data models can significantly favour one of the technologies tested.

  13. Authomatization of Digital Collection Access Using Mobile and Wireless Data Terminals

    NASA Astrophysics Data System (ADS)

    Leontiev, I. V.

    Information technologies become vital due to information processing needs, database access, data analysis and decision support. Currently, a lot of scientific projects are oriented on database integration of heterogeneous systems. The problem of on-line and rapid access to large integrated systems of digital collections is also very important. Usually users move between different locations, either at work or at home. In most cases users need an efficient and remote access to information, stored in integrated data collections. Desktop computers are unable to fulfill the needs, so mobile and wireless devices become helpful. Handhelds and data terminals are nessessary in medical assistance (they store detailed information about each patient, and helpful for nurses), immediate access to data collections is used in a Highway patrol services (databanks of cars, owners, driver licences). Using mobile access, warehouse operations can be validated. Library and museum items cyclecounting will speed up using online barcode-scanning and central database access. That's why mobile devices - cell phones, PDA, handheld computers with wireless access, WindowsCE and PalmOS terminals become popular. Generally, mobile devices have a relatively slow processor, and limited display capabilities, but they are effective for storing and displaying textual data, recognize user hand-writing with stylus, support GUI. Users can perform operations on handheld terminal, and exchange data with the main system (using immediate radio access, or offline access during syncronization process) for update. In our report, we give an approach for mobile access to data collections, which raises an efficiency of data processing in a book library, helps to control available books, books in stock, validate service charges, eliminate staff mistakes, generate requests for book delivery. Our system uses mobile devices Symbol RF (with radio-channel access), and data terminals Symbol Palm Terminal for batch-processing and synchronization with remote library databases. We discuss the use of PalmOS-compatible devices, and WindowsCE terminals. Our software system is based on modular, scalable three-tier architecture. Additional functionality can be easily customized. Scalability is also supplied by Internet / Intranet technologies, and radio-access points. The base module of the system supports generic warehouse operations: cyclecounting with handheld barcode-scanners, efficient items delivery and issue, item movement, reserving, report generating on finished and in-process operations. Movements are optimized using worker's current location, operations are sorted in a priority order and transmitted to mobile and wireless worker's terminals. Mobile terminals improve of tasks processing control, eliminate staff mistakes, display actual information about main processes, provide data for online-reports, and significantly raise the efficiency of data exchange.

  14. Nuclear data made easily accessible through the Notre Dame Nuclear Database

    NASA Astrophysics Data System (ADS)

    Khouw, Timothy; Lee, Kevin; Fasano, Patrick; Mumpower, Matthew; Aprahamian, Ani

    2014-09-01

    In 1994, the NNDC revolutionized nuclear research by providing a colorful, clickable, searchable database over the internet. Over the last twenty years, web technology has evolved dramatically. Our project, the Notre Dame Nuclear Database, aims to provide a more comprehensive and broadly searchable interactive body of data. The database can be searched by an array of filters which includes metadata such as the facility where a measurement is made, the author(s), or date of publication for the datum of interest. The user interface takes full advantage of HTML, a web markup language, CSS (cascading style sheets to define the aesthetics of the website), and JavaScript, a language that can process complex data. A command-line interface is supported that interacts with the database directly on a user's local machine which provides single command access to data. This is possible through the use of a standardized API (application programming interface) that relies upon well-defined filtering variables to produce customized search results. We offer an innovative chart of nuclides utilizing scalable vector graphics (SVG) to deliver users an unsurpassed level of interactivity supported on all computers and mobile devices. We will present a functional demo of our database at the conference.

  15. Earth science big data at users' fingertips: the EarthServer Science Gateway Mobile

    NASA Astrophysics Data System (ADS)

    Barbera, Roberto; Bruno, Riccardo; Calanducci, Antonio; Fargetta, Marco; Pappalardo, Marco; Rundo, Francesco

    2014-05-01

    The EarthServer project (www.earthserver.eu), funded by the European Commission under its Seventh Framework Program, aims at establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending leading-edge Array Database technology. The core idea is to use database query languages as client/server interface to achieve barrier-free "mix & match" access to multi-source, any-size, multi-dimensional space-time data -- in short: "Big Earth Data Analytics" - based on the open standards of the Open Geospatial Consortium Web Coverage Processing Service (OGC WCPS) and the W3C XQuery. EarthServer combines both, thereby achieving a tight data/metadata integration. Further, the rasdaman Array Database System (www.rasdaman.com) is extended with further space-time coverage data types. On server side, highly effective optimizations - such as parallel and distributed query processing - ensure scalability to Exabyte volumes. In this contribution we will report on the EarthServer Science Gateway Mobile, an app for both iOS and Android-based devices that allows users to seamlessly access some of the EarthServer applications using SAML-based federated authentication and fine-grained authorisation mechanisms.

  16. Searching Across the International Space Station Databases

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

    2007-01-01

    Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.

  17. Data, Metadata - Who Cares?

    NASA Astrophysics Data System (ADS)

    Baumann, Peter

    2013-04-01

    There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.

  18. Database Resources of the BIG Data Center in 2018

    PubMed Central

    Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan

    2018-01-01

    Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542

  19. Lean Middleware

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; Bell, David g.; Ashish, Naveen

    2005-01-01

    This paper describes an approach to achieving data integration across multiple sources in an enterprise, in a manner that is cost efficient and economically scalable. We present an approach that does not rely on major investment in structured, heavy-weight database systems for data storage or heavy-weight middleware responsible for integrated access. The approach is centered around pushing any required data structure and semantics functionality (schema) to application clients, as well as pushing integration specification and functionality to clients where integration can be performed on-the-fly .

  20. Integrating Scientific Array Processing into Standard SQL

    NASA Astrophysics Data System (ADS)

    Misev, Dimitar; Bachhuber, Johannes; Baumann, Peter

    2014-05-01

    We live in a time that is dominated by data. Data storage is cheap and more applications than ever accrue vast amounts of data. Storing the emerging multidimensional data sets efficiently, however, and allowing them to be queried by their inherent structure, is a challenge many databases have to face today. Despite the fact that multidimensional array data is almost always linked to additional, non-array information, array databases have mostly developed separately from relational systems, resulting in a disparity between the two database categories. The current SQL standard and SQL DBMS supports arrays - and in an extension also multidimensional arrays - but does so in a very rudimentary and inefficient way. This poster demonstrates the practicality of an SQL extension for array processing, implemented in a proof-of-concept multi-faceted system that manages a federation of array and relational database systems, providing transparent, efficient and scalable access to the heterogeneous data in them.

  1. Design of a decentralized reusable research database architecture to support data acquisition in large research projects.

    PubMed

    Iavindrasana, Jimison; Depeursinge, Adrien; Ruch, Patrick; Spahni, Stéphane; Geissbuhler, Antoine; Müller, Henning

    2007-01-01

    The diagnostic and therapeutic processes, as well as the development of new treatments, are hindered by the fragmentation of information which underlies them. In a multi-institutional research study database, the clinical information system (CIS) contains the primary data input. An important part of the money of large scale clinical studies is often paid for data creation and maintenance. The objective of this work is to design a decentralized, scalable, reusable database architecture with lower maintenance costs for managing and integrating distributed heterogeneous data required as basis for a large-scale research project. Technical and legal aspects are taken into account based on various use case scenarios. The architecture contains 4 layers: data storage and access are decentralized at their production source, a connector as a proxy between the CIS and the external world, an information mediator as a data access point and the client side. The proposed design will be implemented inside six clinical centers participating in the @neurIST project as part of a larger system on data integration and reuse for aneurism treatment.

  2. Dynameomics: design of a computational lab workflow and scientific data repository for protein simulations.

    PubMed

    Simms, Andrew M; Toofanny, Rudesh D; Kehl, Catherine; Benson, Noah C; Daggett, Valerie

    2008-06-01

    Dynameomics is a project to investigate and catalog the native-state dynamics and thermal unfolding pathways of representatives of all protein folds using solvated molecular dynamics simulations, as described in the preceding paper. Here we introduce the design of the molecular dynamics data warehouse, a scalable, reliable repository that houses simulation data that vastly simplifies management and access. In the succeeding paper, we describe the development of a complementary multidimensional database. A single protein unfolding or native-state simulation can take weeks to months to complete, and produces gigabytes of coordinate and analysis data. Mining information from over 3000 completed simulations is complicated and time-consuming. Even the simplest queries involve writing intricate programs that must be built from low-level file system access primitives and include significant logic to correctly locate and parse data of interest. As a result, programs to answer questions that require data from hundreds of simulations are very difficult to write. Thus, organization and access to simulation data have been major obstacles to the discovery of new knowledge in the Dynameomics project. This repository is used internally and is the foundation of the Dynameomics portal site http://www.dynameomics.org. By organizing simulation data into a scalable, manageable and accessible form, we can begin to address substantial questions that move us closer to solving biomedical and bioengineering problems.

  3. Distributed PACS using distributed file system with hierarchical meta data servers.

    PubMed

    Hiroyasu, Tomoyuki; Minamitani, Yoshiyuki; Miki, Mitsunori; Yokouchi, Hisatake; Yoshimi, Masato

    2012-01-01

    In this research, we propose a new distributed PACS (Picture Archiving and Communication Systems) which is available to integrate several PACSs that exist in each medical institution. The conventional PACS controls DICOM file into one data-base. On the other hand, in the proposed system, DICOM file is separated into meta data and image data and those are stored individually. Using this mechanism, since file is not always accessed the entire data, some operations such as finding files, changing titles, and so on can be performed in high-speed. At the same time, as distributed file system is utilized, accessing image files can also achieve high-speed access and high fault tolerant. The introduced system has a more significant point. That is the simplicity to integrate several PACSs. In the proposed system, only the meta data servers are integrated and integrated system can be constructed. This system also has the scalability of file access with along to the number of file numbers and file sizes. On the other hand, because meta-data server is integrated, the meta data server is the weakness of this system. To solve this defect, hieratical meta data servers are introduced. Because of this mechanism, not only fault--tolerant ability is increased but scalability of file access is also increased. To discuss the proposed system, the prototype system using Gfarm was implemented. For evaluating the implemented system, file search operating time of Gfarm and NFS were compared.

  4. Scalable and expressive medical terminologies.

    PubMed

    Mays, E; Weida, R; Dionne, R; Laker, M; White, B; Liang, C; Oles, F J

    1996-01-01

    The K-Rep system, based on description logic, is used to represent and reason with large and expressive controlled medical terminologies. Expressive concept descriptions incorporate semantically precise definitions composed using logical operators, together with important non-semantic information such as synonyms and codes. Examples are drawn from our experience with K-Rep in modeling the InterMed laboratory terminology and also developing a large clinical terminology now in production use at Kaiser-Permanente. System-level scalability of performance is achieved through an object-oriented database system which efficiently maps persistent memory to virtual memory. Equally important is conceptual scalability-the ability to support collaborative development, organization, and visualization of a substantial terminology as it evolves over time. K-Rep addresses this need by logically completing concept definitions and automatically classifying concepts in a taxonomy via subsumption inferences. The K-Rep system includes a general-purpose GUI environment for terminology development and browsing, a custom interface for formulary term maintenance, a C+2 application program interface, and a distributed client-server mode which provides lightweight clients with efficient run-time access to K-Rep by means of a scripting language.

  5. P43-S Computational Biology Applications Suite for High-Performance Computing (BioHPC.net)

    PubMed Central

    Pillardy, J.

    2007-01-01

    One of the challenges of high-performance computing (HPC) is user accessibility. At the Cornell University Computational Biology Service Unit, which is also a Microsoft HPC institute, we have developed a computational biology application suite that allows researchers from biological laboratories to submit their jobs to the parallel cluster through an easy-to-use Web interface. Through this system, we are providing users with popular bioinformatics tools including BLAST, HMMER, InterproScan, and MrBayes. The system is flexible and can be easily customized to include other software. It is also scalable; the installation on our servers currently processes approximately 8500 job submissions per year, many of them requiring massively parallel computations. It also has a built-in user management system, which can limit software and/or database access to specified users. TAIR, the major database of the plant model organism Arabidopsis, and SGN, the international tomato genome database, are both using our system for storage and data analysis. The system consists of a Web server running the interface (ASP.NET C#), Microsoft SQL server (ADO.NET), compute cluster running Microsoft Windows, ftp server, and file server. Users can interact with their jobs and data via a Web browser, ftp, or e-mail. The interface is accessible at http://cbsuapps.tc.cornell.edu/.

  6. Future mobile access for open-data platforms and the BBC-DaaS system

    NASA Astrophysics Data System (ADS)

    Edlich, Stefan; Singh, Sonam; Pfennigstorf, Ingo

    2013-03-01

    In this paper, we develop an open data platform on multimedia devices to act as marketplace of data for information seekers and data providers. We explore the important aspects of Data-as-a-Service (DaaS) service in the cloud with a mobile access point. The basis of the DaaS service is to act as a marketplace for information, utilizing new technologies and recent new scalable polyglot architectures based on NoSql databases. Whereas Open-Data platforms are beginning to be widely accepted, its mobile use is not. We compare similar products, their approach and a possible mobile usage. We discuss several approaches to address the mobile access as a native app, html5 and a mobile first approach together with the several frontend presentation techniques. Big data visualization itself is in the early days and we explore some possibilities to get big data / open data accessed by mobile users.

  7. Database Resources of the BIG Data Center in 2018.

    PubMed

    2018-01-04

    The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Ergatis: a web interface and scalable software system for bioinformatics workflows

    PubMed Central

    Orvis, Joshua; Crabtree, Jonathan; Galens, Kevin; Gussman, Aaron; Inman, Jason M.; Lee, Eduardo; Nampally, Sreenath; Riley, David; Sundaram, Jaideep P.; Felix, Victor; Whitty, Brett; Mahurkar, Anup; Wortman, Jennifer; White, Owen; Angiuoli, Samuel V.

    2010-01-01

    Motivation: The growth of sequence data has been accompanied by an increasing need to analyze data on distributed computer clusters. The use of these systems for routine analysis requires scalable and robust software for data management of large datasets. Software is also needed to simplify data management and make large-scale bioinformatics analysis accessible and reproducible to a wide class of target users. Results: We have developed a workflow management system named Ergatis that enables users to build, execute and monitor pipelines for computational analysis of genomics data. Ergatis contains preconfigured components and template pipelines for a number of common bioinformatics tasks such as prokaryotic genome annotation and genome comparisons. Outputs from many of these components can be loaded into a Chado relational database. Ergatis was designed to be accessible to a broad class of users and provides a user friendly, web-based interface. Ergatis supports high-throughput batch processing on distributed compute clusters and has been used for data management in a number of genome annotation and comparative genomics projects. Availability: Ergatis is an open-source project and is freely available at http://ergatis.sourceforge.net Contact: jorvis@users.sourceforge.net PMID:20413634

  9. Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics

    PubMed Central

    Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.

    2012-01-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849

  10. The BioMart community portal: an innovative alternative to large, centralized data repositories

    PubMed Central

    Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J.; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W.; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J.; Dassi, Erik; Genova, Alex Di; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C.; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M.; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K.; Hayes, Richard D.; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M.; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W.; Dayem Ullah, Abu Z.; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek

    2015-01-01

    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. PMID:25897122

  11. Field of genes: using Apache Kafka as a bioinformatic data repository.

    PubMed

    Lawlor, Brendan; Lynch, Richard; Mac Aogáin, Micheál; Walsh, Paul

    2018-04-01

    Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI's) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI's RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data.

  12. Scalable Database Design of End-Game Model with Decoupled Countermeasure and Threat Information

    DTIC Science & Technology

    2017-11-01

    Threat Information by Decetria Akole and Michael Chen Approved for public release; distribution is unlimited...Scalable Database Design of End-Game Model with Decoupled Countermeasure and Threat Information by Decetria Akole The Thurgood Marshall...for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data

  13. GLAD: a system for developing and deploying large-scale bioinformatics grid.

    PubMed

    Teo, Yong-Meng; Wang, Xianbing; Ng, Yew-Kwong

    2005-03-01

    Grid computing is used to solve large-scale bioinformatics problems with gigabytes database by distributing the computation across multiple platforms. Until now in developing bioinformatics grid applications, it is extremely tedious to design and implement the component algorithms and parallelization techniques for different classes of problems, and to access remotely located sequence database files of varying formats across the grid. In this study, we propose a grid programming toolkit, GLAD (Grid Life sciences Applications Developer), which facilitates the development and deployment of bioinformatics applications on a grid. GLAD has been developed using ALiCE (Adaptive scaLable Internet-based Computing Engine), a Java-based grid middleware, which exploits the task-based parallelism. Two bioinformatics benchmark applications, such as distributed sequence comparison and distributed progressive multiple sequence alignment, have been developed using GLAD.

  14. An innovative approach to capability-based emergency operations planning

    PubMed Central

    Keim, Mark E

    2013-01-01

    This paper describes the innovative use information technology for assisting disaster planners with an easily-accessible method for writing and improving evidence-based emergency operations plans. This process is used to identify all key objectives of the emergency response according to capabilities of the institution, community or society. The approach then uses a standardized, objective-based format, along with a consensus-based method for drafting capability-based operational-level plans. This information is then integrated within a relational database to allow for ease of access and enhanced functionality to search, sort and filter and emergency operations plan according to user need and technological capacity. This integrated approach is offered as an effective option for integrating best practices of planning with the efficiency, scalability and flexibility of modern information and communication technology. PMID:28228987

  15. An innovative approach to capability-based emergency operations planning.

    PubMed

    Keim, Mark E

    2013-01-01

    This paper describes the innovative use information technology for assisting disaster planners with an easily-accessible method for writing and improving evidence-based emergency operations plans. This process is used to identify all key objectives of the emergency response according to capabilities of the institution, community or society. The approach then uses a standardized, objective-based format, along with a consensus-based method for drafting capability-based operational-level plans. This information is then integrated within a relational database to allow for ease of access and enhanced functionality to search, sort and filter and emergency operations plan according to user need and technological capacity. This integrated approach is offered as an effective option for integrating best practices of planning with the efficiency, scalability and flexibility of modern information and communication technology.

  16. Alternatives to relational databases in precision medicine: Comparison of NoSQL approaches for big data storage using supercomputers

    NASA Astrophysics Data System (ADS)

    Velazquez, Enrique Israel

    Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.

  17. LVFS: A Scalable Petabye/Exabyte Data Storage System

    NASA Astrophysics Data System (ADS)

    Golpayegani, N.; Halem, M.; Masuoka, E. J.; Ye, G.; Devine, N. K.

    2013-12-01

    Managing petabytes of data with hundreds of millions of files is the first step necessary towards an effective big data computing and collaboration environment in a distributed system. We describe here the MODAPS LAADS Virtual File System (LVFS), a new storage architecture which replaces the previous MODAPS operational Level 1 Land Atmosphere Archive Distribution System (LAADS) NFS based approach to storing and distributing datasets from several instruments, such as MODIS, MERIS, and VIIRS. LAADS is responsible for the distribution of over 4 petabytes of data and over 300 million files across more than 500 disks. We present here the first LVFS big data comparative performance results and new capabilities not previously possible with the LAADS system. We consider two aspects in addressing inefficiencies of massive scales of data. First, is dealing in a reliable and resilient manner with the volume and quantity of files in such a dataset, and, second, minimizing the discovery and lookup times for accessing files in such large datasets. There are several popular file systems that successfully deal with the first aspect of the problem. Their solution, in general, is through distribution, replication, and parallelism of the storage architecture. The Hadoop Distributed File System (HDFS), Parallel Virtual File System (PVFS), and Lustre are examples of such file systems that deal with petabyte data volumes. The second aspect deals with data discovery among billions of files, the largest bottleneck in reducing access time. The metadata of a file, generally represented in a directory layout, is stored in ways that are not readily scalable. This is true for HDFS, PVFS, and Lustre as well. Recent experimental file systems, such as Spyglass or Pantheon, have attempted to address this problem through redesign of the metadata directory architecture. LVFS takes a radically different architectural approach by eliminating the need for a separate directory within the file system. The LVFS system replaces the NFS disk mounting approach of LAADS and utilizes the already existing highly optimized metadata database server, which is applicable to most scientific big data intensive compute systems. Thus, LVFS ties the existing storage system with the existing metadata infrastructure system which we believe leads to a scalable exabyte virtual file system. The uniqueness of the implemented design is not limited to LAADS but can be employed with most scientific data processing systems. By utilizing the Filesystem In Userspace (FUSE), a kernel module available in many operating systems, LVFS was able to replace the NFS system while staying POSIX compliant. As a result, the LVFS system becomes scalable to exabyte sizes owing to the use of highly scalable database servers optimized for metadata storage. The flexibility of the LVFS design allows it to organize data on the fly in different ways, such as by region, date, instrument or product without the need for duplication, symbolic links, or any other replication methods. We proposed here a strategic reference architecture that addresses the inefficiencies of scientific petabyte/exabyte file system access through the dynamic integration of the observing system's large metadata file.

  18. Integration of Oracle and Hadoop: Hybrid Databases Affordable at Scale

    NASA Astrophysics Data System (ADS)

    Canali, L.; Baranowski, Z.; Kothuri, P.

    2017-10-01

    This work reports on the activities aimed at integrating Oracle and Hadoop technologies for the use cases of CERN database services and in particular on the development of solutions for offloading data and queries from Oracle databases into Hadoop-based systems. The goal and interest of this investigation is to increase the scalability and optimize the cost/performance footprint for some of our largest Oracle databases. These concepts have been applied, among others, to build offline copies of CERN accelerator controls and logging databases. The tested solution allows to run reports on the controls data offloaded in Hadoop without affecting the critical production database, providing both performance benefits and cost reduction for the underlying infrastructure. Other use cases discussed include building hybrid database solutions with Oracle and Hadoop, offering the combined advantages of a mature relational database system with a scalable analytics engine.

  19. A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

    DOE PAGES

    Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

    2013-01-01

    Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less

  20. Field of genes: using Apache Kafka as a bioinformatic data repository

    PubMed Central

    Lynch, Richard; Walsh, Paul

    2018-01-01

    Abstract Background Bioinformatic research is increasingly dependent on large-scale datasets, accessed either from private or public repositories. An example of a public repository is National Center for Biotechnology Information's (NCBI’s) Reference Sequence (RefSeq). These repositories must decide in what form to make their data available. Unstructured data can be put to almost any use but are limited in how access to them can be scaled. Highly structured data offer improved performance for specific algorithms but limit the wider usefulness of the data. We present an alternative: lightly structured data stored in Apache Kafka in a way that is amenable to parallel access and streamed processing, including subsequent transformations into more highly structured representations. We contend that this approach could provide a flexible and powerful nexus of bioinformatic data, bridging the gap between low structure on one hand, and high performance and scale on the other. To demonstrate this, we present a proof-of-concept version of NCBI’s RefSeq database using this technology. We measure the performance and scalability characteristics of this alternative with respect to flat files. Results The proof of concept scales almost linearly as more compute nodes are added, outperforming the standard approach using files. Conclusions Apache Kafka merits consideration as a fast and more scalable but general-purpose way to store and retrieve bioinformatic data, for public, centralized reference datasets such as RefSeq and for private clinical and experimental data. PMID:29635394

  1. The BioMart community portal: an innovative alternative to large, centralized data repositories.

    PubMed

    Smedley, Damian; Haider, Syed; Durinck, Steffen; Pandini, Luca; Provero, Paolo; Allen, James; Arnaiz, Olivier; Awedh, Mohammad Hamza; Baldock, Richard; Barbiera, Giulia; Bardou, Philippe; Beck, Tim; Blake, Andrew; Bonierbale, Merideth; Brookes, Anthony J; Bucci, Gabriele; Buetti, Iwan; Burge, Sarah; Cabau, Cédric; Carlson, Joseph W; Chelala, Claude; Chrysostomou, Charalambos; Cittaro, Davide; Collin, Olivier; Cordova, Raul; Cutts, Rosalind J; Dassi, Erik; Di Genova, Alex; Djari, Anis; Esposito, Anthony; Estrella, Heather; Eyras, Eduardo; Fernandez-Banet, Julio; Forbes, Simon; Free, Robert C; Fujisawa, Takatomo; Gadaleta, Emanuela; Garcia-Manteiga, Jose M; Goodstein, David; Gray, Kristian; Guerra-Assunção, José Afonso; Haggarty, Bernard; Han, Dong-Jin; Han, Byung Woo; Harris, Todd; Harshbarger, Jayson; Hastings, Robert K; Hayes, Richard D; Hoede, Claire; Hu, Shen; Hu, Zhi-Liang; Hutchins, Lucie; Kan, Zhengyan; Kawaji, Hideya; Keliet, Aminah; Kerhornou, Arnaud; Kim, Sunghoon; Kinsella, Rhoda; Klopp, Christophe; Kong, Lei; Lawson, Daniel; Lazarevic, Dejan; Lee, Ji-Hyun; Letellier, Thomas; Li, Chuan-Yun; Lio, Pietro; Liu, Chu-Jun; Luo, Jie; Maass, Alejandro; Mariette, Jerome; Maurel, Thomas; Merella, Stefania; Mohamed, Azza Mostafa; Moreews, Francois; Nabihoudine, Ibounyamine; Ndegwa, Nelson; Noirot, Céline; Perez-Llamas, Cristian; Primig, Michael; Quattrone, Alessandro; Quesneville, Hadi; Rambaldi, Davide; Reecy, James; Riba, Michela; Rosanoff, Steven; Saddiq, Amna Ali; Salas, Elisa; Sallou, Olivier; Shepherd, Rebecca; Simon, Reinhard; Sperling, Linda; Spooner, William; Staines, Daniel M; Steinbach, Delphine; Stone, Kevin; Stupka, Elia; Teague, Jon W; Dayem Ullah, Abu Z; Wang, Jun; Ware, Doreen; Wong-Erasmus, Marie; Youens-Clark, Ken; Zadissa, Amonida; Zhang, Shi-Jian; Kasprzyk, Arek

    2015-07-01

    The BioMart Community Portal (www.biomart.org) is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. SiC: An Agent Based Architecture for Preventing and Detecting Attacks to Ubiquitous Databases

    NASA Astrophysics Data System (ADS)

    Pinzón, Cristian; de Paz, Yanira; Bajo, Javier; Abraham, Ajith; Corchado, Juan M.

    One of the main attacks to ubiquitous databases is the structure query language (SQL) injection attack, which causes severe damages both in the commercial aspect and in the user’s confidence. This chapter proposes the SiC architecture as a solution to the SQL injection attack problem. This is a hierarchical distributed multiagent architecture, which involves an entirely new approach with respect to existing architectures for the prevention and detection of SQL injections. SiC incorporates a kind of intelligent agent, which integrates a case-based reasoning system. This agent, which is the core of the architecture, allows the application of detection techniques based on anomalies as well as those based on patterns, providing a great degree of autonomy, flexibility, robustness and dynamic scalability. The characteristics of the multiagent system allow an architecture to detect attacks from different types of devices, regardless of the physical location. The architecture has been tested on a medical database, guaranteeing safe access from various devices such as PDAs and notebook computers.

  3. Introducing the PRIDE Archive RESTful web services.

    PubMed

    Reisinger, Florian; del-Toro, Noemi; Ternent, Tobias; Hermjakob, Henning; Vizcaíno, Juan Antonio

    2015-07-01

    The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Service Management Database for DSN Equipment

    NASA Technical Reports Server (NTRS)

    Zendejas, Silvino; Bui, Tung; Bui, Bach; Malhotra, Shantanu; Chen, Fannie; Wolgast, Paul; Allen, Christopher; Luong, Ivy; Chang, George; Sadaqathulla, Syed

    2009-01-01

    This data- and event-driven persistent storage system leverages the use of commercial software provided by Oracle for portability, ease of maintenance, scalability, and ease of integration with embedded, client-server, and multi-tiered applications. In this role, the Service Management Database (SMDB) is a key component of the overall end-to-end process involved in the scheduling, preparation, and configuration of the Deep Space Network (DSN) equipment needed to perform the various telecommunication services the DSN provides to its customers worldwide. SMDB makes efficient use of triggers, stored procedures, queuing functions, e-mail capabilities, data management, and Java integration features provided by the Oracle relational database management system. SMDB uses a third normal form schema design that allows for simple data maintenance procedures and thin layers of integration with client applications. The software provides an integrated event logging system with ability to publish events to a JMS messaging system for synchronous and asynchronous delivery to subscribed applications. It provides a structured classification of events and application-level messages stored in database tables that are accessible by monitoring applications for real-time monitoring or for troubleshooting and analysis over historical archives.

  5. Relax with CouchDB--into the non-relational DBMS era of bioinformatics.

    PubMed

    Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R

    2012-07-01

    With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. PathCase-SB architecture and database design

    PubMed Central

    2011-01-01

    Background Integration of metabolic pathways resources and regulatory metabolic network models, and deploying new tools on the integrated platform can help perform more effective and more efficient systems biology research on understanding the regulation in metabolic networks. Therefore, the tasks of (a) integrating under a single database environment regulatory metabolic networks and existing models, and (b) building tools to help with modeling and analysis are desirable and intellectually challenging computational tasks. Description PathCase Systems Biology (PathCase-SB) is built and released. The PathCase-SB database provides data and API for multiple user interfaces and software tools. The current PathCase-SB system provides a database-enabled framework and web-based computational tools towards facilitating the development of kinetic models for biological systems. PathCase-SB aims to integrate data of selected biological data sources on the web (currently, BioModels database and KEGG), and to provide more powerful and/or new capabilities via the new web-based integrative framework. This paper describes architecture and database design issues encountered in PathCase-SB's design and implementation, and presents the current design of PathCase-SB's architecture and database. Conclusions PathCase-SB architecture and database provide a highly extensible and scalable environment with easy and fast (real-time) access to the data in the database. PathCase-SB itself is already being used by researchers across the world. PMID:22070889

  7. Evaluating a NoSQL Alternative for Chilean Virtual Observatory Services

    NASA Astrophysics Data System (ADS)

    Antognini, J.; Araya, M.; Solar, M.; Valenzuela, C.; Lira, F.

    2015-09-01

    Currently, the standards and protocols for data access in the Virtual Observatory architecture (DAL) are generally implemented with relational databases based on SQL. In particular, the Astronomical Data Query Language (ADQL), language used by IVOA to represent queries to VO services, was created to satisfy the different data access protocols, such as Simple Cone Search. ADQL is based in SQL92, and has extra functionality implemented using PgSphere. An emergent alternative to SQL are the so called NoSQL databases, which can be classified in several categories such as Column, Document, Key-Value, Graph, Object, etc.; each one recommended for different scenarios. Within their notable characteristics we can find: schema-free, easy replication support, simple API, Big Data, etc. The Chilean Virtual Observatory (ChiVO) is developing a functional prototype based on the IVOA architecture, with the following relevant factors: Performance, Scalability, Flexibility, Complexity, and Functionality. Currently, it's very difficult to compare these factors, due to a lack of alternatives. The objective of this paper is to compare NoSQL alternatives with SQL through the implementation of a Web API REST that satisfies ChiVO's needs: a SESAME-style name resolver for the data from ALMA. Therefore, we propose a test scenario by configuring a NoSQL database with data from different sources and evaluating the feasibility of creating a Simple Cone Search service and its performance. This comparison will allow to pave the way for the application of Big Data databases in the Virtual Observatory.

  8. A scalable database model for multiparametric time series: a volcano observatory case study

    NASA Astrophysics Data System (ADS)

    Montalto, Placido; Aliotta, Marco; Cassisi, Carmelo; Prestifilippo, Michele; Cannata, Andrea

    2014-05-01

    The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.

  9. Enhancing UCSF Chimera through web services

    PubMed Central

    Huang, Conrad C.; Meng, Elaine C.; Morris, John H.; Pettersen, Eric F.; Ferrin, Thomas E.

    2014-01-01

    Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList. PMID:24861624

  10. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

    PubMed Central

    2012-01-01

    Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909

  11. Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.

    PubMed

    Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John

    2012-12-05

    For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.

  12. Towards Scalable Deep Learning via I/O Analysis and Optimization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pumma, Sarunya; Si, Min; Feng, Wu-Chun

    Deep learning systems have been growing in prominence as a way to automatically characterize objects, trends, and anomalies. Given the importance of deep learning systems, researchers have been investigating techniques to optimize such systems. An area of particular interest has been using large supercomputing systems to quickly generate effective deep learning networks: a phase often referred to as “training” of the deep learning neural network. As we scale existing deep learning frameworks—such as Caffe—on these large supercomputing systems, we notice that the parallelism can help improve the computation tremendously, leaving data I/O as the major bottleneck limiting the overall systemmore » scalability. In this paper, we first present a detailed analysis of the performance bottlenecks of Caffe on large supercomputing systems. Our analysis shows that the I/O subsystem of Caffe—LMDB—relies on memory-mapped I/O to access its database, which can be highly inefficient on large-scale systems because of its interaction with the process scheduling system and the network-based parallel filesystem. Based on this analysis, we then present LMDBIO, our optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Our experimental results show that LMDBIO can improve the overall execution time of Caffe by nearly 20-fold in some cases.« less

  13. ArrayBridge: Interweaving declarative array processing with high-performance computing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xing, Haoyuan; Floratos, Sofoklis; Blanas, Spyros

    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aimsmore » to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.« less

  14. Gigwa-Genotype investigator for genome-wide analyses.

    PubMed

    Sempéré, Guilhem; Philippe, Florian; Dereeper, Alexis; Ruiz, Manuel; Sarah, Gautier; Larmande, Pierre

    2016-06-06

    Exploring the structure of genomes and analyzing their evolution is essential to understanding the ecological adaptation of organisms. However, with the large amounts of data being produced by next-generation sequencing, computational challenges arise in terms of storage, search, sharing, analysis and visualization. This is particularly true with regards to studies of genomic variation, which are currently lacking scalable and user-friendly data exploration solutions. Here we present Gigwa, a web-based tool that provides an easy and intuitive way to explore large amounts of genotyping data by filtering it not only on the basis of variant features, including functional annotations, but also on genotype patterns. The data storage relies on MongoDB, which offers good scalability properties. Gigwa can handle multiple databases and may be deployed in either single- or multi-user mode. In addition, it provides a wide range of popular export formats. The Gigwa application is suitable for managing large amounts of genomic variation data. Its user-friendly web interface makes such processing widely accessible. It can either be simply deployed on a workstation or be used to provide a shared data portal for a given community of researchers.

  15. Alternatives to relational database: comparison of NoSQL and XML approaches for clinical data storage.

    PubMed

    Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze

    2013-04-01

    Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  16. Design and implementation of scalable tape archiver

    NASA Technical Reports Server (NTRS)

    Nemoto, Toshihiro; Kitsuregawa, Masaru; Takagi, Mikio

    1996-01-01

    In order to reduce costs, computer manufacturers try to use commodity parts as much as possible. Mainframes using proprietary processors are being replaced by high performance RISC microprocessor-based workstations, which are further being replaced by the commodity microprocessor used in personal computers. Highly reliable disks for mainframes are also being replaced by disk arrays, which are complexes of disk drives. In this paper we try to clarify the feasibility of a large scale tertiary storage system composed of 8-mm tape archivers utilizing robotics. In the near future, the 8-mm tape archiver will be widely used and become a commodity part, since recent rapid growth of multimedia applications requires much larger storage than disk drives can provide. We designed a scalable tape archiver which connects as many 8-mm tape archivers (element archivers) as possible. In the scalable archiver, robotics can exchange a cassette tape between two adjacent element archivers mechanically. Thus, we can build a large scalable archiver inexpensively. In addition, a sophisticated migration mechanism distributes frequently accessed tapes (hot tapes) evenly among all of the element archivers, which improves the throughput considerably. Even with the failures of some tape drives, the system dynamically redistributes hot tapes to the other element archivers which have live tape drives. Several kinds of specially tailored huge archivers are on the market, however, the 8-mm tape scalable archiver could replace them. To maintain high performance in spite of high access locality when a large number of archivers are attached to the scalable archiver, it is necessary to scatter frequently accessed cassettes among the element archivers and to use the tape drives efficiently. For this purpose, we introduce two cassette migration algorithms, foreground migration and background migration. Background migration transfers cassettes between element archivers to redistribute frequently accessed cassettes, thus balancing the load of each archiver. Background migration occurs the robotics are idle. Both migration algorithms are based on access frequency and space utility of each element archiver. To normalize these parameters according to the number of drives in each element archiver, it is possible to maintain high performance even if some tape drives fail. We found that the foreground migration is efficient at reducing access response time. Beside the foreground migration, the background migration makes it possible to track the transition of spatial access locality quickly.

  17. The Virtual Watershed Observatory: Cyberinfrastructure for Model-Data Integration and Access

    NASA Astrophysics Data System (ADS)

    Duffy, C.; Leonard, L. N.; Giles, L.; Bhatt, G.; Yu, X.

    2011-12-01

    The Virtual Watershed Observatory (VWO) is a concept where scientists, water managers, educators and the general public can create a virtual observatory from integrated hydrologic model results, national databases and historical or real-time observations via web services. In this paper, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover. The VWO has the broad purpose of making accessible water resource simulations, real-time data assimilation, calibration and archival at the scale of HUC 12 watersheds (Hydrologic Unit Code) anywhere in the continental US. Our prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed simulation. The paper will describe cyberinfrastructure tools and workflow that attempts to resolve the problem of model-data accessibility and scalability such that individuals, research teams, managers and educators can create a WVO in a desired context. Examples are given for the NSF-funded Shale Hills Critical Zone Observatory and the European Critical Zone Observatories within the SoilTrEC project. In the future implementation of WVO services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.

  18. A Grid Metadata Service for Earth and Environmental Sciences

    NASA Astrophysics Data System (ADS)

    Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni

    2010-05-01

    Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which increases flexibility and scalability; (iii) provides full support for Grid Security Infrastructure, which means (authorization, mutual authentication, data integrity, data confidentiality and delegation); (iv) is compatible with existing grid middleware such as gLite and Globus and finally (v) is currently adopted at the Euro-Mediterranean Centre for Climate Change (CMCC - Italy) to manage the entire CMCC data production activity as well as in the international Climate-G testbed.

  19. A Parallel Relational Database Management System Approach to Relevance Feedback in Information Retrieval.

    ERIC Educational Resources Information Center

    Lundquist, Carol; Frieder, Ophir; Holmes, David O.; Grossman, David

    1999-01-01

    Describes a scalable, parallel, relational database-drive information retrieval engine. To support portability across a wide range of execution environments, all algorithms adhere to the SQL-92 standard. By incorporating relevance feedback algorithms, accuracy is enhanced over prior database-driven information retrieval efforts. Presents…

  20. Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the Earthcube CINERGI Project

    NASA Astrophysics Data System (ADS)

    Zaslavsky, I.; Richard, S. M.; Valentine, D. W., Jr.; Grethe, J. S.; Hsu, L.; Malik, T.; Bermudez, L. E.; Gupta, A.; Lehnert, K. A.; Whitenack, T.; Ozyurt, I. B.; Condit, C.; Calderon, R.; Musil, L.

    2014-12-01

    EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application. This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project's website (http://workspace.earthcube.org/cinergi), which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.

  1. Connecting real-time data to algorithms and databases: EarthCube's Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS)

    NASA Astrophysics Data System (ADS)

    Daniels, M. D.; Graves, S. J.; Kerkez, B.; Chandrasekar, V.; Vernon, F.; Martin, C. L.; Maskey, M.; Keiser, K.; Dye, M. J.

    2015-12-01

    The Cloud-Hosted Real-time Data Services for the Geosciences (CHORDS) project was funded under the National Science Foundation's EarthCube initiative. CHORDS addresses the ever-increasing importance of real-time scientific data in the geosciences, particularly in mission critical scenarios, where informed decisions must be made rapidly. Access to constant streams of real-time data also allow many new transient phenomena in space-time to be observed, however, much of these streaming data are either completely inaccessible or only available to proprietary in-house tools or displays. Small research teams do not have the resources to develop tools for the broad dissemination of their unique real-time data and require an easy to use, scalable, cloud-based solution to facilitate this access. CHORDS will make these diverse streams of real-time data available to the broader geosciences community. This talk will highlight a recently developed CHORDS portal tools and processing systems which address some of the gaps in handling real-time data, particularly in the provisioning of data from the "long-tail" scientific community through a simple interface that is deployed in the cloud, is scalable and is able to be customized by research teams. A running portal, with operational data feeds from across the nation, will be presented. The processing within the CHORDS system will expose these real-time streams via standard services from the Open Geospatial Consortium (OGC) in a way that is simple and transparent to the data provider, while maximizing the usage of these investments. The ingestion of high velocity, high volume and diverse data has allowed the project to explore a NoSQL database implementation. Broad use of the CHORDS framework by geoscientists will help to facilitate adaptive experimentation, model assimilation and real-time hypothesis testing.

  2. Wired/wireless access integrated RoF-PON with scalable generation of multi-frequency MMWs enabled by polarization multiplexed FWM in SOA.

    PubMed

    Xiang, Yu; Chen, Chen; Zhang, Chongfu; Qiu, Kun

    2013-01-14

    In this paper, we propose and demonstrate a novel integrated radio-over-fiber passive optical network (RoF-PON) system for both wired and wireless access. By utilizing the polarization multiplexed four-wave mixing (FWM) effect in a semiconductor optical amplifier (SOA), scalable generation of multi-frequency millimeter-waves (MMWs) can be provided so as to assist the configuration of multi-frequency wireless access for the wire/wireless access integrated ROF-PON system. In order to obtain a better performance, the polarization multiplexed FWM effect is investigated in detail. Simulation results successfully verify the feasibility of our proposed scheme.

  3. PM2006: a highly scalable urban planning management information system--Case study: Suzhou Urban Planning Bureau

    NASA Astrophysics Data System (ADS)

    Jing, Changfeng; Liang, Song; Ruan, Yong; Huang, Jie

    2008-10-01

    During the urbanization process, when facing complex requirements of city development, ever-growing urban data, rapid development of planning business and increasing planning complexity, a scalable, extensible urban planning management information system is needed urgently. PM2006 is such a system that can deal with these problems. In response to the status and problems in urban planning, the scalability and extensibility of PM2006 are introduced which can be seen as business-oriented workflow extensibility, scalability of DLL-based architecture, flexibility on platforms of GIS and database, scalability of data updating and maintenance and so on. It is verified that PM2006 system has good extensibility and scalability which can meet the requirements of all levels of administrative divisions and can adapt to ever-growing changes in urban planning business. At the end of this paper, the application of PM2006 in Urban Planning Bureau of Suzhou city is described.

  4. Enhancing UCSF Chimera through web services.

    PubMed

    Huang, Conrad C; Meng, Elaine C; Morris, John H; Pettersen, Eric F; Ferrin, Thomas E

    2014-07-01

    Integrating access to web services with desktop applications allows for an expanded set of application features, including performing computationally intensive tasks and convenient searches of databases. We describe how we have enhanced UCSF Chimera (http://www.rbvi.ucsf.edu/chimera/), a program for the interactive visualization and analysis of molecular structures and related data, through the addition of several web services (http://www.rbvi.ucsf.edu/chimera/docs/webservices.html). By streamlining access to web services, including the entire job submission, monitoring and retrieval process, Chimera makes it simpler for users to focus on their science projects rather than data manipulation. Chimera uses Opal, a toolkit for wrapping scientific applications as web services, to provide scalable and transparent access to several popular software packages. We illustrate Chimera's use of web services with an example workflow that interleaves use of these services with interactive manipulation of molecular sequences and structures, and we provide an example Python program to demonstrate how easily Opal-based web services can be accessed from within an application. Web server availability: http://webservices.rbvi.ucsf.edu/opal2/dashboard?command=serviceList. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. New capabilities in the HENP grand challenge storage access systemand its application at RHIC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bernardo, L.; Gibbard, B.; Malon, D.

    2000-04-25

    The High Energy and Nuclear Physics Data Access GrandChallenge project has developed an optimizing storage access softwaresystem that was prototyped at RHIC. It is currently undergoingintegration with the STAR experiment in preparation for data taking thatstarts in mid-2000. The behavior and lessons learned in the RHIC MockData Challenge exercises are described as well as the observedperformance under conditions designed to characterize scalability. Up to250 simultaneous queries were tested and up to 10 million events across 7event components were involved in these queries. The system coordinatesthe staging of "bundles" of files from the HPSS tape system, so that allthe needed componentsmore » of each event are in disk cache when accessed bythe application software. The caching policy algorithm for thecoordinated bundle staging is described in the paper. The initialprototype implementation interfaced to the Objectivity/DB. In this latestversion, it evolved to work with arbitrary files and use CORBA interfacesto the tag database and file catalog services. The interface to the tagdatabase and the MySQL-based file catalog services used by STAR aredescribed along with the planned usage scenarios.« less

  6. The CEBAF Element Database and Related Operational Software

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Larrieu, Theodore; Slominski, Christopher; Keesee, Marie

    The newly commissioned 12GeV CEBAF accelerator relies on a flexible, scalable and comprehensive database to define the accelerator. This database delivers the configuration for CEBAF operational tools, including hardware checkout, the downloadable optics model, control screens, and much more. The presentation will describe the flexible design of the CEBAF Element Database (CED), its features and assorted use case examples.

  7. IntNetDB v1.0: an integrated protein-protein interaction network database generated by a probabilistic model

    PubMed Central

    Xia, Kai; Dong, Dong; Han, Jing-Dong J

    2006-01-01

    Background Although protein-protein interaction (PPI) networks have been explored by various experimental methods, the maps so built are still limited in coverage and accuracy. To further expand the PPI network and to extract more accurate information from existing maps, studies have been carried out to integrate various types of functional relationship data. A frequently updated database of computationally analyzed potential PPIs to provide biological researchers with rapid and easy access to analyze original data as a biological network is still lacking. Results By applying a probabilistic model, we integrated 27 heterogeneous genomic, proteomic and functional annotation datasets to predict PPI networks in human. In addition to previously studied data types, we show that phenotypic distances and genetic interactions can also be integrated to predict PPIs. We further built an easy-to-use, updatable integrated PPI database, the Integrated Network Database (IntNetDB) online, to provide automatic prediction and visualization of PPI network among genes of interest. The networks can be visualized in SVG (Scalable Vector Graphics) format for zooming in or out. IntNetDB also provides a tool to extract topologically highly connected network neighborhoods from a specific network for further exploration and research. Using the MCODE (Molecular Complex Detections) algorithm, 190 such neighborhoods were detected among all the predicted interactions. The predicted PPIs can also be mapped to worm, fly and mouse interologs. Conclusion IntNetDB includes 180,010 predicted protein-protein interactions among 9,901 human proteins and represents a useful resource for the research community. Our study has increased prediction coverage by five-fold. IntNetDB also provides easy-to-use network visualization and analysis tools that allow biological researchers unfamiliar with computational biology to access and analyze data over the internet. The web interface of IntNetDB is freely accessible at . Visualization requires Mozilla version 1.8 (or higher) or Internet Explorer with installation of SVGviewer. PMID:17112386

  8. Chemistry Modeling for Aerothermodynamics and TPS

    NASA Technical Reports Server (NTRS)

    Wang, Dunyou; Stallcop, James R.; Dateo, Christopher e.; Schwenke, David W.; Halicioglu, Timur; Huo, winifred M.

    2005-01-01

    Recent advances in supercomputers and highly scalable quantum chemistry software render computational chemistry methods a viable means of providing chemistry data for aerothermal analysis at a specific level of confidence. Four examples of first principles quantum chemistry calculations will be presented. Study of the highly nonequilibrium rotational distribution of a nitrogen molecule from the exchange reaction N + N2 illustrates how chemical reactions can influence rotational distribution. The reaction C2H + H2 is one example of a radical reaction that occurs during hypersonic entry into an atmosphere containing methane. A study of the etching of a Si surface illustrates our approach to surface reactions. A recently developed web accessible database and software tool (DDD) that provides the radiation profile of diatomic molecules is also described.

  9. Chemistry Modeling for Aerothermodynamics and TPS

    NASA Technical Reports Server (NTRS)

    Wang, Dun-You; Stallcop, James R.; Dateo, Christopher E.; Schwenke, David W.; Haliciogiu, Timur; Huo, Winifred

    2004-01-01

    Recent advances in supercomputers and highly scalable quantum chemistry software render computational chemistry methods a viable means of providing chemistry data for aerothermal analysis at a specific level of confidence. Four examples of first principles quantum chemistry calculations will be presented. The study of the highly nonequilibrium rotational distribution of nitrogen molecule from the exchange reaction N + N2 illustrates how chemical reactions can influence the rotational distribution. The reaction C2H + H2 is one example of a radical reaction that occurs during hypersonic entry into a methane containing atmosphere. A study of the etching of Si surface illustrates our approach to surface reactions. A recently developed web accessible database and software tool (DDD) that provides the radiation profile of diatomic molecules is also described.

  10. Quantum-secured blockchain

    NASA Astrophysics Data System (ADS)

    Kiktenko, E. O.; Pozhar, N. O.; Anufriev, M. N.; Trushechkin, A. S.; Yunusov, R. R.; Kurochkin, Y. V.; Lvovsky, A. I.; Fedorov, A. K.

    2018-07-01

    Blockchain is a distributed database which is cryptographically protected against malicious modifications. While promising for a wide range of applications, current blockchain platforms rely on digital signatures, which are vulnerable to attacks by means of quantum computers. The same, albeit to a lesser extent, applies to cryptographic hash functions that are used in preparing new blocks, so parties with access to quantum computation would have unfair advantage in procuring mining rewards. Here we propose a possible solution to the quantum era blockchain challenge and report an experimental realization of a quantum-safe blockchain platform that utilizes quantum key distribution across an urban fiber network for information-theoretically secure authentication. These results address important questions about realizability and scalability of quantum-safe blockchains for commercial and governmental applications.

  11. UPM: unified policy-based network management

    NASA Astrophysics Data System (ADS)

    Law, Eddie; Saxena, Achint

    2001-07-01

    Besides providing network management to the Internet, it has become essential to offer different Quality of Service (QoS) to users. Policy-based management provides control on network routers to achieve this goal. The Internet Engineering Task Force (IETF) has proposed a two-tier architecture whose implementation is based on the Common Open Policy Service (COPS) protocol and Lightweight Directory Access Protocol (LDAP). However, there are several limitations to this design such as scalability and cross-vendor hardware compatibility. To address these issues, we present a functionally enhanced multi-tier policy management architecture design in this paper. Several extensions are introduced thereby adding flexibility and scalability. In particular, an intermediate entity between the policy server and policy rule database called the Policy Enforcement Agent (PEA) is introduced. By keeping internal data in a common format, using a standard protocol, and by interpreting and translating request and decision messages from multi-vendor hardware, this agent allows a dynamic Unified Information Model throughout the architecture. We have tailor-made this unique information system to save policy rules in the directory server and allow executions of policy rules with dynamic addition of new equipment during run-time.

  12. From EGEE Operations Portal towards EGI Operations Portal

    NASA Astrophysics Data System (ADS)

    Cordier, Hélène; L'Orphelin, Cyril; Reynaud, Sylvain; Lequeux, Olivier; Loikkanen, Sinikka; Veyre, Pierre

    Grid operators in EGEE have been using a dedicated dashboard as their central operational tool, stable and scalable for the last 5 years despite continuous upgrade from specifications by users, monitoring tools or data providers. In EGEE-III, recent regionalisation of operations led the Operations Portal developers to conceive a standalone instance of this tool. We will see how the dashboard reorganization paved the way for the re-engineering of the portal itself. The outcome is an easily deployable package customized with relevant information sources and specific decentralized operational requirements. This package is composed of a generic and scalable data access mechanism, Lavoisier; a renowned php framework for configuration flexibility, Symfony and a MySQL database. VO life cycle and operational information, EGEE broadcast and Downtime notifications are next for the major reorganization until all other key features of the Operations Portal are migrated to the framework. Features specifications will be sketched at the same time to adapt to EGI requirements and to upgrade. Future work on feature regionalisation, on new advanced features or strategy planning will be tracked in EGI- Inspire through the Operations Tools Advisory Group, OTAG, where all users, customers and third parties of the Operations Portal are represented from January 2010.

  13. CellAtlasSearch: a scalable search engine for single cells.

    PubMed

    Srivastava, Divyanshu; Iyer, Arvind; Kumar, Vibhor; Sengupta, Debarka

    2018-05-21

    Owing to the advent of high throughput single cell transcriptomics, past few years have seen exponential growth in production of gene expression data. Recently efforts have been made by various research groups to homogenize and store single cell expression from a large number of studies. The true value of this ever increasing data deluge can be unlocked by making it searchable. To this end, we propose CellAtlasSearch, a novel search architecture for high dimensional expression data, which is massively parallel as well as light-weight, thus infinitely scalable. In CellAtlasSearch, we use a Graphical Processing Unit (GPU) friendly version of Locality Sensitive Hashing (LSH) for unmatched speedup in data processing and query. Currently, CellAtlasSearch features over 300 000 reference expression profiles including both bulk and single-cell data. It enables the user query individual single cell transcriptomes and finds matching samples from the database along with necessary meta information. CellAtlasSearch aims to assist researchers and clinicians in characterizing unannotated single cells. It also facilitates noise free, low dimensional representation of single-cell expression profiles by projecting them on a wide variety of reference samples. The web-server is accessible at: http://www.cellatlassearch.com.

  14. Scalable Lunar Surface Networks and Adaptive Orbit Access

    NASA Technical Reports Server (NTRS)

    Wang, Xudong

    2015-01-01

    Teranovi Technologies, Inc., has developed innovative network architecture, protocols, and algorithms for both lunar surface and orbit access networks. A key component of the overall architecture is a medium access control (MAC) protocol that includes a novel mechanism of overlaying time division multiple access (TDMA) and carrier sense multiple access with collision avoidance (CSMA/CA), ensuring scalable throughput and quality of service. The new MAC protocol is compatible with legacy Institute of Electrical and Electronics Engineers (IEEE) 802.11 networks. Advanced features include efficiency power management, adaptive channel width adjustment, and error control capability. A hybrid routing protocol combines the advantages of ad hoc on-demand distance vector (AODV) routing and disruption/delay-tolerant network (DTN) routing. Performance is significantly better than AODV or DTN and will be particularly effective for wireless networks with intermittent links, such as lunar and planetary surface networks and orbit access networks.

  15. A scalable healthcare information system based on a service-oriented architecture.

    PubMed

    Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei

    2011-06-01

    Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.

  16. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Linear Time Algorithms to Restrict Insider Access using Multi-Policy Access Control Systems

    PubMed Central

    Mell, Peter; Shook, James; Harang, Richard; Gavrila, Serban

    2017-01-01

    An important way to limit malicious insiders from distributing sensitive information is to as tightly as possible limit their access to information. This has always been the goal of access control mechanisms, but individual approaches have been shown to be inadequate. Ensemble approaches of multiple methods instantiated simultaneously have been shown to more tightly restrict access, but approaches to do so have had limited scalability (resulting in exponential calculations in some cases). In this work, we take the Next Generation Access Control (NGAC) approach standardized by the American National Standards Institute (ANSI) and demonstrate its scalability. The existing publicly available reference implementations all use cubic algorithms and thus NGAC was widely viewed as not scalable. The primary NGAC reference implementation took, for example, several minutes to simply display the set of files accessible to a user on a moderately sized system. In our approach, we take these cubic algorithms and make them linear. We do this by reformulating the set theoretic approach of the NGAC standard into a graph theoretic approach and then apply standard graph algorithms. We thus can answer important access control decision questions (e.g., which files are available to a user and which users can access a file) using linear time graph algorithms. We also provide a default linear time mechanism to visualize and review user access rights for an ensemble of access control mechanisms. Our visualization appears to be a simple file directory hierarchy but in reality is an automatically generated structure abstracted from the underlying access control graph that works with any set of simultaneously instantiated access control policies. It also provide an implicit mechanism for symbolic linking that provides a powerful access capability. Our work thus provides the first efficient implementation of NGAC while enabling user privilege review through a novel visualization approach. This may help transition from concept to reality the idea of using ensembles of simultaneously instantiated access control methodologies, thereby limiting insider threat. PMID:28758045

  18. Efficient and Scalable Cross-Matching of (Very) Large Catalogs

    NASA Astrophysics Data System (ADS)

    Pineau, F.-X.; Boch, T.; Derriere, S.

    2011-07-01

    Whether it be for building multi-wavelength datasets from independent surveys, studying changes in objects luminosities, or detecting moving objects (stellar proper motions, asteroids), cross-catalog matching is a technique widely used in astronomy. The need for efficient, reliable and scalable cross-catalog matching is becoming even more pressing with forthcoming projects which will produce huge catalogs in which astronomers will dig for rare objects, perform statistical analysis and classification, or real-time transients detection. We have developed a formalism and the corresponding technical framework to address the challenge of fast cross-catalog matching. Our formalism supports more than simple nearest-neighbor search, and handles elliptical positional errors. Scalability is improved by partitioning the sky using the HEALPix scheme, and processing independently each sky cell. The use of multi-threaded two-dimensional kd-trees adapted to managing equatorial coordinates enables efficient neighbor search. The whole process can run on a single computer, but could also use clusters of machines to cross-match future very large surveys such as GAIA or LSST in reasonable times. We already achieve performances where the 2MASS (˜470M sources) and SDSS DR7 (˜350M sources) can be matched on a single machine in less than 10 minutes. We aim at providing astronomers with a catalog cross-matching service, available on-line and leveraging on the catalogs present in the VizieR database. This service will allow users both to access pre-computed cross-matches across some very large catalogs, and to run customized cross-matching operations. It will also support VO protocols for synchronous or asynchronous queries.

  19. Conditions Database for the Belle II Experiment

    NASA Astrophysics Data System (ADS)

    Wood, L.; Elsethagen, T.; Schram, M.; Stephan, E.

    2017-10-01

    The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II conditions database was designed with a straightforward goal: make it as easily maintainable as possible. To this end, HEP-specific software tools were avoided as much as possible and industry standard tools used instead. HTTP REST services were selected as the application interface, which provide a high-level interface to users through the use of standard libraries such as curl. The application interface itself is written in Java and runs in an embedded Payara-Micro Java EE application server. Scalability at the application interface is provided by use of Hazelcast, an open source In-Memory Data Grid (IMDG) providing distributed in-memory computing and supporting the creation and clustering of new application interface instances as demand increases. The IMDG provides fast and efficient access to conditions data via in-memory caching.

  20. ATGC transcriptomics: a web-based application to integrate, explore and analyze de novo transcriptomic data.

    PubMed

    Gonzalez, Sergio; Clavijo, Bernardo; Rivarola, Máximo; Moreno, Patricio; Fernandez, Paula; Dopazo, Joaquín; Paniego, Norma

    2017-02-22

    In the last years, applications based on massively parallelized RNA sequencing (RNA-seq) have become valuable approaches for studying non-model species, e.g., without a fully sequenced genome. RNA-seq is a useful tool for detecting novel transcripts and genetic variations and for evaluating differential gene expression by digital measurements. The large and complex datasets resulting from functional genomic experiments represent a challenge in data processing, management, and analysis. This problem is especially significant for small research groups working with non-model species. We developed a web-based application, called ATGC transcriptomics, with a flexible and adaptable interface that allows users to work with new generation sequencing (NGS) transcriptomic analysis results using an ontology-driven database. This new application simplifies data exploration, visualization, and integration for a better comprehension of the results. ATGC transcriptomics provides access to non-expert computer users and small research groups to a scalable storage option and simple data integration, including database administration and management. The software is freely available under the terms of GNU public license at http://atgcinta.sourceforge.net .

  1. Public Access Workstations in the Library: New Trends.

    ERIC Educational Resources Information Center

    Beecher, Henry

    1991-01-01

    Discusses the use of microcomputer-based workstations that are provided for public access in libraries. Criteria for workstations are discussed, including standard hardware, open-design software, scalable interface, and connectivity options for networking; systems that provide full-text access are described; and the need for standards is…

  2. Performances of the PIPER scalable child human body model in accident reconstruction

    PubMed Central

    Giordano, Chiara; Kleiven, Svein

    2017-01-01

    Human body models (HBMs) have the potential to provide significant insights into the pediatric response to impact. This study describes a scalable/posable approach to perform child accident reconstructions using the Position and Personalize Advanced Human Body Models for Injury Prediction (PIPER) scalable child HBM of different ages and in different positions obtained by the PIPER tool. Overall, the PIPER scalable child HBM managed reasonably well to predict the injury severity and location of the children involved in real-life crash scenarios documented in the medical records. The developed methodology and workflow is essential for future work to determine child injury tolerances based on the full Child Advanced Safety Project for European Roads (CASPER) accident reconstruction database. With the workflow presented in this study, the open-source PIPER scalable HBM combined with the PIPER tool is also foreseen to have implications for improved safety designs for a better protection of children in traffic accidents. PMID:29135997

  3. Open, Cross Platform Chemistry Application Unifying Structure Manipulation, External Tools, Databases and Visualization

    DTIC Science & Technology

    2012-11-27

    with powerful analysis tools and an informatics approach leveraging best-of-breed NoSQL databases, in order to store, search and retrieve relevant...dictionaries, and JavaScript also has good support. The MongoDB project[15] was chosen as a scalable NoSQL data store for the cheminfor- matics components

  4. Improving the Scalability of an Exact Approach for Frequent Item Set Hiding

    ERIC Educational Resources Information Center

    LaMacchia, Carolyn

    2013-01-01

    Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data.…

  5. FDT 2.0: Improving scalability of the fuzzy decision tree induction tool - integrating database storage.

    PubMed

    Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W

    2014-12-01

    Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.

  6. Privacy-Aware Location Database Service for Granular Queries

    NASA Astrophysics Data System (ADS)

    Kiyomoto, Shinsaku; Martin, Keith M.; Fukushima, Kazuhide

    Future mobile markets are expected to increasingly embrace location-based services. This paper presents a new system architecture for location-based services, which consists of a location database and distributed location anonymizers. The service is privacy-aware in the sense that the location database always maintains a degree of anonymity. The location database service permits three different levels of query and can thus be used to implement a wide range of location-based services. Furthermore, the architecture is scalable and employs simple functions that are similar to those found in general database systems.

  7. Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services

    NASA Astrophysics Data System (ADS)

    Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.

    Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability

  8. Adaptive format conversion for scalable video coding

    NASA Astrophysics Data System (ADS)

    Wan, Wade K.; Lim, Jae S.

    2001-12-01

    The enhancement layer in many scalable coding algorithms is composed of residual coding information. There is another type of information that can be transmitted instead of (or in addition to) residual coding. Since the encoder has access to the original sequence, it can utilize adaptive format conversion (AFC) to generate the enhancement layer and transmit the different format conversion methods as enhancement data. This paper investigates the use of adaptive format conversion information as enhancement data in scalable video coding. Experimental results are shown for a wide range of base layer qualities and enhancement bitrates to determine when AFC can improve video scalability. Since the parameters needed for AFC are small compared to residual coding, AFC can provide video scalability at low enhancement layer bitrates that are not possible with residual coding. In addition, AFC can also be used in addition to residual coding to improve video scalability at higher enhancement layer bitrates. Adaptive format conversion has not been studied in detail, but many scalable applications may benefit from it. An example of an application that AFC is well-suited for is the migration path for digital television where AFC can provide immediate video scalability as well as assist future migrations.

  9. Scalable Active Optical Access Network Using Variable High-Speed PLZT Optical Switch/Splitter

    NASA Astrophysics Data System (ADS)

    Ashizawa, Kunitaka; Sato, Takehiro; Tokuhashi, Kazumasa; Ishii, Daisuke; Okamoto, Satoru; Yamanaka, Naoaki; Oki, Eiji

    This paper proposes a scalable active optical access network using high-speed Plumbum Lanthanum Zirconate Titanate (PLZT) optical switch/splitter. The Active Optical Network, called ActiON, using PLZT switching technology has been presented to increase the number of subscribers and the maximum transmission distance, compared to the Passive Optical Network (PON). ActiON supports the multicast slot allocation realized by running the PLZT switch elements in the splitter mode, which forces the switch to behave as an optical splitter. However, the previous ActiON creates a tradeoff between the network scalability and the power loss experienced by the optical signal to each user. It does not use the optical power efficiently because the optical power is simply divided into 0.5 to 0.5 without considering transmission distance from OLT to each ONU. The proposed network adopts PLZT switch elements in the variable splitter mode, which controls the split ratio of the optical power considering the transmission distance from OLT to each ONU, in addition to PLZT switch elements in existing two modes, the switching mode and the splitter mode. The proposed network introduces the flexible multicast slot allocation according to the transmission distance from OLT to each user and the number of required users using three modes, while keeping the advantages of ActiON, which are to support scalable and secure access services. Numerical results show that the proposed network dramatically reduces the required number of slots and supports high bandwidth efficiency services and extends the coverage of access network, compared to the previous ActiON, and the required computation time for selecting multicast users is less than 30msec, which is acceptable for on-demand broadcast services.

  10. Asteroseismology and the Virtual Observatory

    NASA Astrophysics Data System (ADS)

    Suárez, J. C.

    2010-12-01

    Virtual Observatory is an international project aiming at solving the problem of interoperability among astronomical archives and the scalability in the classical methods of retrieving and analyzing astronomical data in order to deal with huge amounts of datasets. This is being tackled thanks to the standardization of astronomical archives favoring their access in a efficient manner. This project, which is nowadays a reality, is more and more adopted by many fields of Science. In the present paper I will describe the origin of a new era in Stellar Physics whose main role is played by the relationship between asteroseismology and V.O. I will summarize the main concerns of both fields and the current development of VO tools for the development of what we could name as asteroseismology online, in which not only observed datasets are concerned but also the management of model databases.

  11. Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

    NASA Astrophysics Data System (ADS)

    Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

    2007-04-01

    In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.

  12. Information Security Considerations for Applications Using Apache Accumulo

    DTIC Science & Technology

    2014-09-01

    Distributed File System INSCOM United States Army Intelligence and Security Command JPA Java Persistence API JSON JavaScript Object Notation MAC Mandatory... MySQL [13]. BigTable can process 20 petabytes per day [14]. High degree of scalability on commodity hardware. NoSQL databases do not rely on highly...manipulation in relational databases. NoSQL databases each have a unique programming interface that uses a lower level procedural language (e.g., Java

  13. Enabling High-performance Interactive Geoscience Data Analysis Through Data Placement and Movement Optimization

    NASA Astrophysics Data System (ADS)

    Zhu, F.; Yu, H.; Rilee, M. L.; Kuo, K. S.; Yu, L.; Pan, Y.; Jiang, H.

    2017-12-01

    Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer from data archive centers to scientists' computers through low-bandwidth Internet connections. Data transfer becomes a major performance bottleneck in such an approach. Combined with generally constrained local compute/storage resources, they limit the extent of scientists' studies and deprive them of timely outcomes. Thus, this conventional approach is not scalable with respect to both the volume and variety of geoscience data. A much more viable solution is to couple analysis and storage systems to minimize data transfer. In our study, we compare loosely coupled approaches (exemplified by Spark and Hadoop) and tightly coupled approaches (exemplified by parallel distributed database management systems, e.g., SciDB). In particular, we investigate the optimization of data placement and movement to effectively tackle the variety challenge, and boost the popularization of parallelization to address the volume challenge. Our goal is to enable high-performance interactive analysis for a good portion of geoscience data analysis exercise. We show that tightly coupled approaches can concentrate data traffic between local storage systems and compute units, and thereby optimizing bandwidth utilization to achieve a better throughput. Based on our observations, we develop a geoscience data analysis system that tightly couples analysis engines with storages, which has direct access to the detailed map of data partition locations. Through an innovation data partitioning and distribution scheme, our system has demonstrated scalable and interactive performance in real-world geoscience data analysis applications.

  14. Database and Analytical Tool Development for the Management of Data Derived from US DOE (NETL) Funded Fine Particulate (PM2.5) Research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robinson Khosah

    2007-07-31

    Advanced Technology Systems, Inc. (ATS) was contracted by the U. S. Department of Energy's National Energy Technology Laboratory (DOE-NETL) to develop a state-of-the-art, scalable and robust web-accessible database application to manage the extensive data sets resulting from the DOE-NETL-sponsored ambient air monitoring programs in the upper Ohio River valley region. The data management system was designed to include a web-based user interface that will allow easy access to the data by the scientific community, policy- and decision-makers, and other interested stakeholders, while providing detailed information on sampling, analytical and quality control parameters. In addition, the system will provide graphical analyticalmore » tools for displaying, analyzing and interpreting the air quality data. The system will also provide multiple report generation capabilities and easy-to-understand visualization formats that can be utilized by the media and public outreach/educational institutions. The project was conducted in two phases. Phase One included the following tasks: (1) data inventory/benchmarking, including the establishment of an external stakeholder group; (2) development of a data management system; (3) population of the database; (4) development of a web-based data retrieval system, and (5) establishment of an internal quality assurance/quality control system on data management. Phase Two involved the development of a platform for on-line data analysis. Phase Two included the following tasks: (1) development of a sponsor and stakeholder/user website with extensive online analytical tools; (2) development of a public website; (3) incorporation of an extensive online help system into each website; and (4) incorporation of a graphical representation (mapping) system into each website. The project is now technically completed.« less

  15. Benefits Access for College Completion: Innovative Approaches to Meet the Financial Needs of Low-Income Students

    ERIC Educational Resources Information Center

    Center for Postsecondary and Economic Success, 2014

    2014-01-01

    Benefits Access for College Completion (BACC) was designed to help colleges develop new policies that increase low-income students' access to public benefits, easing their financial burden to allow them to finish school and earn postsecondary credentials. Colleges participating in BACC have developed and institutionalized scalable, sustainable…

  16. Enhancing navigation in biomedical databases by community voting and database-driven text classification

    PubMed Central

    Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

    2009-01-01

    Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796

  17. Web tools for predictive toxicology model building.

    PubMed

    Jeliazkova, Nina

    2012-07-01

    The development and use of web tools in chemistry has accumulated more than 15 years of history already. Powered by the advances in the Internet technologies, the current generation of web systems are starting to expand into areas, traditional for desktop applications. The web platforms integrate data storage, cheminformatics and data analysis tools. The ease of use and the collaborative potential of the web is compelling, despite the challenges. The topic of this review is a set of recently published web tools that facilitate predictive toxicology model building. The focus is on software platforms, offering web access to chemical structure-based methods, although some of the frameworks could also provide bioinformatics or hybrid data analysis functionalities. A number of historical and current developments are cited. In order to provide comparable assessment, the following characteristics are considered: support for workflows, descriptor calculations, visualization, modeling algorithms, data management and data sharing capabilities, availability of GUI or programmatic access and implementation details. The success of the Web is largely due to its highly decentralized, yet sufficiently interoperable model for information access. The expected future convergence between cheminformatics and bioinformatics databases provides new challenges toward management and analysis of large data sets. The web tools in predictive toxicology will likely continue to evolve toward the right mix of flexibility, performance, scalability, interoperability, sets of unique features offered, friendly user interfaces, programmatic access for advanced users, platform independence, results reproducibility, curation and crowdsourcing utilities, collaborative sharing and secure access.

  18. Scalable Indoor Localization via Mobile Crowdsourcing and Gaussian Process

    PubMed Central

    Chang, Qiang; Li, Qun; Shi, Zesen; Chen, Wei; Wang, Weiping

    2016-01-01

    Indoor localization using Received Signal Strength Indication (RSSI) fingerprinting has been extensively studied for decades. The positioning accuracy is highly dependent on the density of the signal database. In areas without calibration data, however, this algorithm breaks down. Building and updating a dense signal database is labor intensive, expensive, and even impossible in some areas. Researchers are continually searching for better algorithms to create and update dense databases more efficiently. In this paper, we propose a scalable indoor positioning algorithm that works both in surveyed and unsurveyed areas. We first propose Minimum Inverse Distance (MID) algorithm to build a virtual database with uniformly distributed virtual Reference Points (RP). The area covered by the virtual RPs can be larger than the surveyed area. A Local Gaussian Process (LGP) is then applied to estimate the virtual RPs’ RSSI values based on the crowdsourced training data. Finally, we improve the Bayesian algorithm to estimate the user’s location using the virtual database. All the parameters are optimized by simulations, and the new algorithm is tested on real-case scenarios. The results show that the new algorithm improves the accuracy by 25.5% in the surveyed area, with an average positioning error below 2.2 m for 80% of the cases. Moreover, the proposed algorithm can localize the users in the neighboring unsurveyed area. PMID:26999139

  19. Supporting Social Data Observatory with Customizable Index Structures on HBase - Architecture and Performance

    DTIC Science & Technology

    2013-01-01

    commercial NoSQL database system. The results show that In-dexedHBase provides a data loading speed that is 6 times faster than Riak, and is...compare it with Riak, a widely adopted commercial NoSQL database system. The results show that In- dexedHBase provides a data loading speed that is 6...events. This chapter describes our research towards building an efficient and scalable storage platform for Truthy. Many existing NoSQL databases

  20. WOVOdat, A Worldwide Volcano Unrest Database, to Improve Eruption Forecasts

    NASA Astrophysics Data System (ADS)

    Widiwijayanti, C.; Costa, F.; Win, N. T. Z.; Tan, K.; Newhall, C. G.; Ratdomopurbo, A.

    2015-12-01

    WOVOdat is the World Organization of Volcano Observatories' Database of Volcanic Unrest. An international effort to develop common standards for compiling and storing data on volcanic unrests in a centralized database and freely web-accessible for reference during volcanic crises, comparative studies, and basic research on pre-eruption processes. WOVOdat will be to volcanology as an epidemiological database is to medicine. Despite the large spectrum of monitoring techniques, the interpretation of monitoring data throughout the evolution of the unrest and making timely forecasts remain the most challenging tasks for volcanologists. The field of eruption forecasting is becoming more quantitative, based on the understanding of the pre-eruptive magmatic processes and dynamic interaction between variables that are at play in a volcanic system. Such forecasts must also acknowledge and express the uncertainties, therefore most of current research in this field focused on the application of event tree analysis to reflect multiple possible scenarios and the probability of each scenario. Such forecasts are critically dependent on comprehensive and authoritative global volcano unrest data sets - the very information currently collected in WOVOdat. As the database becomes more complete, Boolean searches, side-by-side digital and thus scalable comparisons of unrest, pattern recognition, will generate reliable results. Statistical distribution obtained from WOVOdat can be then used to estimate the probabilities of each scenario after specific patterns of unrest. We established main web interface for data submission and visualizations, and have now incorporated ~20% of worldwide unrest data into the database, covering more than 100 eruptive episodes. In the upcoming years we will concentrate in acquiring data from volcano observatories develop a robust data query interface, optimizing data mining, and creating tools by which WOVOdat can be used for probabilistic eruption forecasting. The more data in WOVOdat, the more useful it will be.

  1. A Living Laboratory for Energy Systems Integration - Continuum Magazine |

    Science.gov Websites

    research centers across NREL to study how to optimize the campus's energy use. The Energy DataBus The , at second-by-second intervals, 24 hours per day, and stores it all in one giant database. And the use solution that is designed for large, scalable databases. "It's similar to the one that Facebook and

  2. TreeVector: scalable, interactive, phylogenetic trees for the web.

    PubMed

    Pethica, Ralph; Barker, Gary; Kovacs, Tim; Gough, Julian

    2010-01-28

    Phylogenetic trees are complex data forms that need to be graphically displayed to be human-readable. Traditional techniques of plotting phylogenetic trees focus on rendering a single static image, but increases in the production of biological data and large-scale analyses demand scalable, browsable, and interactive trees. We introduce TreeVector, a Scalable Vector Graphics-and Java-based method that allows trees to be integrated and viewed seamlessly in standard web browsers with no extra software required, and can be modified and linked using standard web technologies. There are now many bioinformatics servers and databases with a range of dynamic processes and updates to cope with the increasing volume of data. TreeVector is designed as a framework to integrate with these processes and produce user-customized phylogenies automatically. We also address the strengths of phylogenetic trees as part of a linked-in browsing process rather than an end graphic for print. TreeVector is fast and easy to use and is available to download precompiled, but is also open source. It can also be run from the web server listed below or the user's own web server. It has already been deployed on two recognized and widely used database Web sites.

  3. A Hybrid EAV-Relational Model for Consistent and Scalable Capture of Clinical Research Data.

    PubMed

    Khan, Omar; Lim Choi Keung, Sarah N; Zhao, Lei; Arvanitis, Theodoros N

    2014-01-01

    Many clinical research databases are built for specific purposes and their design is often guided by the requirements of their particular setting. Not only does this lead to issues of interoperability and reusability between research groups in the wider community but, within the project itself, changes and additions to the system could be implemented using an ad hoc approach, which may make the system difficult to maintain and even more difficult to share. In this paper, we outline a hybrid Entity-Attribute-Value and relational model approach for modelling data, in light of frequently changing requirements, which enables the back-end database schema to remain static, improving the extensibility and scalability of an application. The model also facilitates data reuse. The methods used build on the modular architecture previously introduced in the CURe project.

  4. SeqWare Query Engine: storing and searching sequence data in the cloud.

    PubMed

    O'Connor, Brian D; Merriman, Barry; Nelson, Stanley F

    2010-12-21

    Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets.

  5. SeqWare Query Engine: storing and searching sequence data in the cloud

    PubMed Central

    2010-01-01

    Background Since the introduction of next-generation DNA sequencers the rapid increase in sequencer throughput, and associated drop in costs, has resulted in more than a dozen human genomes being resequenced over the last few years. These efforts are merely a prelude for a future in which genome resequencing will be commonplace for both biomedical research and clinical applications. The dramatic increase in sequencer output strains all facets of computational infrastructure, especially databases and query interfaces. The advent of cloud computing, and a variety of powerful tools designed to process petascale datasets, provide a compelling solution to these ever increasing demands. Results In this work, we present the SeqWare Query Engine which has been created using modern cloud computing technologies and designed to support databasing information from thousands of genomes. Our backend implementation was built using the highly scalable, NoSQL HBase database from the Hadoop project. We also created a web-based frontend that provides both a programmatic and interactive query interface and integrates with widely used genome browsers and tools. Using the query engine, users can load and query variants (SNVs, indels, translocations, etc) with a rich level of annotations including coverage and functional consequences. As a proof of concept we loaded several whole genome datasets including the U87MG cell line. We also used a glioblastoma multiforme tumor/normal pair to both profile performance and provide an example of using the Hadoop MapReduce framework within the query engine. This software is open source and freely available from the SeqWare project (http://seqware.sourceforge.net). Conclusions The SeqWare Query Engine provided an easy way to make the U87MG genome accessible to programmers and non-programmers alike. This enabled a faster and more open exploration of results, quicker tuning of parameters for heuristic variant calling filters, and a common data interface to simplify development of analytical tools. The range of data types supported, the ease of querying and integrating with existing tools, and the robust scalability of the underlying cloud-based technologies make SeqWare Query Engine a nature fit for storing and searching ever-growing genome sequence datasets. PMID:21210981

  6. Comparing NetCDF and SciDB on managing and querying 5D hydrologic dataset

    NASA Astrophysics Data System (ADS)

    Liu, Haicheng; Xiao, Xiao

    2016-11-01

    Efficiently extracting information from high dimensional hydro-meteorological modelling datasets requires smart solutions. Traditional methods are mostly based on files, which can be edited and accessed handily. But they have problems of efficiency due to contiguous storage structure. Others propose databases as an alternative for advantages such as native functionalities for manipulating multidimensional (MD) arrays, smart caching strategy and scalability. In this research, NetCDF file based solutions and the multidimensional array database management system (DBMS) SciDB applying chunked storage structure are benchmarked to determine the best solution for storing and querying 5D large hydrologic modelling dataset. The effect of data storage configurations including chunk size, dimension order and compression on query performance is explored. Results indicate that dimension order to organize storage of 5D data has significant influence on query performance if chunk size is very large. But the effect becomes insignificant when chunk size is properly set. Compression of SciDB mostly has negative influence on query performance. Caching is an advantage but may be influenced by execution of different query processes. On the whole, NetCDF solution without compression is in general more efficient than the SciDB DBMS.

  7. Scientific Data Services -- A High-Performance I/O System with Array Semantics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Kesheng; Byna, Surendra; Rotem, Doron

    2011-09-21

    As high-performance computing approaches exascale, the existing I/O system design is having trouble keeping pace in both performance and scalability. We propose to address this challenge by adopting database principles and techniques in parallel I/O systems. First, we propose to adopt an array data model because many scientific applications represent their data in arrays. This strategy follows a cardinal principle from database research, which separates the logical view from the physical layout of data. This high-level data model gives the underlying implementation more freedom to optimize the physical layout and to choose the most effective way of accessing the data.more » For example, knowing that a set of write operations is working on a single multi-dimensional array makes it possible to keep the subarrays in a log structure during the write operations and reassemble them later into another physical layout as resources permit. While maintaining the high-level view, the storage system could compress the user data to reduce the physical storage requirement, collocate data records that are frequently used together, or replicate data to increase availability and fault-tolerance. Additionally, the system could generate secondary data structures such as database indexes and summary statistics. We expect the proposed Scientific Data Services approach to create a “live” storage system that dynamically adjusts to user demands and evolves with the massively parallel storage hardware.« less

  8. Multi-service small-cell cloud wired/wireless access network based on tunable optical frequency comb

    NASA Astrophysics Data System (ADS)

    Xiang, Yu; Zhou, Kun; Yang, Liu; Pan, Lei; Liao, Zhen-wan; Zhang, Qiang

    2015-11-01

    In this paper, we demonstrate a novel multi-service wired/wireless integrated access architecture of cloud radio access network (C-RAN) based on radio-over-fiber passive optical network (RoF-PON) system, which utilizes scalable multiple- frequency millimeter-wave (MF-MMW) generation based on tunable optical frequency comb (TOFC). In the baseband unit (BBU) pool, the generated optical comb lines are modulated into wired, RoF and WiFi/WiMAX signals, respectively. The multi-frequency RoF signals are generated by beating the optical comb line pairs in the small cell. The WiFi/WiMAX signals are demodulated after passing through the band pass filter (BPF) and band stop filter (BSF), respectively, whereas the wired signal can be received directly. The feasibility and scalability of the proposed multi-service wired/wireless integrated C-RAN are confirmed by the simulations.

  9. Cost Considerations in Cloud Computing

    DTIC Science & Technology

    2014-01-01

    investments. 2. Database Options The potential promise that “ big data ” analytics holds for many enterprise mission areas makes relevant the question of the...development of a range of new distributed file systems and data - bases that have better scalability properties than traditional SQL databases. Hadoop ... data . Many systems exist that extend or supplement Hadoop —such as Apache Accumulo, which provides a highly granular mechanism for managing security

  10. The LSST Data Mining Research Agenda

    NASA Astrophysics Data System (ADS)

    Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

    2008-12-01

    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.

  11. SPMBR: a scalable algorithm for mining sequential patterns based on bitmaps

    NASA Astrophysics Data System (ADS)

    Xu, Xiwei; Zhang, Changhai

    2013-12-01

    Now some sequential patterns mining algorithms generate too many candidate sequences, and increase the processing cost of support counting. Therefore, we present an effective and scalable algorithm called SPMBR (Sequential Patterns Mining based on Bitmap Representation) to solve the problem of mining the sequential patterns for large databases. Our method differs from previous related works of mining sequential patterns. The main difference is that the database of sequential patterns is represented by bitmaps, and a simplified bitmap structure is presented firstly. In this paper, First the algorithm generate candidate sequences by SE(Sequence Extension) and IE(Item Extension), and then obtain all frequent sequences by comparing the original bitmap and the extended item bitmap .This method could simplify the problem of mining the sequential patterns and avoid the high processing cost of support counting. Both theories and experiments indicate that the performance of SPMBR is predominant for large transaction databases, the required memory size for storing temporal data is much less during mining process, and all sequential patterns can be mined with feasibility.

  12. Efficient data management tools for the heterogeneous big data warehouse

    NASA Astrophysics Data System (ADS)

    Alekseev, A. A.; Osipova, V. V.; Ivanov, M. A.; Klimentov, A.; Grigorieva, N. V.; Nalamwar, H. S.

    2016-09-01

    The traditional RDBMS has been consistent for the normalized data structures. RDBMS served well for decades, but the technology is not optimal for data processing and analysis in data intensive fields like social networks, oil-gas industry, experiments at the Large Hadron Collider, etc. Several challenges have been raised recently on the scalability of data warehouse like workload against the transactional schema, in particular for the analysis of archived data or the aggregation of data for summary and accounting purposes. The paper evaluates new database technologies like HBase, Cassandra, and MongoDB commonly referred as NoSQL databases for handling messy, varied and large amount of data. The evaluation depends upon the performance, throughput and scalability of the above technologies for several scientific and industrial use-cases. This paper outlines the technologies and architectures needed for processing Big Data, as well as the description of the back-end application that implements data migration from RDBMS to NoSQL data warehouse, NoSQL database organization and how it could be useful for further data analytics.

  13. On-demand virtual optical network access using 100 Gb/s Ethernet.

    PubMed

    Ishida, Osamu; Takamichi, Toru; Arai, Sachine; Kawate, Ryusuke; Toyoda, Hidehiro; Morita, Itsuro; Araki, Soichiro; Ichikawa, Toshiyuki; Hoshida, Takeshi; Murai, Hitoshi

    2011-12-12

    Our Terabit LAN initiatives attempt to enhance the scalability and utilization of lambda resources. This paper describes bandwidth-on-demand virtualized 100GE access to WDM networks on a field fiber test-bed using multi-domain optical-path provisioning. © 2011 Optical Society of America

  14. Towards Direct Manipulation and Remixing of Massive Data: The EarthServer Approach

    NASA Astrophysics Data System (ADS)

    Baumann, P.

    2012-04-01

    Complex analytics on "big data" is one of the core challenges of current Earth science, generating strong requirements for on-demand processing and fil tering of massive data sets. Issues under discussion include flexibility, performance, scalability, and the heterogeneity of the information types invo lved. In other domains, high-level query languages (such as those offered by database systems) have proven successful in the quest for flexible, scalable data access interfaces to massive amounts of data. However, due to the lack of support for many of the Earth science data structures, database systems are only used for registries and catalogs, but not for the bulk of spatio-temporal data. One core information category in this field is given by coverage data. ISO 19123 defines coverages, simplifying, as a representation of a "space-time varying phenomenon". This model can express a large class of Earth science data structures, including rectified and non-rectified rasters, curvilinear grids, point clouds, TINs, general meshes, trajectories, surfaces, and solids. This abstract definition, which is too high-level to establish interoperability, is concretized by the OGC GML 3.2.1 Application Schema for Coverages Standard into an interoperable representation. The OGC Web Coverage Processing Service (WCPS) Standard defines a declarative query language on multi-dimensional raster-type coverages, such as 1D in-situ sensor timeseries, 2D EO imagery, 3D x/y/t image time series and x/y/z geophysical data, 4D x/y/z/t climate and ocean data. Hence, important ingredients for versatile coverage retrieval are given - however, this potential has not been fully unleashed by service architectures up to now. The EU FP7-INFRA project EarthServer, launched in September 2011, aims at enabling standards-based on-demand analytics over the Web for Earth science data based on an integration of W3C XQuery for alphanumeric data and OGC-WCPS for raster data. Ultimately, EarthServer will support all OGC coverage types. The platform used by EarthServer is the rasdaman raster database system. To exploit heterogeneous multi-parallel platforms, automatic request distribution and orchestration is being established. Client toolkits are under development which will allow to quickly compose bespoke interactive clients, ranging from mobile devices over Web clients to high-end immersive virtual reality. The EarthServer platform has been deployed in six large-scale data centres with the aim of setting up Lighthouse Applications addressing all Earth Sciences, including satellite and airborne earth observation as well as use cases from atmosphere, ocean, snow, and ice monitoring, and geology on Earth and Mars. These services, each of which will ultimately host at least 100 TB, will form a peer cloud with distributed query processing for arbitrarily mixing database and in-situ access. With its ability to directly manipulate, analyze and remix massive data, the goal of EarthServer is to lift the data providers' semantic level from data stewardship to service stewardship.

  15. SBROME: a scalable optimization and module matching framework for automated biosystems design.

    PubMed

    Huynh, Linh; Tsoukalas, Athanasios; Köppe, Matthias; Tagkopoulos, Ilias

    2013-05-17

    The development of a scalable framework for biodesign automation is a formidable challenge given the expected increase in part availability and the ever-growing complexity of synthetic circuits. To allow for (a) the use of previously constructed and characterized circuits or modules and (b) the implementation of designs that can scale up to hundreds of nodes, we here propose a divide-and-conquer Synthetic Biology Reusable Optimization Methodology (SBROME). An abstract user-defined circuit is first transformed and matched against a module database that incorporates circuits that have previously been experimentally characterized. Then the resulting circuit is decomposed to subcircuits that are populated with the set of parts that best approximate the desired function. Finally, all subcircuits are subsequently characterized and deposited back to the module database for future reuse. We successfully applied SBROME toward two alternative designs of a modular 3-input multiplexer that utilize pre-existing logic gates and characterized biological parts.

  16. PIQMIe: a web server for semi-quantitative proteomics data management and analysis

    PubMed Central

    Kuzniar, Arnold; Kanaar, Roland

    2014-01-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. PMID:24861615

  17. PIQMIe: a web server for semi-quantitative proteomics data management and analysis.

    PubMed

    Kuzniar, Arnold; Kanaar, Roland

    2014-07-01

    We present the Proteomics Identifications and Quantitations Data Management and Integration Service or PIQMIe that aids in reliable and scalable data management, analysis and visualization of semi-quantitative mass spectrometry based proteomics experiments. PIQMIe readily integrates peptide and (non-redundant) protein identifications and quantitations from multiple experiments with additional biological information on the protein entries, and makes the linked data available in the form of a light-weight relational database, which enables dedicated data analyses (e.g. in R) and user-driven queries. Using the web interface, users are presented with a concise summary of their proteomics experiments in numerical and graphical forms, as well as with a searchable protein grid and interactive visualization tools to aid in the rapid assessment of the experiments and in the identification of proteins of interest. The web server not only provides data access through a web interface but also supports programmatic access through RESTful web service. The web server is available at http://piqmie.semiqprot-emc.cloudlet.sara.nl or http://www.bioinformatics.nl/piqmie. This website is free and open to all users and there is no login requirement. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. Scalability enhancement of AODV using local link repairing

    NASA Astrophysics Data System (ADS)

    Jain, Jyoti; Gupta, Roopam; Bandhopadhyay, T. K.

    2014-09-01

    Dynamic change in the topology of an ad hoc network makes it difficult to design an efficient routing protocol. Scalability of an ad hoc network is also one of the important criteria of research in this field. Most of the research works in ad hoc network focus on routing and medium access protocols and produce simulation results for limited-size networks. Ad hoc on-demand distance vector (AODV) is one of the best reactive routing protocols. In this article, modified routing protocols based on local link repairing of AODV are proposed. Method of finding alternate routes for next-to-next node is proposed in case of link failure. These protocols are beacon-less, means periodic hello message is removed from the basic AODV to improve scalability. Few control packet formats have been changed to accommodate suggested modification. Proposed protocols are simulated to investigate scalability performance and compared with basic AODV protocol. This also proves that local link repairing of proposed protocol improves scalability of the network. From simulation results, it is clear that scalability performance of routing protocol is improved because of link repairing method. We have tested protocols for different terrain area with approximate constant node densities and different traffic load.

  19. Research on high availability architecture of SQL and NoSQL

    NASA Astrophysics Data System (ADS)

    Wang, Zhiguo; Wei, Zhiqiang; Liu, Hao

    2017-03-01

    With the advent of the era of big data, amount and importance of data have increased dramatically. SQL database develops in performance and scalability, but more and more companies tend to use NoSQL database as their databases, because NoSQL database has simpler data model and stronger extension capacity than SQL database. Almost all database designers including SQL database and NoSQL database aim to improve performance and ensure availability by reasonable architecture which can reduce the effects of software failures and hardware failures, so that they can provide better experiences for their customers. In this paper, I mainly discuss the architectures of MySQL, MongoDB, and Redis, which are high available and have been deployed in practical application environment, and design a hybrid architecture.

  20. Cheetah: A Framework for Scalable Hierarchical Collective Operations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua S

    2011-01-01

    Collective communication operations, used by many scientific applications, tend to limit overall parallel application performance and scalability. Computer systems are becoming more heterogeneous with increasing node and core-per-node counts. Also, a growing number of data-access mechanisms, of varying characteristics, are supported within a single computer system. We describe a new hierarchical collective communication framework that takes advantage of hardware-specific data-access mechanisms. It is flexible, with run-time hierarchy specification, and sharing of collective communication primitives between collective algorithms. Data buffers are shared between levels in the hierarchy reducing collective communication management overhead. We have implemented several versions of the Message Passingmore » Interface (MPI) collective operations, MPI Barrier() and MPI Bcast(), and run experiments using up to 49, 152 processes on a Cray XT5, and a small InfiniBand based cluster. At 49, 152 processes our barrier implementation outperforms the optimized native implementation by 75%. 32 Byte and one Mega-Byte broadcasts outperform it by 62% and 11%, respectively, with better scalability characteristics. Improvements relative to the default Open MPI implementation are much larger.« less

  1. A scalable and continuous-upgradable optical wireless and wired convergent access network.

    PubMed

    Sung, J Y; Cheng, K T; Chow, C W; Yeh, C H; Pan, C-L

    2014-06-02

    In this work, a scalable and continuous upgradable convergent optical access network is proposed. By using a multi-wavelength coherent comb source and a programmable waveshaper at the central office (CO), optical millimeter-wave (mm-wave) signals of different frequencies (from baseband to > 100 GHz) can be generated. Hence, it provides a scalable and continuous upgradable solution for end-user who needs 60 GHz wireless services now and > 100 GHz wireless services in the future. During the upgrade, user only needs to upgrade their optical networking unit (ONU). A programmable waveshaper is used to select the suitable optical tones with wavelength separation equals to the desired mm-wave frequency; while the CO remains intact. The centralized characteristics of the proposed system can easily add any new service and end-user. The centralized control of the wavelength makes the system more stable. Wired data rate of 17.45 Gb/s and w-band wireless data rate up to 3.36 Gb/s were demonstrated after transmission over 40 km of single-mode fiber (SMF).

  2. Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.

    PubMed

    Yang, Chao-Tung; Liu, Jung-Chun; Chen, Shuo-Tsung; Lu, Hsin-Wen

    2017-08-18

    Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.

  3. The EarthServer project: Exploiting Identity Federations, Science Gateways and Social and Mobile Clients for Big Earth Data Analysis

    NASA Astrophysics Data System (ADS)

    Barbera, Roberto; Bruno, Riccardo; Calanducci, Antonio; Messina, Antonio; Pappalardo, Marco; Passaro, Gianluca

    2013-04-01

    The EarthServer project (www.earthserver.eu), funded by the European Commission under its Seventh Framework Program, aims at establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending leading-edge Array Database technology. The core idea is to use database query languages as client/server interface to achieve barrier-free "mix & match" access to multi-source, any-size, multi-dimensional space-time data -- in short: "Big Earth Data Analytics" - based on the open standards of the Open Geospatial Consortium Web Coverage Processing Service (OGC WCPS) and the W3C XQuery. EarthServer combines both, thereby achieving a tight data/metadata integration. Further, the rasdaman Array Database System (www.rasdaman.com) is extended with further space-time coverage data types. On server side, highly effective optimizations - such as parallel and distributed query processing - ensure scalability to Exabyte volumes. Six Lighthouse Applications are being established in EarthServer, each of which poses distinct challenges on Earth Data Analytics: Cryospheric Science, Airborne Science, Atmospheric Science, Geology, Oceanography, and Planetary Science. Altogether, they cover all Earth Science domains; the Planetary Science use case has been added to challenge concepts and standards in non-standard environments. In addition, EarthLook (maintained by Jacobs University) showcases use of OGC standards in 1D through 5D use cases. In this contribution we will report on the first applications integrated in the EarthServer Science Gateway and on the clients for mobile appliances developed to access them. We will also show how federated and social identity services can allow Big Earth Data Providers to expose their data in a distributed environment keeping a strict and fine-grained control on user authentication and authorisation. The degree of fulfilment of the EarthServer implementation with the recommendations made in the recent TERENA Study on AAA Platforms For Scientific Resources in Europe (https://confluence.terena.org/display/aaastudy/AAA+Study+Home+Page) will also be assessed.

  4. A Flexible Online Metadata Editing and Management System

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aguilar, Raul; Pan, Jerry Yun; Gries, Corinna

    2010-01-01

    A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fieldsmore » user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to metadata files in the application introduced here is managed in a custom online module, using a MySQL backend accessed by a simple Java Server Faces front end. A flexible system with three grouping options, organization, group and single editing access is provided. Three levels were chosen to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.« less

  5. Scalable Self-Propagating High-Temperature Synthesis of Graphene for Supercapacitors with Superior Power Density and Cyclic Stability.

    PubMed

    Li, Chen; Zhang, Xiong; Wang, Kai; Sun, Xianzhong; Liu, Guanghua; Li, Jiangtao; Tian, Huanfang; Li, Jianqi; Ma, Yanwei

    2017-02-01

    An ultrafast self-propagating high-temperature synthesis technique offers scalable routes for the fabrication of mesoporous graphene directly from CO 2 . Due to the excellent electrical conductivity and high ion-accessible surface area, supercapacitor electrodes based on the obtained graphene exhibit superior energy and power performance. The capacitance retention is higher than 90% after one million charge/discharge cycles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Distributed cyberinfrastructure tools for automated data processing of structural monitoring data

    NASA Astrophysics Data System (ADS)

    Zhang, Yilan; Kurata, Masahiro; Lynch, Jerome P.; van der Linden, Gwendolyn; Sederat, Hassan; Prakash, Atul

    2012-04-01

    The emergence of cost-effective sensing technologies has now enabled the use of dense arrays of sensors to monitor the behavior and condition of large-scale bridges. The continuous operation of dense networks of sensors presents a number of new challenges including how to manage such massive amounts of data that can be created by the system. This paper reports on the progress of the creation of cyberinfrastructure tools which hierarchically control networks of wireless sensors deployed in a long-span bridge. The internet-enabled cyberinfrastructure is centrally managed by a powerful database which controls the flow of data in the entire monitoring system architecture. A client-server model built upon the database provides both data-provider and system end-users with secured access to various levels of information of a bridge. In the system, information on bridge behavior (e.g., acceleration, strain, displacement) and environmental condition (e.g., wind speed, wind direction, temperature, humidity) are uploaded to the database from sensor networks installed in the bridge. Then, data interrogation services interface with the database via client APIs to autonomously process data. The current research effort focuses on an assessment of the scalability and long-term robustness of the proposed cyberinfrastructure framework that has been implemented along with a permanent wireless monitoring system on the New Carquinez (Alfred Zampa Memorial) Suspension Bridge in Vallejo, CA. Many data interrogation tools are under development using sensor data and bridge metadata (e.g., geometric details, material properties, etc.) Sample data interrogation clients including those for the detection of faulty sensors, automated modal parameter extraction.

  7. A comprehensive and scalable database search system for metaproteomics.

    PubMed

    Chatterjee, Sandip; Stupp, Gregory S; Park, Sung Kyu Robin; Ducom, Jean-Christophe; Yates, John R; Su, Andrew I; Wolan, Dennis W

    2016-08-16

    Mass spectrometry-based shotgun proteomics experiments rely on accurate matching of experimental spectra against a database of protein sequences. Existing computational analysis methods are limited in the size of their sequence databases, which severely restricts the proteomic sequencing depth and functional analysis of highly complex samples. The growing amount of public high-throughput sequencing data will only exacerbate this problem. We designed a broadly applicable metaproteomic analysis method (ComPIL) that addresses protein database size limitations. Our approach to overcome this significant limitation in metaproteomics was to design a scalable set of sequence databases assembled for optimal library querying speeds. ComPIL was integrated with a modified version of the search engine ProLuCID (termed "Blazmass") to permit rapid matching of experimental spectra. Proof-of-principle analysis of human HEK293 lysate with a ComPIL database derived from high-quality genomic libraries was able to detect nearly all of the same peptides as a search with a human database (~500x fewer peptides in the database), with a small reduction in sensitivity. We were also able to detect proteins from the adenovirus used to immortalize these cells. We applied our method to a set of healthy human gut microbiome proteomic samples and showed a substantial increase in the number of identified peptides and proteins compared to previous metaproteomic analyses, while retaining a high degree of protein identification accuracy and allowing for a more in-depth characterization of the functional landscape of the samples. The combination of ComPIL with Blazmass allows proteomic searches to be performed with database sizes much larger than previously possible. These large database searches can be applied to complex meta-samples with unknown composition or proteomic samples where unexpected proteins may be identified. The protein database, proteomic search engine, and the proteomic data files for the 5 microbiome samples characterized and discussed herein are open source and available for use and additional analysis.

  8. A Fast Approximate Algorithm for Mapping Long Reads to Large Reference Databases.

    PubMed

    Jain, Chirag; Dilthey, Alexander; Koren, Sergey; Aluru, Srinivas; Phillippy, Adam M

    2018-04-30

    Emerging single-molecule sequencing technologies from Pacific Biosciences and Oxford Nanopore have revived interest in long-read mapping algorithms. Alignment-based seed-and-extend methods demonstrate good accuracy, but face limited scalability, while faster alignment-free methods typically trade decreased precision for efficiency. In this article, we combine a fast approximate read mapping algorithm based on minimizers with a novel MinHash identity estimation technique to achieve both scalability and precision. In contrast to prior methods, we develop a mathematical framework that defines the types of mapping targets we uncover, establish probabilistic estimates of p-value and sensitivity, and demonstrate tolerance for alignment error rates up to 20%. With this framework, our algorithm automatically adapts to different minimum length and identity requirements and provides both positional and identity estimates for each mapping reported. For mapping human PacBio reads to the hg38 reference, our method is 290 × faster than Burrows-Wheeler Aligner-MEM with a lower memory footprint and recall rate of 96%. We further demonstrate the scalability of our method by mapping noisy PacBio reads (each ≥5 kbp in length) to the complete NCBI RefSeq database containing 838 Gbp of sequence and >60,000 genomes.

  9. A review of accessibility of administrative healthcare databases in the Asia-Pacific region.

    PubMed

    Milea, Dominique; Azmi, Soraya; Reginald, Praveen; Verpillat, Patrice; Francois, Clement

    2015-01-01

    We describe and compare the availability and accessibility of administrative healthcare databases (AHDB) in several Asia-Pacific countries: Australia, Japan, South Korea, Taiwan, Singapore, China, Thailand, and Malaysia. The study included hospital records, reimbursement databases, prescription databases, and data linkages. Databases were first identified through PubMed, Google Scholar, and the ISPOR database register. Database custodians were contacted. Six criteria were used to assess the databases and provided the basis for a tool to categorise databases into seven levels ranging from least accessible (Level 1) to most accessible (Level 7). We also categorised overall data accessibility for each country as high, medium, or low based on accessibility of databases as well as the number of academic articles published using the databases. Fifty-four administrative databases were identified. Only a limited number of databases allowed access to raw data and were at Level 7 [Medical Data Vision EBM Provider, Japan Medical Data Centre (JMDC) Claims database and Nihon-Chouzai Pharmacy Claims database in Japan, and Medicare, Pharmaceutical Benefits Scheme (PBS), Centre for Health Record Linkage (CHeReL), HealthLinQ, Victorian Data Linkages (VDL), SA-NT DataLink in Australia]. At Levels 3-6 were several databases from Japan [Hamamatsu Medical University Database, Medi-Trend, Nihon University School of Medicine Clinical Data Warehouse (NUSM)], Australia [Western Australia Data Linkage (WADL)], Taiwan [National Health Insurance Research Database (NHIRD)], South Korea [Health Insurance Review and Assessment Service (HIRA)], and Malaysia [United Nations University (UNU)-Casemix]. Countries were categorised as having a high level of data accessibility (Australia, Taiwan, and Japan), medium level of accessibility (South Korea), or a low level of accessibility (Thailand, China, Malaysia, and Singapore). In some countries, data may be available but accessibility was restricted based on requirements by data custodians. Compared with previous research, this study describes the landscape of databases in the selected countries with more granularity using an assessment tool developed for this purpose. A high number of databases were identified but most had restricted access, preventing their potential use to support research. We hope that this study helps to improve the understanding of the AHDB landscape, increase data sharing and database research in Asia-Pacific countries.

  10. MADGE: scalable distributed data management software for cDNA microarrays.

    PubMed

    McIndoe, Richard A; Lanzen, Aaron; Hurtz, Kimberly

    2003-01-01

    The human genome project and the development of new high-throughput technologies have created unparalleled opportunities to study the mechanism of diseases, monitor the disease progression and evaluate effective therapies. Gene expression profiling is a critical tool to accomplish these goals. The use of nucleic acid microarrays to assess the gene expression of thousands of genes simultaneously has seen phenomenal growth over the past five years. Although commercial sources of microarrays exist, investigators wanting more flexibility in the genes represented on the array will turn to in-house production. The creation and use of cDNA microarrays is a complicated process that generates an enormous amount of information. Effective data management of this information is essential to efficiently access, analyze, troubleshoot and evaluate the microarray experiments. We have developed a distributable software package designed to track and store the various pieces of data generated by a cDNA microarray facility. This includes the clone collection storage data, annotation data, workflow queues, microarray data, data repositories, sample submission information, and project/investigator information. This application was designed using a 3-tier client server model. The data access layer (1st tier) contains the relational database system tuned to support a large number of transactions. The data services layer (2nd tier) is a distributed COM server with full database transaction support. The application layer (3rd tier) is an internet based user interface that contains both client and server side code for dynamic interactions with the user. This software is freely available to academic institutions and non-profit organizations at http://www.genomics.mcg.edu/niddkbtc.

  11. Web Application Software for Ground Operations Planning Database (GOPDb) Management

    NASA Technical Reports Server (NTRS)

    Lanham, Clifton; Kallner, Shawn; Gernand, Jeffrey

    2013-01-01

    A Web application facilitates collaborative development of the ground operations planning document. This will reduce costs and development time for new programs by incorporating the data governance, access control, and revision tracking of the ground operations planning data. Ground Operations Planning requires the creation and maintenance of detailed timelines and documentation. The GOPDb Web application was created using state-of-the-art Web 2.0 technologies, and was deployed as SaaS (Software as a Service), with an emphasis on data governance and security needs. Application access is managed using two-factor authentication, with data write permissions tied to user roles and responsibilities. Multiple instances of the application can be deployed on a Web server to meet the robust needs for multiple, future programs with minimal additional cost. This innovation features high availability and scalability, with no additional software that needs to be bought or installed. For data governance and security (data quality, management, business process management, and risk management for data handling), the software uses NAMS. No local copy/cloning of data is permitted. Data change log/tracking is addressed, as well as collaboration, work flow, and process standardization. The software provides on-line documentation and detailed Web-based help. There are multiple ways that this software can be deployed on a Web server to meet ground operations planning needs for future programs. The software could be used to support commercial crew ground operations planning, as well as commercial payload/satellite ground operations planning. The application source code and database schema are owned by NASA.

  12. Cloud-Based Computational Tools for Earth Science Applications

    NASA Astrophysics Data System (ADS)

    Arendt, A. A.; Fatland, R.; Howe, B.

    2015-12-01

    Earth scientists are increasingly required to think across disciplines and utilize a wide range of datasets in order to solve complex environmental challenges. Although significant progress has been made in distributing data, researchers must still invest heavily in developing computational tools to accommodate their specific domain. Here we document our development of lightweight computational data systems aimed at enabling rapid data distribution, analytics and problem solving tools for Earth science applications. Our goal is for these systems to be easily deployable, scalable and flexible to accommodate new research directions. As an example we describe "Ice2Ocean", a software system aimed at predicting runoff from snow and ice in the Gulf of Alaska region. Our backend components include relational database software to handle tabular and vector datasets, Python tools (NumPy, pandas and xray) for rapid querying of gridded climate data, and an energy and mass balance hydrological simulation model (SnowModel). These components are hosted in a cloud environment for direct access across research teams, and can also be accessed via API web services using a REST interface. This API is a vital component of our system architecture, as it enables quick integration of our analytical tools across disciplines, and can be accessed by any existing data distribution centers. We will showcase several data integration and visualization examples to illustrate how our system has expanded our ability to conduct cross-disciplinary research.

  13. Wireless Computing Architecture III

    DTIC Science & Technology

    2013-09-01

    MIMO Multiple-Input and Multiple-Output MIMO /CON MIMO with concurrent hannel access and estimation MU- MIMO Multiuser MIMO OFDM Orthogonal...compressive sensing \\; a design for concurrent channel estimation in scalable multiuser MIMO networking; and novel networking protocols based on machine...Network, Antenna Arrays, UAV networking, Angle of Arrival, Localization MIMO , Access Point, Channel State Information, Compressive Sensing 16

  14. A Stateful Multicast Access Control Mechanism for Future Metro-Area-Networks.

    ERIC Educational Resources Information Center

    Sun, Wei-qiang; Li, Jin-sheng; Hong, Pei-lin

    2003-01-01

    Multicasting is a necessity for a broadband metro-area-network; however security problems exist with current multicast protocols. A stateful multicast access control mechanism, based on MAPE, is proposed. The architecture of MAPE is discussed, as well as the states maintained and messages exchanged. The scheme is flexible and scalable. (Author/AEF)

  15. NPTool: Towards Scalability and Reliability of Business Process Management

    NASA Astrophysics Data System (ADS)

    Braghetto, Kelly Rosa; Ferreira, João Eduardo; Pu, Calton

    Currently one important challenge in business process management is provide at the same time scalability and reliability of business process executions. This difficulty becomes more accentuated when the execution control assumes complex countless business processes. This work presents NavigationPlanTool (NPTool), a tool to control the execution of business processes. NPTool is supported by Navigation Plan Definition Language (NPDL), a language for business processes specification that uses process algebra as formal foundation. NPTool implements the NPDL language as a SQL extension. The main contribution of this paper is a description of the NPTool showing how the process algebra features combined with a relational database model can be used to provide a scalable and reliable control in the execution of business processes. The next steps of NPTool include reuse of control-flow patterns and support to data flow management.

  16. Reflective Database Access Control

    ERIC Educational Resources Information Center

    Olson, Lars E.

    2009-01-01

    "Reflective Database Access Control" (RDBAC) is a model in which a database privilege is expressed as a database query itself, rather than as a static privilege contained in an access control list. RDBAC aids the management of database access controls by improving the expressiveness of policies. However, such policies introduce new interactions…

  17. Database Access Systems.

    ERIC Educational Resources Information Center

    Dalrymple, Prudence W.; Roderer, Nancy K.

    1994-01-01

    Highlights the changes that have occurred from 1987-93 in database access systems. Topics addressed include types of databases, including CD-ROMs; enduser interface; database selection; database access management, including library instruction and use of primary literature; economic issues; database users; the search process; and improving…

  18. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1992-01-01

    One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.

  19. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data

    PubMed Central

    2014-01-01

    Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility. PMID:25089180

  20. BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.

    PubMed

    Wu, Hongyan; Fujiwara, Toyofumi; Yamamoto, Yasunori; Bolleman, Jerven; Yamaguchi, Atsuko

    2014-01-01

    Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.

  1. Tunable optical frequency comb enabled scalable and cost-effective multiuser orthogonal frequency-division multiple access passive optical network with source-free optical network units.

    PubMed

    Chen, Chen; Zhang, Chongfu; Liu, Deming; Qiu, Kun; Liu, Shuang

    2012-10-01

    We propose and experimentally demonstrate a multiuser orthogonal frequency-division multiple access passive optical network (OFDMA-PON) with source-free optical network units (ONUs), enabled by tunable optical frequency comb generation technology. By cascading a phase modulator (PM) and an intensity modulator and dynamically controlling the peak-to-peak voltage of a PM driven signal, a tunable optical frequency comb source can be generated. It is utilized to assist the configuration of a multiple source-free ONUs enhanced OFDMA-PON where simultaneous and interference-free multiuser upstream transmission over a single wavelength can be efficiently supported. The proposed multiuser OFDMA-PON is scalable and cost effective, and its feasibility is successfully verified by experiment.

  2. A review of accessibility of administrative healthcare databases in the Asia-Pacific region

    PubMed Central

    Milea, Dominique; Azmi, Soraya; Reginald, Praveen; Verpillat, Patrice; Francois, Clement

    2015-01-01

    Objective We describe and compare the availability and accessibility of administrative healthcare databases (AHDB) in several Asia-Pacific countries: Australia, Japan, South Korea, Taiwan, Singapore, China, Thailand, and Malaysia. Methods The study included hospital records, reimbursement databases, prescription databases, and data linkages. Databases were first identified through PubMed, Google Scholar, and the ISPOR database register. Database custodians were contacted. Six criteria were used to assess the databases and provided the basis for a tool to categorise databases into seven levels ranging from least accessible (Level 1) to most accessible (Level 7). We also categorised overall data accessibility for each country as high, medium, or low based on accessibility of databases as well as the number of academic articles published using the databases. Results Fifty-four administrative databases were identified. Only a limited number of databases allowed access to raw data and were at Level 7 [Medical Data Vision EBM Provider, Japan Medical Data Centre (JMDC) Claims database and Nihon-Chouzai Pharmacy Claims database in Japan, and Medicare, Pharmaceutical Benefits Scheme (PBS), Centre for Health Record Linkage (CHeReL), HealthLinQ, Victorian Data Linkages (VDL), SA-NT DataLink in Australia]. At Levels 3–6 were several databases from Japan [Hamamatsu Medical University Database, Medi-Trend, Nihon University School of Medicine Clinical Data Warehouse (NUSM)], Australia [Western Australia Data Linkage (WADL)], Taiwan [National Health Insurance Research Database (NHIRD)], South Korea [Health Insurance Review and Assessment Service (HIRA)], and Malaysia [United Nations University (UNU)-Casemix]. Countries were categorised as having a high level of data accessibility (Australia, Taiwan, and Japan), medium level of accessibility (South Korea), or a low level of accessibility (Thailand, China, Malaysia, and Singapore). In some countries, data may be available but accessibility was restricted based on requirements by data custodians. Conclusions Compared with previous research, this study describes the landscape of databases in the selected countries with more granularity using an assessment tool developed for this purpose. A high number of databases were identified but most had restricted access, preventing their potential use to support research. We hope that this study helps to improve the understanding of the AHDB landscape, increase data sharing and database research in Asia-Pacific countries. PMID:27123180

  3. Adaptable Information Models in the Global Change Information System

    NASA Astrophysics Data System (ADS)

    Duggan, B.; Buddenberg, A.; Aulenbach, S.; Wolfe, R.; Goldstein, J.

    2014-12-01

    The US Global Change Research Program has sponsored the creation of the Global Change Information System () to provide a web based source of accessible, usable, and timely information about climate and global change for use by scientists, decision makers, and the public. The GCIS played multiple roles during the assembly and release of the Third National Climate Assessment. It provided human and programmable interfaces, relational and semantic representations of information, and discrete identifiers for various types of resources, which could then be manipulated by a distributed team with a wide range of specialties. The GCIS also served as a scalable backend for the web based version of the report. In this talk, we discuss the infrastructure decisions made during the design and deployment of the GCIS, as well as ongoing work to adapt to new types of information. Both a constrained relational database and an open ended triple store are used to ensure data integrity while maintaining fluidity. Using natural primary keys allows identifiers to propagate through both models. Changing identifiers are accomodated through fine grained auditing and explicit mappings to external lexicons. A practical RESTful API is used whose endpoints are also URIs in an ontology. Both the relational schema and the ontology are maleable, and stability is ensured through test driven development and continuous integration testing using modern open source techniques. Content is also validated through continuous testing techniques. A high degres of scalability is achieved through caching.

  4. Database and Analytical Tool Development for the Management of Data Derived from US DOE (NETL) Funded Fine Particulate (PM2.5) Research

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robinson P. Khosah; Frank T. Alex

    2007-02-11

    Advanced Technology Systems, Inc. (ATS) was contracted by the U. S. Department of Energy's National Energy Technology Laboratory (DOE-NETL) to develop a state-of-the-art, scalable and robust web-accessible database application to manage the extensive data sets resulting from the DOE-NETL-sponsored ambient air monitoring programs in the upper Ohio River valley region. The data management system was designed to include a web-based user interface that will allow easy access to the data by the scientific community, policy- and decision-makers, and other interested stakeholders, while providing detailed information on sampling, analytical and quality control parameters. In addition, the system will provide graphical analyticalmore » tools for displaying, analyzing and interpreting the air quality data. The system will also provide multiple report generation capabilities and easy-to-understand visualization formats that can be utilized by the media and public outreach/educational institutions. The project is being conducted in two phases. Phase One includes the following tasks: (1) data inventory/benchmarking, including the establishment of an external stakeholder group; (2) development of a data management system; (3) population of the database; (4) development of a web-based data retrieval system, and (5) establishment of an internal quality assurance/quality control system on data management. Phase Two, which is currently underway, involves the development of a platform for on-line data analysis. Phase Two includes the following tasks: (1) development of a sponsor and stakeholder/user website with extensive online analytical tools; (2) development of a public website; (3) incorporation of an extensive online help system into each website; and (4) incorporation of a graphical representation (mapping) system into each website. The project is now into its forty-eighth month of development activities.« less

  5. DATABASE AND ANALYTICAL TOOL DEVELOPMENT FOR THE MANAGEMENT OF DATA DERIVED FROM US DOE (NETL) FUNDED FINE PARTICULATE (PM 2.5) RESEARCH

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robinson P. Khosah; Charles G. Crawford

    2006-02-11

    Advanced Technology Systems, Inc. (ATS) was contracted by the U. S. Department of Energy's National Energy Technology Laboratory (DOE-NETL) to develop a state-of-the-art, scalable and robust web-accessible database application to manage the extensive data sets resulting from the DOE-NETL-sponsored ambient air monitoring programs in the upper Ohio River valley region. The data management system was designed to include a web-based user interface that will allow easy access to the data by the scientific community, policy- and decision-makers, and other interested stakeholders, while providing detailed information on sampling, analytical and quality control parameters. In addition, the system will provide graphical analyticalmore » tools for displaying, analyzing and interpreting the air quality data. The system will also provide multiple report generation capabilities and easy-to-understand visualization formats that can be utilized by the media and public outreach/educational institutions. The project is being conducted in two phases. Phase One includes the following tasks: (1) data inventory/benchmarking, including the establishment of an external stakeholder group; (2) development of a data management system; (3) population of the database; (4) development of a web-based data retrieval system, and (5) establishment of an internal quality assurance/quality control system on data management. Phase Two, which is currently underway, involves the development of a platform for on-line data analysis. Phase Two includes the following tasks: (1) development of a sponsor and stakeholder/user website with extensive online analytical tools; (2) development of a public website; (3) incorporation of an extensive online help system into each website; and (4) incorporation of a graphical representation (mapping) system into each website. The project is now into its forty-second month of development activities.« less

  6. DATABASE AND ANALYTICAL TOOL DEVELOPMENT FOR THE MANAGEMENT OF DATA DERIVED FROM US DOE (NETL) FUNDED FINE PARTICULATE (PM2.5) RESEARCH

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Robinson P. Khosah; Charles G. Crawford

    Advanced Technology Systems, Inc. (ATS) was contracted by the U. S. Department of Energy's National Energy Technology Laboratory (DOE-NETL) to develop a state-of-the-art, scalable and robust web-accessible database application to manage the extensive data sets resulting from the DOE-NETL-sponsored ambient air monitoring programs in the upper Ohio River valley region. The data management system was designed to include a web-based user interface that will allow easy access to the data by the scientific community, policy- and decision-makers, and other interested stakeholders, while providing detailed information on sampling, analytical and quality control parameters. In addition, the system will provide graphical analyticalmore » tools for displaying, analyzing and interpreting the air quality data. The system will also provide multiple report generation capabilities and easy-to-understand visualization formats that can be utilized by the media and public outreach/educational institutions. The project is being conducted in two phases. Phase 1, which is currently in progress and will take twelve months to complete, will include the following tasks: (1) data inventory/benchmarking, including the establishment of an external stakeholder group; (2) development of a data management system; (3) population of the database; (4) development of a web-based data retrieval system, and (5) establishment of an internal quality assurance/quality control system on data management. In Phase 2, which will be completed in the second year of the project, a platform for on-line data analysis will be developed. Phase 2 will include the following tasks: (1) development of a sponsor and stakeholder/user website with extensive online analytical tools; (2) development of a public website; (3) incorporation of an extensive online help system into each website; and (4) incorporation of a graphical representation (mapping) system into each website. The project is now into its eleventh month of Phase 1 development activities.« less

  7. VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases

    NASA Technical Reports Server (NTRS)

    Roussopoulos, N.; Sellis, Timos

    1993-01-01

    One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.

  8. A Scalable Cloud Library Empowering Big Data Management, Diagnosis, and Visualization of Cloud-Resolving Models

    NASA Astrophysics Data System (ADS)

    Zhou, S.; Tao, W. K.; Li, X.; Matsui, T.; Sun, X. H.; Yang, X.

    2015-12-01

    A cloud-resolving model (CRM) is an atmospheric numerical model that can numerically resolve clouds and cloud systems at 0.25~5km horizontal grid spacings. The main advantage of the CRM is that it can allow explicit interactive processes between microphysics, radiation, turbulence, surface, and aerosols without subgrid cloud fraction, overlapping and convective parameterization. Because of their fine resolution and complex physical processes, it is challenging for the CRM community to i) visualize/inter-compare CRM simulations, ii) diagnose key processes for cloud-precipitation formation and intensity, and iii) evaluate against NASA's field campaign data and L1/L2 satellite data products due to large data volume (~10TB) and complexity of CRM's physical processes. We have been building the Super Cloud Library (SCL) upon a Hadoop framework, capable of CRM database management, distribution, visualization, subsetting, and evaluation in a scalable way. The current SCL capability includes (1) A SCL data model enables various CRM simulation outputs in NetCDF, including the NASA-Unified Weather Research and Forecasting (NU-WRF) and Goddard Cumulus Ensemble (GCE) model, to be accessed and processed by Hadoop, (2) A parallel NetCDF-to-CSV converter supports NU-WRF and GCE model outputs, (3) A technique visualizes Hadoop-resident data with IDL, (4) A technique subsets Hadoop-resident data, compliant to the SCL data model, with HIVE or Impala via HUE's Web interface, (5) A prototype enables a Hadoop MapReduce application to dynamically access and process data residing in a parallel file system, PVFS2 or CephFS, where high performance computing (HPC) simulation outputs such as NU-WRF's and GCE's are located. We are testing Apache Spark to speed up SCL data processing and analysis.With the SCL capabilities, SCL users can conduct large-domain on-demand tasks without downloading voluminous CRM datasets and various observations from NASA Field Campaigns and Satellite data to a local computer, and inter-compare CRM output and data with GCE and NU-WRF.

  9. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    NASA Astrophysics Data System (ADS)

    Dykstra, Dave

    2012-12-01

    One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.

  10. Comparison of the Frontier Distributed Database Caching System to NoSQL Databases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dykstra, Dave

    One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less

  11. Layered video transmission over multirate DS-CDMA wireless systems

    NASA Astrophysics Data System (ADS)

    Kondi, Lisimachos P.; Srinivasan, Deepika; Pados, Dimitris A.; Batalama, Stella N.

    2003-05-01

    n this paper, we consider the transmission of video over wireless direct-sequence code-division multiple access (DS-CDMA) channels. A layered (scalable) video source codec is used and each layer is transmitted over a different CDMA channel. Spreading codes with different lengths are allowed for each CDMA channel (multirate CDMA). Thus, a different number of chips per bit can be used for the transmission of each scalable layer. For a given fixed energy value per chip and chip rate, the selection of a spreading code length affects the transmitted energy per bit and bit rate for each scalable layer. An MPEG-4 source encoder is used to provide a two-layer SNR scalable bitstream. Each of the two layers is channel-coded using Rate-Compatible Punctured Convolutional (RCPC) codes. Then, the data are interleaved, spread, carrier-modulated and transmitted over the wireless channel. A multipath Rayleigh fading channel is assumed. At the other end, we assume the presence of an antenna array receiver. After carrier demodulation, multiple-access-interference suppressing despreading is performed using space-time auxiliary vector (AV) filtering. The choice of the AV receiver is dictated by realistic channel fading rates that limit the data record available for receiver adaptation and redesign. Indeed, AV filter short-data-record estimators have been shown to exhibit superior bit-error-rate performance in comparison with LMS, RLS, SMI, or 'multistage nested Wiener' adaptive filter implementations. Our experimental results demonstrate the effectiveness of multirate DS-CDMA systems for wireless video transmission.

  12. PathwayAccess: CellDesigner plugins for pathway databases.

    PubMed

    Van Hemert, John L; Dickerson, Julie A

    2010-09-15

    CellDesigner provides a user-friendly interface for graphical biochemical pathway description. Many pathway databases are not directly exportable to CellDesigner models. PathwayAccess is an extensible suite of CellDesigner plugins, which connect CellDesigner directly to pathway databases using respective Java application programming interfaces. The process is streamlined for creating new PathwayAccess plugins for specific pathway databases. Three PathwayAccess plugins, MetNetAccess, BioCycAccess and ReactomeAccess, directly connect CellDesigner to the pathway databases MetNetDB, BioCyc and Reactome. PathwayAccess plugins enable CellDesigner users to expose pathway data to analytical CellDesigner functions, curate their pathway databases and visually integrate pathway data from different databases using standard Systems Biology Markup Language and Systems Biology Graphical Notation. Implemented in Java, PathwayAccess plugins run with CellDesigner version 4.0.1 and were tested on Ubuntu Linux, Windows XP and 7, and MacOSX. Source code, binaries, documentation and video walkthroughs are freely available at http://vrac.iastate.edu/~jlv.

  13. Criteria to assess potential reverse innovations: opportunities for shared learning between high- and low-income countries.

    PubMed

    Bhattacharyya, Onil; Wu, Diane; Mossman, Kathryn; Hayden, Leigh; Gill, Pavan; Cheng, Yu-Ling; Daar, Abdallah; Soman, Dilip; Synowiec, Christina; Taylor, Andrea; Wong, Joseph; von Zedtwitz, Max; Zlotkin, Stanley; Mitchell, William; McGahan, Anita

    2017-01-25

    Low- and middle-income countries (LMICs) are developing novel approaches to healthcare that may be relevant to high-income countries (HICs). These include products, services, organizational processes, or policies that improve access, cost, or efficiency of healthcare. However, given the challenge of replication, it is difficult to identify innovations that could be successfully adapted to high-income settings. We present a set of criteria for evaluating the potential impact of LMIC innovations in HIC settings. An initial framework was drafted based on a literature review, and revised iteratively by applying it to LMIC examples from the Center for Health Market Innovations (CHMI) program database. The resulting criteria were then reviewed using a modified Delphi process by the Reverse Innovation Working Group, consisting of 31 experts in medicine, engineering, management and political science, as well as representatives from industry and government, all with an expressed interest in reverse innovation. The resulting 8 criteria are divided into two steps with a simple scoring system. First, innovations are assessed according to their success within the LMIC context according to metrics of improving accessibility, cost-effectiveness, scalability, and overall effectiveness. Next, they are scored for their potential for spread to HICs, according to their ability to address an HIC healthcare challenge, compatibility with infrastructure and regulatory requirements, degree of novelty, and degree of current collaboration with HICs. We use examples to illustrate where programs which appear initially promising may be unlikely to succeed in a HIC setting due to feasibility concerns. This study presents a framework for identifying reverse innovations that may be useful to policymakers and funding agencies interested in identifying novel approaches to addressing cost and access to care in HICs. We solicited expert feedback and consensus on an empirically-derived set of criteria to create a practical tool for funders that can be used directly and tested prospectively using current databases of LMIC programs.

  14. Pulling on the Long Tail with Flyover Country, a Mobile App to Expose, Visualize, Discover, and Explore Open Geoscience Data

    NASA Astrophysics Data System (ADS)

    Myrbo, A.; Loeffler, S.; Ai, S.; McEwan, R.

    2015-12-01

    The ultimate EarthCube product has been described as a mobile app that provides all of the known geoscience data for a geographic point or polygon, from the top of the atmosphere to the core of the Earth, throughout geologic time. The database queries are hidden from the user, and the data are visually rendered for easy recognition of patterns and associations. This fanciful vision is not so remote: NSF EarthCube and Geoinformatics support has already fostered major advances in database interoperability and harmonization of APIs; numerous "domain repositories," databases curated by subject matter experts, now provide a vast wealth of open, easily-accessible georeferenced data on rock and sediment chemistry and mineralogy, paleobiology, stratigraphy, rock magnetics, and more. New datasets accrue daily, including many harvested from the literature by automated means. None of these constitute big data - all are part of the long tail of geoscience, heterogeneous data consisting of relatively small numbers of measurements made by a large number of people, typically on physical samples. This vision of mobile data discovery requires a software package to cleverly expose these domain repositories' holdings; currently, queries mainly come from single investigators to single databases. The NSF-funded mobile app Flyover Country (FC; fc.umn.edu), developed for geoscience outreach and education, has been welcomed by data curators and cyberinfrastructure developers as a testing ground for their API services, data provision, and scalability. FC pulls maps and data within a bounding envelope and caches them for offline use; location-based services alert users to nearby points of interest (POI). The incorporation of data from multiple databases across domains requires parsimonious data requests and novel visualization techniques, especially for mapping of data with a time or stratigraphic depth component. The preservation of data provenance and authority is critical for researcher buy-in to all community databases, and further allows exploration and suggestions of collaborators, based upon geography and topical relevance.

  15. Scalable Video Streaming Relay for Smart Mobile Devices in Wireless Networks

    PubMed Central

    Kwon, Dongwoo; Je, Huigwang; Kim, Hyeonwoo; Ju, Hongtaek; An, Donghyeok

    2016-01-01

    Recently, smart mobile devices and wireless communication technologies such as WiFi, third generation (3G), and long-term evolution (LTE) have been rapidly deployed. Many smart mobile device users can access the Internet wirelessly, which has increased mobile traffic. In 2014, more than half of the mobile traffic around the world was devoted to satisfying the increased demand for the video streaming. In this paper, we propose a scalable video streaming relay scheme. Because many collisions degrade the scalability of video streaming, we first separate networks to prevent excessive contention between devices. In addition, the member device controls the video download rate in order to adapt to video playback. If the data are sufficiently buffered, the member device stops the download. If not, it requests additional video data. We implemented apps to evaluate the proposed scheme and conducted experiments with smart mobile devices. The results showed that our scheme improves the scalability of video streaming in a wireless local area network (WLAN). PMID:27907113

  16. Scalable Video Streaming Relay for Smart Mobile Devices in Wireless Networks.

    PubMed

    Kwon, Dongwoo; Je, Huigwang; Kim, Hyeonwoo; Ju, Hongtaek; An, Donghyeok

    2016-01-01

    Recently, smart mobile devices and wireless communication technologies such as WiFi, third generation (3G), and long-term evolution (LTE) have been rapidly deployed. Many smart mobile device users can access the Internet wirelessly, which has increased mobile traffic. In 2014, more than half of the mobile traffic around the world was devoted to satisfying the increased demand for the video streaming. In this paper, we propose a scalable video streaming relay scheme. Because many collisions degrade the scalability of video streaming, we first separate networks to prevent excessive contention between devices. In addition, the member device controls the video download rate in order to adapt to video playback. If the data are sufficiently buffered, the member device stops the download. If not, it requests additional video data. We implemented apps to evaluate the proposed scheme and conducted experiments with smart mobile devices. The results showed that our scheme improves the scalability of video streaming in a wireless local area network (WLAN).

  17. A federated capability-based access control mechanism for internet of things (IoTs)

    NASA Astrophysics Data System (ADS)

    Xu, Ronghua; Chen, Yu; Blasch, Erik; Chen, Genshe

    2018-05-01

    The prevalence of Internet of Things (IoTs) allows heterogeneous embedded smart devices to collaboratively provide intelligent services with or without human intervention. While leveraging the large-scale IoT-based applications like Smart Gird and Smart Cities, IoT also incurs more concerns on privacy and security. Among the top security challenges that IoTs face is that access authorization is critical in resource and information protection over IoTs. Traditional access control approaches, like Access Control Lists (ACL), Role-based Access Control (RBAC) and Attribute-based Access Control (ABAC), are not able to provide a scalable, manageable and efficient mechanisms to meet requirement of IoT systems. The extraordinary large number of nodes, heterogeneity as well as dynamicity, necessitate more fine-grained, lightweight mechanisms for IoT devices. In this paper, a federated capability-based access control (FedCAC) framework is proposed to enable an effective access control processes to devices, services and information in large scale IoT systems. The federated capability delegation mechanism, based on a propagation tree, is illustrated for access permission propagation. An identity-based capability token management strategy is presented, which involves registering, propagation and revocation of the access authorization. Through delegating centralized authorization decision-making policy to local domain delegator, the access authorization process is locally conducted on the service provider that integrates situational awareness (SAW) and customized contextual conditions. Implemented and tested on both resources-constrained devices, like smart sensors and Raspberry PI, and non-resource-constrained devices, like laptops and smart phones, our experimental results demonstrate the feasibility of the proposed FedCAC approach to offer a scalable, lightweight and fine-grained access control solution to IoT systems connected to a system network.

  18. Visualizing Cloud Properties and Satellite Imagery: A Tool for Visualization and Information Integration

    NASA Astrophysics Data System (ADS)

    Chee, T.; Nguyen, L.; Smith, W. L., Jr.; Spangenberg, D.; Palikonda, R.; Bedka, K. M.; Minnis, P.; Thieman, M. M.; Nordeen, M.

    2017-12-01

    Providing public access to research products including cloud macro and microphysical properties and satellite imagery are a key concern for the NASA Langley Research Center Cloud and Radiation Group. This work describes a web based visualization tool and API that allows end users to easily create customized cloud product and satellite imagery, ground site data and satellite ground track information that is generated dynamically. The tool has two uses, one to visualize the dynamically created imagery and the other to provide access to the dynamically generated imagery directly at a later time. Internally, we leverage our practical experience with large, scalable application practices to develop a system that has the largest potential for scalability as well as the ability to be deployed on the cloud to accommodate scalability issues. We build upon NASA Langley Cloud and Radiation Group's experience with making real-time and historical satellite cloud product information, satellite imagery, ground site data and satellite track information accessible and easily searchable. This tool is the culmination of our prior experience with dynamic imagery generation and provides a way to build a "mash-up" of dynamically generated imagery and related kinds of information that are visualized together to add value to disparate but related information. In support of NASA strategic goals, our group aims to make as much scientific knowledge, observations and products available to the citizen science, research and interested communities as well as for automated systems to acquire the same information for data mining or other analytic purposes. This tool and the underlying API's provide a valuable research tool to a wide audience both as a standalone research tool and also as an easily accessed data source that can easily be mined or used with existing tools.

  19. Selecting the Right Courseware for Your Online Learning Program.

    ERIC Educational Resources Information Center

    O'Mara, Heather

    2000-01-01

    Presents criteria for selecting courseware for online classes. Highlights include ease of use, including navigation; assessment tools; advantages of Java-enabled courseware; advantages of Oracle databases, including scalability; future possibilities for multimedia technology; and open architecture that will integrate with other systems. (LRW)

  20. Dynamic and scalable audio classification by collective network of binary classifiers framework: an evolutionary approach.

    PubMed

    Kiranyaz, Serkan; Mäkinen, Toni; Gabbouj, Moncef

    2012-10-01

    In this paper, we propose a novel framework based on a collective network of evolutionary binary classifiers (CNBC) to address the problems of feature and class scalability. The main goal of the proposed framework is to achieve a high classification performance over dynamic audio and video repositories. The proposed framework adopts a "Divide and Conquer" approach in which an individual network of binary classifiers (NBC) is allocated to discriminate each audio class. An evolutionary search is applied to find the best binary classifier in each NBC with respect to a given criterion. Through the incremental evolution sessions, the CNBC framework can dynamically adapt to each new incoming class or feature set without resorting to a full-scale re-training or re-configuration. Therefore, the CNBC framework is particularly designed for dynamically varying databases where no conventional static classifiers can adapt to such changes. In short, it is entirely a novel topology, an unprecedented approach for dynamic, content/data adaptive and scalable audio classification. A large set of audio features can be effectively used in the framework, where the CNBCs make appropriate selections and combinations so as to achieve the highest discrimination among individual audio classes. Experiments demonstrate a high classification accuracy (above 90%) and efficiency of the proposed framework over large and dynamic audio databases. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. FPGA-based prototype storage system with phase change memory

    NASA Astrophysics Data System (ADS)

    Li, Gezi; Chen, Xiaogang; Chen, Bomy; Li, Shunfen; Zhou, Mi; Han, Wenbing; Song, Zhitang

    2016-10-01

    With the ever-increasing amount of data being stored via social media, mobile telephony base stations, and network devices etc. the database systems face severe bandwidth bottlenecks when moving vast amounts of data from storage to the processing nodes. At the same time, Storage Class Memory (SCM) technologies such as Phase Change Memory (PCM) with unique features like fast read access, high density, non-volatility, byte-addressability, positive response to increasing temperature, superior scalability, and zero standby leakage have changed the landscape of modern computing and storage systems. In such a scenario, we present a storage system called FLEET which can off-load partial or whole SQL queries to the storage engine from CPU. FLEET uses an FPGA rather than conventional CPUs to implement the off-load engine due to its highly parallel nature. We have implemented an initial prototype of FLEET with PCM-based storage. The results demonstrate that significant performance and CPU utilization gains can be achieved by pushing selected query processing components inside in PCM-based storage.

  2. Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment

    PubMed Central

    Anwar, Nadia; Hunt, Ela

    2009-01-01

    Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482

  3. A Node Linkage Approach for Sequential Pattern Mining

    PubMed Central

    Navarro, Osvaldo; Cumplido, René; Villaseñor-Pineda, Luis; Feregrino-Uribe, Claudia; Carrasco-Ochoa, Jesús Ariel

    2014-01-01

    Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms. PMID:24933123

  4. The evolution of monitoring system: the INFN-CNAF case study

    NASA Astrophysics Data System (ADS)

    Bovina, Stefano; Michelotto, Diego

    2017-10-01

    Over the past two years, the operations at CNAF, the ICT center of the Italian Institute for Nuclear Physics, have undergone significant changes. The adoption of configuration management tools, such as Puppet, and the constant increase of dynamic and cloud infrastructures have led us to investigate a new monitoring approach. The present work deals with the centralization of the monitoring service at CNAF through a scalable and highly configurable monitoring infrastructure. The selection of tools has been made taking into account the following requirements given by users: (I) adaptability to dynamic infrastructures, (II) ease of configuration and maintenance, capability to provide more flexibility, (III) compatibility with existing monitoring system, (IV) re-usability and ease of access to information and data. In the paper, the CNAF monitoring infrastructure and its related components are hereafter described: Sensu as monitoring router, InfluxDB as time series database to store data gathered from sensors, Uchiwa as monitoring dashboard and Grafana as a tool to create dashboards and to visualize time series metrics.

  5. Software Defined Networking challenges and future direction: A case study of implementing SDN features on OpenStack private cloud

    NASA Astrophysics Data System (ADS)

    Shamugam, Veeramani; Murray, I.; Leong, J. A.; Sidhu, Amandeep S.

    2016-03-01

    Cloud computing provides services on demand instantly, such as access to network infrastructure consisting of computing hardware, operating systems, network storage, database and applications. Network usage and demands are growing at a very fast rate and to meet the current requirements, there is a need for automatic infrastructure scaling. Traditional networks are difficult to automate because of the distributed nature of their decision making process for switching or routing which are collocated on the same device. Managing complex environments using traditional networks is time-consuming and expensive, especially in the case of generating virtual machines, migration and network configuration. To mitigate the challenges, network operations require efficient, flexible, agile and scalable software defined networks (SDN). This paper discuss various issues in SDN and suggests how to mitigate the network management related issues. A private cloud prototype test bed was setup to implement the SDN on the OpenStack platform to test and evaluate the various network performances provided by the various configurations.

  6. Scalable quantum memory in the ultrastrong coupling regime.

    PubMed

    Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C

    2015-03-02

    Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.

  7. Scalable quantum memory in the ultrastrong coupling regime

    PubMed Central

    Kyaw, T. H.; Felicetti, S.; Romero, G.; Solano, E.; Kwek, L.-C.

    2015-01-01

    Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances. PMID:25727251

  8. High Optical Access Trap 2.0.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Maunz, Peter Lukas Wilhelm

    2016-01-26

    The High Optical Access (HOA) trap was designed in collaboration with the Modular Universal Scalable Ion-trap Quantum Computer (MUSIQC) team, funded along with Sandia National Laboratories through IARPA's Multi Qubit Coherent Operations (MQCO) program. The design of version 1 of the HOA trap was completed in September 2012 and initial devices were completed and packaged in February 2013. The second version of the High Optical Access Trap (HOA-2) was completed in September 2014 and is available at IARPA's disposal.

  9. Problem Reporting System

    NASA Technical Reports Server (NTRS)

    Potter, Don; Serian, Charles; Sweet, Robert; Sapir, Babak; Gamez, Enrique; Mays, David

    2008-01-01

    The Problem Reporting System (PRS) is a Web application, running on two Web servers (load-balanced) and two database servers (RAID-5), which establishes a system for submission, editing, and sharing of reports to manage risk assessment of anomalies identified in NASA's flight projects. PRS consolidates diverse anomaly-reporting systems, maintains a rich database set, and incorporates a robust engine, which allows tracking of any hardware, software, or paper process by configuring an appropriate life cycle. Global and specific project administration and setup tools allow lifecycle tailoring, along with customizable controls for user, e-mail, notifications, and more. PRS is accessible via the World Wide Web for authorized user at most any location. Upon successful log-in, the user receives a customizable window, which displays time-critical 'To Do' items (anomalies requiring the user s input before the system moves the anomaly to the next phase of the lifecycle), anomalies originated by the user, anomalies the user has addressed, and custom queries that can be saved for future use. Access controls exist depending on a user's role as system administrator, project administrator, user, or developer, and then, further by association with user, project, subsystem, company, or item with provisions for business-to-business exclusions, limitations on access according to the covert or overt nature of a given project, all with multiple layers of filtration, as needed. Reporting of metrics is built in. There is a provision for proxy access (in which the user may choose to grant one or more other users to view screens and perform actions as though they were the user, during any part of a tracking life cycle - especially useful during tight build schedules and vacations to keep things moving). The system also provides users the ability to have an anomaly link to or notify other systems, including QA Inspection Reports, Safety, GIDEP (Government-Industry Data Exchange Program) Alert, Corrective Actions, and Lessons Learned. The PRS tracking engine was designed as a very extensible and scalable system, able to support additional applications, with future development possibilities already discussed, including Incident Surprise Anomalies (for anomalies occurring during Operations phases of NASA Flight projects), GIDEP and NASA Alerts, and others.

  10. Energy challenges in optical access and aggregation networks.

    PubMed

    Kilper, Daniel C; Rastegarfar, Houman

    2016-03-06

    Scalability is a critical issue for access and aggregation networks as they must support the growth in both the size of data capacity demands and the multiplicity of access points. The number of connected devices, the Internet of Things, is growing to the tens of billions. Prevailing communication paradigms are reaching physical limitations that make continued growth problematic. Challenges are emerging in electronic and optical systems and energy increasingly plays a central role. With the spectral efficiency of optical systems approaching the Shannon limit, increasing parallelism is required to support higher capacities. For electronic systems, as the density and speed increases, the total system energy, thermal density and energy per bit are moving into regimes that become impractical to support-for example requiring single-chip processor powers above the 100 W limit common today. We examine communication network scaling and energy use from the Internet core down to the computer processor core and consider implications for optical networks. Optical switching in data centres is identified as a potential model from which scalable access and aggregation networks for the future Internet, with the application of integrated photonic devices and intelligent hybrid networking, will emerge. © 2016 The Author(s).

  11. Visualization for genomics: the Microbial Genome Viewer.

    PubMed

    Kerkhoven, Robert; van Enckevort, Frank H J; Boekhorst, Jos; Molenaar, Douwe; Siezen, Roland J

    2004-07-22

    A Web-based visualization tool, the Microbial Genome Viewer, is presented that allows the user to combine complex genomic data in a highly interactive way. This Web tool enables the interactive generation of chromosome wheels and linear genome maps from genome annotation data stored in a MySQL database. The generated images are in scalable vector graphics (SVG) format, which is suitable for creating high-quality scalable images and dynamic Web representations. Gene-related data such as transcriptome and time-course microarray experiments can be superimposed on the maps for visual inspection. The Microbial Genome Viewer 1.0 is freely available at http://www.cmbi.kun.nl/MGV

  12. k-RP*{sub s}: A scalable distributed data structure for high-performance multi-attribute access

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Litwin, W.; Neimat, M.A.

    k-RP*{sub s} is a new data structure for scalable multicomputer files with multi-attribute (k-d) keys. We discuss the k-RP*{sub s} file evolution and search algorithms. Performance analysis shows that a k-RP*{sub s} file can be much larger and orders of magnitude faster than a traditional k-d file. The speed-up is especially important for range and partial match searches that are often impractical with traditional k-d files. This opens up a new perspective for many applications.

  13. CICS Region Virtualization for Cost Effective Application Development

    ERIC Educational Resources Information Center

    Khan, Kamal Waris

    2012-01-01

    Mainframe is used for hosting large commercial databases, transaction servers and applications that require a greater degree of reliability, scalability and security. Customer Information Control System (CICS) is a mainframe software framework for implementing transaction services. It is designed for rapid, high-volume online processing. In order…

  14. OmniPHR: A distributed architecture model to integrate personal health records.

    PubMed

    Roehrs, Alex; da Costa, Cristiano André; da Rosa Righi, Rodrigo

    2017-07-01

    The advances in the Information and Communications Technology (ICT) brought many benefits to the healthcare area, specially to digital storage of patients' health records. However, it is still a challenge to have a unified viewpoint of patients' health history, because typically health data is scattered among different health organizations. Furthermore, there are several standards for these records, some of them open and others proprietary. Usually health records are stored in databases within health organizations and rarely have external access. This situation applies mainly to cases where patients' data are maintained by healthcare providers, known as EHRs (Electronic Health Records). In case of PHRs (Personal Health Records), in which patients by definition can manage their health records, they usually have no control over their data stored in healthcare providers' databases. Thereby, we envision two main challenges regarding PHR context: first, how patients could have a unified view of their scattered health records, and second, how healthcare providers can access up-to-date data regarding their patients, even though changes occurred elsewhere. For addressing these issues, this work proposes a model named OmniPHR, a distributed model to integrate PHRs, for patients and healthcare providers use. The scientific contribution is to propose an architecture model to support a distributed PHR, where patients can maintain their health history in an unified viewpoint, from any device anywhere. Likewise, for healthcare providers, the possibility of having their patients data interconnected among health organizations. The evaluation demonstrates the feasibility of the model in maintaining health records distributed in an architecture model that promotes a unified view of PHR with elasticity and scalability of the solution. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. A public database of macromolecular diffraction experiments.

    PubMed

    Grabowski, Marek; Langner, Karol M; Cymborowski, Marcin; Porebski, Przemyslaw J; Sroka, Piotr; Zheng, Heping; Cooper, David R; Zimmerman, Matthew D; Elsliger, Marc André; Burley, Stephen K; Minor, Wladek

    2016-11-01

    The low reproducibility of published experimental results in many scientific disciplines has recently garnered negative attention in scientific journals and the general media. Public transparency, including the availability of `raw' experimental data, will help to address growing concerns regarding scientific integrity. Macromolecular X-ray crystallography has led the way in requiring the public dissemination of atomic coordinates and a wealth of experimental data, making the field one of the most reproducible in the biological sciences. However, there remains no mandate for public disclosure of the original diffraction data. The Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) has been developed to archive raw data from diffraction experiments and, equally importantly, to provide related metadata. Currently, the database of our resource contains data from 2920 macromolecular diffraction experiments (5767 data sets), accounting for around 3% of all depositions in the Protein Data Bank (PDB), with their corresponding partially curated metadata. IRRMC utilizes distributed storage implemented using a federated architecture of many independent storage servers, which provides both scalability and sustainability. The resource, which is accessible via the web portal at http://www.proteindiffraction.org, can be searched using various criteria. All data are available for unrestricted access and download. The resource serves as a proof of concept and demonstrates the feasibility of archiving raw diffraction data and associated metadata from X-ray crystallographic studies of biological macromolecules. The goal is to expand this resource and include data sets that failed to yield X-ray structures in order to facilitate collaborative efforts that will improve protein structure-determination methods and to ensure the availability of `orphan' data left behind for various reasons by individual investigators and/or extinct structural genomics projects.

  16. Usage Patterns of a Mobile Palliative Care Application.

    PubMed

    Zhang, Haipeng; Liu, David; Marks, Sean; Rickerson, Elizabeth M; Wright, Adam; Gordon, William J; Landman, Adam

    2018-06-01

    Fast Facts Mobile (FFM) was created to be a convenient way for clinicians to access the Fast Facts and Concepts database of palliative care articles on a smartphone or tablet device. We analyzed usage patterns of FFM through an integrated analytics platform on the mobile versions of the FFM application. The primary objective of this study was to evaluate the usage data from FFM as a way to better understand user behavior for FFM as a palliative care educational tool. This is an exploratory, retrospective analysis of de-identified analytics data collected through the iOS and Android versions of FFM captured from November 2015 to November 2016. FFM App download statistics from November 1, 2015, to November 1, 2016, were accessed from the Apple and Google development websites. Further FFM session data were obtained from the analytics platform built into FFM. FFM was downloaded 9409 times over the year with 201,383 articles accessed. The most searched-for terms in FFM include the following: nausea, methadone, and delirium. We compared frequent users of FFM to infrequent users of FFM and found that 13% of all users comprise 66% of all activity in the application. Demand for useful and scalable tools for both primary palliative care and specialty palliative care will likely continue to grow. Understanding the usage patterns for FFM has the potential to inform the development of future versions of Fast Facts. Further studies of mobile palliative care educational tools will be needed to further define the impact of these educational tools.

  17. Factors influencing the implementation, adoption, use, sustainability and scalability of eLearning for family medicine specialty training: a systematic review protocol.

    PubMed

    Cotič, Živa; Rees, Rebecca; Wark, Petra A; Car, Josip

    2016-10-19

    In 2013, there was a shortage of approximately 7.2 million health workers worldwide, which is larger among family physicians than among specialists. eLearning could provide a potential solution to some of these global workforce challenges. However, there is little evidence on factors facilitating or hindering implementation, adoption, use, scalability and sustainability of eLearning. This review aims to synthesise results from qualitative and mixed methods studies to provide insight on factors influencing implementation of eLearning for family medicine specialty education and training. Additionally, this review aims to identify the actions needed to increase effectiveness of eLearning and identify the strategies required to improve eLearning implementation, adoption, use, sustainability and scalability for family medicine speciality education and training. A systematic search will be conducted across a range of databases for qualitative studies focusing on experiences, barriers, facilitators, and other factors related to the implementation, adoption, use, sustainability and scalability of eLearning for family medicine specialty education and training. Studies will be synthesised by using the framework analysis approach. This study will contribute to the evaluation of eLearning implementation, adoption, use, sustainability and scalability for family medicine specialty training and education and the development of eLearning guidelines for postgraduate medical education. PROSPERO http://www.crd.york.ac.uk/PROSPERO/display_record.asp?ID=CRD42016036449.

  18. A slotted access control protocol for metropolitan WDM ring networks

    NASA Astrophysics Data System (ADS)

    Baziana, P. A.; Pountourakis, I. E.

    2009-03-01

    In this study we focus on the serious scalability problems that many access protocols for WDM ring networks introduce due to the use of a dedicated wavelength per access node for either transmission or reception. We propose an efficient slotted MAC protocol suitable for WDM ring metropolitan area networks. The proposed network architecture employs a separate wavelength for control information exchange prior to the data packet transmission. Each access node is equipped with a pair of tunable transceivers for data communication and a pair of fixed tuned transceivers for control information exchange. Also, each access node includes a set of fixed delay lines for synchronization reasons; to keep the data packets, while the control information is processed. An efficient access algorithm is applied to avoid both the data wavelengths and the receiver collisions. In our protocol, each access node is capable of transmitting and receiving over any of the data wavelengths, facing the scalability issues. Two different slot reuse schemes are assumed: the source and the destination stripping schemes. For both schemes, performance measures evaluation is provided via an analytic model. The analytical results are validated by a discrete event simulation model that uses Poisson traffic sources. Simulation results show that the proposed protocol manages efficient bandwidth utilization, especially under high load. Also, comparative simulation results prove that our protocol achieves significant performance improvement as compared with other WDMA protocols which restrict transmission over a dedicated data wavelength. Finally, performance measures evaluation is explored for diverse numbers of buffer size, access nodes and data wavelengths.

  19. Mining and Indexing Graph Databases

    ERIC Educational Resources Information Center

    Yuan, Dayu

    2013-01-01

    Graphs are widely used to model structures and relationships of objects in various scientific and commercial fields. Chemical molecules, proteins, malware system-call dependencies and three-dimensional mechanical parts are all modeled as graphs. In this dissertation, we propose to mine and index those graph data to enable fast and scalable search.…

  20. High Productivity Computing Systems Analysis and Performance

    DTIC Science & Technology

    2005-07-01

    cubic grid Discrete Math Global Updates per second (GUP/S) RandomAccess Paper & Pencil Contact Bob Lucas (ISI) Multiple Precision none...can be found at the web site. One of the HPCchallenge codes, RandomAccess, is derived from the HPCS discrete math benchmarks that we released, and...Kernels Discrete Math … Graph Analysis … Linear Solvers … Signal Processi ng Execution Bounds Execution Indicators 6 Scalable Compact

  1. Cloud-based Web Services for Near-Real-Time Web access to NPP Satellite Imagery and other Data

    NASA Astrophysics Data System (ADS)

    Evans, J. D.; Valente, E. G.

    2010-12-01

    We are building a scalable, cloud computing-based infrastructure for Web access to near-real-time data products synthesized from the U.S. National Polar-Orbiting Environmental Satellite System (NPOESS) Preparatory Project (NPP) and other geospatial and meteorological data. Given recent and ongoing changes in the the NPP and NPOESS programs (now Joint Polar Satellite System), the need for timely delivery of NPP data is urgent. We propose an alternative to a traditional, centralized ground segment, using distributed Direct Broadcast facilities linked to industry-standard Web services by a streamlined processing chain running in a scalable cloud computing environment. Our processing chain, currently implemented on Amazon.com's Elastic Compute Cloud (EC2), retrieves raw data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) and synthesizes data products such as Sea-Surface Temperature, Vegetation Indices, etc. The cloud computing approach lets us grow and shrink computing resources to meet large and rapid fluctuations (twice daily) in both end-user demand and data availability from polar-orbiting sensors. Early prototypes have delivered various data products to end-users with latencies between 6 and 32 minutes. We have begun to replicate machine instances in the cloud, so as to reduce latency and maintain near-real time data access regardless of increased data input rates or user demand -- all at quite moderate monthly costs. Our service-based approach (in which users invoke software processes on a Web-accessible server) facilitates access into datasets of arbitrary size and resolution, and allows users to request and receive tailored and composite (e.g., false-color multiband) products on demand. To facilitate broad impact and adoption of our technology, we have emphasized open, industry-standard software interfaces and open source software. Through our work, we envision the widespread establishment of similar, derived, or interoperable systems for processing and serving near-real-time data from NPP and other sensors. A scalable architecture based on cloud computing ensures cost-effective, real-time processing and delivery of NPP and other data. Access via standard Web services maximizes its interoperability and usefulness.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahn, Gail-Joon

    The project seeks an innovative framework to enable users to access and selectively share resources in distributed environments, enhancing the scalability of information sharing. We have investigated secure sharing & assurance approaches for ad-hoc collaboration, focused on Grids, Clouds, and ad-hoc network environments.

  3. A Cloud-based Approach to Medical NLP

    PubMed Central

    Chard, Kyle; Russell, Michael; Lussier, Yves A.; Mendonça, Eneida A; Silverstein, Jonathan C.

    2011-01-01

    Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN. PMID:22195072

  4. A cloud-based approach to medical NLP.

    PubMed

    Chard, Kyle; Russell, Michael; Lussier, Yves A; Mendonça, Eneida A; Silverstein, Jonathan C

    2011-01-01

    Natural Language Processing (NLP) enables access to deep content embedded in medical texts. To date, NLP has not fulfilled its promise of enabling robust clinical encoding, clinical use, quality improvement, and research. We submit that this is in part due to poor accessibility, scalability, and flexibility of NLP systems. We describe here an approach and system which leverages cloud-based approaches such as virtual machines and Representational State Transfer (REST) to extract, process, synthesize, mine, compare/contrast, explore, and manage medical text data in a flexibly secure and scalable architecture. Available architectures in which our Smntx (pronounced as semantics) system can be deployed include: virtual machines in a HIPAA-protected hospital environment, brought up to run analysis over bulk data and destroyed in a local cloud; a commercial cloud for a large complex multi-institutional trial; and within other architectures such as caGrid, i2b2, or NHIN.

  5. Selective access and editing in a database

    NASA Technical Reports Server (NTRS)

    Maluf, David A. (Inventor); Gawdiak, Yuri O. (Inventor)

    2010-01-01

    Method and system for providing selective access to different portions of a database by different subgroups of database users. Where N users are involved, up to 2.sup.N-1 distinguishable access subgroups in a group space can be formed, where no two access subgroups have the same members. Two or more members of a given access subgroup can edit, substantially simultaneously, a document accessible to each member.

  6. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions.

    PubMed

    Chen, Jinbo; Scholz, Uwe; Zhou, Ruonan; Lange, Matthias

    2018-03-01

    In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.

  7. Bringing the CMS distributed computing system into scalable operations

    NASA Astrophysics Data System (ADS)

    Belforte, S.; Fanfani, A.; Fisk, I.; Flix, J.; Hernández, J. M.; Kress, T.; Letts, J.; Magini, N.; Miccio, V.; Sciabà, A.

    2010-04-01

    Establishing efficient and scalable operations of the CMS distributed computing system critically relies on the proper integration, commissioning and scale testing of the data and workload management tools, the various computing workflows and the underlying computing infrastructure, located at more than 50 computing centres worldwide and interconnected by the Worldwide LHC Computing Grid. Computing challenges periodically undertaken by CMS in the past years with increasing scale and complexity have revealed the need for a sustained effort on computing integration and commissioning activities. The Processing and Data Access (PADA) Task Force was established at the beginning of 2008 within the CMS Computing Program with the mandate of validating the infrastructure for organized processing and user analysis including the sites and the workload and data management tools, validating the distributed production system by performing functionality, reliability and scale tests, helping sites to commission, configure and optimize the networking and storage through scale testing data transfers and data processing, and improving the efficiency of accessing data across the CMS computing system from global transfers to local access. This contribution reports on the tools and procedures developed by CMS for computing commissioning and scale testing as well as the improvements accomplished towards efficient, reliable and scalable computing operations. The activities include the development and operation of load generators for job submission and data transfers with the aim of stressing the experiment and Grid data management and workload management systems, site commissioning procedures and tools to monitor and improve site availability and reliability, as well as activities targeted to the commissioning of the distributed production, user analysis and monitoring systems.

  8. NoSQL: collection document and cloud by using a dynamic web query form

    NASA Astrophysics Data System (ADS)

    Abdalla, Hemn B.; Lin, Jinzhao; Li, Guoquan

    2015-07-01

    Mongo-DB (from "humongous") is an open-source document database and the leading NoSQL database. A NoSQL (Not Only SQL, next generation databases, being non-relational, deal, open-source and horizontally scalable) presenting a mechanism for storage and retrieval of documents. Previously, we stored and retrieved the data using the SQL queries. Here, we use the MonogoDB that means we are not utilizing the MySQL and SQL queries. Directly importing the documents into our Drives, retrieving the documents on that drive by not applying the SQL queries, using the IO BufferReader and Writer, BufferReader for importing our type of document files to my folder (Drive). For retrieving the document files, the usage is BufferWriter from the particular folder (or) Drive. In this sense, providing the security for those storing files for what purpose means if we store the documents in our local folder means all or views that file and modified that file. So preventing that file, we are furnishing the security. The original document files will be changed to another format like in this paper; Binary format is used. Our documents will be converting to the binary format after that direct storing in one of our folder, that time the storage space will provide the private key for accessing that file. Wherever any user tries to discover the Document files means that file data are in the binary format, the document's file owner simply views that original format using that personal key from receive the secret key from the cloud.

  9. Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

    PubMed Central

    Bajaj, Chandrajit

    2009-01-01

    Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231

  10. Scalability of voltage-controlled filamentary and nanometallic resistance memory devices.

    PubMed

    Lu, Yang; Lee, Jong Ho; Chen, I-Wei

    2017-08-31

    Much effort has been devoted to device and materials engineering to realize nanoscale resistance random access memory (RRAM) for practical applications, but a rational physical basis to be relied on to design scalable devices spanning many length scales is still lacking. In particular, there is no clear criterion for switching control in those RRAM devices in which resistance changes are limited to localized nanoscale filaments that experience concentrated heat, electric current and field. Here, we demonstrate voltage-controlled resistance switching, always at a constant characteristic critical voltage, for macro and nanodevices in both filamentary RRAM and nanometallic RRAM, and the latter switches uniformly and does not require a forming process. As a result, area-scalability can be achieved under a device-area-proportional current compliance for the low resistance state of the filamentary RRAM, and for both the low and high resistance states of the nanometallic RRAM. This finding will help design area-scalable RRAM at the nanoscale. It also establishes an analogy between RRAM and synapses, in which signal transmission is also voltage-controlled.

  11. Challenges in Database Design with Microsoft Access

    ERIC Educational Resources Information Center

    Letkowski, Jerzy

    2014-01-01

    Design, development and explorations of databases are popular topics covered in introductory courses taught at business schools. Microsoft Access is the most popular software used in those courses. Despite quite high complexity of Access, it is considered to be one of the most friendly database programs for beginners. A typical Access textbook…

  12. Column Store for GWAC: A High-cadence, High-density, Large-scale Astronomical Light Curve Pipeline and Distributed Shared-nothing Database

    NASA Astrophysics Data System (ADS)

    Wan, Meng; Wu, Chao; Wang, Jing; Qiu, Yulei; Xin, Liping; Mullender, Sjoerd; Mühleisen, Hannes; Scheers, Bart; Zhang, Ying; Nes, Niels; Kersten, Martin; Huang, Yongpan; Deng, Jinsong; Wei, Jianyan

    2016-11-01

    The ground-based wide-angle camera array (GWAC), a part of the SVOM space mission, will search for various types of optical transients by continuously imaging a field of view (FOV) of 5000 degrees2 every 15 s. Each exposure consists of 36 × 4k × 4k pixels, typically resulting in 36 × ˜175,600 extracted sources. For a modern time-domain astronomy project like GWAC, which produces massive amounts of data with a high cadence, it is challenging to search for short timescale transients in both real-time and archived data, and to build long-term light curves for variable sources. Here, we develop a high-cadence, high-density light curve pipeline (HCHDLP) to process the GWAC data in real-time, and design a distributed shared-nothing database to manage the massive amount of archived data which will be used to generate a source catalog with more than 100 billion records during 10 years of operation. First, we develop HCHDLP based on the column-store DBMS of MonetDB, taking advantage of MonetDB’s high performance when applied to massive data processing. To realize the real-time functionality of HCHDLP, we optimize the pipeline in its source association function, including both time and space complexity from outside the database (SQL semantic) and inside (RANGE-JOIN implementation), as well as in its strategy of building complex light curves. The optimized source association function is accelerated by three orders of magnitude. Second, we build a distributed database using a two-level time partitioning strategy via the MERGE TABLE and REMOTE TABLE technology of MonetDB. Intensive tests validate that our database architecture is able to achieve both linear scalability in response time and concurrent access by multiple users. In summary, our studies provide guidance for a solution to GWAC in real-time data processing and management of massive data.

  13. Scalable web services for the PSIPRED Protein Analysis Workbench.

    PubMed

    Buchan, Daniel W A; Minneci, Federico; Nugent, Tim C O; Bryson, Kevin; Jones, David T

    2013-07-01

    Here, we present the new UCL Bioinformatics Group's PSIPRED Protein Analysis Workbench. The Workbench unites all of our previously available analysis methods into a single web-based framework. The new web portal provides a greatly streamlined user interface with a number of new features to allow users to better explore their results. We offer a number of additional services to enable computationally scalable execution of our prediction methods; these include SOAP and XML-RPC web server access and new HADOOP packages. All software and services are available via the UCL Bioinformatics Group website at http://bioinf.cs.ucl.ac.uk/.

  14. A distributed infrastructure for publishing VO services: an implementation

    NASA Astrophysics Data System (ADS)

    Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo

    2016-07-01

    This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.

  15. An Extensible "SCHEMA-LESS" Database Framework for Managing High-Throughput Semi-Structured Documents

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; Tran, Peter B.

    2003-01-01

    Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semistructured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.

  16. An Extensible Schema-less Database Framework for Managing High-throughput Semi-Structured Documents

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; Tran, Peter B.; La, Tracy; Clancy, Daniel (Technical Monitor)

    2002-01-01

    Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword searches of records for both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high throughput open database framework for managing, storing, and searching unstructured or semi structured arbitrary hierarchal models, XML and HTML.

  17. NETMARK: A Schema-less Extension for Relational Databases for Managing Semi-structured Data Dynamically

    NASA Technical Reports Server (NTRS)

    Maluf, David A.; Tran, Peter B.

    2003-01-01

    Object-Relational database management system is an integrated hybrid cooperative approach to combine the best practices of both the relational model utilizing SQL queries and the object-oriented, semantic paradigm for supporting complex data creation. In this paper, a highly scalable, information on demand database framework, called NETMARK, is introduced. NETMARK takes advantages of the Oracle 8i object-relational database using physical addresses data types for very efficient keyword search of records spanning across both context and content. NETMARK was originally developed in early 2000 as a research and development prototype to solve the vast amounts of unstructured and semi-structured documents existing within NASA enterprises. Today, NETMARK is a flexible, high-throughput open database framework for managing, storing, and searching unstructured or semi-structured arbitrary hierarchal models, such as XML and HTML.

  18. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience.

    PubMed

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R; Bock, Davi D; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R Clay; Smith, Stephen J; Szalay, Alexander S; Vogelstein, Joshua T; Vogelstein, R Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes - neural connectivity maps of the brain-using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems-reads to parallel disk arrays and writes to solid-state storage-to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization.

  19. The Open Connectome Project Data Cluster: Scalable Analysis and Vision for High-Throughput Neuroscience

    PubMed Central

    Burns, Randal; Roncal, William Gray; Kleissas, Dean; Lillaney, Kunal; Manavalan, Priya; Perlman, Eric; Berger, Daniel R.; Bock, Davi D.; Chung, Kwanghun; Grosenick, Logan; Kasthuri, Narayanan; Weiler, Nicholas C.; Deisseroth, Karl; Kazhdan, Michael; Lichtman, Jeff; Reid, R. Clay; Smith, Stephen J.; Szalay, Alexander S.; Vogelstein, Joshua T.; Vogelstein, R. Jacob

    2013-01-01

    We describe a scalable database cluster for the spatial analysis and annotation of high-throughput brain imaging data, initially for 3-d electron microscopy image stacks, but for time-series and multi-channel data as well. The system was designed primarily for workloads that build connectomes— neural connectivity maps of the brain—using the parallel execution of computer vision algorithms on high-performance compute clusters. These services and open-science data sets are publicly available at openconnecto.me. The system design inherits much from NoSQL scale-out and data-intensive computing architectures. We distribute data to cluster nodes by partitioning a spatial index. We direct I/O to different systems—reads to parallel disk arrays and writes to solid-state storage—to avoid I/O interference and maximize throughput. All programming interfaces are RESTful Web services, which are simple and stateless, improving scalability and usability. We include a performance evaluation of the production system, highlighting the effec-tiveness of spatial data organization. PMID:24401992

  20. A Scalable Infrastructure for Lidar Topography Data Distribution, Processing, and Discovery

    NASA Astrophysics Data System (ADS)

    Crosby, C. J.; Nandigam, V.; Krishnan, S.; Phan, M.; Cowart, C. A.; Arrowsmith, R.; Baru, C.

    2010-12-01

    High-resolution topography data acquired with lidar (light detection and ranging) technology have emerged as a fundamental tool in the Earth sciences, and are also being widely utilized for ecological, planning, engineering, and environmental applications. Collected from airborne, terrestrial, and space-based platforms, these data are revolutionary because they permit analysis of geologic and biologic processes at resolutions essential for their appropriate representation. Public domain lidar data collection by federal, state, and local agencies are a valuable resource to the scientific community, however the data pose significant distribution challenges because of the volume and complexity of data that must be stored, managed, and processed. Lidar data acquisition may generate terabytes of data in the form of point clouds, digital elevation models (DEMs), and derivative products. This massive volume of data is often challenging to host for resource-limited agencies. Furthermore, these data can be technically challenging for users who lack appropriate software, computing resources, and expertise. The National Science Foundation-funded OpenTopography Facility (www.opentopography.org) has developed a cyberinfrastructure-based solution to enable online access to Earth science-oriented high-resolution lidar topography data, online processing tools, and derivative products. OpenTopography provides access to terabytes of point cloud data, standard DEMs, and Google Earth image data, all co-located with computational resources for on-demand data processing. The OpenTopography portal is built upon a cyberinfrastructure platform that utilizes a Services Oriented Architecture (SOA) to provide a modular system that is highly scalable and flexible enough to support the growing needs of the Earth science lidar community. OpenTopography strives to host and provide access to datasets as soon as they become available, and also to expose greater application level functionalities to our end-users (such as generation of custom DEMs via various gridding algorithms, and hydrological modeling algorithms). In the future, the SOA will enable direct authenticated access to back-end functionality through simple Web service Application Programming Interfaces (APIs), so that users may access our data and compute resources via clients other than Web browsers. In addition to an overview of the OpenTopography SOA, this presentation will discuss our recently developed lidar data ingestion and management system for point cloud data delivered in the binary LAS standard. This system compliments our existing partitioned database approach for data delivered in ASCII format, and permits rapid ingestion of data. The system has significantly reduced data ingestion times and has implications for data distribution in emergency response situations. We will also address on ongoing work to develop a community lidar metadata catalog based on the OGC Catalogue Service for Web (CSW) standard, which will help to centralize discovery of public domain lidar data.

  1. Direct accessibility of mixed-metal (III/II) acid sites through the rational synthesis of porous metal carboxylates.

    PubMed

    Wongsakulphasatch, S; Nouar, F; Rodriguez, J; Scott, L; Le Guillouzer, C; Devic, T; Horcajada, P; Grenèche, J-M; Llewellyn, P L; Vimont, A; Clet, G; Daturi, M; Serre, C

    2015-06-25

    The scalable and environmentally-friendly synthesis of mixed Fe(III)/M(II) (M = Ni, Co, Mg) polycarboxylate porous MOFs based on the Secondary Building Unit approach is reported. A combination of in situ infrared spectroscopy, (57)Fe Mössbauer spectrometry and adsorption microcalorimetry confirms the direct accessibility of the iron(III) and metal(II) sites under low temperature activation conditions.

  2. Flexible services for the support of research.

    PubMed

    Turilli, Matteo; Wallom, David; Williams, Chris; Gough, Steve; Curran, Neal; Tarrant, Richard; Bretherton, Dan; Powell, Andy; Johnson, Matt; Harmer, Terry; Wright, Peter; Gordon, John

    2013-01-28

    Cloud computing has been increasingly adopted by users and providers to promote a flexible, scalable and tailored access to computing resources. Nonetheless, the consolidation of this paradigm has uncovered some of its limitations. Initially devised by corporations with direct control over large amounts of computational resources, cloud computing is now being endorsed by organizations with limited resources or with a more articulated, less direct control over these resources. The challenge for these organizations is to leverage the benefits of cloud computing while dealing with limited and often widely distributed computing resources. This study focuses on the adoption of cloud computing by higher education institutions and addresses two main issues: flexible and on-demand access to a large amount of storage resources, and scalability across a heterogeneous set of cloud infrastructures. The proposed solutions leverage a federated approach to cloud resources in which users access multiple and largely independent cloud infrastructures through a highly customizable broker layer. This approach allows for a uniform authentication and authorization infrastructure, a fine-grained policy specification and the aggregation of accounting and monitoring. Within a loosely coupled federation of cloud infrastructures, users can access vast amount of data without copying them across cloud infrastructures and can scale their resource provisions when the local cloud resources become insufficient.

  3. The EarthServer Federation: State, Role, and Contribution to GEOSS

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Baumann, Peter

    2016-04-01

    The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.

  4. Integrating cancer survivors' experiences into UK cancer registries: design and development of the ePOCS system (electronic Patient-reported Outcomes from Cancer Survivors)

    PubMed Central

    Ashley, L; Jones, H; Thomas, J; Forman, D; Newsham, A; Morris, E; Johnson, O; Velikova, G; Wright, P

    2011-01-01

    Background: Understanding the psychosocial challenges of cancer survivorship, and identifying which patients experience ongoing difficulties, is a key priority. The ePOCS (electronic patient-reported outcomes from cancer survivors) project aims to develop and evaluate a cost-efficient, UK-scalable electronic system for collecting patient-reported outcome measures (PROMs), at regular post-diagnostic timepoints, and linking these with clinical data in cancer registries. Methods: A multidisciplinary team developed the system using agile methods. Design entailed process mapping the system's constituent parts, data flows and involved human activities, and undertaking usability testing. Informatics specialists built new technical components, including a web-based questionnaire tool and tracking database, and established component-connecting data flows. Development challenges were overcome, including patient usability and data linkage and security. Results: We have developed a system in which PROMs are completed online, using a secure questionnaire administration tool, accessed via a public-facing website, and the responses are linked and stored with clinical registry data. Patient monitoring and communications are semiautomated via a tracker database, and patient correspondence is primarily Email-based. The system is currently honed for clinician-led hospital-based patient recruitment. Conclusions: A feasibility test study is underway. Although there are possible challenges to sustaining and scaling up ePOCS, the system has potential to support UK epidemiological PROMs collection and clinical data linkage. PMID:22048035

  5. Reactome graph database: Efficient access to complex pathway data

    PubMed Central

    Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902

  6. Reactome graph database: Efficient access to complex pathway data.

    PubMed

    Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

    2018-01-01

    Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.

  7. The plant phenological online database (PPODB): an online database for long-term phenological data.

    PubMed

    Dierenbach, Jonas; Badeck, Franz-W; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de .

  8. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.

    MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).

  9. Data warehousing with Oracle

    NASA Astrophysics Data System (ADS)

    Shahzad, Muhammad A.

    1999-02-01

    With the emergence of data warehousing, Decision support systems have evolved to its best. At the core of these warehousing systems lies a good database management system. Database server, used for data warehousing, is responsible for providing robust data management, scalability, high performance query processing and integration with other servers. Oracle being the initiator in warehousing servers, provides a wide range of features for facilitating data warehousing. This paper is designed to review the features of data warehousing - conceptualizing the concept of data warehousing and, lastly, features of Oracle servers for implementing a data warehouse.

  10. Fermilab Security Site Access Request Database

    Science.gov Websites

    Fermilab Security Site Access Request Database Use of the online version of the Fermilab Security Site Access Request Database requires that you login into the ESH&Q Web Site. Note: Only Fermilab generated from the ESH&Q Section's Oracle database on May 27, 2018 05:48 AM. If you have a question

  11. 47 CFR 54.410 - Subscriber eligibility determination and certification.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... eligibility by accessing one or more databases containing information regarding the subscriber's income (“income databases”), the eligible telecommunications carrier must access such income databases and... carrier cannot determine a prospective subscriber's income-based eligibility by accessing income databases...

  12. 47 CFR 54.410 - Subscriber eligibility determination and certification.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... eligibility by accessing one or more databases containing information regarding the subscriber's income (“income databases”), the eligible telecommunications carrier must access such income databases and... carrier cannot determine a prospective subscriber's income-based eligibility by accessing income databases...

  13. 47 CFR 54.410 - Subscriber eligibility determination and certification.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... eligibility by accessing one or more databases containing information regarding the subscriber's income (“income databases”), the eligible telecommunications carrier must access such income databases and... carrier cannot determine a prospective subscriber's income-based eligibility by accessing income databases...

  14. The AAS Working Group on Accessibility and Disability (WGAD) Year 1 Highlights and Database Access

    NASA Astrophysics Data System (ADS)

    Knierman, Karen A.; Diaz Merced, Wanda; Aarnio, Alicia; Garcia, Beatriz; Monkiewicz, Jacqueline A.; Murphy, Nicholas Arnold

    2017-06-01

    The AAS Working Group on Accessibility and Disability (WGAD) was formed in January of 2016 with the express purpose of seeking equity of opportunity and building inclusive practices for disabled astronomers at all educational and career stages. In this presentation, we will provide a summary of current activities, focusing on developing best practices for accessibility with respect to astronomical databases, publications, and meetings. Due to the reliance of space sciences on databases, it is important to have user centered design systems for data retrieval. The cognitive overload that may be experienced by users of current databases may be mitigated by use of multi-modal interfaces such as xSonify. Such interfaces would be in parallel or outside the original database and would not require additional software efforts from the original database. WGAD is partnering with the IAU Commission C1 WG Astronomy for Equity and Inclusion to develop such accessibility tools for databases and methods for user testing. To collect data on astronomical conference and meeting accessibility considerations, WGAD solicited feedback from January AAS attendees via a web form. These data, together with upcoming input from the community and analysis of accessibility documents of similar conferences, will be used to create a meeting accessibility document. Additionally, we will update the progress of journal access guidelines and our social media presence via Twitter. We recommend that astronomical journals form committees to evaluate the accessibility of their publications by performing user-centered usability studies.

  15. Advanced Software Development Workstation Project, phase 3

    NASA Technical Reports Server (NTRS)

    1991-01-01

    ACCESS provides a generic capability to develop software information system applications which are explicitly intended to facilitate software reuse. In addition, it provides the capability to retrofit existing large applications with a user friendly front end for preparation of input streams in a way that will reduce required training time, improve the productivity even of experienced users, and increase accuracy. Current and past work shows that ACCESS will be scalable to much larger object bases.

  16. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences

    USDA-ARS?s Scientific Manuscript database

    The iPlant Collaborative provides life science research communities access to comprehensive, scalable, and cohesive computational infrastructure for data management; identify management; collaboration tools; and cloud, high-performance, high-throughput computing. iPlant provides training, learning m...

  17. Real-time Data Access to First Responders: A VORB application

    NASA Astrophysics Data System (ADS)

    Lu, S.; Kim, J. B.; Bryant, P.; Foley, S.; Vernon, F.; Rajasekar, A.; Meier, S.

    2006-12-01

    Getting information to first responders is not an easy task. The sensors that provide the information are diverse in formats and come from many disciplines. They are also distributed by location, transmit data at different frequencies and are managed and owned by autonomous administrative entities. Pulling such types of data in real-time, needs a very robust sensor network with reliable data transport and buffering capabilities. Moreover, the system should be extensible and scalable in numbers and sensor types. ROADNet is a real- time sensor network project at UCSD gathering diverse environmental data in real-time or near-real-time. VORB (Virtual Object Ring Buffer) is the middleware used in ROADNet offering simple, uniform and scalable real-time data management for discovering (through metadata), accessing and archiving real-time data and data streams. Recent development in VORB, a web API, has offered quick and simple real-time data integration with web applications. In this poster, we discuss one application developed as part of ROADNet. SMER (Santa Margarita Ecological Reserve) is located in interior Southern California, a region prone to catastrophic wildfires each summer and fall. To provide data during emergencies, we have applied the VORB framework to develop a web-based application for providing access to diverse sensor data including weather data, heat sensor information, and images from cameras. Wildfire fighters have access to real-time data about weather and heat conditions in the area and view pictures taken from cameras at multiple points in the Reserve to pinpoint problem areas. Moreover, they can browse archived images and sensor data from earlier times to provide a comparison framework. To show scalability of the system, we have expanded the sensor network under consideration through other areas in Southern California including sensors accessible by Los Angeles County Fire Department (LACOFD) and those available through the High Performance Wireless Research and Education Network (HPWREN). The poster will discuss the system architecture and components, the types of sensor being used and usage scenarios. The system is currently operational through the SMER web-site.

  18. LexisNexis

    EPA Pesticide Factsheets

    LexisNexis provides access to electronic legal and non-legal research databases to the Agency's attorneys, administrative law judges, law clerks, investigators, and certain non-legal staff (e.g. staff in the Office of Public Affairs). The agency requires access to the following types of electronic databases: Legal databases, Non-legal databases, Public Records databases, and Financial databases.

  19. Visibiome: an efficient microbiome search engine based on a scalable, distributed architecture.

    PubMed

    Azman, Syafiq Kamarul; Anwar, Muhammad Zohaib; Henschel, Andreas

    2017-07-24

    Given the current influx of 16S rRNA profiles of microbiota samples, it is conceivable that large amounts of them eventually are available for search, comparison and contextualization with respect to novel samples. This process facilitates the identification of similar compositional features in microbiota elsewhere and therefore can help to understand driving factors for microbial community assembly. We present Visibiome, a microbiome search engine that can perform exhaustive, phylogeny based similarity search and contextualization of user-provided samples against a comprehensive dataset of 16S rRNA profiles environments, while tackling several computational challenges. In order to scale to high demands, we developed a distributed system that combines web framework technology, task queueing and scheduling, cloud computing and a dedicated database server. To further ensure speed and efficiency, we have deployed Nearest Neighbor search algorithms, capable of sublinear searches in high-dimensional metric spaces in combination with an optimized Earth Mover Distance based implementation of weighted UniFrac. The search also incorporates pairwise (adaptive) rarefaction and optionally, 16S rRNA copy number correction. The result of a query microbiome sample is the contextualization against a comprehensive database of microbiome samples from a diverse range of environments, visualized through a rich set of interactive figures and diagrams, including barchart-based compositional comparisons and ranking of the closest matches in the database. Visibiome is a convenient, scalable and efficient framework to search microbiomes against a comprehensive database of environmental samples. The search engine leverages a popular but computationally expensive, phylogeny based distance metric, while providing numerous advantages over the current state of the art tool.

  20. A method of demand-driven and data-centric Web service configuration for flexible business process implementation

    NASA Astrophysics Data System (ADS)

    Xu, Boyi; Xu, Li Da; Fei, Xiang; Jiang, Lihong; Cai, Hongming; Wang, Shuai

    2017-08-01

    Facing the rapidly changing business environments, implementation of flexible business process is crucial, but difficult especially in data-intensive application areas. This study aims to provide scalable and easily accessible information resources to leverage business process management. In this article, with a resource-oriented approach, enterprise data resources are represented as data-centric Web services, grouped on-demand of business requirement and configured dynamically to adapt to changing business processes. First, a configurable architecture CIRPA involving information resource pool is proposed to act as a scalable and dynamic platform to virtualise enterprise information resources as data-centric Web services. By exposing data-centric resources as REST services in larger granularities, tenant-isolated information resources could be accessed in business process execution. Second, dynamic information resource pool is designed to fulfil configurable and on-demand data accessing in business process execution. CIRPA also isolates transaction data from business process while supporting diverse business processes composition. Finally, a case study of using our method in logistics application shows that CIRPA provides an enhanced performance both in static service encapsulation and dynamic service execution in cloud computing environment.

  1. Python Winding Itself Around Datacubes: How to Access Massive Multi-Dimensional Arrays in a Pythonic Way

    NASA Astrophysics Data System (ADS)

    Merticariu, Vlad; Misev, Dimitar; Baumann, Peter

    2017-04-01

    While python has developed into the lingua franca in Data Science there is often a paradigm break when accessing specialized tools. In particular for one of the core data categories in science and engineering, massive multi-dimensional arrays, out-of-memory solutions typically employ their own, different models. We discuss this situation on the example of the scalable open-source array engine, rasdaman ("raster data manager") which offers access to and processing of Petascale multi-dimensional arrays through an SQL-style array query language, rasql. Such queries are executed in the server on a storage engine utilizing adaptive array partitioning and based on a processing engine implementing a "tile streaming" paradigm to allow processing of arrays massively larger than server RAM. The rasdaman QL has acted as blueprint for forthcoming ISO Array SQL and the Open Geospatial Consortium (OGC) geo analytics language, Web Coverage Processing Service, adopted in 2008. Not surprisingly, rasdaman is OGC and INSPIRE Reference Implementation for their "Big Earth Data" standards suite. Recently, rasdaman has been augmented with a python interface which allows to transparently interact with the database (credits go to Siddharth Shukla's Master Thesis at Jacobs University). Programmers do not need to know the rasdaman query language, as the operators are silently transformed, through lazy evaluation, into queries. Arrays delivered are likewise automatically transformed into their python representation. In the talk, the rasdaman concept will be illustrated with the help of large-scale real-life examples of operational satellite image and weather data services, and sample python code.

  2. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    PubMed

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

  3. Science Initiatives of the US Virtual Astronomical Observatory

    NASA Astrophysics Data System (ADS)

    Hanisch, R. J.

    2012-09-01

    The United States Virtual Astronomical Observatory program is the operational facility successor to the National Virtual Observatory development project. The primary goal of the US VAO is to build on the standards, protocols, and associated infrastructure developed by NVO and the International Virtual Observatory Alliance partners and to bring to fruition a suite of applications and web-based tools that greatly enhance the research productivity of professional astronomers. To this end, and guided by the advice of our Science Council (Fabbiano et al. 2011), we have focused on five science initiatives in the first two years of VAO operations: 1) scalable cross-comparisons between astronomical source catalogs, 2) dynamic spectral energy distribution construction, visualization, and model fitting, 3) integration and periodogram analysis of time series data from the Harvard Time Series Center and NASA Star and Exoplanet Database, 4) integration of VO data discovery and access tools into the IRAF data analysis environment, and 5) a web-based portal to VO data discovery, access, and display tools. We are also developing tools for data linking and semantic discovery, and have a plan for providing data mining and advanced statistical analysis resources for VAO users. Initial versions of these applications and web-based services are being released over the course of the summer and fall of 2011, with further updates and enhancements planned for throughout 2012 and beyond.

  4. The plant phenological online database (PPODB): an online database for long-term phenological data

    NASA Astrophysics Data System (ADS)

    Dierenbach, Jonas; Badeck, Franz-W.; Schaber, Jörg

    2013-09-01

    We present an online database that provides unrestricted and free access to over 16 million plant phenological observations from over 8,000 stations in Central Europe between the years 1880 and 2009. Unique features are (1) a flexible and unrestricted access to a full-fledged database, allowing for a wide range of individual queries and data retrieval, (2) historical data for Germany before 1951 ranging back to 1880, and (3) more than 480 curated long-term time series covering more than 100 years for individual phenological phases and plants combined over Natural Regions in Germany. Time series for single stations or Natural Regions can be accessed through a user-friendly graphical geo-referenced interface. The joint databases made available with the plant phenological database PPODB render accessible an important data source for further analyses of long-term changes in phenology. The database can be accessed via www.ppodb.de .

  5. 47 CFR 15.711 - Interference avoidance methods.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... channel availability for a TVBD is determined based on the geo-location and database access method described in paragraphs (a) and (b) of this section. (a) Geo-location and database access. A TVBD shall rely on the geo-location and database access mechanism to identify available television channels...

  6. 47 CFR 15.711 - Interference avoidance methods.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... channel availability for a TVBD is determined based on the geo-location and database access method described in paragraphs (a) and (b) of this section. (a) Geo-location and database access. A TVBD shall rely on the geo-location and database access mechanism to identify available television channels...

  7. 47 CFR 15.711 - Interference avoidance methods.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... channel availability for a TVBD is determined based on the geo-location and database access method described in paragraphs (a) and (b) of this section. (a) Geo-location and database access. A TVBD shall rely on the geo-location and database access mechanism to identify available television channels...

  8. 47 CFR 15.711 - Interference avoidance methods.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... channel availability for a TVBD is determined based on the geo-location and database access method described in paragraphs (a) and (b) of this section. (a) Geo-location and database access. A TVBD shall rely on the geo-location and database access mechanism to identify available television channels...

  9. WASP: a Web-based Allele-Specific PCR assay designing tool for detecting SNPs and mutations

    PubMed Central

    Wangkumhang, Pongsakorn; Chaichoompu, Kridsadakorn; Ngamphiw, Chumpol; Ruangrit, Uttapong; Chanprasert, Juntima; Assawamakin, Anunchai; Tongsima, Sissades

    2007-01-01

    Background Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) and mutations. It is applied in many recent studies including population genetics, molecular genetics and pharmacogenomics. Using known AS primer design tools to create primers leads to cumbersome process to inexperience users since information about SNP/mutation must be acquired from public databases prior to the design. Furthermore, most of these tools do not offer the mismatch enhancement to designed primers. The available web applications do not provide user-friendly graphical input interface and intuitive visualization of their primer results. Results This work presents a web-based AS primer design application called WASP. This tool can efficiently design AS primers for human SNPs as well as mutations. To assist scientists with collecting necessary information about target polymorphisms, this tool provides a local SNP database containing over 10 million SNPs of various populations from public domain databases, namely NCBI dbSNP, HapMap and JSNP respectively. This database is tightly integrated with the tool so that users can perform the design for existing SNPs without going off the site. To guarantee specificity of AS primers, the proposed system incorporates a primer specificity enhancement technique widely used in experiment protocol. In particular, WASP makes use of different destabilizing effects by introducing one deliberate 'mismatch' at the penultimate (second to last of the 3'-end) base of AS primers to improve the resulting AS primers. Furthermore, WASP offers graphical user interface through scalable vector graphic (SVG) draw that allow users to select SNPs and graphically visualize designed primers and their conditions. Conclusion WASP offers a tool for designing AS primers for both SNPs and mutations. By integrating the database for known SNPs (using gene ID or rs number), this tool facilitates the awkward process of getting flanking sequences and other related information from public SNP databases. It takes into account the underlying destabilizing effect to ensure the effectiveness of designed primers. With user-friendly SVG interface, WASP intuitively presents resulting designed primers, which assist users to export or to make further adjustment to the design. This software can be freely accessed at . PMID:17697334

  10. CloudMan as a platform for tool, data, and analysis distribution.

    PubMed

    Afgan, Enis; Chapman, Brad; Taylor, James

    2012-11-27

    Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.

  11. Reducing the language content in ToM tests: A developmental scale.

    PubMed

    Burnel, Morgane; Perrone-Bertolotti, Marcela; Reboul, Anne; Baciu, Monica; Durrleman, Stephanie

    2018-02-01

    The goal of the current study was to statistically evaluate the reliable scalability of a set of tasks designed to assess Theory of Mind (ToM) without language as a confounding variable. This tool might be useful to study ToM in populations where language is impaired or to study links between language and ToM. Low verbal versions of the ToM tasks proposed by Wellman and Liu (2004) for their scale were tested in 234 children (2.5 years to 11.9 years). Results showed that 5 of the tasks formed a scale according to both Guttman and Rasch models whereas all 6 tasks could form a scale according to the Rasch model only. The main difference from the original scale was that the Explicit False Belief task could be included whereas the Knowledge Access (KA) task could not. The authors argue that the more verbal version of the KA task administered in previous studies could have measured language understanding rather than ToM. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  12. A distributed cloud-based cyberinfrastructure framework for integrated bridge monitoring

    NASA Astrophysics Data System (ADS)

    Jeong, Seongwoon; Hou, Rui; Lynch, Jerome P.; Sohn, Hoon; Law, Kincho H.

    2017-04-01

    This paper describes a cloud-based cyberinfrastructure framework for the management of the diverse data involved in bridge monitoring. Bridge monitoring involves various hardware systems, software tools and laborious activities that include, for examples, a structural health monitoring (SHM), sensor network, engineering analysis programs and visual inspection. Very often, these monitoring systems, tools and activities are not coordinated, and the collected information are not shared. A well-designed integrated data management framework can support the effective use of the data and, thereby, enhance bridge management and maintenance operations. The cloud-based cyberinfrastructure framework presented herein is designed to manage not only sensor measurement data acquired from the SHM system, but also other relevant information, such as bridge engineering model and traffic videos, in an integrated manner. For the scalability and flexibility, cloud computing services and distributed database systems are employed. The information stored can be accessed through standard web interfaces. For demonstration, the cyberinfrastructure system is implemented for the monitoring of the bridges located along the I-275 Corridor in the state of Michigan.

  13. Freely Accessible Chemical Database Resources of Compounds for in Silico Drug Discovery.

    PubMed

    Yang, JingFang; Wang, Di; Jia, Chenyang; Wang, Mengyao; Hao, GeFei; Yang, GuangFu

    2018-05-07

    In silico drug discovery has been proved to be a solidly established key component in early drug discovery. However, this task is hampered by the limitation of quantity and quality of compound databases for screening. In order to overcome these obstacles, freely accessible database resources of compounds have bloomed in recent years. Nevertheless, how to choose appropriate tools to treat these freely accessible databases are crucial. To the best of our knowledge, this is the first systematic review on this issue. The existed advantages and drawbacks of chemical databases were analyzed and summarized based on the collected six categories of freely accessible chemical databases from literature in this review. Suggestions on how and in which conditions the usage of these databases could be reasonable were provided. Tools and procedures for building 3D structure chemical libraries were also introduced. In this review, we described the freely accessible chemical database resources for in silico drug discovery. In particular, the chemical information for building chemical database appears as attractive resources for drug design to alleviate experimental pressure. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  14. Correlates of Access to Business Research Databases

    ERIC Educational Resources Information Center

    Gottfried, John C.

    2010-01-01

    This study examines potential correlates of business research database access through academic libraries serving top business programs in the United States. Results indicate that greater access to research databases is related to enrollment in graduate business programs, but not to overall enrollment or status as a public or private institution.…

  15. 47 CFR 51.217 - Nondiscriminatory access: Telephone numbers, operator services, directory assistance services...

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... to have access to its directory assistance services, including directory assistance databases, so... provider, including transfer of the LECs' directory assistance databases in readily accessible magnetic.... Updates to the directory assistance database shall be made in the same format as the initial transfer...

  16. Multifarenes: new modular cavitands.

    PubMed

    Parvari, Galit; Annamalai, Senthilmurugan; Borovoi, Iris; Chechik, Helena; Botoshansky, Mark; Pappo, Doron; Keinan, Ehud

    2014-03-07

    Multifarenes, a new class of macrocycles, which are constructed of alternating building blocks, are conveniently accessible by three complementary syntheses that provide modularity and scalability. In addition to metal-ion coordination, these cavitands show increased flexibility with increasing ring size, offering opportunities for induced fit to guest molecules.

  17. PACKMAN-Net: A Distributed, Open-Access, and Scalable Network of User-Friendly Space Weather Stations

    NASA Astrophysics Data System (ADS)

    Zorzano, M.-P.; Martín-Torres, J.; Mathanlal, T.; Vakkada Ramachandran, A.; Ramirez-Luque, J.-A.

    2018-04-01

    The purpose of this work is to demonstrate the operability of a network of small-sized detectors of the PACKMAN instrument, operated simultaneously to provide real time cosmic ray and solar activity monitoring over the entire planet.

  18. A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

    PubMed

    Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A

    2014-10-12

    BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.

  19. Accessing Data Federations with CVMFS

    DOE PAGES

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave; ...

    2017-11-23

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the datamore » scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. Here, we will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.« less

  20. Accessing Data Federations with CVMFS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the datamore » scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. Here, we will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.« less

  1. Accessing Data Federations with CVMFS

    NASA Astrophysics Data System (ADS)

    Weitzel, Derek; Bockelman, Brian; Dykstra, Dave; Blomer, Jakob; Meusel, Ren

    2017-10-01

    Data federations have become an increasingly common tool for large collaborations such as CMS and Atlas to efficiently distribute large data files. Unfortunately, these typically are implemented with weak namespace semantics and a non-POSIX API. On the other hand, CVMFS has provided a POSIX-compliant read-only interface for use cases with a small working set size (such as software distribution). The metadata required for the CVMFS POSIX interface is distributed through a caching hierarchy, allowing it to scale to the level of about a hundred thousand hosts. In this paper, we will describe our contributions to CVMFS that merges the data scalability of XRootD-based data federations (such as AAA) with metadata scalability and POSIX interface of CVMFS. We modified CVMFS so it can serve unmodified files without copying them to the repository server. CVMFS 2.2.0 is also able to redirect requests for data files to servers outside of the CVMFS content distribution network. Finally, we added the ability to manage authorization and authentication using security credentials such as X509 proxy certificates. We combined these modifications with the OSGs StashCache regional XRootD caching infrastructure to create a cached data distribution network. We will show performance metrics accessing the data federation through CVMFS compared to direct data federation access. Additionally, we will discuss the improved user experience of providing access to a data federation through a POSIX filesystem.

  2. Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.

    PubMed

    Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios

    2016-01-01

    The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language (VHDL) and validated using an Field-Programmable Gate Array (FPGA) implementation.

  3. NCBI2RDF: enabling full RDF-based access to NCBI databases.

    PubMed

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.

  4. SRD: a Staphylococcus regulatory RNA database.

    PubMed

    Sassi, Mohamed; Augagneur, Yoann; Mauro, Tony; Ivain, Lorraine; Chabelskaya, Svetlana; Hallier, Marc; Sallou, Olivier; Felden, Brice

    2015-05-01

    An overflow of regulatory RNAs (sRNAs) was identified in a wide range of bacteria. We designed and implemented a new resource for the hundreds of sRNAs identified in Staphylococci, with primary focus on the human pathogen Staphylococcus aureus. The "Staphylococcal Regulatory RNA Database" (SRD, http://srd.genouest.org/) compiled all published data in a single interface including genetic locations, sequences and other features. SRD proposes novel and simplified identifiers for Staphylococcal regulatory RNAs (srn) based on the sRNA's genetic location in S. aureus strain N315 which served as a reference. From a set of 894 sequences and after an in-depth cleaning, SRD provides a list of 575 srn exempt of redundant sequences. For each sRNA, their experimental support(s) is provided, allowing the user to individually assess their validity and significance. RNA-seq analysis performed on strains N315, NCTC8325, and Newman allowed us to provide further details, upgrade the initial annotation, and identified 159 RNA-seq independent transcribed sRNAs. The lists of 575 and 159 sRNAs sequences were used to predict the number and location of srns in 18 S. aureus strains and 10 other Staphylococci. A comparison of the srn contents within 32 Staphylococcal genomes revealed a poor conservation between species. In addition, sRNA structure predictions obtained with MFold are accessible. A BLAST server and the intaRNA program, which is dedicated to target prediction, were implemented. SRD is the first sRNA database centered on a genus; it is a user-friendly and scalable device with the possibility to submit new sequences that should spread in the literature. © 2015 Sassi et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  5. Protein Simulation Data in the Relational Model.

    PubMed

    Simms, Andrew M; Daggett, Valerie

    2012-10-01

    High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost-significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server.

  6. Protein Simulation Data in the Relational Model

    PubMed Central

    Simms, Andrew M.; Daggett, Valerie

    2011-01-01

    High performance computing is leading to unprecedented volumes of data. Relational databases offer a robust and scalable model for storing and analyzing scientific data. However, these features do not come without a cost—significant design effort is required to build a functional and efficient repository. Modeling protein simulation data in a relational database presents several challenges: the data captured from individual simulations are large, multi-dimensional, and must integrate with both simulation software and external data sites. Here we present the dimensional design and relational implementation of a comprehensive data warehouse for storing and analyzing molecular dynamics simulations using SQL Server. PMID:23204646

  7. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  8. Building the European Seismological Research Infrastructure: results from 4 years NERIES EC project

    NASA Astrophysics Data System (ADS)

    van Eck, T.; Giardini, D.

    2010-12-01

    The EC Research Infrastructure (RI) project, Network of Research Infrastructures for European Seismology (NERIES), implemented a comprehensive European integrated RI for earthquake seismological data that is scalable and sustainable. NERIES opened a significant amount of additional seismological data, integrated different distributed data archives, implemented and produced advanced analysis tools and advanced software packages and tools. A single seismic data portal provides a single access point and overview for European seismological data available for the earth science research community. Additional data access tools and sites have been implemented to meet user and robustness requirements, notably those at the EMSC and ORFEUS. The datasets compiled in NERIES and available through the portal include among others: - The expanded Virtual European Broadband Seismic Network (VEBSN) with real-time access to more then 500 stations from > 53 observatories. This data is continuously monitored, quality controlled and archived in the European Integrated Distributed waveform Archive (EIDA). - A unique integration of acceleration datasets from seven networks in seven European or associated countries centrally accessible in a homogeneous format, thus forming the core comprehensive European acceleration database. Standardized parameter analysis and actual software are included in the database. - A Distributed Archive of Historical Earthquake Data (AHEAD) for research purposes, containing among others a comprehensive European Macroseismic Database and Earthquake Catalogue (1000 - 1963, M ≥5.8), including analysis tools. - Data from 3 one year OBS deployments at three sites, Atlantic, Ionian and Ligurian Sea within the general SEED format, thus creating the core integrated data base for ocean, sea and land based seismological observatories. Tools to facilitate analysis and data mining of the RI datasets are: - A comprehensive set of European seismological velocity reference model including a standardized model description with several visualisation tools currently adapted on a global scale. - An integrated approach to seismic hazard modelling and forecasting, a community accepted forecasting testing and model validation approach and the core hazard portal developed along the same technologies as the NERIES data portal. - Implemented homogeneous shakemap estimation tools at several large European observatories and a complementary new loss estimation software tool. - A comprehensive set of new techniques for geotechnical site characterization with relevant software packages documented and maintained (www.geopsy.org). - A set of software packages for data mining, data reduction, data exchange and information management in seismology as research and observatory analysis tools NERIES has a long-term impact and is coordinated with related US initiatives IRIS and EarthScope. The follow-up EC project of NERIES, NERA (2010 - 2014), is funded and will integrate the seismological and the earthquake engineering infrastructures. NERIES further provided the proof of concept for the ESFRI2008 initiative: the European Plate Observing System (EPOS). Its preparatory phase (2010 - 2014) is also funded by the EC.

  9. DoD Identity Matching Engine for Security and Analysis (IMESA) Access to Criminal Justice Information (CJI) and Terrorist Screening Databases (TSDB)

    DTIC Science & Technology

    2016-05-04

    IMESA) Access to Criminal Justice Information (CJI) and Terrorist Screening Databases (TSDB) References: See Enclosure 1 1. PURPOSE. In...CJI database mirror image files. (3) Memorandums of understanding with the FBI CJIS as the data broker for DoD organizations that need access ...not for access determinations. (3) Legal restrictions established by the Sex Offender Registration and Notification Act (SORNA) jurisdictions on

  10. Archiving and Near Real Time Visualization of USGS Instantaneous Data

    NASA Astrophysics Data System (ADS)

    Zaslavsky, I.; Ryan, D.; Whitenack, T.; Valentine, D. W.; Rodriguez, M.

    2009-12-01

    The CUAHSI Hydrologic Information System project has been developing databases, services and online and desktop software applications supporting standards-based publication and access to large volumes of hydrologic data from US federal agencies and academic partners. In particular, the CUAHSI WaterML 1.x schema specification for exchanging hydrologic time series, earlier published as an OGC Discussion Paper (2007), has been adopted by the United States Geological Survey to provide web service access to USGS daily values and instantaneous data. The latter service, making available raw measurements of discharge, gage height and several other parameters for over 10,000 USGS real time measurement points, was announced by USGS, as an experimental WaterML-compliant service, at the end of July 2009. We demonstrate an online application that leverages the new service for nearly continuous harvesting of USGS real time data, and simultaneous visualization and analysis of the data streams. To make this possible, we integrate service components of the CUAHSI software stack with Open Source Data Turbine (OSDT) system, an NSF-supported software environment for robust and scalable assimilation of multimedia data streams (e.g. from sensors), and interfacing with a variety of viewers, databases, archival systems and client applications. Our application continuously queries USGS Instantaneous water data service (which provides access to 15-min measurements updated at USGS every 4 hours), and maps the results for each station-variable combination to a separate "channel", which is used by OSDT to quickly access and manipulate the time series. About 15,000 channels are used, which makes it by far the largest deployment of OSDT. Using RealTime Data Viewer, users can now select one or more stations of interest (e.g. from upstream or downstream from each other), and observe and annotate simultaneous dynamics in the respective discharge and gage height values, using fast forward or backward modes, real-time mode, etc. Memory management, scheduling service-based retrieval from USGS web services, and organizing access to 7,330 selected stations, turned out to be the major challenges in this project. To allow station navigation, they are grouped by state and county in the user interface. Memory footprint has been monitored under different Java VM settings, to find the correct regime. These and other solutions are discussed in the paper, and accompanied with a series of examples of simultaneous visualization of discharge from multiple stations as a component of hydrologic analysis.

  11. The NCAR Research Data Archive's Hybrid Approach for Data Discovery and Access

    NASA Astrophysics Data System (ADS)

    Schuster, D.; Worley, S. J.

    2013-12-01

    The NCAR Research Data Archive (RDA http://rda.ucar.edu) maintains a variety of data discovery and access capabilities for it's 600+ dataset collections to support the varying needs of a diverse user community. In-house developed and standards-based community tools offer services to more than 10,000 users annually. By number of users the largest group is external and access the RDA through web based protocols; the internal NCAR HPC users are fewer in number, but typically access more data volume. This paper will detail the data discovery and access services maintained by the RDA to support both user groups, and show metrics that illustrate how the community is using the services. The distributed search capability enabled by standards-based community tools, such as Geoportal and an OAI-PMH access point that serves multiple metadata standards, provide pathways for external users to initially discover RDA holdings. From here, in-house developed web interfaces leverage primary discovery level metadata databases that support keyword and faceted searches. Internal NCAR HPC users, or those familiar with the RDA, may go directly to the dataset collection of interest and refine their search based on rich file collection metadata. Multiple levels of metadata have proven to be invaluable for discovery within terabyte-sized archives composed of many atmospheric or oceanic levels, hundreds of parameters, and often numerous grid and time resolutions. Once users find the data they want, their access needs may vary as well. A THREDDS data server running on targeted dataset collections enables remote file access through OPENDAP and other web based protocols primarily for external users. In-house developed tools give all users the capability to submit data subset extraction and format conversion requests through scalable, HPC based delayed mode batch processing. Users can monitor their RDA-based data processing progress and receive instructions on how to access the data when it is ready. External users are provided with RDA server generated scripts to download the resulting request output. Similarly they can download native dataset collection files or partial files using Wget or cURL based scripts supplied by the RDA server. Internal users can access the resulting request output or native dataset collection files directly from centralized file systems.

  12. WaveNet: A Web-Based Metocean Data Access, Processing and Analysis Tool. Part 4 - GLOS/GLCFS Database

    DTIC Science & Technology

    2014-06-01

    and Coastal Data Information Program ( CDIP ). This User’s Guide includes step-by-step instructions for accessing the GLOS/GLCFS database via WaveNet...access, processing and analysis tool; part 3 – CDIP database. ERDC/CHL CHETN-xx-14. Vicksburg, MS: U.S. Army Engineer Research and Development Center

  13. Evaluation of an Online Instructional Database Accessed by QR Codes to Support Biochemistry Practical Laboratory Classes

    ERIC Educational Resources Information Center

    Yip, Tor; Melling, Louise; Shaw, Kirsty J.

    2016-01-01

    An online instructional database containing information on commonly used pieces of laboratory equipment was created. In order to make the database highly accessible and to promote its use, QR codes were utilized. The instructional materials were available anytime and accessed using QR codes located on the equipment itself and within undergraduate…

  14. Using ontology databases for scalable query answering, inconsistency detection, and data integration

    PubMed Central

    Dou, Dejing

    2011-01-01

    An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases. PMID:22163378

  15. An Ephemeral Burst-Buffer File System for Scientific Applications

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Teng; Moody, Adam; Yu, Weikuan

    BurstFS is a distributed file system for node-local burst buffers on high performance computing systems. BurstFS presents a shared file system space across the burst buffers so that applications that use shared files can access the highly-scalable burst buffers without changing their applications.

  16. Social Media Tools for Teaching and Learning

    ERIC Educational Resources Information Center

    Wagner, Ronald

    2011-01-01

    According to Wikipedia, "social media is the media designed to be disseminated through social interaction, created using highly accessible scalable techniques. Social media is the use of web-based and mobile technologies to turn communication into interactive dialogue." Social networks, such as Facebook and Twitter, contain millions of members who…

  17. Scalable and Accurate SMT-Based Model Checking of Data Flow Systems

    DTIC Science & Technology

    2013-10-31

    accessed from C, C++, Java, and OCaml , and provisions have been made to support other languages . CVC4 can be compiled and run on various flavors of...be accessed from C, C++, Java, and OCaml , and provisions have been made to support other languages . CVC4 can be compiled and run on various flavors of...C, C++, Java, and OCaml , and provisions have been made to support other languages . CVC4 can be compiled and run on various flavors of Linux, Mac OS

  18. A graphics to scalable vector graphics adaptation framework for progressive remote line rendering on mobile devices

    NASA Astrophysics Data System (ADS)

    Le, Minh Tuan; Nguyen, Congdu; Yoon, Dae-Il; Jung, Eun Ku; Kim, Hae-Kwang

    2007-12-01

    In this paper, we introduce a graphics to Scalable Vector Graphics (SVG) adaptation framework with a mechanism of vector graphics transmission to overcome the shortcoming of real-time representation and interaction experiences of 3D graphics application running on mobile devices. We therefore develop an interactive 3D visualization system based on the proposed framework for rapidly representing a 3D scene on mobile devices without having to download it from the server. Our system scenario is composed of a client viewer and a graphic to SVG adaptation server. The client viewer offers the user to access to the same 3D contents with different devices according to consumer interactions.

  19. The EDRN knowledge environment: an open source, scalable informatics platform for biological sciences research

    NASA Astrophysics Data System (ADS)

    Crichton, Daniel; Mahabal, Ashish; Anton, Kristen; Cinquini, Luca; Colbert, Maureen; Djorgovski, S. George; Kincaid, Heather; Kelly, Sean; Liu, David

    2017-05-01

    We describe here the Early Detection Research Network (EDRN) for Cancer's knowledge environment. It is an open source platform built by NASA's Jet Propulsion Laboratory with contributions from the California Institute of Technology, and Giesel School of Medicine at Dartmouth. It uses tools like Apache OODT, Plone, and Solr, and borrows heavily from JPL's Planetary Data System's ontological infrastructure. It has accumulated data on hundreds of thousands of biospecemens and serves over 1300 registered users across the National Cancer Institute (NCI). The scalable computing infrastructure is built such that we are being able to reach out to other agencies, provide homogeneous access, and provide seamless analytics support and bioinformatics tools through community engagement.

  20. CloudMan as a platform for tool, data, and analysis distribution

    PubMed Central

    2012-01-01

    Background Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way. Results CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations. Conclusions With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions. PMID:23181507

  1. A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

    NASA Astrophysics Data System (ADS)

    Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

    2017-08-01

    Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.

  2. Hierarchical data security in a Query-By-Example interface for a shared database.

    PubMed

    Taylor, Merwyn

    2002-06-01

    Whenever a shared database resource, containing critical patient data, is created, protecting the contents of the database is a high priority goal. This goal can be achieved by developing a Query-By-Example (QBE) interface, designed to access a shared database, and embedding within the QBE a hierarchical security module that limits access to the data. The security module ensures that researchers working in one clinic do not get access to data from another clinic. The security can be based on a flexible taxonomy structure that allows ordinary users to access data from individual clinics and super users to access data from all clinics. All researchers submit queries through the same interface and the security module processes the taxonomy and user identifiers to limit access. Using this system, two different users with different access rights can submit the same query and get different results thus reducing the need to create different interfaces for different clinics and access rights.

  3. A cloud-based multimodality case file for mobile devices.

    PubMed

    Balkman, Jason D; Loehfelm, Thomas W

    2014-01-01

    Recent improvements in Web and mobile technology, along with the widespread use of handheld devices in radiology education, provide unique opportunities for creating scalable, universally accessible, portable image-rich radiology case files. A cloud database and a Web-based application for radiologic images were developed to create a mobile case file with reasonable usability, download performance, and image quality for teaching purposes. A total of 75 radiology cases related to breast, thoracic, gastrointestinal, musculoskeletal, and neuroimaging subspecialties were included in the database. Breast imaging cases are the focus of this article, as they best demonstrate handheld display capabilities across a wide variety of modalities. This case subset also illustrates methods for adapting radiologic content to cloud platforms and mobile devices. Readers will gain practical knowledge about storage and retrieval of cloud-based imaging data, an awareness of techniques used to adapt scrollable and high-resolution imaging content for the Web, and an appreciation for optimizing images for handheld devices. The evaluation of this software demonstrates the feasibility of adapting images from most imaging modalities to mobile devices, even in cases of full-field digital mammograms, where high resolution is required to represent subtle pathologic features. The cloud platform allows cases to be added and modified in real time by using only a standard Web browser with no application-specific software. Challenges remain in developing efficient ways to generate, modify, and upload radiologic and supplementary teaching content to this cloud-based platform. Online supplemental material is available for this article. ©RSNA, 2014.

  4. Modular biometric system

    NASA Astrophysics Data System (ADS)

    Hsu, Charles; Viazanko, Michael; O'Looney, Jimmy; Szu, Harold

    2009-04-01

    Modularity Biometric System (MBS) is an approach to support AiTR of the cooperated and/or non-cooperated standoff biometric in an area persistent surveillance. Advanced active and passive EOIR and RF sensor suite is not considered here. Neither will we consider the ROC, PD vs. FAR, versus the standoff POT in this paper. Our goal is to catch the "most wanted (MW)" two dozens, separately furthermore ad hoc woman MW class from man MW class, given their archrivals sparse front face data basis, by means of various new instantaneous input called probing faces. We present an advanced algorithm: mini-Max classifier, a sparse sample realization of Cramer-Rao Fisher bound of the Maximum Likelihood classifier that minimize the dispersions among the same woman classes and maximize the separation among different man-woman classes, based on the simple feature space of MIT Petland eigen-faces. The original aspect consists of a modular structured design approach at the system-level with multi-level architectures, multiple computing paradigms, and adaptable/evolvable techniques to allow for achieving a scalable structure in terms of biometric algorithms, identification quality, sensors, database complexity, database integration, and component heterogenity. MBS consist of a number of biometric technologies including fingerprints, vein maps, voice and face recognitions with innovative DSP algorithm, and their hardware implementations such as using Field Programmable Gate arrays (FPGAs). Biometric technologies and the composed modularity biometric system are significant for governmental agencies, enterprises, banks and all other organizations to protect people or control access to critical resources.

  5. Projecting 2D gene expression data into 3D and 4D space.

    PubMed

    Gerth, Victor E; Katsuyama, Kaori; Snyder, Kevin A; Bowes, Jeff B; Kitayama, Atsushi; Ueno, Naoto; Vize, Peter D

    2007-04-01

    Video games typically generate virtual 3D objects by texture mapping an image onto a 3D polygonal frame. The feeling of movement is then achieved by mathematically simulating camera movement relative to the polygonal frame. We have built customized scripts that adapt video game authoring software to texture mapping images of gene expression data onto b-spline based embryo models. This approach, known as UV mapping, associates two-dimensional (U and V) coordinates within images to the three dimensions (X, Y, and Z) of a b-spline model. B-spline model frameworks were built either from confocal data or de novo extracted from 2D images, once again using video game authoring approaches. This system was then used to build 3D models of 182 genes expressed in developing Xenopus embryos and to implement these in a web-accessible database. Models can be viewed via simple Internet browsers and utilize openGL hardware acceleration via a Shockwave plugin. Not only does this database display static data in a dynamic and scalable manner, the UV mapping system also serves as a method to align different images to a common framework, an approach that may make high-throughput automated comparisons of gene expression patterns possible. Finally, video game systems also have elegant methods for handling movement, allowing biomechanical algorithms to drive the animation of models. With further development, these biomechanical techniques offer practical methods for generating virtual embryos that recapitulate morphogenesis.

  6. TAMEE: data management and analysis for tissue microarrays.

    PubMed

    Thallinger, Gerhard G; Baumgartner, Kerstin; Pirklbauer, Martin; Uray, Martina; Pauritsch, Elke; Mehes, Gabor; Buck, Charles R; Zatloukal, Kurt; Trajanoski, Zlatko

    2007-03-07

    With the introduction of tissue microarrays (TMAs) researchers can investigate gene and protein expression in tissues on a high-throughput scale. TMAs generate a wealth of data calling for extended, high level data management. Enhanced data analysis and systematic data management are required for traceability and reproducibility of experiments and provision of results in a timely and reliable fashion. Robust and scalable applications have to be utilized, which allow secure data access, manipulation and evaluation for researchers from different laboratories. TAMEE (Tissue Array Management and Evaluation Environment) is a web-based database application for the management and analysis of data resulting from the production and application of TMAs. It facilitates storage of production and experimental parameters, of images generated throughout the TMA workflow, and of results from core evaluation. Database content consistency is achieved using structured classifications of parameters. This allows the extraction of high quality results for subsequent biologically-relevant data analyses. Tissue cores in the images of stained tissue sections are automatically located and extracted and can be evaluated using a set of predefined analysis algorithms. Additional evaluation algorithms can be easily integrated into the application via a plug-in interface. Downstream analysis of results is facilitated via a flexible query generator. We have developed an integrated system tailored to the specific needs of research projects using high density TMAs. It covers the complete workflow of TMA production, experimental use and subsequent analysis. The system is freely available for academic and non-profit institutions from http://genome.tugraz.at/Software/TAMEE.

  7. NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases

    PubMed Central

    Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor

    2013-01-01

    RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425

  8. An Open-source Toolbox for Analysing and Processing PhysioNet Databases in MATLAB and Octave.

    PubMed

    Silva, Ikaro; Moody, George B

    The WaveForm DataBase (WFDB) Toolbox for MATLAB/Octave enables integrated access to PhysioNet's software and databases. Using the WFDB Toolbox for MATLAB/Octave, users have access to over 50 physiological databases in PhysioNet. The toolbox provides access over 4 TB of biomedical signals including ECG, EEG, EMG, and PLETH. Additionally, most signals are accompanied by metadata such as medical annotations of clinical events: arrhythmias, sleep stages, seizures, hypotensive episodes, etc. Users of this toolbox should easily be able to reproduce, validate, and compare results published based on PhysioNet's software and databases.

  9. Learning Optimized Local Difference Binaries for Scalable Augmented Reality on Mobile Devices.

    PubMed

    Xin Yang; Kwang-Ting Cheng

    2014-06-01

    The efficiency, robustness and distinctiveness of a feature descriptor are critical to the user experience and scalability of a mobile augmented reality (AR) system. However, existing descriptors are either too computationally expensive to achieve real-time performance on a mobile device such as a smartphone or tablet, or not sufficiently robust and distinctive to identify correct matches from a large database. As a result, current mobile AR systems still only have limited capabilities, which greatly restrict their deployment in practice. In this paper, we propose a highly efficient, robust and distinctive binary descriptor, called Learning-based Local Difference Binary (LLDB). LLDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pairwise grid cells within the patch. To select an optimized set of grid cell pairs, we densely sample grid cells from an image patch and then leverage a modified AdaBoost algorithm to automatically extract a small set of critical ones with the goal of maximizing the Hamming distance between mismatches while minimizing it between matches. Experimental results demonstrate that LLDB is extremely fast to compute and to match against a large database due to its high robustness and distinctiveness. Compared to the state-of-the-art binary descriptors, primarily designed for speed, LLDB has similar efficiency for descriptor construction, while achieving a greater accuracy and faster matching speed when matching over a large database with 2.3M descriptors on mobile devices.

  10. A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ciangottini, D.; Balcas, J.; Mascheroni, M.

    AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses amore » NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.« less

  11. A comparison of different database technologies for the CMS AsyncStageOut transfer database

    NASA Astrophysics Data System (ADS)

    Ciangottini, D.; Balcas, J.; Mascheroni, M.; Rupeika, E. A.; Vaandering, E.; Riahi, H.; Silva, J. M. D.; Hernandez, J. M.; Belforte, S.; Ivanov, T. T.

    2017-10-01

    AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.

  12. Using linked administrative and disease-specific databases to study end-of-life care on a population level.

    PubMed

    Maetens, Arno; De Schreye, Robrecht; Faes, Kristof; Houttekier, Dirk; Deliens, Luc; Gielen, Birgit; De Gendt, Cindy; Lusyne, Patrick; Annemans, Lieven; Cohen, Joachim

    2016-10-18

    The use of full-population databases is under-explored to study the use, quality and costs of end-of-life care. Using the case of Belgium, we explored: (1) which full-population databases provide valid information about end-of-life care, (2) what procedures are there to use these databases, and (3) what is needed to integrate separate databases. Technical and privacy-related aspects of linking and accessing Belgian administrative databases and disease registries were assessed in cooperation with the database administrators and privacy commission bodies. For all relevant databases, we followed procedures in cooperation with database administrators to link the databases and to access the data. We identified several databases as fitting for end-of-life care research in Belgium: the InterMutualistic Agency's national registry of health care claims data, the Belgian Cancer Registry including data on incidence of cancer, and databases administrated by Statistics Belgium including data from the death certificate database, the socio-economic survey and fiscal data. To obtain access to the data, approval was required from all database administrators, supervisory bodies and two separate national privacy bodies. Two Trusted Third Parties linked the databases via a deterministic matching procedure using multiple encrypted social security numbers. In this article we describe how various routinely collected population-level databases and disease registries can be accessed and linked to study patterns in the use, quality and costs of end-of-life care in the full population and in specific diagnostic groups.

  13. Exploiting NASA's Cumulus Earth Science Cloud Archive with Services and Computation

    NASA Astrophysics Data System (ADS)

    Pilone, D.; Quinn, P.; Jazayeri, A.; Schuler, I.; Plofchan, P.; Baynes, K.; Ramachandran, R.

    2017-12-01

    NASA's Earth Observing System Data and Information System (EOSDIS) houses nearly 30PBs of critical Earth Science data and with upcoming missions is expected to balloon to between 200PBs-300PBs over the next seven years. In addition to the massive increase in data collected, researchers and application developers want more and faster access - enabling complex visualizations, long time-series analysis, and cross dataset research without needing to copy and manage massive amounts of data locally. NASA has started prototyping with commercial cloud providers to make this data available in elastic cloud compute environments, allowing application developers direct access to the massive EOSDIS holdings. In this talk we'll explain the principles behind the archive architecture and share our experience of dealing with large amounts of data with serverless architectures including AWS Lambda, the Elastic Container Service (ECS) for long running jobs, and why we dropped thousands of lines of code for AWS Step Functions. We'll discuss best practices and patterns for accessing and using data available in a shared object store (S3) and leveraging events and message passing for sophisticated and highly scalable processing and analysis workflows. Finally we'll share capabilities NASA and cloud services are making available on the archives to enable massively scalable analysis and computation in a variety of formats and tools.

  14. Database of Geoscientific References Through 2007 for Afghanistan, Version 2

    USGS Publications Warehouse

    Eppinger, Robert G.; Sipeki, Julianna; Scofield, M.L. Sco

    2007-01-01

    This report describes an accompanying database of geoscientific references for the country of Afghanistan. Included is an accompanying Microsoft? Access 2003 database of geoscientific references for the country of Afghanistan. The reference compilation is part of a larger joint study of Afghanistan's energy, mineral, and water resources, and geologic hazards, currently underway by the U.S. Geological Survey, the British Geological Survey, and the Afghanistan Geological Survey. The database includes both published (n = 2,462) and unpublished (n = 174) references compiled through September, 2007. The references comprise two separate tables in the Access database. The reference database includes a user-friendly, keyword-searchable, interface and only minimum knowledge of the use of Microsoft? Access is required.

  15. Querying and Computing with BioCyc Databases

    PubMed Central

    Krummenacker, Markus; Paley, Suzanne; Mueller, Lukas; Yan, Thomas; Karp, Peter D.

    2006-01-01

    Summary We describe multiple methods for accessing and querying the complex and integrated cellular data in the BioCyc family of databases: access through multiple file formats, access through Application Program Interfaces (APIs) for LISP, Perl and Java, and SQL access through the BioWarehouse relational database. Availability The Pathway Tools software and 20 BioCyc DBs in Tiers 1 and 2 are freely available to academic users; fees apply to some types of commercial use. For download instructions see http://BioCyc.org/download.shtml PMID:15961440

  16. Accessing Cloud Properties and Satellite Imagery: A tool for visualization and data mining

    NASA Astrophysics Data System (ADS)

    Chee, T.; Nguyen, L.; Minnis, P.; Spangenberg, D.; Palikonda, R.

    2016-12-01

    Providing public access to imagery of cloud macro and microphysical properties and the underlying satellite imagery is a key concern for the NASA Langley Research Center Cloud and Radiation Group. This work describes a tool and system that allows end users to easily browse cloud information and satellite imagery that is otherwise difficult to acquire and manipulate. The tool has two uses, one to visualize the data and the other to access the data directly. It uses a widely used access protocol, the Open Geospatial Consortium's Web Map and Processing Services, to encourage user to access the data we produce. Internally, we leverage our practical experience with large, scalable application practices to develop a system that has the largest potential for scalability as well as the ability to be deployed on the cloud. One goal of the tool is to provide a demonstration of the back end capability to end users so that they can use the dynamically generated imagery and data as an input to their own work flows or to set up data mining constraints. We build upon NASA Langley Cloud and Radiation Group's experience with making real-time and historical satellite cloud product information and satellite imagery accessible and easily searchable. Increasingly, information is used in a "mash-up" form where multiple sources of information are combined to add value to disparate but related information. In support of NASA strategic goals, our group aims to make as much cutting edge scientific knowledge, observations and products available to the citizen science, research and interested communities for these kinds of "mash-ups" as well as provide a means for automated systems to data mine our information. This tool and access method provides a valuable research tool to a wide audience both as a standalone research tool and also as an easily accessed data source that can easily be mined or used with existing tools.

  17. X-ray Photoelectron Spectroscopy Database (Version 4.1)

    National Institute of Standards and Technology Data Gateway

    SRD 20 X-ray Photoelectron Spectroscopy Database (Version 4.1) (Web, free access)   The NIST XPS Database gives access to energies of many photoelectron and Auger-electron spectral lines. The database contains over 22,000 line positions, chemical shifts, doublet splittings, and energy separations of photoelectron and Auger-electron lines.

  18. Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

    NASA Technical Reports Server (NTRS)

    Morgan, Philip E.

    2004-01-01

    This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

  19. Update on NASA Space Shuttle Earth Observations Photography on the laser videodisc for rapid image access

    NASA Technical Reports Server (NTRS)

    Lulla, Kamlesh

    1994-01-01

    There have been many significant improvements in the public access to the Space Shuttle Earth Observations Photography Database. New information is provided for the user community on the recently released videodisc of this database. Topics covered included the following: earlier attempts; our first laser videodisc in 1992; the new laser videodisc in 1994; and electronic database access.

  20. Joint Experimentation on Scalable Parallel Processors (JESPP)

    DTIC Science & Technology

    2006-04-01

    made use of local embedded relational databases, implemented using sqlite on each node of an SPP to execute queries and return results via an ad hoc ...rl.af.mil 12a. DISTRIBUTION / AVAILABILITY STATEENT APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. 12b. DISTRIBUTION CODE 13. ABSTRACT...Experimentation Directorate (J9) required expansion of its joint semi-automated forces (JSAF) code capabilities; including number of entities, behavior complexity

  1. Quadruple multi-wavelength conversion for access network scalability based on cross-phase modulation in an SOA-MZI

    NASA Astrophysics Data System (ADS)

    Ab-Rahman, Mohammad Syuhaimi; Swedan, Abdulhameed Almabrok

    2017-12-01

    The emergence of new services and data exchange applications has increased the demand for bandwidth among individuals and commercial business users at the access area. Thus, vendors of optical access networks should achieve a high-capacity system. This study demonstrates the performance of an integrated configuration of one to four multi-wavelength conversions at 10 Gb/s based on cross-phase modulation using semiconductor optical amplifier integrated with Mach-Zehnder interferometer. The Opti System simulation tool is used to simulate and demonstrate one to four wavelength conversions using one modulated wavelength and four probes of continuous wave sources. The wavelength converter processes are confirmed through investigation of the input and output characteristics, optical signal-to-noise ratio, conversion efficiency, and extinction ratio of new modulated channels after separation by demultiplexing. The outcomes of the proposed system using single channel indicate that the capacity can increase from 10 Gb/s to 50 Gb/s with a maximum number of access points increasing from 64 to 320 (each point with 156.25 Mb/s bandwidth). The splitting ratio of 1:16 provides each client with 625 Mb/s for the total number of 80 users. The Q-factor and bit error rate curves are investigated to confirm and validate the modified scheme and prove the system performance of the full topology of 25 km with 1/64 splitter. The outcomes are within the acceptable range to provide the system scalability.

  2. Development, deployment and operations of ATLAS databases

    NASA Astrophysics Data System (ADS)

    Vaniachine, A. V.; Schmitt, J. G. v. d.

    2008-07-01

    In preparation for ATLAS data taking, a coordinated shift from development towards operations has occurred in ATLAS database activities. In addition to development and commissioning activities in databases, ATLAS is active in the development and deployment (in collaboration with the WLCG 3D project) of the tools that allow the worldwide distribution and installation of databases and related datasets, as well as the actual operation of this system on ATLAS multi-grid infrastructure. We describe development and commissioning of major ATLAS database applications for online and offline. We present the first scalability test results and ramp-up schedule over the initial LHC years of operations towards the nominal year of ATLAS running, when the database storage volumes are expected to reach 6.1 TB for the Tag DB and 1.0 TB for the Conditions DB. ATLAS database applications require robust operational infrastructure for data replication between online and offline at Tier-0, and for the distribution of the offline data to Tier-1 and Tier-2 computing centers. We describe ATLAS experience with Oracle Streams and other technologies for coordinated replication of databases in the framework of the WLCG 3D services.

  3. Security middleware infrastructure for DICOM images in health information systems.

    PubMed

    Kallepalli, Vijay N V; Ehikioya, Sylvanus A; Camorlinga, Sergio; Rueda, Jose A

    2003-12-01

    In health care, it is mandatory to maintain the privacy and confidentiality of medical data. To achieve this, a fine-grained access control and an access log for accessing medical images are two important aspects that need to be considered in health care systems. Fine-grained access control provides access to medical data only to authorized persons based on priority, location, and content. A log captures each attempt to access medical data. This article describes an overall middleware infrastructure required for secure access to Digital Imaging and Communication in Medicine (DICOM) images, with an emphasis on access control and log maintenance. We introduce a hybrid access control model that combines the properties of two existing models. A trust relationship between hospitals is used to make the hybrid access control model scalable across hospitals. We also discuss events that have to be logged and where the log has to be maintained. A prototype of security middleware infrastructure is implemented.

  4. A customizable, scalable scheduling and reporting system.

    PubMed

    Wood, Jody L; Whitman, Beverly J; Mackley, Lisa A; Armstrong, Robert; Shotto, Robert T

    2014-06-01

    Scheduling is essential for running a facility smoothly and for summarizing activities in use reports. The Penn State Hershey Clinical Simulation Center has developed a scheduling interface that uses off-the-shelf components, with customizations that adapt to each institution's data collection and reporting needs. The system is designed using programs within the Microsoft Office 2010 suite. Outlook provides the scheduling component, while the reporting is performed using Access or Excel. An account with a calendar is created for the main schedule, with separate resource accounts created for each room within the center. The Outlook appointment form's 2 default tabs are used, in addition to a customized third tab. The data are then copied from the calendar into either a database table or a spreadsheet, where the reports are generated.Incorporating this system into an institution-wide structure allows integration of personnel lists and potentially enables all users to check the schedule from their desktop. Outlook also has a Web-based application for viewing the basic schedule from outside the institution, although customized data cannot be accessed. The scheduling and reporting functions have been used for a year at the Penn State Hershey Clinical Simulation Center. The schedule has increased workflow efficiency, improved the quality of recorded information, and provided more accurate reporting. The Penn State Hershey Clinical Simulation Center's scheduling and reporting system can be adapted easily to most simulation centers and can expand and change to meet future growth with little or no expense to the center.

  5. Integrated Management and Visualization of Electronic Tag Data with Tagbase

    PubMed Central

    Lam, Chi Hin; Tsontos, Vardis M.

    2011-01-01

    Electronic tags have been used widely for more than a decade in studies of diverse marine species. However, despite significant investment in tagging programs and hardware, data management aspects have received insufficient attention, leaving researchers without a comprehensive toolset to manage their data easily. The growing volume of these data holdings, the large diversity of tag types and data formats, and the general lack of data management resources are not only complicating integration and synthesis of electronic tagging data in support of resource management applications but potentially threatening the integrity and longer-term access to these valuable datasets. To address this critical gap, Tagbase has been developed as a well-rounded, yet accessible data management solution for electronic tagging applications. It is based on a unified relational model that accommodates a suite of manufacturer tag data formats in addition to deployment metadata and reprocessed geopositions. Tagbase includes an integrated set of tools for importing tag datasets into the system effortlessly, and provides reporting utilities to interactively view standard outputs in graphical and tabular form. Data from the system can also be easily exported or dynamically coupled to GIS and other analysis packages. Tagbase is scalable and has been ported to a range of database management systems to support the needs of the tagging community, from individual investigators to large scale tagging programs. Tagbase represents a mature initiative with users at several institutions involved in marine electronic tagging research. PMID:21750734

  6. Heterogeneous distributed query processing: The DAVID system

    NASA Technical Reports Server (NTRS)

    Jacobs, Barry E.

    1985-01-01

    The objective of the Distributed Access View Integrated Database (DAVID) project is the development of an easy to use computer system with which NASA scientists, engineers and administrators can uniformly access distributed heterogeneous databases. Basically, DAVID will be a database management system that sits alongside already existing database and file management systems. Its function is to enable users to access the data in other languages and file systems without having to learn the data manipulation languages. Given here is an outline of a talk on the DAVID project and several charts.

  7. An Open and Scalable Learning Infrastructure for Food Safety

    ERIC Educational Resources Information Center

    Manouselis, Nikos; Thanopoulos, Charalampos; Vignare, Karen; Geith, Christine

    2013-01-01

    In the last several years, a variety of approaches and tools have been developed for giving access to open educational resources (OER) related to food safety, security, and food standards, as well to various targeted audiences (e.g., farmers, agronomists). The aim of this paper is to present a technology infrastructure currently in demonstration…

  8. Coordinating Decentralized Learning and Conflict Resolution across Agent Boundaries

    ERIC Educational Resources Information Center

    Cheng, Shanjun

    2012-01-01

    It is crucial for embedded systems to adapt to the dynamics of open environments. This adaptation process becomes especially challenging in the context of multiagent systems because of scalability, partial information accessibility and complex interaction of agents. It is a challenge for agents to learn good policies, when they need to plan and…

  9. Parent Adoption of School Communications Technology: A 12-School Experiment of Default Enrollment Policies

    ERIC Educational Resources Information Center

    Bergman, Peter; Rogers, Todd

    2016-01-01

    Providing parents access to their child's grades, missing assignment information, or personalized teacher suggestions has been shown to improve parental engagement and increase student achievement for low-income, minority students. This project aims to test an automated, scalable text-message system designed to improve outcomes for low-income…

  10. Three-Dimensional Space to Assess Cloud Interoperability

    DTIC Science & Technology

    2013-03-01

    12 1. Portability and Mobility ...collection of network-enabled services that guarantees to provide a scalable, easy accessible, reliable, and personalized computing infrastructure , based on...are used in research to describe cloud models, such as SaaS (Software as a Service), PaaS (Platform as a service), IaaS ( Infrastructure as a Service

  11. Evidence generation from healthcare databases: recommendations for managing change.

    PubMed

    Bourke, Alison; Bate, Andrew; Sauer, Brian C; Brown, Jeffrey S; Hall, Gillian C

    2016-07-01

    There is an increasing reliance on databases of healthcare records for pharmacoepidemiology and other medical research, and such resources are often accessed over a long period of time so it is vital to consider the impact of changes in data, access methodology and the environment. The authors discuss change in communication and management, and provide a checklist of issues to consider for both database providers and users. The scope of the paper is database research, and changes are considered in relation to the three main components of database research: the data content itself, how it is accessed, and the support and tools needed to use the database. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  12. The NOAO Data Lab PHAT Photometry Database

    NASA Astrophysics Data System (ADS)

    Olsen, Knut; Williams, Ben; Fitzpatrick, Michael; PHAT Team

    2018-01-01

    We present a database containing both the combined photometric object catalog and the single epoch measurements from the Panchromatic Hubble Andromeda Treasury (PHAT). This database is hosted by the NOAO Data Lab (http://datalab.noao.edu), and as such exposes a number of data services to the PHAT photometry, including access through a Table Access Protocol (TAP) service, direct PostgreSQL queries, web-based and programmatic query interfaces, remote storage space for personal database tables and files, and a JupyterHub-based Notebook analysis environment, as well as image access through a Simple Image Access (SIA) service. We show how the Data Lab database and Jupyter Notebook environment allow for straightforward and efficient analyses of PHAT catalog data, including maps of object density, depth, and color, extraction of light curves of variable objects, and proper motion exploration.

  13. High-performance metadata indexing and search in petascale data storage systems

    NASA Astrophysics Data System (ADS)

    Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.

    2008-07-01

    Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.

  14. Active in-database processing to support ambient assisted living systems.

    PubMed

    de Morais, Wagner O; Lundström, Jens; Wickström, Nicholas

    2014-08-12

    As an alternative to the existing software architectures that underpin the development of smart homes and ambient assisted living (AAL) systems, this work presents a database-centric architecture that takes advantage of active databases and in-database processing. Current platforms supporting AAL systems use database management systems (DBMSs) exclusively for data storage. Active databases employ database triggers to detect and react to events taking place inside or outside of the database. DBMSs can be extended with stored procedures and functions that enable in-database processing. This means that the data processing is integrated and performed within the DBMS. The feasibility and flexibility of the proposed approach were demonstrated with the implementation of three distinct AAL services. The active database was used to detect bed-exits and to discover common room transitions and deviations during the night. In-database machine learning methods were used to model early night behaviors. Consequently, active in-database processing avoids transferring sensitive data outside the database, and this improves performance, security and privacy. Furthermore, centralizing the computation into the DBMS facilitates code reuse, adaptation and maintenance. These are important system properties that take into account the evolving heterogeneity of users, their needs and the devices that are characteristic of smart homes and AAL systems. Therefore, DBMSs can provide capabilities to address requirements for scalability, security, privacy, dependability and personalization in applications of smart environments in healthcare.

  15. Active In-Database Processing to Support Ambient Assisted Living Systems

    PubMed Central

    de Morais, Wagner O.; Lundström, Jens; Wickström, Nicholas

    2014-01-01

    As an alternative to the existing software architectures that underpin the development of smart homes and ambient assisted living (AAL) systems, this work presents a database-centric architecture that takes advantage of active databases and in-database processing. Current platforms supporting AAL systems use database management systems (DBMSs) exclusively for data storage. Active databases employ database triggers to detect and react to events taking place inside or outside of the database. DBMSs can be extended with stored procedures and functions that enable in-database processing. This means that the data processing is integrated and performed within the DBMS. The feasibility and flexibility of the proposed approach were demonstrated with the implementation of three distinct AAL services. The active database was used to detect bed-exits and to discover common room transitions and deviations during the night. In-database machine learning methods were used to model early night behaviors. Consequently, active in-database processing avoids transferring sensitive data outside the database, and this improves performance, security and privacy. Furthermore, centralizing the computation into the DBMS facilitates code reuse, adaptation and maintenance. These are important system properties that take into account the evolving heterogeneity of users, their needs and the devices that are characteristic of smart homes and AAL systems. Therefore, DBMSs can provide capabilities to address requirements for scalability, security, privacy, dependability and personalization in applications of smart environments in healthcare. PMID:25120164

  16. A Model Based Mars Climate Database for the Mission Design

    NASA Technical Reports Server (NTRS)

    2005-01-01

    A viewgraph presentation on a model based climate database is shown. The topics include: 1) Why a model based climate database?; 2) Mars Climate Database v3.1 Who uses it ? (approx. 60 users!); 3) The new Mars Climate database MCD v4.0; 4) MCD v4.0: what's new ? 5) Simulation of Water ice clouds; 6) Simulation of Water ice cycle; 7) A new tool for surface pressure prediction; 8) Acces to the database MCD 4.0; 9) How to access the database; and 10) New web access

  17. DSSTOX WEBSITE LAUNCH: IMPROVING PUBLIC ACCESS TO DATABASES FOR BUILDING STRUCTURE-TOXICITY PREDICTION MODELS

    EPA Science Inventory

    DSSTox Website Launch: Improving Public Access to Databases for Building Structure-Toxicity Prediction Models
    Ann M. Richard
    US Environmental Protection Agency, Research Triangle Park, NC, USA

    Distributed: Decentralized set of standardized, field-delimited databases,...

  18. Software Engineering Laboratory (SEL) database organization and user's guide, revision 2

    NASA Technical Reports Server (NTRS)

    Morusiewicz, Linda; Bristow, John

    1992-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base table is described. In addition, techniques for accessing the database through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL) are discussed.

  19. Software Engineering Laboratory (SEL) database organization and user's guide

    NASA Technical Reports Server (NTRS)

    So, Maria; Heller, Gerard; Steinberg, Sandra; Spiegel, Douglas

    1989-01-01

    The organization of the Software Engineering Laboratory (SEL) database is presented. Included are definitions and detailed descriptions of the database tables and views, the SEL data, and system support data. The mapping from the SEL and system support data to the base tables is described. In addition, techniques for accessing the database, through the Database Access Manager for the SEL (DAMSEL) system and via the ORACLE structured query language (SQL), are discussed.

  20. AsyncStageOut: Distributed user data management for CMS Analysis

    NASA Astrophysics Data System (ADS)

    Riahi, H.; Wildish, T.; Ciangottini, D.; Hernández, J. M.; Andreeva, J.; Balcas, J.; Karavakis, E.; Mascheroni, M.; Tanasijczuk, A. J.; Vaandering, E. W.

    2015-12-01

    AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirements for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.

  1. AsyncStageOut: Distributed User Data Management for CMS Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riahi, H.; Wildish, T.; Ciangottini, D.

    2015-12-23

    AsyncStageOut (ASO) is a new component of the distributed data analysis system of CMS, CRAB, designed for managing users' data. It addresses a major weakness of the previous model, namely that mass storage of output data was part of the job execution resulting in inefficient use of job slots and an unacceptable failure rate at the end of the jobs. ASO foresees the management of up to 400k files per day of various sizes, spread worldwide across more than 60 sites. It must handle up to 1000 individual users per month, and work with minimal delay. This creates challenging requirementsmore » for system scalability, performance and monitoring. ASO uses FTS to schedule and execute the transfers between the storage elements of the source and destination sites. It has evolved from a limited prototype to a highly adaptable service, which manages and monitors the user file placement and bookkeeping. To ensure system scalability and data monitoring, it employs new technologies such as a NoSQL database and re-uses existing components of PhEDEx and the FTS Dashboard. We present the asynchronous stage-out strategy and the architecture of the solution we implemented to deal with those issues and challenges. The deployment model for the high availability and scalability of the service is discussed. The performance of the system during the commissioning and the first phase of production are also shown, along with results from simulations designed to explore the limits of scalability.« less

  2. Full-Text Linking: Affiliated versus Nonaffiliated Access in a Free Database.

    ERIC Educational Resources Information Center

    Grogg, Jill E.; Andreadis, Debra K.; Kirk, Rachel A.

    2002-01-01

    Presents a comparison of access to full-text articles from a free bibliographic database (PubSCIENCE) for affiliated and unaffiliated users. Found that affiliated users had access to more full-text articles than unaffiliated users had, and that both types of users could increase their level of access through additional searching and greater…

  3. A User-Friendly, Keyword-Searchable Database of Geoscientific References Through 2007 for Afghanistan

    USGS Publications Warehouse

    Eppinger, Robert G.; Sipeki, Julianna; Scofield, M.L. Sco

    2008-01-01

    This report includes a document and accompanying Microsoft Access 2003 database of geoscientific references for the country of Afghanistan. The reference compilation is part of a larger joint study of Afghanistan?s energy, mineral, and water resources, and geologic hazards currently underway by the U.S. Geological Survey, the British Geological Survey, and the Afghanistan Geological Survey. The database includes both published (n = 2,489) and unpublished (n = 176) references compiled through calendar year 2007. The references comprise two separate tables in the Access database. The reference database includes a user-friendly, keyword-searchable interface and only minimum knowledge of the use of Microsoft Access is required.

  4. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Forslund, D.W.; Cook, J.L.

    One of the most powerful tools available for telemedicine is a multimedia medical record accessible over a wide area and simultaneously editable by multiple physicians. The ability to do this through an intuitive interface linking multiple distributed data repositories while maintaining full data integrity is a fundamental enabling technology in healthcare. The authors discuss the role of distributed object technology using Java and CORBA in providing this capability including an example of such a system (TeleMed) which can be accessed through the World Wide Web. Issues of security, scalability, data integrity, and usability are emphasized.

  5. The role of CORBA in enabling telemedicine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Forslund, D.W.

    1997-07-01

    One of the most powerful tools available for telemedicine is a multimedia medical record accessible over a wide area and simultaneously editable by multiple physicians. The ability to do this through an intuitive interface linking multiple distributed data repositories while maintaining full data integrity is a fundamental enabling technology in healthcare. The author discusses the role of distributed object technology using CORBA in providing this capability including an example of such a system (TeleMed) which can be accessed through the World Wide Web. Issues of security, scalability, data integrity, and useability are emphasized.

  6. Second-Tier Database for Ecosystem Focus, 2002-2003 Annual Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van Holmes, Chris; Muongchanh, Christine; Anderson, James J.

    2003-11-01

    The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities. The Second-Tier Database known as Data Access in Realtime (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures helpful in evaluating the condition of Columbia Basin salmonid stocks.

  7. A service-oriented data access control model

    NASA Astrophysics Data System (ADS)

    Meng, Wei; Li, Fengmin; Pan, Juchen; Song, Song; Bian, Jiali

    2017-01-01

    The development of mobile computing, cloud computing and distributed computing meets the growing individual service needs. Facing with complex application system, it's an urgent problem to ensure real-time, dynamic, and fine-grained data access control. By analyzing common data access control models, on the basis of mandatory access control model, the paper proposes a service-oriented access control model. By regarding system services as subject and data of databases as object, the model defines access levels and access identification of subject and object, and ensures system services securely to access databases.

  8. ATAC-see reveals the accessible genome by transposase-mediated imaging and sequencing.

    PubMed

    Chen, Xingqi; Shen, Ying; Draper, Will; Buenrostro, Jason D; Litzenburger, Ulrike; Cho, Seung Woo; Satpathy, Ansuman T; Carter, Ava C; Ghosh, Rajarshi P; East-Seletsky, Alexandra; Doudna, Jennifer A; Greenleaf, William J; Liphardt, Jan T; Chang, Howard Y

    2016-12-01

    Spatial organization of the genome plays a central role in gene expression, DNA replication, and repair. But current epigenomic approaches largely map DNA regulatory elements outside of the native context of the nucleus. Here we report assay of transposase-accessible chromatin with visualization (ATAC-see), a transposase-mediated imaging technology that employs direct imaging of the accessible genome in situ, cell sorting, and deep sequencing to reveal the identity of the imaged elements. ATAC-see revealed the cell-type-specific spatial organization of the accessible genome and the coordinated process of neutrophil chromatin extrusion, termed NETosis. Integration of ATAC-see with flow cytometry enables automated quantitation and prospective cell isolation as a function of chromatin accessibility, and it reveals a cell-cycle dependence of chromatin accessibility that is especially dynamic in G1 phase. The integration of imaging and epigenomics provides a general and scalable approach for deciphering the spatiotemporal architecture of gene control.

  9. The unified database for the fixed target experiment BM@N

    NASA Astrophysics Data System (ADS)

    Gertsenberger, K. V.

    2016-09-01

    The article describes the developed database designed as comprehensive data storage of the fixed target experiment BM@N [1] at Joint Institute for Nuclear Research (JINR) in Dubna. The structure and purposes of the BM@N facility will be briefly presented. The scheme of the unified database and its parameters will be described in detail. The use of the BM@N database implemented on the PostgreSQL database management system (DBMS) allows one to provide user access to the actual information of the experiment. Also the interfaces developed for the access to the database will be presented. One was implemented as the set of C++ classes to access the data without SQL statements, the other-Web-interface being available on the Web page of the BM@N experiment.

  10. Access to digital library databases in higher education: design problems and infrastructural gaps.

    PubMed

    Oswal, Sushil K

    2014-01-01

    After defining accessibility and usability, the author offers a broad survey of the research studies on digital content databases which have thus far primarily depended on data drawn from studies conducted by sighted researchers with non-disabled users employing screen readers and low vision devices. This article aims at producing a detailed description of the difficulties confronted by blind screen reader users with online library databases which now hold most of the academic, peer-reviewed journal and periodical content essential for research and teaching in higher education. The approach taken here is borrowed from descriptive ethnography which allows the author to create a complete picture of the accessibility and usability problems faced by an experienced academic user of digital library databases and screen readers. The author provides a detailed analysis of the different aspects of accessibility issues in digital databases under several headers with a special focus on full-text PDF files. The author emphasizes that long-term studies with actual, blind screen reader users employing both qualitative and computerized research tools can yield meaningful data for the designers and developers to improve these databases to a level that they begin to provide an equal access to the blind.

  11. Atomic Spectra Database (ASD)

    National Institute of Standards and Technology Data Gateway

    SRD 78 NIST Atomic Spectra Database (ASD) (Web, free access)   This database provides access and search capability for NIST critically evaluated data on atomic energy levels, wavelengths, and transition probabilities that are reasonably up-to-date. The NIST Atomic Spectroscopy Data Center has carried out these critical compilations.

  12. Integrating geo web services for a user driven exploratory analysis

    NASA Astrophysics Data System (ADS)

    Moncrieff, Simon; Turdukulov, Ulanbek; Gulland, Elizabeth-Kate

    2016-04-01

    In data exploration, several online data sources may need to be dynamically aggregated or summarised over spatial region, time interval, or set of attributes. With respect to thematic data, web services are mainly used to present results leading to a supplier driven service model limiting the exploration of the data. In this paper we propose a user need driven service model based on geo web processing services. The aim of the framework is to provide a method for the scalable and interactive access to various geographic data sources on the web. The architecture combines a data query, processing technique and visualisation methodology to rapidly integrate and visually summarise properties of a dataset. We illustrate the environment on a health related use case that derives Age Standardised Rate - a dynamic index that needs integration of the existing interoperable web services of demographic data in conjunction with standalone non-spatial secure database servers used in health research. Although the example is specific to the health field, the architecture and the proposed approach are relevant and applicable to other fields that require integration and visualisation of geo datasets from various web services and thus, we believe is generic in its approach.

  13. Interoperability of GADU in using heterogeneous Grid resources for bioinformatics applications.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sulakhe, D.; Rodriguez, A.; Wilde, M.

    2008-03-01

    Bioinformatics tools used for efficient and computationally intensive analysis of genetic sequences require large-scale computational resources to accommodate the growing data. Grid computational resources such as the Open Science Grid and TeraGrid have proved useful for scientific discovery. The genome analysis and database update system (GADU) is a high-throughput computational system developed to automate the steps involved in accessing the Grid resources for running bioinformatics applications. This paper describes the requirements for building an automated scalable system such as GADU that can run jobs on different Grids. The paper describes the resource-independent configuration of GADU using the Pegasus-based virtual datamore » system that makes high-throughput computational tools interoperable on heterogeneous Grid resources. The paper also highlights the features implemented to make GADU a gateway to computationally intensive bioinformatics applications on the Grid. The paper will not go into the details of problems involved or the lessons learned in using individual Grid resources as it has already been published in our paper on genome analysis research environment (GNARE) and will focus primarily on the architecture that makes GADU resource independent and interoperable across heterogeneous Grid resources.« less

  14. A scalable architecture for extracting, aligning, linking, and visualizing multi-Int data

    NASA Astrophysics Data System (ADS)

    Knoblock, Craig A.; Szekely, Pedro

    2015-05-01

    An analyst today has a tremendous amount of data available, but each of the various data sources typically exists in their own silos, so an analyst has limited ability to see an integrated view of the data and has little or no access to contextual information that could help in understanding the data. We have developed the Domain-Insight Graph (DIG) system, an innovative architecture for extracting, aligning, linking, and visualizing massive amounts of domain-specific content from unstructured sources. Under the DARPA Memex program we have already successfully applied this architecture to multiple application domains, including the enormous international problem of human trafficking, where we extracted, aligned and linked data from 50 million online Web pages. DIG builds on our Karma data integration toolkit, which makes it easy to rapidly integrate structured data from a variety of sources, including databases, spreadsheets, XML, JSON, and Web services. The ability to integrate Web services allows Karma to pull in live data from the various social media sites, such as Twitter, Instagram, and OpenStreetMaps. DIG then indexes the integrated data and provides an easy to use interface for query, visualization, and analysis.

  15. Scalable population estimates using spatial-stream-network (SSN) models, fish density surveys, and national geospatial database frameworks for streams

    Treesearch

    Daniel J. Isaak; Jay M. Ver Hoef; Erin E. Peterson; Dona L. Horan; David E. Nagel

    2017-01-01

    Population size estimates for stream fishes are important for conservation and management, but sampling costs limit the extent of most estimates to small portions of river networks that encompass 100s–10 000s of linear kilometres. However, the advent of large fish density data sets, spatial-stream-network (SSN) models that benefit from nonindependence among samples,...

  16. Secure transport and adaptation of MC-EZBC video utilizing H.264-based transport protocols☆

    PubMed Central

    Hellwagner, Hermann; Hofbauer, Heinz; Kuschnig, Robert; Stütz, Thomas; Uhl, Andreas

    2012-01-01

    Universal Multimedia Access (UMA) calls for solutions where content is created once and subsequently adapted to given requirements. With regard to UMA and scalability, which is required often due to a wide variety of end clients, the best suited codecs are wavelet based (like the MC-EZBC) due to their inherent high number of scaling options. However, most transport technologies for delivering videos to end clients are targeted toward the H.264/AVC standard or, if scalability is required, the H.264/SVC. In this paper we will introduce a mapping of the MC-EZBC bitstream to existing H.264/SVC based streaming and scaling protocols. This enables the use of highly scalable wavelet based codecs on the one hand and the utilization of already existing network technologies without accruing high implementation costs on the other hand. Furthermore, we will evaluate different scaling options in order to choose the best option for given requirements. Additionally, we will evaluate different encryption options based on transport and bitstream encryption for use cases where digital rights management is required. PMID:26869746

  17. Electronic Reference Library: Silverplatter's Database Networking Solution.

    ERIC Educational Resources Information Center

    Millea, Megan

    Silverplatter's Electronic Reference Library (ERL) provides wide area network access to its databases using TCP/IP communications and client-server architecture. ERL has two main components: The ERL clients (retrieval interface) and the ERL server (search engines). ERL clients provide patrons with seamless access to multiple databases on multiple…

  18. Development and applications of the EntomopathogenID MLSA database for use in agricultural systems

    USDA-ARS?s Scientific Manuscript database

    The current study reports the development and application of a publicly accessible, curated database of Hypocrealean entomopathogenic fungi sequence data. The goal was to provide a platform for users to easily access sequence data from reference strains. The database can be used to accurately identi...

  19. 48 CFR 504.602-71 - Federal Procurement Data System-Public access to data.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration awarded a contract for creation and operation of the Federal Procurement Data System (FPDS) database. That database includes information reported by departments and agencies as required by Federal Acquisition...

  20. 48 CFR 504.602-71 - Federal Procurement Data System-Public access to data.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration awarded a contract for creation and operation of the Federal Procurement Data System (FPDS) database. That database includes information reported by departments and agencies as required by Federal Acquisition...

  1. An SQL query generator for CLIPS

    NASA Technical Reports Server (NTRS)

    Snyder, James; Chirica, Laurian

    1990-01-01

    As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.

  2. Web client and ODBC access to legacy database information: a low cost approach.

    PubMed Central

    Sanders, N. W.; Mann, N. H.; Spengler, D. M.

    1997-01-01

    A new method has been developed for the Department of Orthopaedics of Vanderbilt University Medical Center to access departmental clinical data. Previously this data was stored only in the medical center's mainframe DB2 database, it is now additionally stored in a departmental SQL database. Access to this data is available via any ODBC compliant front-end or a web client. With a small budget and no full time staff, we were able to give our department on-line access to many years worth of patient data that was previously inaccessible. PMID:9357735

  3. High-Performance Secure Database Access Technologies for HEP Grids

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matthew Vranicar; John Weicher

    2006-04-17

    The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less

  4. Scalable Video Transmission Over Multi-Rate Multiple Access Channels

    DTIC Science & Technology

    2007-06-01

    Rate - compatible punctured convolutional codes (RCPC codes ) and their ap- plications,” IEEE...source encoded using the MPEG-4 video codec. The source encoded bitstream is then channel encoded with Rate Compatible Punctured Convolutional (RCPC...Clark, and J. M. Geist, “ Punctured convolutional codes or rate (n-1)/n and simplified maximum likelihood decoding,” IEEE Transactions on

  5. A Scalable Model for Channel Access Protocols in Multihop Ad Hoc Networks

    DTIC Science & Technology

    2004-01-01

    among the nodes. Gitman [28] published what is arguably the first paper that actually dealt with a multihop system. Gitman con- sidered a two-hop...Wireless Information Networks, vol. 9, no. 3, pp. 191–199, July 2002. [28] I. Gitman , “On the capacity of slotted ALOHA networks and some desigh

  6. Factors Associated with Choice of Web or Print Intervention Materials in the Healthy Directions 2 Study

    ERIC Educational Resources Information Center

    Greaney, Mary L.; Puleo, Elaine; Bennett, Gary G.; Haines, Jess; Viswanath, K.; Gillman, Matthew W.; Sprunck-Harrild, Kim; Coeling, Molly; Rusinak, Donna; Emmons, Karen M.

    2014-01-01

    Background: Many U.S. adults have multiple behavioral risk factors, and effective, scalable interventions are needed to promote population-level health. In the health care setting, interventions are often provided in print, although accessible to nearly everyone, are brief (e.g., pamphlets), are not interactive, and can require some logistics…

  7. Scalable Earth-observation Analytics for Geoscientists: Spacetime Extensions to the Array Database SciDB

    NASA Astrophysics Data System (ADS)

    Appel, Marius; Lahn, Florian; Pebesma, Edzer; Buytaert, Wouter; Moulds, Simon

    2016-04-01

    Today's amount of freely available data requires scientists to spend large parts of their work on data management. This is especially true in environmental sciences when working with large remote sensing datasets, such as obtained from earth-observation satellites like the Sentinel fleet. Many frameworks like SpatialHadoop or Apache Spark address the scalability but target programmers rather than data analysts, and are not dedicated to imagery or array data. In this work, we use the open-source data management and analytics system SciDB to bring large earth-observation datasets closer to analysts. Its underlying data representation as multidimensional arrays fits naturally to earth-observation datasets, distributes storage and computational load over multiple instances by multidimensional chunking, and also enables efficient time-series based analyses, which is usually difficult using file- or tile-based approaches. Existing interfaces to R and Python furthermore allow for scalable analytics with relatively little learning effort. However, interfacing SciDB and file-based earth-observation datasets that come as tiled temporal snapshots requires a lot of manual bookkeeping during ingestion, and SciDB natively only supports loading data from CSV-like and custom binary formatted files, which currently limits its practical use in earth-observation analytics. To make it easier to work with large multi-temporal datasets in SciDB, we developed software tools that enrich SciDB with earth observation metadata and allow working with commonly used file formats: (i) the SciDB extension library scidb4geo simplifies working with spatiotemporal arrays by adding relevant metadata to the database and (ii) the Geospatial Data Abstraction Library (GDAL) driver implementation scidb4gdal allows to ingest and export remote sensing imagery from and to a large number of file formats. Using added metadata on temporal resolution and coverage, the GDAL driver supports time-based ingestion of imagery to existing multi-temporal SciDB arrays. While our SciDB plugin works directly in the database, the GDAL driver has been specifically developed using a minimum amount of external dependencies (i.e. CURL). Source code for both tools is available from github [1]. We present these tools in a case-study that demonstrates the ingestion of multi-temporal tiled earth-observation data to SciDB, followed by a time-series analysis using R and SciDBR. Through the exclusive use of open-source software, our approach supports reproducibility in scalable large-scale earth-observation analytics. In the future, these tools can be used in an automated way to let scientists only work on ready-to-use SciDB arrays to significantly reduce the data management workload for domain scientists. [1] https://github.com/mappl/scidb4geo} and \\url{https://github.com/mappl/scidb4gdal

  8. Multi-views storage model and access methods of conversation history in converged IP messaging system

    NASA Astrophysics Data System (ADS)

    Lu, Meilian; Yang, Dong; Zhou, Xing

    2013-03-01

    Based on the analysis of the requirements of conversation history storage in CPM (Converged IP Messaging) system, a Multi-views storage model and access methods of conversation history are proposed. The storage model separates logical views from physical storage and divides the storage into system managed region and user managed region. It simultaneously supports conversation view, system pre-defined view and user-defined view of storage. The rationality and feasibility of multi-view presentation, the physical storage model and access methods are validated through the implemented prototype. It proves that, this proposal has good scalability, which will help to optimize the physical data storage structure and improve storage performance.

  9. Broadband and scalable mobile satellite communication system for future access networks

    NASA Astrophysics Data System (ADS)

    Ohata, Kohei; Kobayashi, Kiyoshi; Nakahira, Katsuya; Ueba, Masazumi

    2005-07-01

    Due to the recent market trends, NTT has begun research into next generation satellite communication systems, such as broadband and scalable mobile communication systems. One service application objective is to provide broadband Internet access for transportation systems, temporal broadband access networks and telemetries to remote areas. While these are niche markets the total amount of capacity should be significant. We set a 1-Gb/s total transmission capacity as our goal. Our key concern is the system cost, which means that the system should be unified system with diversified services and not tailored for each application. As satellites account for a large portion of the total system cost, we set the target satellite size as a small, one-ton class dry mass with a 2-kW class payload power. In addition to the payload power and weight, the mobile satellite's frequency band is extremely limited. Therefore, we need to develop innovative technologies that will reduce the weight and maximize spectrum and power efficiency. Another challenge is the need for the system to handle up to 50 dB and a wide data rate range of other applications. This paper describes the key communication system technologies; the frequency reuse strategy, multiplexing scheme, resource allocation scheme, and QoS management algorithm to ensure excellent spectrum efficiency and support a variety of services and quality requirements in the mobile environment.

  10. Streamlining Collaboration for the Gravitational-wave Astronomy Community

    NASA Astrophysics Data System (ADS)

    Koranda, S.

    2016-12-01

    In the morning hours of September 14, 2015 the LaserInterferometer Gravitational-wave Observatory (LIGO) directlydetected gravitational waves from inspiraling and coalescingblack holes, confirming a major prediction of AlbertEinstein's general theory of relativity and beginning the eraof gravitational-wave astronomy. With the LIGO detectors in the United States, the Virgo andGEO detectors in Europe, and the KAGRA detector in Japan thegravitational-wave astrononmy community is opening a newwindow on our Universe. Realizing the full science potentialof LIGO and the other interferometers requires globalcollaboration not only within the gravitational-wave astronomycommunity but also with the astronomers and astrophysicists acrossmultipe disciplines working to realize and leverage the powerof multi-messenger astronomy. Enabling thousands of researchers from around the world andacross multiple projects to efficiently collaborate, share,and analyze data and provide streamlined access to services,computing, and tools requires new and scalable approaches toidentity and access management (IAM). We will discuss LIGO'sIAM journey that began in 2007 and how today LIGO leveragesinternal identity federations like InCommon and eduGAIN toprovide scalable and managed access for the gravitational-waveastronomy community. We will discuss the steps both largeand small research organizations and projects take as theirIAM infrastructure matures from ad-hoc silos of independent services to fully integrated and federated services thatstreamline collaboration so that scientists can focus onresearch and not managing passwords.

  11. The Efficiency and the Scalability of an Explicit Operator on an IBM POWER4 System

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    We present an evaluation of the efficiency and the scalability of an explicit CFD operator on an IBM POWER4 system. The POWER4 architecture exhibits a common trend in HPC architectures: boosting CPU processing power by increasing the number of functional units, while hiding the latency of memory access by increasing the depth of the memory hierarchy. The overall machine performance depends on the ability of the caches-buses-fabric-memory to feed the functional units with the data to be processed. In this study we evaluate the efficiency and scalability of one explicit CFD operator on an IBM POWER4. This operator performs computations at the points of a Cartesian grid and involves a few dozen floating point numbers and on the order of 100 floating point operations per grid point. The computations in all grid points are independent. Specifically, we estimate the efficiency of the RHS operator (SP of NPB) on a single processor as the observed/peak performance ratio. Then we estimate the scalability of the operator on a single chip (2 CPUs), a single MCM (8 CPUs), 16 CPUs, and the whole machine (32 CPUs). Then we perform the same measurements for a chache-optimized version of the RHS operator. For our measurements we use the HPM (Hardware Performance Monitor) counters available on the POWER4. These counters allow us to analyze the obtained performance results.

  12. Understanding the patient perspective on research access to national health records databases for conduct of randomized registry trials.

    PubMed

    Avram, Robert; Marquis-Gravel, Guillaume; Simard, François; Pacheco, Christine; Couture, Étienne; Tremblay-Gravel, Maxime; Desplantie, Olivier; Malhamé, Isabelle; Bibas, Lior; Mansour, Samer; Parent, Marie-Claude; Farand, Paul; Harvey, Luc; Lessard, Marie-Gabrielle; Ly, Hung; Liu, Geoffrey; Hay, Annette E; Marc Jolicoeur, E

    2018-07-01

    Use of health administrative databases is proposed for screening and monitoring of participants in randomized registry trials. However, access to these databases raises privacy concerns. We assessed patient's preferences regarding use of personal information to link their research records with national health databases, as part of a hypothetical randomized registry trial. Cardiology patients were invited to complete an anonymous self-reported survey that ascertained preferences related to the concept of accessing government health databases for research, the type of personal identifiers to be shared and the type of follow-up preferred as participants in a hypothetical trial. A total of 590 responders completed the survey (90% response rate), the majority of which were Caucasians (90.4%), male (70.0%) with a median age of 65years (interquartile range, 8). The majority responders (80.3%) would grant researchers access to health administrative databases for screening and follow-up. To this end, responders endorsed the recording of their personal identifiers by researchers for future record linkage, including their name (90%), and health insurance number (83.9%), but fewer responders agreed with the recording of their social security number (61.4%, p<0.05 with date of birth as reference). Prior participation in a trial predicted agreement for granting researchers access to the administrative databases (OR: 1.69, 95% confidence interval: 1.03-2.90; p=0.04). The majority of Cardiology patients surveyed were supportive of use of their personal identifiers to access administrative health databases and conduct long-term monitoring in the context of a randomized registry trial. Copyright © 2018 Elsevier Ireland Ltd. All rights reserved.

  13. Performance analysis of data intensive cloud systems based on data management and replication: a survey

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malik, Saif Ur Rehman; Khan, Samee U.; Ewen, Sam J.

    2015-03-14

    As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobiledevices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amountmore » of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies. Moreover, the breakdown of both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.« less

  14. The PMDB Protein Model Database

    PubMed Central

    Castrignanò, Tiziana; De Meo, Paolo D'Onorio; Cozzetto, Domenico; Talamo, Ivano Giuseppe; Tramontano, Anna

    2006-01-01

    The Protein Model Database (PMDB) is a public resource aimed at storing manually built 3D models of proteins. The database is designed to provide access to models published in the scientific literature, together with validating experimental data. It is a relational database and it currently contains >74 000 models for ∼240 proteins. The system is accessible at and allows predictors to submit models along with related supporting evidence and users to download them through a simple and intuitive interface. Users can navigate in the database and retrieve models referring to the same target protein or to different regions of the same protein. Each model is assigned a unique identifier that allows interested users to directly access the data. PMID:16381873

  15. Pace: Privacy-Protection for Access Control Enforcement in P2P Networks

    NASA Astrophysics Data System (ADS)

    Sánchez-Artigas, Marc; García-López, Pedro

    In open environments such as peer-to-peer (P2P) systems, the decision to collaborate with multiple users — e.g., by granting access to a resource — is hard to achieve in practice due to extreme decentralization and the lack of trusted third parties. The literature contains a plethora of applications in which a scalable solution for distributed access control is crucial. This fact motivates us to propose a protocol to enforce access control, applicable to networks consisting entirely of untrusted nodes. The main feature of our protocol is that it protects both sensitive permissions and sensitive policies, and does not rely on any centralized authority. We analyze the efficiency (computational effort and communication overhead) as well as the security of our protocol.

  16. [The database server for the medical bibliography database at Charles University].

    PubMed

    Vejvalka, J; Rojíková, V; Ulrych, O; Vorísek, M

    1998-01-01

    In the medical community, bibliographic databases are widely accepted as a most important source of information both for theoretical and clinical disciplines. To improve access to medical bibliographic databases at Charles University, a database server (ERL by Silver Platter) was set up at the 2nd Faculty of Medicine in Prague. The server, accessible by Internet 24 hours/7 days, hosts now 14 years' MEDLINE and 10 years' EMBASE Paediatrics. Two different strategies are available for connecting to the server: a specialized client program that communicates over the Internet (suitable for professional searching) and a web-based access that requires no specialized software (except the WWW browser) on the client side. The server is now offered to academic community to host further databases, possibly subscribed by consortia whose individual members would not subscribe them by themselves.

  17. Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

    PubMed

    Kafkas, Şenay; Kim, Jee-Hyub; Pi, Xingjun; McEntyre, Johanna R

    2015-01-01

    In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases. Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771. Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.

  18. 17 CFR 162.3 - Affiliate marketing opt out and exceptions.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... places that information into a common database that the covered affiliate may access. (3) Service... maintains or accesses a common database that the covered affiliate may access) receives eligibility... the notice and opt-out provisions under other privacy rules under the FCRA, the GLB Act or the CEA. ...

  19. 17 CFR 162.3 - Affiliate marketing opt out and exceptions.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... places that information into a common database that the covered affiliate may access. (3) Service... maintains or accesses a common database that the covered affiliate may access) receives eligibility... the notice and opt-out provisions under other privacy rules under the FCRA, the GLB Act or the CEA. ...

  20. 17 CFR 162.3 - Affiliate marketing opt out and exceptions.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... places that information into a common database that the covered affiliate may access. (3) Service... maintains or accesses a common database that the covered affiliate may access) receives eligibility... the notice and opt-out provisions under other privacy rules under the FCRA, the GLB Act or the CEA. ...

  1. EROS Main Image File: A Picture Perfect Database for Landsat Imagery and Aerial Photography.

    ERIC Educational Resources Information Center

    Jack, Robert F.

    1984-01-01

    Describes Earth Resources Observation System online database, which provides access to computerized images of Earth obtained via satellite. Highlights include retrieval system and commands, types of images, search strategies, other online functions, and interpretation of accessions. Satellite information, sources and samples of accessions, and…

  2. Forest Stand Segmentation Using Airborne LIDAR Data and Very High Resolution Multispectral Imagery

    NASA Astrophysics Data System (ADS)

    Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet, Valérie; Hervieu, Alexandre

    2016-06-01

    Forest stands are the basic units for forest inventory and mapping. Stands are large forested areas (e.g., ≥ 2 ha) of homogeneous tree species composition. The accurate delineation of forest stands is usually performed by visual analysis of human operators on very high resolution (VHR) optical images. This work is highly time consuming and should be automated for scalability purposes. In this paper, a method based on the fusion of airborne laser scanning data (or lidar) and very high resolution multispectral imagery for automatic forest stand delineation and forest land-cover database update is proposed. The multispectral images give access to the tree species whereas 3D lidar point clouds provide geometric information on the trees. Therefore, multi-modal features are computed, both at pixel and object levels. The objects are individual trees extracted from lidar data. A supervised classification is performed at the object level on the computed features in order to coarsely discriminate the existing tree species in the area of interest. The analysis at tree level is particularly relevant since it significantly improves the tree species classification. A probability map is generated through the tree species classification and inserted with the pixel-based features map in an energetical framework. The proposed energy is then minimized using a standard graph-cut method (namely QPBO with α-expansion) in order to produce a segmentation map with a controlled level of details. Comparison with an existing forest land cover database shows that our method provides satisfactory results both in terms of stand labelling and delineation (matching ranges between 94% and 99%).

  3. Development of noSQL data storage for the ATLAS PanDA Monitoring System

    NASA Astrophysics Data System (ADS)

    Potekhin, M.; ATLAS Collaboration

    2012-06-01

    For several years the PanDA Workload Management System has been the basis for distributed production and analysis for the ATLAS experiment at the LHC. Since the start of data taking PanDA usage has ramped up steadily, typically exceeding 500k completed jobs/day by June 2011. The associated monitoring data volume has been rising as well, to levels that present a new set of challenges in the areas of database scalability and monitoring system performance and efficiency. These challenges are being met with a R&D effort aimed at implementing a scalable and efficient monitoring data storage based on a noSQL solution (Cassandra). We present our motivations for using this technology, as well as data design and the techniques used for efficient indexing of the data. We also discuss the hardware requirements as they were determined by testing with actual data and realistic loads.

  4. The BioIntelligence Framework: a new computational platform for biomedical knowledge computing.

    PubMed

    Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles; Mousses, Spyro

    2013-01-01

    Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information.

  5. Microcomputer-Based Access to Machine-Readable Numeric Databases.

    ERIC Educational Resources Information Center

    Wenzel, Patrick

    1988-01-01

    Describes the use of microcomputers and relational database management systems to improve access to numeric databases by the Data and Program Library Service at the University of Wisconsin. The internal records management system, in-house reference tools, and plans to extend these tools to the entire campus are discussed. (3 references) (CLB)

  6. Ionic Liquids Database- (ILThermo)

    National Institute of Standards and Technology Data Gateway

    SRD 147 NIST Ionic Liquids Database- (ILThermo) (Web, free access)   IUPAC Ionic Liquids Database, ILThermo, is a free web research tool that allows users worldwide to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.

  7. FirstSearch and NetFirst--Web and Dial-up Access: Plus Ca Change, Plus C'est la Meme Chose?

    ERIC Educational Resources Information Center

    Koehler, Wallace; Mincey, Danielle

    1996-01-01

    Compares and evaluates the differences between OCLC's dial-up and World Wide Web FirstSearch access methods and their interfaces with the underlying databases. Also examines NetFirst, OCLC's new Internet catalog, the only Internet tracking database from a "traditional" database service. (Author/PEN)

  8. 48 CFR 504.605-70 - Federal Procurement Data System-Public access to data.

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration awarded a contract for creation and operation of the Federal Procurement Data System (FPDS) database. That database includes information reported by departments and agencies as required by FAR subpart 4.6. One of...

  9. 48 CFR 504.605-70 - Federal Procurement Data System-Public access to data.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration awarded a contract for creation and operation of the Federal Procurement Data System (FPDS) database. That database includes information reported by departments and agencies as required by FAR subpart 4.6. One of...

  10. 48 CFR 504.605-70 - Federal Procurement Data System-Public access to data.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... Procurement Data System—Public access to data. (a) The FPDS database. The General Services Administration awarded a contract for creation and operation of the Federal Procurement Data System (FPDS) database. That database includes information reported by departments and agencies as required by FAR subpart 4.6. One of...

  11. Tao of Gateway: Providing Internet Access to Licensed Databases.

    ERIC Educational Resources Information Center

    McClellan, Gregory A.; Garrison, William V.

    1997-01-01

    Illustrates an approach for providing networked access to licensed databases over the Internet by positioning the library between patron and vendor. Describes how the gateway systems and database connection servers work and discusses how treatment of security has evolved with the introduction of the World Wide Web. Outlines plans to reimplement…

  12. Web Database Development: Implications for Academic Publishing.

    ERIC Educational Resources Information Center

    Fernekes, Bob

    This paper discusses the preliminary planning, design, and development of a pilot project to create an Internet accessible database and search tool for locating and distributing company data and scholarly work. Team members established four project objectives: (1) to develop a Web accessible database and decision tool that creates Web pages on the…

  13. Ocean Drilling Program: Web Site Access Statistics

    Science.gov Websites

    and products Drilling services and tools Online Janus database Search the ODP/TAMU web site ODP's main See statistics for JOIDES members. See statistics for Janus database. 1997 October November December accessible only on www-odp.tamu.edu. ** End of ODP, start of IODP. Privacy Policy ODP | Search | Database

  14. Windows on the brain: the emerging role of atlases and databases in neuroscience

    NASA Technical Reports Server (NTRS)

    Van Essen, David C.; VanEssen, D. C. (Principal Investigator)

    2002-01-01

    Brain atlases and associated databases have great potential as gateways for navigating, accessing, and visualizing a wide range of neuroscientific data. Recent progress towards realizing this potential includes the establishment of probabilistic atlases, surface-based atlases and associated databases, combined with improvements in visualization capabilities and internet access.

  15. Pan European Phenological database (PEP725): a single point of access for European data.

    PubMed

    Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana

    2018-06-01

    The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via http://www.pep725.eu/ . Users of the PEP725 database have studied a diversity of topics ranging from climate change impact, plant physiological question, phenological modeling, and remote sensing of vegetation to ecosystem productivity.

  16. Pan European Phenological database (PEP725): a single point of access for European data

    NASA Astrophysics Data System (ADS)

    Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M.; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana

    2018-02-01

    The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via http://www.pep725.eu/. Users of the PEP725 database have studied a diversity of topics ranging from climate change impact, plant physiological question, phenological modeling, and remote sensing of vegetation to ecosystem productivity.

  17. The CoFactor database: organic cofactors in enzyme catalysis.

    PubMed

    Fischer, Julia D; Holliday, Gemma L; Thornton, Janet M

    2010-10-01

    Organic enzyme cofactors are involved in many enzyme reactions. Therefore, the analysis of cofactors is crucial to gain a better understanding of enzyme catalysis. To aid this, we have created the CoFactor database. CoFactor provides a web interface to access hand-curated data extracted from the literature on organic enzyme cofactors in biocatalysis, as well as automatically collected information. CoFactor includes information on the conformational and solvent accessibility variation of the enzyme-bound cofactors, as well as mechanistic and structural information about the hosting enzymes. The database is publicly available and can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/CoFactor.

  18. Foot and Ankle Fellowship Websites: An Assessment of Accessibility and Quality.

    PubMed

    Hinds, Richard M; Danna, Natalie R; Capo, John T; Mroczek, Kenneth J

    2017-08-01

    The Internet has been reported to be the first informational resource for many fellowship applicants. The objective of this study was to assess the accessibility of orthopaedic foot and ankle fellowship websites and to evaluate the quality of information provided via program websites. The American Orthopaedic Foot and Ankle Society (AOFAS) and the Fellowship and Residency Electronic Interactive Database (FREIDA) fellowship databases were accessed to generate a comprehensive list of orthopaedic foot and ankle fellowship programs. The databases were reviewed for links to fellowship program websites and compared with program websites accessed from a Google search. Accessible fellowship websites were then analyzed for the quality of recruitment and educational content pertinent to fellowship applicants. Forty-seven orthopaedic foot and ankle fellowship programs were identified. The AOFAS database featured direct links to 7 (15%) fellowship websites with the independent Google search yielding direct links to 29 (62%) websites. No direct website links were provided in the FREIDA database. Thirty-six accessible websites were analyzed for content. Program websites featured a mean 44% (range = 5% to 75%) of the total assessed content. The most commonly presented recruitment and educational content was a program description (94%) and description of fellow operative experience (83%), respectively. There is substantial variability in the accessibility and quality of orthopaedic foot and ankle fellowship websites. Recognition of deficits in accessibility and content quality may assist foot and ankle fellowships in improving program information online. Level IV.

  19. Experience with ATLAS MySQL PanDA database service

    NASA Astrophysics Data System (ADS)

    Smirnov, Y.; Wlodek, T.; De, K.; Hover, J.; Ozturk, N.; Smith, J.; Wenaus, T.; Yu, D.

    2010-04-01

    The PanDA distributed production and analysis system has been in production use for ATLAS data processing and analysis since late 2005 in the US, and globally throughout ATLAS since early 2008. Its core architecture is based on a set of stateless web services served by Apache and backed by a suite of MySQL databases that are the repository for all PanDA information: active and archival job queues, dataset and file catalogs, site configuration information, monitoring information, system control parameters, and so on. This database system is one of the most critical components of PanDA, and has successfully delivered the functional and scaling performance required by PanDA, currently operating at a scale of half a million jobs per week, with much growth still to come. In this paper we describe the design and implementation of the PanDA database system, its architecture of MySQL servers deployed at BNL and CERN, backup strategy and monitoring tools. The system has been developed, thoroughly tested, and brought to production to provide highly reliable, scalable, flexible and available database services for ATLAS Monte Carlo production, reconstruction and physics analysis.

  20. Scalable total synthesis and comprehensive structure-activity relationship studies of the phytotoxin coronatine.

    PubMed

    Littleson, Mairi M; Baker, Christopher M; Dalençon, Anne J; Frye, Elizabeth C; Jamieson, Craig; Kennedy, Alan R; Ling, Kenneth B; McLachlan, Matthew M; Montgomery, Mark G; Russell, Claire J; Watson, Allan J B

    2018-03-16

    Natural phytotoxins are valuable starting points for agrochemical design. Acting as a jasmonate agonist, coronatine represents an attractive herbicidal lead with novel mode of action, and has been an important synthetic target for agrochemical development. However, both restricted access to quantities of coronatine and a lack of a suitably scalable and flexible synthetic approach to its constituent natural product components, coronafacic and coronamic acids, has frustrated development of this target. Here, we report gram-scale production of coronafacic acid that allows a comprehensive structure-activity relationship study of this target. Biological assessment of a >120 member library combined with computational studies have revealed the key determinants of potency, rationalising hypotheses held for decades, and allowing future rational design of new herbicidal leads based on this template.

  1. A complexity-scalable software-based MPEG-2 video encoder.

    PubMed

    Chen, Guo-bin; Lu, Xin-ning; Wang, Xing-guo; Liu, Ji-lin

    2004-05-01

    With the development of general-purpose processors (GPP) and video signal processing algorithms, it is possible to implement a software-based real-time video encoder on GPP, and its low cost and easy upgrade attract developers' interests to transfer video encoding from specialized hardware to more flexible software. In this paper, the encoding structure is set up first to support complexity scalability; then a lot of high performance algorithms are used on the key time-consuming modules in coding process; finally, at programming level, processor characteristics are considered to improve data access efficiency and processing parallelism. Other programming methods such as lookup table are adopted to reduce the computational complexity. Simulation results showed that these ideas could not only improve the global performance of video coding, but also provide great flexibility in complexity regulation.

  2. Doing One Thing Well: Leveraging Microservices for NASA Earth Science Discovery and Access Across Heterogenous Data Sources

    NASA Astrophysics Data System (ADS)

    Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.

    2015-12-01

    The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.

  3. A Generic Ground Framework for Image Expertise Centres and Small-Sized Production Centres

    NASA Astrophysics Data System (ADS)

    Sellé, A.

    2009-05-01

    Initiated by the Pleiadas Earth Observation Program, the CNES (French Space Agency) has developed a generic collaborative framework for its image quality centre, highly customisable for any upcoming expertise centre. This collaborative framework has been design to be used by a group of experts or scientists that want to share data and processings and manage interfaces with external entities. Its flexible and scalable architecture complies with the core requirements: defining a user data model with no impact on the software (generic access data), integrating user processings with a GUI builder and built-in APIs, and offering a scalable architecture to fit any preformance requirement and accompany growing projects. The CNES jas given licensing grants for two software companies that will be able to redistribute this framework to any customer.

  4. Erectile Dysfunction Herbs: A Natural Treatment for ED?

    MedlinePlus

    ... Analysis. 2015;102:476. DHEA. Natural Medicines Comprehensive Database. http://www.naturaldatabase.com. Accessed Nov. 1, 2015. L-arginine. Natural Medicines Comprehensive Database. http://www.naturaldatabase.com. Accessed Nov. 1, 2015. ...

  5. ERMes: Open Source Simplicity for Your E-Resource Management

    ERIC Educational Resources Information Center

    Doering, William; Chilton, Galadriel

    2009-01-01

    ERMes, the latest version of electronic resource management system (ERM), is a relational database; content in different tables connects to, and works with, content in other tables. ERMes requires Access 2007 (Windows) or Access 2008 (Mac) to operate as the database utilizes functionality not available in previous versions of Microsoft Access. The…

  6. Databases and Electronic Resources - Betty Petersen Memorial Library

    Science.gov Websites

    of NOAA-Wide and Open Access Databases on the NOAA Central Library website. American Meteorological to a nonfederal website. Open Science Directory Open Science Directory contains collections of Open Access Journals (e.g. Directory of Open Access Journals) and journals in the special programs (Hinari

  7. A method to implement fine-grained access control for personal health records through standard relational database queries.

    PubMed

    Sujansky, Walter V; Faus, Sam A; Stone, Ethan; Brennan, Patricia Flatley

    2010-10-01

    Online personal health records (PHRs) enable patients to access, manage, and share certain of their own health information electronically. This capability creates the need for precise access-controls mechanisms that restrict the sharing of data to that intended by the patient. The authors describe the design and implementation of an access-control mechanism for PHR repositories that is modeled on the eXtensible Access Control Markup Language (XACML) standard, but intended to reduce the cognitive and computational complexity of XACML. The authors implemented the mechanism entirely in a relational database system using ANSI-standard SQL statements. Based on a set of access-control rules encoded as relational table rows, the mechanism determines via a single SQL query whether a user who accesses patient data from a specific application is authorized to perform a requested operation on a specified data object. Testing of this query on a moderately large database has demonstrated execution times consistently below 100ms. The authors include the details of the implementation, including algorithms, examples, and a test database as Supplementary materials. Copyright © 2010 Elsevier Inc. All rights reserved.

  8. Crystallography Open Database – an open-access collection of crystal structures

    PubMed Central

    Gražulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quirós, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, Armel

    2009-01-01

    The Crystallography Open Database (COD), which is a project that aims to gather all available inorganic, metal–organic and small organic molecule structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80 000 entries in crystallographic information file format, with nearly full coverage of the International Union of Crystallography publications, and is growing in size and quality. PMID:22477773

  9. DynAstVO : a Europlanet database of NEA orbits

    NASA Astrophysics Data System (ADS)

    Desmars, J.; Thuillot, W.; Hestroffer, D.; David, P.; Le Sidaner, P.

    2017-09-01

    DynAstVO is a new orbital database developed within the Europlanet 2020 RI and the Virtual European Solar and Planetary Access (VESPA) frameworks. The database is dedicated to Near-Earth asteroids and provide parameters related to orbits: osculating elements, observational information, ephemeris through SPICE kernel, and in particular, orbit uncertainty and associated covariance matrix. DynAstVO is daily updated on a automatic process of orbit determination on the basis of the Minor Planet Electronic Circulars that reports new observations or the discover of a new asteroid. This database conforms to EPN-TAP environment and is accessible through VO protocols and on the VESPA portal web access (http://vespa.obspm.fr/). A comparison with other classical databases such as Astorb, MPCORB, NEODyS and JPL is also presented.

  10. Slushie World: An In-Class Access Database Tutorial

    ERIC Educational Resources Information Center

    Wynn, Donald E., Jr.; Pratt, Renée M. E.

    2015-01-01

    The Slushie World case study is designed to teach the basics of Microsoft Access and database management over a series of three 75-minute class sessions. Students are asked to build a basic database to track sales and inventory for a small business. Skills to be learned include table creation, data entry and importing, form and report design,…

  11. NATIONAL URBAN DATABASE AND ACCESS PORTAL TOOL (NUDAPT): FACILITATING ADVANCEMENTS IN URBAN METEOROLOGY AND CLIMATE MODELING WITH COMMUNITY-BASED URBAN DATABASES

    EPA Science Inventory

    We discuss the initial design and application of the National Urban Database and Access Portal Tool (NUDAPT). This new project is sponsored by the USEPA and involves collaborations and contributions from many groups from federal and state agencies, and from private and academic i...

  12. Global Mapping of Cyber Attacks

    DTIC Science & Technology

    2014-01-01

    from any 10 particular vendor is wrong. Moreover, researchers often use anti-virus labels as ground-truth for evaluating new approaches [5, 17, 36...lnstrusions and Defenses (RAID), September 2007. [SJ U. Bayer, P. M. Comparetti, C. Hlauschek, C. Kruegel, and E. Kirda. Scalable, behavior -based...International Development and Confiict Management International crisis behavior project http: I /www. cidcm. umd.edu/icb/. Last accessed: December 2011

  13. Scalable PGAS Metadata Management on Extreme Scale Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chavarría-Miranda, Daniel; Agarwal, Khushbu; Straatsma, TP

    Programming models intended to run on exascale systems have a number of challenges to overcome, specially the sheer size of the system as measured by the number of concurrent software entities created and managed by the underlying runtime. It is clear from the size of these systems that any state maintained by the programming model has to be strictly sub-linear in size, in order not to overwhelm memory usage with pure overhead. A principal feature of Partitioned Global Address Space (PGAS) models is providing easy access to global-view distributed data structures. In order to provide efficient access to these distributedmore » data structures, PGAS models must keep track of metadata such as where array sections are located with respect to processes/threads running on the HPC system. As PGAS models and applications become ubiquitous on very large transpetascale systems, a key component to their performance and scalability will be efficient and judicious use of memory for model overhead (metadata) compared to application data. We present an evaluation of several strategies to manage PGAS metadata that exhibit different space/time tradeoffs. We use two real-world PGAS applications to capture metadata usage patterns and gain insight into their communication behavior.« less

  14. Scalable Parallel Density-based Clustering and Applications

    NASA Astrophysics Data System (ADS)

    Patwary, Mostofa Ali

    2014-04-01

    Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.

  15. A systematic literature review on security and privacy of electronic health record systems: technical perspectives.

    PubMed

    Rezaeibagha, Fatemeh; Win, Khin Than; Susilo, Willy

    Even though many safeguards and policies for electronic health record (EHR) security have been implemented, barriers to the privacy and security protection of EHR systems persist. This article presents the results of a systematic literature review regarding frequently adopted security and privacy technical features of EHR systems. Our inclusion criteria were full articles that dealt with the security and privacy of technical implementations of EHR systems published in English in peer-reviewed journals and conference proceedings between 1998 and 2013; 55 selected studies were reviewed in detail. We analysed the review results using two International Organization for Standardization (ISO) standards (29100 and 27002) in order to consolidate the study findings. Using this process, we identified 13 features that are essential to security and privacy in EHRs. These included system and application access control, compliance with security requirements, interoperability, integration and sharing, consent and choice mechanism, policies and regulation, applicability and scalability and cryptography techniques. This review highlights the importance of technical features, including mandated access control policies and consent mechanisms, to provide patients' consent, scalability through proper architecture and frameworks, and interoperability of health information systems, to EHR security and privacy requirements.

  16. rCAD: A Novel Database Schema for the Comparative Analysis of RNA.

    PubMed

    Ozer, Stuart; Doshi, Kishore J; Xu, Weijia; Gutell, Robin R

    2011-12-31

    Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios.

  17. rCAD: A Novel Database Schema for the Comparative Analysis of RNA

    PubMed Central

    Ozer, Stuart; Doshi, Kishore J.; Xu, Weijia; Gutell, Robin R.

    2013-01-01

    Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios. PMID:24772454

  18. LHCb experience with LFC replication

    NASA Astrophysics Data System (ADS)

    Bonifazi, F.; Carbone, A.; Perez, E. D.; D'Apice, A.; dell'Agnello, L.; Duellmann, D.; Girone, M.; Re, G. L.; Martelli, B.; Peco, G.; Ricci, P. P.; Sapunenko, V.; Vagnoni, V.; Vitlacil, D.

    2008-07-01

    Database replication is a key topic in the framework of the LHC Computing Grid to allow processing of data in a distributed environment. In particular, the LHCb computing model relies on the LHC File Catalog, i.e. a database which stores information about files spread across the GRID, their logical names and the physical locations of all the replicas. The LHCb computing model requires the LFC to be replicated at Tier-1s. The LCG 3D project deals with the database replication issue and provides a replication service based on Oracle Streams technology. This paper describes the deployment of the LHC File Catalog replication to the INFN National Center for Telematics and Informatics (CNAF) and to other LHCb Tier-1 sites. We performed stress tests designed to evaluate any delay in the propagation of the streams and the scalability of the system. The tests show the robustness of the replica implementation with performance going much beyond the LHCb requirements.

  19. Search and Graph Database Technologies for Biomedical Semantic Indexing: Experimental Analysis.

    PubMed

    Segura Bedmar, Isabel; Martínez, Paloma; Carruana Martín, Adrián

    2017-12-01

    Biomedical semantic indexing is a very useful support tool for human curators in their efforts for indexing and cataloging the biomedical literature. The aim of this study was to describe a system to automatically assign Medical Subject Headings (MeSH) to biomedical articles from MEDLINE. Our approach relies on the assumption that similar documents should be classified by similar MeSH terms. Although previous work has already exploited the document similarity by using a k-nearest neighbors algorithm, we represent documents as document vectors by search engine indexing and then compute the similarity between documents using cosine similarity. Once the most similar documents for a given input document are retrieved, we rank their MeSH terms to choose the most suitable set for the input document. To do this, we define a scoring function that takes into account the frequency of the term into the set of retrieved documents and the similarity between the input document and each retrieved document. In addition, we implement guidelines proposed by human curators to annotate MEDLINE articles; in particular, the heuristic that says if 3 MeSH terms are proposed to classify an article and they share the same ancestor, they should be replaced by this ancestor. The representation of the MeSH thesaurus as a graph database allows us to employ graph search algorithms to quickly and easily capture hierarchical relationships such as the lowest common ancestor between terms. Our experiments show promising results with an F1 of 69% on the test dataset. To the best of our knowledge, this is the first work that combines search and graph database technologies for the task of biomedical semantic indexing. Due to its horizontal scalability, ElasticSearch becomes a real solution to index large collections of documents (such as the bibliographic database MEDLINE). Moreover, the use of graph search algorithms for accessing MeSH information could provide a support tool for cataloging MEDLINE abstracts in real time. ©Isabel Segura Bedmar, Paloma Martínez, Adrián Carruana Martín. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 01.12.2017.

  20. CERES Search and Subset Tool

    Atmospheric Science Data Center

    2016-06-24

    ... data granules using a high resolution spatial metadata database and directly accessing the archived data granules. Subset results are ... data granules using a high resolution spatial metadata database and directly accessing the archived data granules. Subset results are ...

  1. Publishing Data on Physical Samples Using the GeoLink Ontology and Linked Data Platforms

    NASA Astrophysics Data System (ADS)

    Ji, P.; Arko, R. A.; Lehnert, K. A.; Song, L.; Carter, M. R.; Hsu, L.

    2015-12-01

    Interdisciplinary Earth Data Alliance (IEDA), one of partners in EarthCube GeoLink project, seeks to explore the extent to which the use of GeoLink reusable Ontology Design Patterns (ODPs) and linked data platforms in IEDA data infrastructure can make research data more easily accessible and valuable. Linked data for the System for Earth Sample Registration (SESAR) is the first effort of IEDA to show how linked data enhance the presentation of IEDA data system architecture. SESAR Linked Data maps each table and column in SESAR database to RDF class and property based on GeoLink view, which build on the top of GeoLink ODPs. Then, uses D2RQ dumping the contents of SESAR database into RDF triples on the basis of mapping results. And, the dumped RDF triples is loaded into GRAPHDB, an RDF graph database, as permanent data in the form of atomic facts expressed as subjects, predicates and objects which provide support for semantic interoperability between IEDA and other GeoLink partners. Finally, an integrated browsing and searching interface build on Callimachus, a highly scalable platform for publishing linked data, is introduced to make sense of data stored in triplestore. Drill down and through features are built in the interface to help users locating content efficiently. The drill down feature enables users to explore beyond the summary information in the instance list of a specific class and into the detail from the specific instance page. The drill through feature enables users to jump from one instance to another one by simply clicking the link of the latter nested in the former region. Additionally, OpenLayers map is embedded into the interface to enhance the attractiveness of the presentation of instance which has geospatial information. Furthermore, by linking instances in the SESAR datasets to matching or corresponding instances in external sets, the presentation has been enriched with additional information about related classes like person, cruise, etc.

  2. KA-SB: from data integration to large scale reasoning

    PubMed Central

    Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F

    2009-01-01

    Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402

  3. Oceans 2.0: Interactive tools for the Visualization of Multi-dimensional Ocean Sensor Data

    NASA Astrophysics Data System (ADS)

    Biffard, B.; Valenzuela, M.; Conley, P.; MacArthur, M.; Tredger, S.; Guillemot, E.; Pirenne, B.

    2016-12-01

    Ocean Networks Canada (ONC) operates ocean observatories on all three of Canada's coasts. The instruments produce 280 gigabytes of data per day with 1/2 petabyte archived so far. In 2015, 13 terabytes were downloaded by over 500 users from across the world. ONC's data management system is referred to as "Oceans 2.0" owing to its interactive, participative features. A key element of Oceans 2.0 is real time data acquisition and processing: custom device drivers implement the input-output protocol of each instrument. Automatic parsing and calibration takes place on the fly, followed by event detection and quality control. All raw data are stored in a file archive, while the processed data are copied to fast databases. Interactive access to processed data is provided through data download and visualization/quick look features that are adapted to diverse data types (scalar, acoustic, video, multi-dimensional, etc). Data may be post or re-processed to add features, analysis or correct errors, update calibrations, etc. A robust storage structure has been developed consisting of an extensive file system and a no-SQL database (Cassandra). Cassandra is a node-based open source distributed database management system. It is scalable and offers improved performance for big data. A key feature is data summarization. The system has also been integrated with web services and an ERDDAP OPeNDAP server, capable of serving scalar and multidimensional data from Cassandra for fixed or mobile devices.A complex data viewer has been developed making use of the big data capability to interactively display live or historic echo sounder and acoustic Doppler current profiler data, where users can scroll, apply processing filters and zoom through gigabytes of data with simple interactions. This new technology brings scientists one step closer to a comprehensive, web-based data analysis environment in which visual assessment, filtering, event detection and annotation can be integrated.

  4. Automatic initialization and quality control of large-scale cardiac MRI segmentations.

    PubMed

    Albà, Xènia; Lekadir, Karim; Pereañez, Marco; Medrano-Gracia, Pau; Young, Alistair A; Frangi, Alejandro F

    2018-01-01

    Continuous advances in imaging technologies enable ever more comprehensive phenotyping of human anatomy and physiology. Concomitant reduction of imaging costs has resulted in widespread use of imaging in large clinical trials and population imaging studies. Magnetic Resonance Imaging (MRI), in particular, offers one-stop-shop multidimensional biomarkers of cardiovascular physiology and pathology. A wide range of analysis methods offer sophisticated cardiac image assessment and quantification for clinical and research studies. However, most methods have only been evaluated on relatively small databases often not accessible for open and fair benchmarking. Consequently, published performance indices are not directly comparable across studies and their translation and scalability to large clinical trials or population imaging cohorts is uncertain. Most existing techniques still rely on considerable manual intervention for the initialization and quality control of the segmentation process, becoming prohibitive when dealing with thousands of images. The contributions of this paper are three-fold. First, we propose a fully automatic method for initializing cardiac MRI segmentation, by using image features and random forests regression to predict an initial position of the heart and key anatomical landmarks in an MRI volume. In processing a full imaging database, the technique predicts the optimal corrective displacements and positions in relation to the initial rough intersections of the long and short axis images. Second, we introduce for the first time a quality control measure capable of identifying incorrect cardiac segmentations with no visual assessment. The method uses statistical, pattern and fractal descriptors in a random forest classifier to detect failures to be corrected or removed from subsequent statistical analysis. Finally, we validate these new techniques within a full pipeline for cardiac segmentation applicable to large-scale cardiac MRI databases. The results obtained based on over 1200 cases from the Cardiac Atlas Project show the promise of fully automatic initialization and quality control for population studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Automatic provisioning, deployment and orchestration for load-balancing THREDDS instances

    NASA Astrophysics Data System (ADS)

    Cofino, A. S.; Fernández-Tejería, S.; Kershaw, P.; Cimadevilla, E.; Petri, R.; Pryor, M.; Stephens, A.; Herrera, S.

    2017-12-01

    THREDDS is a widely used web server to provide to different scientific communities with data access and discovery. Due to THREDDS's lack of horizontal scalability and automatic configuration management and deployment, this service usually deals with service downtimes and time consuming configuration tasks, mainly when an intensive use is done as is usual within the scientific community (e.g. climate). Instead of the typical installation and configuration of a single or multiple independent THREDDS servers, manually configured, this work presents an automatic provisioning, deployment and orchestration cluster of THREDDS servers. This solution it's based on Ansible playbooks, used to control automatically the deployment and configuration setup on a infrastructure and to manage the datasets available in THREDDS instances. The playbooks are based on modules (or roles) of different backends and frontends load-balancing setups and solutions. The frontend load-balancing system enables horizontal scalability by delegating requests to backend workers, consisting in a variable number of instances for the THREDDS server. This implementation allows to configure different infrastructure and deployment scenario setups, as more workers are easily added to the cluster by simply declaring them as Ansible variables and executing the playbooks, and also provides fault-tolerance and better reliability since if any of the workers fail another instance of the cluster can take over it. In order to test the solution proposed, two real scenarios are analyzed in this contribution: The JASMIN Group Workspaces at CEDA and the User Data Gateway (UDG) at the Data Climate Service from the University of Cantabria. On the one hand, the proposed configuration has provided CEDA with a higher level and more scalable Group Workspaces (GWS) service than the previous one based on Unix permissions, improving also the data discovery and data access experience. On the other hand, the UDG has improved its scalability by allowing requests to be distributed to the backend workers instead of being served by a unique THREDDS worker. As a conclusion the proposed configuration supposes a significant improvement with respect to configurations based on non-collaborative THREDDS' instances.

  6. BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.

    PubMed

    Ausmees, Kristiina; John, Aji; Toor, Salman Z; Hellander, Andreas; Nettelblad, Carl

    2018-06-26

    The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds - an increasingly common scenario for both academic and commercial cloud providers - our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.

  7. WaveNet: A Web-Based Metocean Data Access, Processing, and Analysis Tool. Part 3 - CDIP Database

    DTIC Science & Technology

    2014-06-01

    and Analysis Tool; Part 3 – CDIP Database by Zeki Demirbilek, Lihwa Lin, and Derek Wilson PURPOSE: This Coastal and Hydraulics Engineering...Technical Note (CHETN) describes coupling of the Coastal Data Information Program ( CDIP ) database to WaveNet, the first module of MetOcnDat (Meteorological...provides a step-by-step procedure to access, process, and analyze wave and wind data from the CDIP database. BACKGROUND: WaveNet addresses a basic

  8. Development and evaluation of a low-cost and high-capacity DICOM image data storage system for research.

    PubMed

    Yakami, Masahiro; Ishizu, Koichi; Kubo, Takeshi; Okada, Tomohisa; Togashi, Kaori

    2011-04-01

    Thin-slice CT data, useful for clinical diagnosis and research, is now widely available but is typically discarded in many institutions, after a short period of time due to data storage capacity limitations. We designed and built a low-cost high-capacity Digital Imaging and COmmunication in Medicine (DICOM) storage system able to store thin-slice image data for years, using off-the-shelf consumer hardware components, such as a Macintosh computer, a Windows PC, and network-attached storage units. "Ordinary" hierarchical file systems, instead of a centralized data management system such as relational database, were adopted to manage patient DICOM files by arranging them in directories enabling quick and easy access to the DICOM files of each study by following the directory trees with Windows Explorer via study date and patient ID. Software used for this system was open-source OsiriX and additional programs we developed ourselves, both of which were freely available via the Internet. The initial cost of this system was about $3,600 with an incremental storage cost of about $900 per 1 terabyte (TB). This system has been running since 7th Feb 2008 with the data stored increasing at the rate of about 1.3 TB per month. Total data stored was 21.3 TB on 23rd June 2009. The maintenance workload was found to be about 30 to 60 min once every 2 weeks. In conclusion, this newly developed DICOM storage system is useful for research due to its cost-effectiveness, enormous capacity, high scalability, sufficient reliability, and easy data access.

  9. Central Satellite Data Repository Supporting Research and Development

    NASA Astrophysics Data System (ADS)

    Han, W.; Brust, J.

    2015-12-01

    Near real-time satellite data is critical to many research and development activities of atmosphere, land, and ocean processes. Acquiring and managing huge volumes of satellite data without (or with less) latency in an organization is always a challenge in the big data age. An organization level data repository is a practical solution to meeting this challenge. The STAR (Center for Satellite Applications and Research of NOAA) Central Data Repository (SCDR) is a scalable, stable, and reliable repository to acquire, manipulate, and disseminate various types of satellite data in an effective and efficient manner. SCDR collects more than 200 data products, which are commonly used by multiple groups in STAR, from NOAA, GOES, Metop, Suomi NPP, Sentinel, Himawari, and other satellites. The processes of acquisition, recording, retrieval, organization, and dissemination are performed in parallel. Multiple data access interfaces, like FTP, FTPS, HTTP, HTTPS, and RESTful, are supported in the SCDR to obtain satellite data from their providers through high speed internet. The original satellite data in various raster formats can be parsed in the respective adapter to retrieve data information. The data information is ingested to the corresponding partitioned tables in the central database. All files are distributed equally on the Network File System (NFS) disks to balance the disk load. SCDR provides consistent interfaces (including Perl utility, portal, and RESTful Web service) to locate files of interest easily and quickly and access them directly by over 200 compute servers via NFS. SCDR greatly improves collection and integration of near real-time satellite data, addresses satellite data requirements of scientists and researchers, and facilitates their primary research and development activities.

  10. The SuperCOSMOS Science Archive

    NASA Astrophysics Data System (ADS)

    Hambly, N.; Read, M.; Mann, R.; Sutorius, E.; Bond, I.; MacGillivray, H.; Williams, P.; Lawrence, A.

    2004-07-01

    The SuperCOSMOS Sky Survey (SSS {http://www-wfau.roe.ac.uk/sss}; Hambly et al., 2001) consists of digitised scans of Schmidt photographic survey material in a multi-colour (BRI), multi-epoch, uniformly calibrated product. It covers the whole southern hemisphere, with an extension into the north currently underway. Public online access to the 2 Tbytes of SSS pixel data and object catalogues has been available for some time; data are being downloaded at a rate of several gigabytes per week, and many new science results are emerging from community use of the data. In this poster we describe the terabyte-scale SuperCOSMOS Science Archive {http://thoth.roe.ac.uk/ssa} (SSA), which is a recasting of the SSS object catalogue system from flat files into an RDBMS, with an enhanced user interface. We describe some aspects of the hardware and schema design of the SSA, which aims to produce a high performance, VO-compatible database, suitable for data mining by `power users', while maintaining the ease of use praised in the old SSS system. Initially, the SSA will allow access through web forms and a flexible SQL interface. It acts as the prototype for the next generation survey archives to be hosted by the University of Edinburgh's Wide Field Astronomy Unit, such as the WFCAM Science Archive of infrared sky survey data, as well as being a scalability testbed for use by AstroGrid, the UK's Virtual Observatory project. As a result of these roles, it will display subsequently an expanding functionality, as web - and later, Grid - services are deployed on it.

  11. MIPS: a database for protein sequences, homology data and yeast genome information.

    PubMed Central

    Mewes, H W; Albermann, K; Heumann, K; Liebl, S; Pfeiffer, F

    1997-01-01

    The MIPS group (Martinsried Institute for Protein Sequences) at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, collects, processes and distributes protein sequence data within the framework of the tripartite association of the PIR-International Protein Sequence Database (,). MIPS contributes nearly 50% of the data input to the PIR-International Protein Sequence Database. The database is distributed on CD-ROM together with PATCHX, an exhaustive supplement of unique, unverified protein sequences from external sources compiled by MIPS. Through its WWW server (http://www.mips.biochem.mpg.de/ ) MIPS permits internet access to sequence databases, homology data and to yeast genome information. (i) Sequence similarity results from the FASTA program () are stored in the FASTA database for all proteins from PIR-International and PATCHX. The database is dynamically maintained and permits instant access to FASTA results. (ii) Starting with FASTA database queries, proteins have been classified into families and superfamilies (PROT-FAM). (iii) The HPT (hashed position tree) data structure () developed at MIPS is a new approach for rapid sequence and pattern searching. (iv) MIPS provides access to the sequence and annotation of the complete yeast genome (), the functional classification of yeast genes (FunCat) and its graphical display, the 'Genome Browser' (). A CD-ROM based on the JAVA programming language providing dynamic interactive access to the yeast genome and the related protein sequences has been compiled and is available on request. PMID:9016498

  12. NNDC Data Services

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuli, J.K.; Sonzogni,A.

    The National Nuclear Data Center has provided remote access to some of its resources since 1986. The major databases and other resources available currently through NNDC Web site are summarized. The National Nuclear Data Center (NNDC) has provided remote access to the nuclear physics databases it maintains and to other resources since 1986. With considerable innovation access is now mostly through the Web. The NNDC Web pages have been modernized to provide a consistent state-of-the-art style. The improved database services and other resources available from the NNOC site at www.nndc.bnl.govwill be described.

  13. Secure, web-accessible call rosters for academic radiology departments.

    PubMed

    Nguyen, A V; Tellis, W M; Avrin, D E

    2000-05-01

    Traditionally, radiology department call rosters have been posted via paper and bulletin boards. Frequently, changes to these lists are made by multiple people independently, but often not synchronized, resulting in confusion among the house staff and technical staff as to who is on call and when. In addition, multiple and disparate copies exist in different sections of the department, and changes made would not be propagated to all the schedules. To eliminate such difficulties, a paperless call scheduling application was developed. Our call scheduling program allowed Java-enabled web access to a database by designated personnel from each radiology section who have privileges to make the necessary changes. Once a person made a change, everyone accessing the database would see the modification. This eliminates the chaos resulting from people swapping shifts at the last minute and not having the time to record or broadcast the change. Furthermore, all changes to the database were logged. Users are given a log-in name and password and can only edit their section; however, all personnel have access to all sections' schedules. Our applet was written in Java 2 using the latest technology in database access. We access our Interbase database through the DataExpress and DB Swing (Borland, Scotts Valley, CA) components. The result is secure access to the call rosters via the web. There are many advantages to the web-enabled access, mainly the ability for people to make changes and have the changes recorded and propagated in a single virtual location and available to all who need to know.

  14. Nencki Genomics Database--Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs.

    PubMed

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql -h database.nencki-genomics.org -u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface.

  15. Nencki Genomics Database—Ensembl funcgen enhanced with intersections, user data and genome-wide TFBS motifs

    PubMed Central

    Krystkowiak, Izabella; Lenart, Jakub; Debski, Konrad; Kuterba, Piotr; Petas, Michal; Kaminska, Bozena; Dabrowski, Michal

    2013-01-01

    We present the Nencki Genomics Database, which extends the functionality of Ensembl Regulatory Build (funcgen) for the three species: human, mouse and rat. The key enhancements over Ensembl funcgen include the following: (i) a user can add private data, analyze them alongside the public data and manage access rights; (ii) inside the database, we provide efficient procedures for computing intersections between regulatory features and for mapping them to the genes. To Ensembl funcgen-derived data, which include data from ENCODE, we add information on conserved non-coding (putative regulatory) sequences, and on genome-wide occurrence of transcription factor binding site motifs from the current versions of two major motif libraries, namely, Jaspar and Transfac. The intersections and mapping to the genes are pre-computed for the public data, and the result of any procedure run on the data added by the users is stored back into the database, thus incrementally increasing the body of pre-computed data. As the Ensembl funcgen schema for the rat is currently not populated, our database is the first database of regulatory features for this frequently used laboratory animal. The database is accessible without registration using the mysql client: mysql –h database.nencki-genomics.org –u public. Registration is required only to add or access private data. A WSDL webservice provides access to the database from any SOAP client, including the Taverna Workbench with a graphical user interface. Database URL: http://www.nencki-genomics.org. PMID:24089456

  16. ForTrilinos

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Evans, Katherine J; Johnson, Seth R; Prokopenko, Andrey V

    'ForTrilinos' is related to The Trilinos Project, which contains a large and growing collection of solver capabilities that can utilize next-generation platforms, in particular scalable multicore, manycore, accelerator and heterogeneous systems. Trilinos is primarily written in C++, including its user interfaces. While C++ is advantageous for gaining access to the latest programming environments, it limits Trilinos usage via Fortran. Sever ad hoc translation interfaces exist to enable Fortran usage of Trilinos, but none of these interfaces is general-purpose or written for reusable and sustainable external use. 'ForTrilinos' provides a seamless pathway for large and complex Fortran-based codes to access Trilinosmore » without C/C++ interface code. This access includes Fortran versions of Kokkos abstractions for code execution and data management.« less

  17. Making Spatial Statistics Service Accessible On Cloud Platform

    NASA Astrophysics Data System (ADS)

    Mu, X.; Wu, J.; Li, T.; Zhong, Y.; Gao, X.

    2014-04-01

    Web service can bring together applications running on diverse platforms, users can access and share various data, information and models more effectively and conveniently from certain web service platform. Cloud computing emerges as a paradigm of Internet computing in which dynamical, scalable and often virtualized resources are provided as services. With the rampant growth of massive data and restriction of net, traditional web services platforms have some prominent problems existing in development such as calculation efficiency, maintenance cost and data security. In this paper, we offer a spatial statistics service based on Microsoft cloud. An experiment was carried out to evaluate the availability and efficiency of this service. The results show that this spatial statistics service is accessible for the public conveniently with high processing efficiency.

  18. Convergent optical wired and wireless long-reach access network using high spectral-efficient modulation.

    PubMed

    Chow, C W; Lin, Y H

    2012-04-09

    To provide broadband services in a single and low cost perform, the convergent optical wired and wireless access network is promising. Here, we propose and demonstrate a convergent optical wired and wireless long-reach access networks based on orthogonal wavelength division multiplexing (WDM). Both the baseband signal and the radio-over-fiber (ROF) signal are multiplexed and de-multiplexed in optical domain, hence it is simple and the operation speed is not limited by the electronic bottleneck caused by the digital signal processing (DSP). Error-free de-multiplexing and down-conversion can be achieved for all the signals after 60 km (long-reach) fiber transmission. The scalability of the system for higher bit-rate (60 GHz) is also simulated and discussed.

  19. Improved Information Retrieval Performance on SQL Database Using Data Adapter

    NASA Astrophysics Data System (ADS)

    Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.

    2018-02-01

    The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.

  20. Are There Any Natural Remedies That Reduce Chronic Fatigue Associated with Chronic Fatigue Syndrome?

    MedlinePlus

    ... management of chronic fatigue syndrome. Natural Medicines Comprehensive Database. http://www.naturaldatabase.com. Accessed Feb. 23, 2015. Magnesium. Natural Medicines Comprehensive Database. http://www.naturaldatabase.com. Accessed Feb. 24, 2015. ...

  1. Pan European Phenological database (PEP725): a single point of access for European data

    NASA Astrophysics Data System (ADS)

    Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M.; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana

    2018-06-01

    The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via http://www.pep725.eu/ . Users of the PEP725 database have studied a diversity of topics ranging from climate change impact, plant physiological question, phenological modeling, and remote sensing of vegetation to ecosystem productivity.

  2. SkyDOT: a publicly accessible variability database, containing multiple sky surveys and real-time data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Starr, D. L.; Wozniak, P. R.; Vestrand, W. T.

    2002-01-01

    SkyDOT (Sky Database for Objects in Time-Domain) is a Virtual Observatory currently comprised of data from the RAPTOR, ROTSE I, and OGLE I1 survey projects. This makes it a very large time domain database. In addition, the RAPTOR project provides SkyDOT with real-time variability data as well as stereoscopic information. With its web interface, we believe SkyDOT will be a very useful tool for both astronomers, and the public. Our main task has been to construct an efficient relational database containing all existing data, while handling a real-time inflow of data. We also provide a useful web interface allowing easymore » access to both astronomers and the public. Initially, this server will allow common searches, specific queries, and access to light curves. In the future we will include machine learning classification tools and access to spectral information.« less

  3. Intelligent Access to Sequence and Structure Databases (IASSD) - an interface for accessing information from major web databases.

    PubMed

    Ganguli, Sayak; Gupta, Manoj Kumar; Basu, Protip; Banik, Rahul; Singh, Pankaj Kumar; Vishal, Vineet; Bera, Abhisek Ranjan; Chakraborty, Hirak Jyoti; Das, Sasti Gopal

    2014-01-01

    With the advent of age of big data and advances in high throughput technology accessing data has become one of the most important step in the entire knowledge discovery process. Most users are not able to decipher the query result that is obtained when non specific keywords or a combination of keywords are used. Intelligent access to sequence and structure databases (IASSD) is a desktop application for windows operating system. It is written in Java and utilizes the web service description language (wsdl) files and Jar files of E-utilities of various databases such as National Centre for Biotechnology Information (NCBI) and Protein Data Bank (PDB). Apart from that IASSD allows the user to view protein structure using a JMOL application which supports conditional editing. The Jar file is freely available through e-mail from the corresponding author.

  4. Migration of legacy mumps applications to relational database servers.

    PubMed

    O'Kane, K C

    2001-07-01

    An extended implementation of the Mumps language is described that facilitates vendor neutral migration of legacy Mumps applications to SQL-based relational database servers. Implemented as a compiler, this system translates Mumps programs to operating system independent, standard C code for subsequent compilation to fully stand-alone, binary executables. Added built-in functions and support modules extend the native hierarchical Mumps database with access to industry standard, networked, relational database management servers (RDBMS) thus freeing Mumps applications from dependence upon vendor specific, proprietary, unstandardized database models. Unlike Mumps systems that have added captive, proprietary RDMBS access, the programs generated by this development environment can be used with any RDBMS system that supports common network access protocols. Additional features include a built-in web server interface and the ability to interoperate directly with programs and functions written in other languages.

  5. Correspondence: World Wide Web access to the British Universities Human Embryo Database

    PubMed Central

    AITON, JAMES F.; MCDONOUGH, ARIANA; MCLACHLAN, JOHN C.; SMART, STEVEN D.; WHITEN, SUSAN C.

    1997-01-01

    The British Universities Human Embryo Database has been created by merging information from the Walmsley Collection of Human Embryos at the School of Biological and Medical Sciences, University of St Andrews and from the Boyd Collection of Human Embryos at the Department of Anatomy, University of Cambridge. The database has been made available electronically on the Internet and World Wide Web browsers can be used to implement interactive access to the information stored in the British Universities Human Embryo Database. The database can, therefore, be accessed and searched from remote sites and specific embryos can be identified in terms of their location, age, developmental stage, plane of section, staining technique, and other parameters. It is intended to add information from other similar collections in the UK as it becomes available. PMID:9034891

  6. Scalable and Resilient Middleware to Handle Information Exchange during Environment Crisis

    NASA Astrophysics Data System (ADS)

    Tao, R.; Poslad, S.; Moßgraber, J.; Middleton, S.; Hammitzsch, M.

    2012-04-01

    The EU FP7 TRIDEC project focuses on enabling real-time, intelligent, information management of collaborative, complex, critical decision processes for earth management. A key challenge is to promote a communication infrastructure to facilitate interoperable environment information services during environment events and crises such as tsunamis and drilling, during which increasing volumes and dimensionality of disparate information sources, including sensor-based and human-based ones, can result, and need to be managed. Such a system needs to support: scalable, distributed messaging; asynchronous messaging; open messaging to handling changing clients such as new and retired automated system and human information sources becoming online or offline; flexible data filtering, and heterogeneous access networks (e.g., GSM, WLAN and LAN). In addition, the system needs to be resilient to handle the ICT system failures, e.g. failure, degradation and overloads, during environment events. There are several system middleware choices for TRIDEC based upon a Service-oriented-architecture (SOA), Event-driven-Architecture (EDA), Cloud Computing, and Enterprise Service Bus (ESB). In an SOA, everything is a service (e.g. data access, processing and exchange); clients can request on demand or subscribe to services registered by providers; more often interaction is synchronous. In an EDA system, events that represent significant changes in state can be processed simply, or as streams or more complexly. Cloud computing is a virtualization, interoperable and elastic resource allocation model. An ESB, a fundamental component for enterprise messaging, supports synchronous and asynchronous message exchange models and has inbuilt resilience against ICT failure. Our middleware proposal is an ESB based hybrid architecture model: an SOA extension supports more synchronous workflows; EDA assists the ESB to handle more complex event processing; Cloud computing can be used to increase and decrease the ESB resources on demand. To reify this hybrid ESB centric architecture, we will adopt two complementary approaches: an open source one for scalability and resilience improvement while a commercial one can be used for ultra-speed messaging, whilst we can bridge between these two to support interoperability. In TRIDEC, to manage such a hybrid messaging system, overlay and underlay management techniques will be adopted. The managers (both global and local) will collect, store and update status information (e.g. CPU utilization, free space, number of clients) and balance the usage, throughput, and delays to improve resilience and scalability. The expected resilience improvement includes dynamic failover, self-healing, pre-emptive load balancing, and bottleneck prediction while the expected improvement for scalability includes capacity estimation, Http Bridge, and automatic configuration and reconfiguration (e.g. add or delete clients and servers).

  7. Volumetric Medical Image Coding: An Object-based, Lossy-to-lossless and Fully Scalable Approach

    PubMed Central

    Danyali, Habibiollah; Mertins, Alfred

    2011-01-01

    In this article, an object-based, highly scalable, lossy-to-lossless 3D wavelet coding approach for volumetric medical image data (e.g., magnetic resonance (MR) and computed tomography (CT)) is proposed. The new method, called 3DOBHS-SPIHT, is based on the well-known set partitioning in the hierarchical trees (SPIHT) algorithm and supports both quality and resolution scalability. The 3D input data is grouped into groups of slices (GOS) and each GOS is encoded and decoded as a separate unit. The symmetric tree definition of the original 3DSPIHT is improved by introducing a new asymmetric tree structure. While preserving the compression efficiency, the new tree structure allows for a small size of each GOS, which not only reduces memory consumption during the encoding and decoding processes, but also facilitates more efficient random access to certain segments of slices. To achieve more compression efficiency, the algorithm only encodes the main object of interest in each 3D data set, which can have any arbitrary shape, and ignores the unnecessary background. The experimental results on some MR data sets show the good performance of the 3DOBHS-SPIHT algorithm for multi-resolution lossy-to-lossless coding. The compression efficiency, full scalability, and object-based features of the proposed approach, beside its lossy-to-lossless coding support, make it a very attractive candidate for volumetric medical image information archiving and transmission applications. PMID:22606653

  8. A survey of commercial object-oriented database management systems

    NASA Technical Reports Server (NTRS)

    Atkins, John

    1992-01-01

    The object-oriented data model is the culmination of over thirty years of database research. Initially, database research focused on the need to provide information in a consistent and efficient manner to the business community. Early data models such as the hierarchical model and the network model met the goal of consistent and efficient access to data and were substantial improvements over simple file mechanisms for storing and accessing data. However, these models required highly skilled programmers to provide access to the data. Consequently, in the early 70's E.F. Codd, an IBM research computer scientists, proposed a new data model based on the simple mathematical notion of the relation. This model is known as the Relational Model. In the relational model, data is represented in flat tables (or relations) which have no physical or internal links between them. The simplicity of this model fostered the development of powerful but relatively simple query languages that now made data directly accessible to the general database user. Except for large, multi-user database systems, a database professional was in general no longer necessary. Database professionals found that traditional data in the form of character data, dates, and numeric data were easily represented and managed via the relational model. Commercial relational database management systems proliferated and performance of relational databases improved dramatically. However, there was a growing community of potential database users whose needs were not met by the relational model. These users needed to store data with data types not available in the relational model and who required a far richer modelling environment than that provided by the relational model. Indeed, the complexity of the objects to be represented in the model mandated a new approach to database technology. The Object-Oriented Model was the result.

  9. Malware detection and analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chiang, Ken; Lloyd, Levi; Crussell, Jonathan

    Embodiments of the invention describe systems and methods for malicious software detection and analysis. A binary executable comprising obfuscated malware on a host device may be received, and incident data indicating a time when the binary executable was received and identifying processes operating on the host device may be recorded. The binary executable is analyzed via a scalable plurality of execution environments, including one or more non-virtual execution environments and one or more virtual execution environments, to generate runtime data and deobfuscation data attributable to the binary executable. At least some of the runtime data and deobfuscation data attributable tomore » the binary executable is stored in a shared database, while at least some of the incident data is stored in a private, non-shared database.« less

  10. The Data Acquisition System of the Stockholm Educational Air Shower Array

    NASA Astrophysics Data System (ADS)

    Hofverberg, P.; Johansson, H.; Pearce, M.; Rydstrom, S.; Wikstrom, C.

    2005-12-01

    The Stockholm Educational Air Shower Array (SEASA) project is deploying an array of plastic scintillator detector stations on school roofs in the Stockholm area. Signals from GPS satellites are used to time synchronise signals from the widely separated detector stations, allowing cosmic ray air showers to be identified and studied. A low-cost and highly scalable data acquisition system has been produced using embedded Linux processors which communicate station data to a central server running a MySQL database. Air shower data can be visualised in real-time using a Java-applet client. It is also possible to query the database and manage detector stations from the client. In this paper, the design and performance of the system are described

  11. Performance of a Bounce-Averaged Global Model of Super-Thermal Electron Transport in the Earth's Magnetic Field

    NASA Technical Reports Server (NTRS)

    McGuire, Tim

    1998-01-01

    In this paper, we report the results of our recent research on the application of a multiprocessor Cray T916 supercomputer in modeling super-thermal electron transport in the earth's magnetic field. In general, this mathematical model requires numerical solution of a system of partial differential equations. The code we use for this model is moderately vectorized. By using Amdahl's Law for vector processors, it can be verified that the code is about 60% vectorized on a Cray computer. Speedup factors on the order of 2.5 were obtained compared to the unvectorized code. In the following sections, we discuss the methodology of improving the code. In addition to our goal of optimizing the code for solution on the Cray computer, we had the goal of scalability in mind. Scalability combines the concepts of portabilty with near-linear speedup. Specifically, a scalable program is one whose performance is portable across many different architectures with differing numbers of processors for many different problem sizes. Though we have access to a Cray at this time, the goal was to also have code which would run well on a variety of architectures.

  12. Wafer-scalable high-performance CVD graphene devices and analog circuits

    NASA Astrophysics Data System (ADS)

    Tao, Li; Lee, Jongho; Li, Huifeng; Piner, Richard; Ruoff, Rodney; Akinwande, Deji

    2013-03-01

    Graphene field effect transistors (GFETs) will serve as an essential component for functional modules like amplifier and frequency doublers in analog circuits. The performance of these modules is directly related to the mobility of charge carriers in GFETs, which per this study has been greatly improved. Low-field electrostatic measurements show field mobility values up to 12k cm2/Vs at ambient conditions with our newly developed scalable CVD graphene. For both hole and electron transport, fabricated GFETs offer substantial amplification for small and large signals at quasi-static frequencies limited only by external capacitances at high-frequencies. GFETs biased at the peak transconductance point featured high small-signal gain with eventual output power compression similar to conventional transistor amplifiers. GFETs operating around the Dirac voltage afforded positive conversion gain for the first time, to our knowledge, in experimental graphene frequency doublers. This work suggests a realistic prospect for high performance linear and non-linear analog circuits based on the unique electron-hole symmetry and fast transport now accessible in wafer-scalable CVD graphene. *Support from NSF CAREER award (ECCS-1150034) and the W. M. Keck Foundation are appreicated.

  13. Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database.

    PubMed

    Harris, Steve; Shi, Sinan; Brealey, David; MacCallum, Niall S; Denaxas, Spiros; Perez-Suarez, David; Ercole, Ari; Watkinson, Peter; Jones, Andrew; Ashworth, Simon; Beale, Richard; Young, Duncan; Brett, Stephen; Singer, Mervyn

    2018-04-01

    To build and curate a linkable multi-centre database of high resolution longitudinal electronic health records (EHR) from adult Intensive Care Units (ICU). To develop a set of open-source tools to make these data 'research ready' while protecting patient's privacy with a particular focus on anonymisation. We developed a scalable EHR processing pipeline for extracting, linking, normalising and curating and anonymising EHR data. Patient and public involvement was sought from the outset, and approval to hold these data was granted by the NHS Health Research Authority's Confidentiality Advisory Group (CAG). The data are held in a certified Data Safe Haven. We followed sustainable software development principles throughout, and defined and populated a common data model that links to other clinical areas. Longitudinal EHR data were loaded into the CCHIC database from eleven adult ICUs at 5 UK teaching hospitals. From January 2014 to January 2017, this amounted to 21,930 and admissions (18,074 unique patients). Typical admissions have 70 data-items pertaining to admission and discharge, and a median of 1030 (IQR 481-2335) time-varying measures. Training datasets were made available through virtual machine images emulating the data processing environment. An open source R package, cleanEHR, was developed and released that transforms the data into a square table readily analysable by most statistical packages. A simple language agnostic configuration file will allow the user to select and clean variables, and impute missing data. An audit trail makes clear the provenance of the data at all times. Making health care data available for research is problematic. CCHIC is a unique multi-centre longitudinal and linkable resource that prioritises patient privacy through the highest standards of data security, but also provides tools to clean, organise, and anonymise the data. We believe the development of such tools are essential if we are to meet the twin requirements of respecting patient privacy and working for patient benefit. The CCHIC database is now in use by health care researchers from academia and industry. The 'research ready' suite of data preparation tools have facilitated access, and linkage to national databases of secondary care is underway. Copyright © 2018 Elsevier B.V. All rights reserved.

  14. 78 FR 48177 - Submission for OMB Review; 30-day Comment Request: National Institute of Mental Health Data...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-08-07

    ... Access Request and Use Certification (previously National Database for Autism Research Data Access... approval for use of the National Database for Autism Research (NDAR) Data Use Certification (DUC) Form...

  15. Cloud computing applications for biomedical science: A perspective.

    PubMed

    Navale, Vivek; Bourne, Philip E

    2018-06-01

    Biomedical research has become a digital data-intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research.

  16. Cloud computing applications for biomedical science: A perspective

    PubMed Central

    2018-01-01

    Biomedical research has become a digital data–intensive endeavor, relying on secure and scalable computing, storage, and network infrastructure, which has traditionally been purchased, supported, and maintained locally. For certain types of biomedical applications, cloud computing has emerged as an alternative to locally maintained traditional computing approaches. Cloud computing offers users pay-as-you-go access to services such as hardware infrastructure, platforms, and software for solving common biomedical computational problems. Cloud computing services offer secure on-demand storage and analysis and are differentiated from traditional high-performance computing by their rapid availability and scalability of services. As such, cloud services are engineered to address big data problems and enhance the likelihood of data and analytics sharing, reproducibility, and reuse. Here, we provide an introductory perspective on cloud computing to help the reader determine its value to their own research. PMID:29902176

  17. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud.

    PubMed

    Niemenmaa, Matti; Kallio, Aleksi; Schumacher, André; Klemelä, Petri; Korpelainen, Eija; Heljanko, Keijo

    2012-03-15

    Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.

  18. Scalable randomized benchmarking of non-Clifford gates

    NASA Astrophysics Data System (ADS)

    Cross, Andrew; Magesan, Easwar; Bishop, Lev; Smolin, John; Gambetta, Jay

    Randomized benchmarking is a widely used experimental technique to characterize the average error of quantum operations. Benchmarking procedures that scale to enable characterization of n-qubit circuits rely on efficient procedures for manipulating those circuits and, as such, have been limited to subgroups of the Clifford group. However, universal quantum computers require additional, non-Clifford gates to approximate arbitrary unitary transformations. We define a scalable randomized benchmarking procedure over n-qubit unitary matrices that correspond to protected non-Clifford gates for a class of stabilizer codes. We present efficient methods for representing and composing group elements, sampling them uniformly, and synthesizing corresponding poly (n) -sized circuits. The procedure provides experimental access to two independent parameters that together characterize the average gate fidelity of a group element. We acknowledge support from ARO under Contract W911NF-14-1-0124.

  19. Interactive, Automated Management of Icing Data

    NASA Technical Reports Server (NTRS)

    Levinson, Laurie H.

    2009-01-01

    IceVal DatAssistant is software (see figure) that provides an automated, interactive solution for the management of data from research on aircraft icing. This software consists primarily of (1) a relational database component used to store ice shape and airfoil coordinates and associated data on operational and environmental test conditions and (2) a graphically oriented database access utility, used to upload, download, process, and/or display data selected by the user. The relational database component consists of a Microsoft Access 2003 database file with nine tables containing data of different types. Included in the database are the data for all publicly releasable ice tracings with complete and verifiable test conditions from experiments conducted to date in the Glenn Research Center Icing Research Tunnel. Ice shapes from computational simulations with the correspond ing conditions performed utilizing the latest version of the LEWICE ice shape prediction code are likewise included, and are linked to the equivalent experimental runs. The database access component includes ten Microsoft Visual Basic 6.0 (VB) form modules and three VB support modules. Together, these modules enable uploading, downloading, processing, and display of all data contained in the database. This component also affords the capability to perform various database maintenance functions for example, compacting the database or creating a new, fully initialized but empty database file.

  20. The "GeneTrustee": a universal identification system that ensures privacy and confidentiality for human genetic databases.

    PubMed

    Burnett, Leslie; Barlow-Stewart, Kris; Proos, Anné L; Aizenberg, Harry

    2003-05-01

    This article describes a generic model for access to samples and information in human genetic databases. The model utilises a "GeneTrustee", a third-party intermediary independent of the subjects and of the investigators or database custodians. The GeneTrustee model has been implemented successfully in various community genetics screening programs and has facilitated research access to genetic databases while protecting the privacy and confidentiality of research subjects. The GeneTrustee model could also be applied to various types of non-conventional genetic databases, including neonatal screening Guthrie card collections, and to forensic DNA samples.

  1. An Introduction to Database Structure and Database Machines.

    ERIC Educational Resources Information Center

    Detweiler, Karen

    1984-01-01

    Enumerates principal management objectives of database management systems (data independence, quality, security, multiuser access, central control) and criteria for comparison (response time, size, flexibility, other features). Conventional database management systems, relational databases, and database machines used for backend processing are…

  2. New Trends in E-Science: Machine Learning and Knowledge Discovery in Databases

    NASA Astrophysics Data System (ADS)

    Brescia, Massimo

    2012-11-01

    Data mining, or Knowledge Discovery in Databases (KDD), while being the main methodology to extract the scientific information contained in Massive Data Sets (MDS), needs to tackle crucial problems since it has to orchestrate complex challenges posed by transparent access to different computing environments, scalability of algorithms, reusability of resources. To achieve a leap forward for the progress of e-science in the data avalanche era, the community needs to implement an infrastructure capable of performing data access, processing and mining in a distributed but integrated context. The increasing complexity of modern technologies carried out a huge production of data, whose related warehouse management and the need to optimize analysis and mining procedures lead to a change in concept on modern science. Classical data exploration, based on local user own data storage and limited computing infrastructures, is no more efficient in the case of MDS, worldwide spread over inhomogeneous data centres and requiring teraflop processing power. In this context modern experimental and observational science requires a good understanding of computer science, network infrastructures, Data Mining, etc. i.e. of all those techniques which fall into the domain of the so called e-science (recently assessed also by the Fourth Paradigm of Science). Such understanding is almost completely absent in the older generations of scientists and this reflects in the inadequacy of most academic and research programs. A paradigm shift is needed: statistical pattern recognition, object oriented programming, distributed computing, parallel programming need to become an essential part of scientific background. A possible practical solution is to provide the research community with easy-to understand, easy-to-use tools, based on the Web 2.0 technologies and Machine Learning methodology. Tools where almost all the complexity is hidden to the final user, but which are still flexible and able to produce efficient and reliable scientific results. All these considerations will be described in the detail in the chapter. Moreover, examples of modern applications offering to a wide variety of e-science communities a large spectrum of computational facilities to exploit the wealth of available massive data sets and powerful machine learning and statistical algorithms will be also introduced.

  3. Querying XML Data with SPARQL

    NASA Astrophysics Data System (ADS)

    Bikakis, Nikos; Gioldasis, Nektarios; Tsinaraki, Chrisa; Christodoulakis, Stavros

    SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also acquired industrial importance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interoperable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings between ontology constructs and XML Schema constructs have been automatically derived or manually specified. The mappings are used to automatically translate SPARQL queries to semantically equivalent XQuery queries which are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

  4. Second-Tier Database for Ecosystem Focus, 2003-2004 Annual Report.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    University of Washington, Columbia Basin Research, DART Project Staff,

    2004-12-01

    The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities essential to sound operational and resource management. The database also assists with juvenile and adult mainstem passage modeling supporting federal decisions affecting the operation of the FCRPS. The Second-Tier Database known as Data Access in Real Time (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures for evaluating the condition of Columbia Basin salmonid stocks. These services are critical tomore » BPA's implementation of its fish and wildlife responsibilities under the Endangered Species Act (ESA).« less

  5. Scalable Trust of Next-Generation Management (STRONGMAN)

    DTIC Science & Technology

    2004-10-01

    remote logins might be policy controlled to allow only strongly encrypted IPSec tunnels to log in remotely, to access selected files, etc. The...and Angelos D. Keromytis. Drop-in Security for Distributed and Portable Computing Elements. Emerald Journal of Internet Research. Electronic...Security and Privacy, pp. 17-31, May 1999. [2] S. M. Bellovin. Distributed Firewalls. ; login : magazine, special issue on security, November 1999. [3] M

  6. Information Security: Governmentwide Guidance Needed to Assist Agencies in Implementing Cloud Computing

    DTIC Science & Technology

    2010-07-01

    Cloud computing , an emerging form of computing in which users have access to scalable, on-demand capabilities that are provided through Internet... cloud computing , (2) the information security implications of using cloud computing services in the Federal Government, and (3) federal guidance and...efforts to address information security when using cloud computing . The complete report is titled Information Security: Federal Guidance Needed to

  7. Nanotechnology: A Policy Primer

    DTIC Science & Technology

    2008-05-20

    of oil.3 ! Universal access to clean water. Nanotechnology water desalination and filtration systems may offer affordable, scalable, and portable...Order Code RL34511 Nanotechnology : A Policy Primer May 20, 2008 John F. Sargent Specialist in Science and Technology Policy Resources, Science, and...REPORT DATE 20 MAY 2008 2. REPORT TYPE 3. DATES COVERED 00-00-2008 to 00-00-2008 4. TITLE AND SUBTITLE Nanotechnology : A Policy Primer 5a

  8. Comparative-effectiveness research in distributed health data networks.

    PubMed

    Toh, S; Platt, R; Steiner, J F; Brown, J S

    2011-12-01

    Comparative-effectiveness research (CER) can be conducted within a distributed health data network. Such networks allow secure access to separate data sets from different data partners and overcome many practical obstacles related to patient privacy, data security, and proprietary concerns. A scalable network architecture supports a wide range of CER activities and meets the data infrastructure needs envisioned by the Federal Coordinating Council for Comparative Effectiveness Research.

  9. Evaluation of NoSQL databases for DIRAC monitoring and beyond

    NASA Astrophysics Data System (ADS)

    Mathe, Z.; Casajus Ramo, A.; Stagni, F.; Tomassetti, L.

    2015-12-01

    Nowadays, many database systems are available but they may not be optimized for storing time series data. Monitoring DIRAC jobs would be better done using a database optimised for storing time series data. So far it was done using a MySQL database, which is not well suited for such an application. Therefore alternatives have been investigated. Choosing an appropriate database for storing huge amounts of time series data is not trivial as one must take into account different aspects such as manageability, scalability and extensibility. We compared the performance of Elasticsearch, OpenTSDB (based on HBase) and InfluxDB NoSQL databases, using the same set of machines and the same data. We also evaluated the effort required for maintaining them. Using the LHCb Workload Management System (WMS), based on DIRAC as a use case we set up a new monitoring system, in parallel with the current MySQL system, and we stored the same data into the databases under test. We evaluated Grafana (for OpenTSDB) and Kibana (for ElasticSearch) metrics and graph editors for creating dashboards, in order to have a clear picture on the usability of each candidate. In this paper we present the results of this study and the performance of the selected technology. We also give an outlook of other potential applications of NoSQL databases within the DIRAC project.

  10. A Database as a Service for the Healthcare System to Store Physiological Signal Data.

    PubMed

    Chang, Hsien-Tsung; Lin, Tsai-Huei

    2016-01-01

    Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.

  11. A Database as a Service for the Healthcare System to Store Physiological Signal Data

    PubMed Central

    Lin, Tsai-Huei

    2016-01-01

    Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415

  12. Comparison of Online Agricultural Information Services.

    ERIC Educational Resources Information Center

    Reneau, Fred; Patterson, Richard

    1984-01-01

    Outlines major online agricultural information services--agricultural databases, databases with agricultural services, educational databases in agriculture--noting services provided, access to the database, and costs. Benefits of online agricultural database sources (availability of agricultural marketing, weather, commodity prices, management…

  13. Enhanced DIII-D Data Management Through a Relational Database

    NASA Astrophysics Data System (ADS)

    Burruss, J. R.; Peng, Q.; Schachter, J.; Schissel, D. P.; Terpstra, T. B.

    2000-10-01

    A relational database is being used to serve data about DIII-D experiments. The database is optimized for queries across multiple shots, allowing for rapid data mining by SQL-literate researchers. The relational database relates different experiments and datasets, thus providing a big picture of DIII-D operations. Users are encouraged to add their own tables to the database. Summary physics quantities about DIII-D discharges are collected and stored in the database automatically. Meta-data about code runs, MDSplus usage, and visualization tool usage are collected, stored in the database, and later analyzed to improve computing. Documentation on the database may be accessed through programming languages such as C, Java, and IDL, or through ODBC compliant applications such as Excel and Access. A database-driven web page also provides a convenient means for viewing database quantities through the World Wide Web. Demonstrations will be given at the poster.

  14. NATIONAL URBAN DATABASE AND ACCESS PROTAL TOOL

    EPA Science Inventory

    Current mesoscale weather prediction and microscale dispersion models are limited in their ability to perform accurate assessments in urban areas. A project called the National Urban Database with Access Portal Tool (NUDAPT) is beginning to provide urban data and improve the para...

  15. VO Access to BASECOL Database

    NASA Astrophysics Data System (ADS)

    Moreau, N.; Dubernet, M. L.

    2006-07-01

    Basecol is a combination of a website (using PHP and HTML) and a MySQL database concerning molecular ro-vibrational transitions induced by collisions with atoms or molecules. This database has been created in view of the scientific preparation of the Heterodyne Instrument for the Far-Infrared on board the Herschel Space Observatory (HSO). Basecol offers an access to numerical and bibliographic data through various output methods such as ASCII, HTML or VOTable (which is a first step towards a VO compliant system). A web service using Apache Axis has been developed in order to provide a direct access to data for external applications.

  16. Designing a Multi-Petabyte Database for LSST

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Becla, Jacek; Hanushevsky, Andrew; Nikolaev, Sergei

    2007-01-10

    The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are beingmore » evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.« less

  17. Database Reports Over the Internet

    NASA Technical Reports Server (NTRS)

    Smith, Dean Lance

    2002-01-01

    Most of the summer was spent developing software that would permit existing test report forms to be printed over the web on a printer that is supported by Adobe Acrobat Reader. The data is stored in a DBMS (Data Base Management System). The client asks for the information from the database using an HTML (Hyper Text Markup Language) form in a web browser. JavaScript is used with the forms to assist the user and verify the integrity of the entered data. Queries to a database are made in SQL (Sequential Query Language), a widely supported standard for making queries to databases. Java servlets, programs written in the Java programming language running under the control of network server software, interrogate the database and complete a PDF form template kept in a file. The completed report is sent to the browser requesting the report. Some errors are sent to the browser in an HTML web page, others are reported to the server. Access to the databases was restricted since the data are being transported to new DBMS software that will run on new hardware. However, the SQL queries were made to Microsoft Access, a DBMS that is available on most PCs (Personal Computers). Access does support the SQL commands that were used, and a database was created with Access that contained typical data for the report forms. Some of the problems and features are discussed below.

  18. For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases.

    PubMed

    Liljekvist, Mads Svane; Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

    2015-01-01

    Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases' criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ's coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general.

  19. Alternative treatment technology information center computer database system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sullivan, D.

    1995-10-01

    The Alternative Treatment Technology Information Center (ATTIC) computer database system was developed pursuant to the 1986 Superfund law amendments. It provides up-to-date information on innovative treatment technologies to clean up hazardous waste sites. ATTIC v2.0 provides access to several independent databases as well as a mechanism for retrieving full-text documents of key literature. It can be accessed with a personal computer and modem 24 hours a day, and there are no user fees. ATTIC provides {open_quotes}one-stop shopping{close_quotes} for information on alternative treatment options by accessing several databases: (1) treatment technology database; this contains abstracts from the literature on all typesmore » of treatment technologies, including biological, chemical, physical, and thermal methods. The best literature as viewed by experts is highlighted. (2) treatability study database; this provides performance information on technologies to remove contaminants from wastewaters and soils. It is derived from treatability studies. This database is available through ATTIC or separately as a disk that can be mailed to you. (3) underground storage tank database; this presents information on underground storage tank corrective actions, surface spills, emergency response, and remedial actions. (4) oil/chemical spill database; this provides abstracts on treatment and disposal of spilled oil and chemicals. In addition to these separate databases, ATTIC allows immediate access to other disk-based systems such as the Vendor Information System for Innovative Treatment Technologies (VISITT) and the Bioremediation in the Field Search System (BFSS). The user may download these programs to their own PC via a high-speed modem. Also via modem, users are able to download entire documents through the ATTIC system. Currently, about fifty publications are available, including Superfund Innovative Technology Evaluation (SITE) program documents.« less

  20. Real-Time Station Grouping under Dynamic Traffic for IEEE 802.11ah

    PubMed Central

    Tian, Le; Latré, Steven

    2017-01-01

    IEEE 802.11ah, marketed as Wi-Fi HaLow, extends Wi-Fi to the sub-1 GHz spectrum. Through a number of physical layer (PHY) and media access control (MAC) optimizations, it aims to bring greatly increased range, energy-efficiency, and scalability. This makes 802.11ah the perfect candidate for providing connectivity to Internet of Things (IoT) devices. One of these new features, referred to as the Restricted Access Window (RAW), focuses on improving scalability in highly dense deployments. RAW divides stations into groups and reduces contention and collisions by only allowing channel access to one group at a time. However, the standard does not dictate how to determine the optimal RAW grouping parameters. The optimal parameters depend on the current network conditions, and it has been shown that incorrect configuration severely impacts throughput, latency and energy efficiency. In this paper, we propose a traffic-adaptive RAW optimization algorithm (TAROA) to adapt the RAW parameters in real time based on the current traffic conditions, optimized for sensor networks in which each sensor transmits packets with a certain (predictable) frequency and may change the transmission frequency over time. The TAROA algorithm is executed at each target beacon transmission time (TBTT), and it first estimates the packet transmission interval of each station only based on packet transmission information obtained by access point (AP) during the last beacon interval. Then, TAROA determines the RAW parameters and assigns stations to RAW slots based on this estimated transmission frequency. The simulation results show that, compared to enhanced distributed channel access/distributed coordination function (EDCA/DCF), the TAROA algorithm can highly improve the performance of IEEE 802.11ah dense networks in terms of throughput, especially when hidden nodes exist, although it does not always achieve better latency performance. This paper contributes with a practical approach to optimizing RAW grouping under dynamic traffic in real time, which is a major leap towards applying RAW mechanism in real-life IoT networks. PMID:28677617

  1. Real-Time Station Grouping under Dynamic Traffic for IEEE 802.11ah.

    PubMed

    Tian, Le; Khorov, Evgeny; Latré, Steven; Famaey, Jeroen

    2017-07-04

    IEEE 802.11ah, marketed as Wi-Fi HaLow, extends Wi-Fi to the sub-1 GHz spectrum. Through a number of physical layer (PHY) and media access control (MAC) optimizations, it aims to bring greatly increased range, energy-efficiency, and scalability. This makes 802.11ah the perfect candidate for providing connectivity to Internet of Things (IoT) devices. One of these new features, referred to as the Restricted Access Window (RAW), focuses on improving scalability in highly dense deployments. RAW divides stations into groups and reduces contention and collisions by only allowing channel access to one group at a time. However, the standard does not dictate how to determine the optimal RAW grouping parameters. The optimal parameters depend on the current network conditions, and it has been shown that incorrect configuration severely impacts throughput, latency and energy efficiency. In this paper, we propose a traffic-adaptive RAW optimization algorithm (TAROA) to adapt the RAW parameters in real time based on the current traffic conditions, optimized for sensor networks in which each sensor transmits packets with a certain (predictable) frequency and may change the transmission frequency over time. The TAROA algorithm is executed at each target beacon transmission time (TBTT), and it first estimates the packet transmission interval of each station only based on packet transmission information obtained by access point (AP) during the last beacon interval. Then, TAROA determines the RAW parameters and assigns stations to RAW slots based on this estimated transmission frequency. The simulation results show that, compared to enhanced distributed channel access/distributed coordination function (EDCA/DCF), the TAROA algorithm can highly improve the performance of IEEE 802.11ah dense networks in terms of throughput, especially when hidden nodes exist, although it does not always achieve better latency performance. This paper contributes with a practical approach to optimizing RAW grouping under dynamic traffic in real time, which is a major leap towards applying RAW mechanism in real-life IoT networks.

  2. PATIKA: an integrated visual environment for collaborative construction and analysis of cellular pathways.

    PubMed

    Demir, E; Babur, O; Dogrusoz, U; Gursoy, A; Nisanci, G; Cetin-Atalay, R; Ozturk, M

    2002-07-01

    Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about cellular processes at molecular level will accumulate with an accelerating rate as a result of proteomics studies. In this regard, it is essential to develop tools for storing, integrating, accessing, and analyzing this data effectively. We define an ontology for a comprehensive representation of cellular events. The ontology presented here enables integration of fragmented or incomplete pathway information and supports manipulation and incorporation of the stored data, as well as multiple levels of abstraction. Based on this ontology, we present the architecture of an integrated environment named Patika (Pathway Analysis Tool for Integration and Knowledge Acquisition). Patika is composed of a server-side, scalable, object-oriented database and client-side editors to provide an integrated, multi-user environment for visualizing and manipulating network of cellular events. This tool features automated pathway layout, functional computation support, advanced querying and a user-friendly graphical interface. We expect that Patika will be a valuable tool for rapid knowledge acquisition, microarray generated large-scale data interpretation, disease gene identification, and drug development. A prototype of Patika is available upon request from the authors.

  3. A cloud-based information repository for bridge monitoring applications

    NASA Astrophysics Data System (ADS)

    Jeong, Seongwoon; Zhang, Yilan; Hou, Rui; Lynch, Jerome P.; Sohn, Hoon; Law, Kincho H.

    2016-04-01

    This paper describes an information repository to support bridge monitoring applications on a cloud computing platform. Bridge monitoring, with instrumentation of sensors in particular, collects significant amount of data. In addition to sensor data, a wide variety of information such as bridge geometry, analysis model and sensor description need to be stored. Data management plays an important role to facilitate data utilization and data sharing. While bridge information modeling (BrIM) technologies and standards have been proposed and they provide a means to enable integration and facilitate interoperability, current BrIM standards support mostly the information about bridge geometry. In this study, we extend the BrIM schema to include analysis models and sensor information. Specifically, using the OpenBrIM standards as the base, we draw on CSI Bridge, a commercial software widely used for bridge analysis and design, and SensorML, a standard schema for sensor definition, to define the data entities necessary for bridge monitoring applications. NoSQL database systems are employed for data repository. Cloud service infrastructure is deployed to enhance scalability, flexibility and accessibility of the data management system. The data model and systems are tested using the bridge model and the sensor data collected at the Telegraph Road Bridge, Monroe, Michigan.

  4. Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

    PubMed

    Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

    2016-08-08

    Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.

  5. The Protein Disease Database of human body fluids: II. Computer methods and data issues.

    PubMed

    Lemkin, P F; Orr, G A; Goldstein, M P; Creed, G J; Myrick, J E; Merril, C R

    1995-01-01

    The Protein Disease Database (PDD) is a relational database of proteins and diseases. With this database it is possible to screen for quantitative protein abnormalities associated with disease states. These quantitative relationships use data drawn from the peer-reviewed biomedical literature. Assays may also include those observed in high-resolution electrophoretic gels that offer the potential to quantitate many proteins in a single test as well as data gathered by enzymatic or immunologic assays. We are using the Internet World Wide Web (WWW) and the Web browser paradigm as an access method for wide distribution and querying of the Protein Disease Database. The WWW hypertext transfer protocol and its Common Gateway Interface make it possible to build powerful graphical user interfaces that can support easy-to-use data retrieval using query specification forms or images. The details of these interactions are totally transparent to the users of these forms. Using a client-server SQL relational database, user query access, initial data entry and database maintenance are all performed over the Internet with a Web browser. We discuss the underlying design issues, mapping mechanisms and assumptions that we used in constructing the system, data entry, access to the database server, security, and synthesis of derived two-dimensional gel image maps and hypertext documents resulting from SQL database searches.

  6. KOJAK Group Finder: Scalable Group Detection via Integrated Knowledge-Based and Statistical Reasoning

    DTIC Science & Technology

    2006-09-01

    STELLA and PowerLoomn. These modules comunicate with a knowledge basec using KIF and stan(lardl relational database systelnis using either standard...groups ontology as well as a rule that infers additional seed members based on joint participation in a terrorism event. EDB schema files are a special... terrorism links from the Ali Baba EDB. Our interpretation of such links is that they KOJAK Manual E-42 encode that two people committed an act of

  7. A Spatially-Registered, Massively Parallelised Data Structure for Interacting with Large, Integrated Geodatasets

    NASA Astrophysics Data System (ADS)

    Irving, D. H.; Rasheed, M.; O'Doherty, N.

    2010-12-01

    The efficient storage, retrieval and interactive use of subsurface data present great challenges in geodata management. Data volumes are typically massive, complex and poorly indexed with inadequate metadata. Derived geomodels and interpretations are often tightly bound in application-centric and proprietary formats; open standards for long-term stewardship are poorly developed. Consequently current data storage is a combination of: complex Logical Data Models (LDMs) based on file storage formats; 2D GIS tree-based indexing of spatial data; and translations of serialised memory-based storage techniques into disk-based storage. Whilst adequate for working at the mesoscale over a short timeframes, these approaches all possess technical and operational shortcomings: data model complexity; anisotropy of access; scalability to large and complex datasets; and weak implementation and integration of metadata. High performance hardware such as parallelised storage and Relational Database Management System (RDBMS) have long been exploited in many solutions but the underlying data structure must provide commensurate efficiencies to allow multi-user, multi-application and near-realtime data interaction. We present an open Spatially-Registered Data Structure (SRDS) built on Massively Parallel Processing (MPP) database architecture implemented by a ANSI SQL 2008 compliant RDBMS. We propose a LDM comprising a 3D Earth model that is decomposed such that each increasing Level of Detail (LoD) is achieved by recursively halving the bin size until it is less than the error in each spatial dimension for that data point. The value of an attribute at that point is stored as a property of that point and at that LoD. It is key to the numerical efficiency of the SRDS that it is under-pinned by a power-of-two relationship thus precluding the need for computationally intensive floating point arithmetic. Our approach employed a tightly clustered MPP array with small clusters of storage, processors and memory communicating over a high-speed network inter-connect. This is a shared-nothing architecture where resources are managed within each cluster unlike most other RDBMSs. Data are accessed on this architecture by their primary index values which utilises the hashing algorithm for point-to-point access. The hashing algorithm’s main role is the efficient distribution of data across the clusters based on the primary index. In this study we used 3D seismic volumes, 2D seismic profiles and borehole logs to demonstrate application in both (x,y,TWT) and (x,y,z)-space. In the SRDS the primary index is a composite column index of (x,y) to avoid invoking time-consuming full table scans as is the case in tree-based systems. This means that data access is isotropic. A query for data in a specified spatial range permits retrieval recursively by point-to-point queries within each nested LoD yielding true linear performance up to the Petabyte scale with hardware scaling presenting the primary limiting factor. Our architecture and LDM promotes: realtime interaction with massive data volumes; streaming of result sets and server-rendered 2D/3D imagery; rigorous workflow control and auditing; and in-database algorithms run directly against data as a HPC cloud service.

  8. libChEBI: an API for accessing the ChEBI database.

    PubMed

    Swainston, Neil; Hastings, Janna; Dekker, Adriano; Muthukrishnan, Venkatesh; May, John; Steinbeck, Christoph; Mendes, Pedro

    2016-01-01

    ChEBI is a database and ontology of chemical entities of biological interest. It is widely used as a source of identifiers to facilitate unambiguous reference to chemical entities within biological models, databases, ontologies and literature. ChEBI contains a wealth of chemical data, covering over 46,500 distinct chemical entities, and related data such as chemical formula, charge, molecular mass, structure, synonyms and links to external databases. Furthermore, ChEBI is an ontology, and thus provides meaningful links between chemical entities. Unlike many other resources, ChEBI is fully human-curated, providing a reliable, non-redundant collection of chemical entities and related data. While ChEBI is supported by a web service for programmatic access and a number of download files, it does not have an API library to facilitate the use of ChEBI and its data in cheminformatics software. To provide this missing functionality, libChEBI, a comprehensive API library for accessing ChEBI data, is introduced. libChEBI is available in Java, Python and MATLAB versions from http://github.com/libChEBI, and provides full programmatic access to all data held within the ChEBI database through a simple and documented API. libChEBI is reliant upon the (automated) download and regular update of flat files that are held locally. As such, libChEBI can be embedded in both on- and off-line software applications. libChEBI allows better support of ChEBI and its data in the development of new cheminformatics software. Covering three key programming languages, it allows for the entirety of the ChEBI database to be accessed easily and quickly through a simple API. All code is open access and freely available.

  9. Ferroelectric tunneling element and memory applications which utilize the tunneling element

    DOEpatents

    Kalinin, Sergei V [Knoxville, TN; Christen, Hans M [Knoxville, TN; Baddorf, Arthur P [Knoxville, TN; Meunier, Vincent [Knoxville, TN; Lee, Ho Nyung [Oak Ridge, TN

    2010-07-20

    A tunneling element includes a thin film layer of ferroelectric material and a pair of dissimilar electrically-conductive layers disposed on opposite sides of the ferroelectric layer. Because of the dissimilarity in composition or construction between the electrically-conductive layers, the electron transport behavior of the electrically-conductive layers is polarization dependent when the tunneling element is below the Curie temperature of the layer of ferroelectric material. The element can be used as a basis of compact 1R type non-volatile random access memory (RAM). The advantages include extremely simple architecture, ultimate scalability and fast access times generic for all ferroelectric memories.

  10. Work Coordination Engine

    NASA Technical Reports Server (NTRS)

    Zendejas, Silvino; Bui, Tung; Bui, Bach; Malhotra, Shantanu; Chen, Fannie; Kim, Rachel; Allen, Christopher; Luong, Ivy; Chang, George; Sadaqathulla, Syed

    2009-01-01

    The Work Coordination Engine (WCE) is a Java application integrated into the Service Management Database (SMDB), which coordinates the dispatching and monitoring of a work order system. WCE de-queues work orders from SMDB and orchestrates the dispatching of work to a registered set of software worker applications distributed over a set of local, or remote, heterogeneous computing systems. WCE monitors the execution of work orders once dispatched, and accepts the results of the work order by storing to the SMDB persistent store. The software leverages the use of a relational database, Java Messaging System (JMS), and Web Services using Simple Object Access Protocol (SOAP) technologies to implement an efficient work-order dispatching mechanism capable of coordinating the work of multiple computer servers on various platforms working concurrently on different, or similar, types of data or algorithmic processing. Existing (legacy) applications can be wrapped with a proxy object so that no changes to the application are needed to make them available for integration into the work order system as "workers." WCE automatically reschedules work orders that fail to be executed by one server to a different server if available. From initiation to completion, the system manages the execution state of work orders and workers via a well-defined set of events, states, and actions. It allows for configurable work-order execution timeouts by work-order type. This innovation eliminates a current processing bottleneck by providing a highly scalable, distributed work-order system used to quickly generate products needed by the Deep Space Network (DSN) to support space flight operations. WCE is driven by asynchronous messages delivered via JMS indicating the availability of new work or workers. It runs completely unattended in support of the lights-out operations concept in the DSN.

  11. Identification of Leishmania by Matrix-Assisted Laser Desorption Ionization-Time of Flight (MALDI-TOF) Mass Spectrometry Using a Free Web-Based Application and a Dedicated Mass-Spectral Library.

    PubMed

    Lachaud, Laurence; Fernández-Arévalo, Anna; Normand, Anne-Cécile; Lami, Patrick; Nabet, Cécile; Donnadieu, Jean Luc; Piarroux, Martine; Djenad, Farid; Cassagne, Carole; Ravel, Christophe; Tebar, Silvia; Llovet, Teresa; Blanchet, Denis; Demar, Magalie; Harrat, Zoubir; Aoun, Karim; Bastien, Patrick; Muñoz, Carmen; Gállego, Montserrat; Piarroux, Renaud

    2017-10-01

    Human leishmaniases are widespread diseases with different clinical forms caused by about 20 species within the Leishmania genus. Leishmania species identification is relevant for therapeutic management and prognosis, especially for cutaneous and mucocutaneous forms. Several methods are available to identify Leishmania species from culture, but they have not been standardized for the majority of the currently described species, with the exception of multilocus enzyme electrophoresis. Moreover, these techniques are expensive, time-consuming, and not available in all laboratories. Within the last decade, mass spectrometry (MS) has been adapted for the identification of microorganisms, including Leishmania However, no commercial reference mass-spectral database is available. In this study, a reference mass-spectral library (MSL) for Leishmania isolates, accessible through a free Web-based application (mass-spectral identification [MSI]), was constructed and tested. It includes mass-spectral data for 33 different Leishmania species, including species that infect humans, animals, and phlebotomine vectors. Four laboratories on two continents evaluated the performance of MSI using 268 samples, 231 of which were Leishmania strains. All Leishmania strains, but one, were correctly identified at least to the complex level. A risk of species misidentification within the Leishmania donovani , L. guyanensis , and L. braziliensis complexes was observed, as previously reported for other techniques. The tested application was reliable, with identification results being comparable to those obtained with reference methods but with a more favorable cost-efficiency ratio. This free online identification system relies on a scalable database and can be implemented directly in users' computers. Copyright © 2017 American Society for Microbiology.

  12. Glocal clinical registries: pacemaker registry design and implementation for global and local integration--methodology and case study.

    PubMed

    da Silva, Kátia Regina; Costa, Roberto; Crevelari, Elizabeth Sartori; Lacerda, Marianna Sobral; de Moraes Albertini, Caio Marcos; Filho, Martino Martinelli; Santana, José Eduardo; Vissoci, João Ricardo Nickenig; Pietrobon, Ricardo; Barros, Jacson V

    2013-01-01

    The ability to apply standard and interoperable solutions for implementing and managing medical registries as well as aggregate, reproduce, and access data sets from legacy formats and platforms to advanced standard formats and operating systems are crucial for both clinical healthcare and biomedical research settings. Our study describes a reproducible, highly scalable, standard framework for a device registry implementation addressing both local data quality components and global linking problems. We developed a device registry framework involving the following steps: (1) Data standards definition and representation of the research workflow, (2) Development of electronic case report forms using REDCap (Research Electronic Data Capture), (3) Data collection according to the clinical research workflow and, (4) Data augmentation by enriching the registry database with local electronic health records, governmental database and linked open data collections, (5) Data quality control and (6) Data dissemination through the registry Web site. Our registry adopted all applicable standardized data elements proposed by American College Cardiology / American Heart Association Clinical Data Standards, as well as variables derived from cardiac devices randomized trials and Clinical Data Interchange Standards Consortium. Local interoperability was performed between REDCap and data derived from Electronic Health Record system. The original data set was also augmented by incorporating the reimbursed values paid by the Brazilian government during a hospitalization for pacemaker implantation. By linking our registry to the open data collection repository Linked Clinical Trials (LinkedCT) we found 130 clinical trials which are potentially correlated with our pacemaker registry. This study demonstrates how standard and reproducible solutions can be applied in the implementation of medical registries to constitute a re-usable framework. Such approach has the potential to facilitate data integration between healthcare and research settings, also being a useful framework to be used in other biomedical registries.

  13. Canada in 3D - Toward a Sustainable 3D Model for Canadian Geology from Diverse Data Sources

    NASA Astrophysics Data System (ADS)

    Brodaric, B.; Pilkington, M.; Snyder, D. B.; St-Onge, M. R.; Russell, H.

    2015-12-01

    Many big science issues span large areas and require data from multiple heterogeneous sources, for example climate change, resource management, and hazard mitigation. Solutions to these issues can significantly benefit from access to a consistent and integrated geological model that would serve as a framework. However, such a model is absent for most large countries including Canada, due to the size of the landmass and the fragmentation of the source data into institutional and disciplinary silos. To overcome these barriers, the "Canada in 3D" (C3D) pilot project was recently launched by the Geological Survey of Canada. C3D is designed to be evergreen, multi-resolution, and inter-disciplinary: (a) it is to be updated regularly upon acquisition of new data; (b) portions vary in resolution and will initially consist of four layers (surficial, sedimentary, crystalline, and mantle) with intermediary patches of higher-resolution fill; and (c) a variety of independently managed data sources are providing inputs, such as geophysical, 3D and 2D geological models, drill logs, and others. Notably, scalability concerns dictate a decentralized and interoperable approach, such that only key control objects, denoting anchors for the modeling process, are imported into the C3D database while retaining provenance links to original sources. The resultant model is managed in the database, contains full modeling provenance as well as links to detailed information on rock units, and is to be visualized in desktop and online environments. It is anticipated that C3D will become the authoritative state of knowledge for the geology of Canada at a national scale.

  14. Surviving the Glut: The Management of Event Streams in Cyberphysical Systems

    NASA Astrophysics Data System (ADS)

    Buchmann, Alejandro

    Alejandro Buchmann is Professor in the Department of Computer Science, Technische Universität Darmstadt, where he heads the Databases and Distributed Systems Group. He received his MS (1977) and PhD (1980) from the University of Texas at Austin. He was an Assistant/Associate Professor at the Institute for Applied Mathematics and Systems IIMAS/UNAM in Mexico, doing research on databases for CAD, geographic information systems, and objectoriented databases. At Computer Corporation of America (later Xerox Advanced Information Systems) in Cambridge, Mass., he worked in the areas of active databases and real-time databases, and at GTE Laboratories, Waltham, in the areas of distributed object systems and the integration of heterogeneous legacy systems. 1991 he returned to academia and joined T.U. Darmstadt. His current research interests are at the intersection of middleware, databases, eventbased distributed systems, ubiquitous computing, and very large distributed systems (P2P, WSN). Much of the current research is concerned with guaranteeing quality of service and reliability properties in these systems, for example, scalability, performance, transactional behaviour, consistency, and end-to-end security. Many research projects imply collaboration with industry and cover a broad spectrum of application domains. Further information can be found at http://www.dvs.tu-darmstadt.de

  15. The Research Tools of the Virtual Astronomical Observatory

    NASA Astrophysics Data System (ADS)

    Hanisch, Robert J.; Berriman, G. B.; Lazio, T. J.; Project, VAO

    2013-01-01

    Astronomy is being transformed by the vast quantities of data, models, and simulations that are becoming available to astronomers at an ever-accelerating rate. The U.S. Virtual Astronomical Observatory (VAO) has been funded to provide an operational facility that is intended to be a resource for discovery and access of data, and to provide science services that use these data. Over the course of the past year, the VAO has been developing and releasing for community use five science tools: 1) "Iris", for dynamically building and analyzing spectral energy distributions, 2) a web-based data discovery tool that allows astronomers to identify and retrieve catalog, image, and spectral data on sources of interest, 3) a scalable cross-comparison service that allows astronomers to conduct pair-wise positional matches between very large catalogs stored remotely as well as between remote and local catalogs, 4) time series tools that allow astronomers to compute periodograms of the public data held at the NASA Star and Exoplanet Database (NStED) and the Harvard Time Series Center, and 5) A VO-aware release of the Image Reduction and Analysis Facility (IRAF) that provides transparent access to VO-available data collections and is SAMP-enabled, so that IRAF users can easily use tools such as Aladin and Topcat in conjuction with IRAF tasks. Additional VAO services will be built to make it easy for researchers to provide access to their data in VO-compliant ways, to build VO-enabled custom applications in Python, and to respond generally to the growing size and complexity of astronomy data. Acknowledgements: The Virtual Astronomical Observatory (VAO) is managed by the VAO, LLC, a non-profit company established as a partnership of the Associated Universities, Inc. and the Association of Universities for Research in Astronomy, Inc. The VAO is sponsored by the National Science Foundation and the National Aeronautics and Space Administration.

  16. Open-access databases as unprecedented resources and drivers of cultural change in fisheries science

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McManamay, Ryan A; Utz, Ryan

    2014-01-01

    Open-access databases with utility in fisheries science have grown exponentially in quantity and scope over the past decade, with profound impacts to our discipline. The management, distillation, and sharing of an exponentially growing stream of open-access data represents several fundamental challenges in fisheries science. Many of the currently available open-access resources may not be universally known among fisheries scientists. We therefore introduce many national- and global-scale open-access databases with applications in fisheries science and provide an example of how they can be harnessed to perform valuable analyses without additional field efforts. We also discuss how the development, maintenance, and utilizationmore » of open-access data are likely to pose technical, financial, and educational challenges to fisheries scientists. Such cultural implications that will coincide with the rapidly increasing availability of free data should compel the American Fisheries Society to actively address these problems now to help ease the forthcoming cultural transition.« less

  17. SORTEZ: a relational translator for NCBI's ASN.1 database.

    PubMed

    Hart, K W; Searls, D B; Overton, G C

    1994-07-01

    The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.

  18. [Status of libraries and databases for natural products at abroad].

    PubMed

    Zhao, Li-Mei; Tan, Ning-Hua

    2015-01-01

    For natural products are one of the important sources for drug discovery, libraries and databases of natural products are significant for the development and research of natural products. At present, most of compound libraries at abroad are synthetic or combinatorial synthetic molecules, resulting to access natural products difficult; for information of natural products are scattered with different standards, it is difficult to construct convenient, comprehensive and large-scale databases for natural products. This paper reviewed the status of current accessing libraries and databases for natural products at abroad and provided some important information for the development of libraries and database for natural products.

  19. FRED, a Front End for Databases.

    ERIC Educational Resources Information Center

    Crystal, Maurice I.; Jakobson, Gabriel E.

    1982-01-01

    FRED (a Front End for Databases) was conceived to alleviate data access difficulties posed by the heterogeneous nature of online databases. A hardware/software layer interposed between users and databases, it consists of three subsystems: user-interface, database-interface, and knowledge base. Architectural alternatives for this database machine…

  20. Entomopathogen ID: a curated sequence resource for entomopathogenic fungi

    USDA-ARS?s Scientific Manuscript database

    We report the development of a publicly accessible, curated database of Hypocrealean entomopathogenic fungi sequence data. The goal is to provide a platform for users to easily access sequence data from reference strains. The database can be used to accurately identify unknown entomopathogenic fungi...

  1. Universal Index System

    NASA Technical Reports Server (NTRS)

    Kelley, Steve; Roussopoulos, Nick; Sellis, Timos; Wallace, Sarah

    1993-01-01

    The Universal Index System (UIS) is an index management system that uses a uniform interface to solve the heterogeneity problem among database management systems. UIS provides an easy-to-use common interface to access all underlying data, but also allows different underlying database management systems, storage representations, and access methods.

  2. Village Green Project: Web-accessible Database

    EPA Science Inventory

    The purpose of this web-accessible database is for the public to be able to view instantaneous readings from a solar-powered air monitoring station located in a public location (prototype pilot test is outside of a library in Durham County, NC). The data are wirelessly transmitte...

  3. 47 CFR 64.623 - Administrator requirements.

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... administrator of the TRS User Registration Database, the administrator of the VRS Access Technology Reference... parties with a vested interest in the outcome of TRS-related numbering administration and activities. (4) None of the administrator of the TRS User Registration Database, the administrator of the VRS Access...

  4. 47 CFR 64.623 - Administrator requirements.

    Code of Federal Regulations, 2014 CFR

    2014-10-01

    ... administrator of the TRS User Registration Database, the administrator of the VRS Access Technology Reference... parties with a vested interest in the outcome of TRS-related numbering administration and activities. (4) None of the administrator of the TRS User Registration Database, the administrator of the VRS Access...

  5. Designing a multi-petabyte database for LSST

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Becla, J; Hanushevsky, A

    2005-12-21

    The 3.2 giga-pixel LSST camera will produce over half a petabyte of raw images every month. This data needs to be reduced in under a minute to produce real-time transient alerts, and then cataloged and indexed to allow efficient access and simplify further analysis. The indexed catalogs alone are expected to grow at a speed of about 600 terabytes per year. The sheer volume of data, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require cutting-edge techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on amore » database for catalogs and metadata. Several database systems are being evaluated to understand how they will scale and perform at these data volumes in anticipated LSST access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, and the database architecture that is expected to be adopted in order to meet the data challenges.« less

  6. Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

    PubMed Central

    Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

    2009-01-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578

  7. The BioIntelligence Framework: a new computational platform for biomedical knowledge computing

    PubMed Central

    Farley, Toni; Kiefer, Jeff; Lee, Preston; Von Hoff, Daniel; Trent, Jeffrey M; Colbourn, Charles

    2013-01-01

    Breakthroughs in molecular profiling technologies are enabling a new data-intensive approach to biomedical research, with the potential to revolutionize how we study, manage, and treat complex diseases. The next great challenge for clinical applications of these innovations will be to create scalable computational solutions for intelligently linking complex biomedical patient data to clinically actionable knowledge. Traditional database management systems (DBMS) are not well suited to representing complex syntactic and semantic relationships in unstructured biomedical information, introducing barriers to realizing such solutions. We propose a scalable computational framework for addressing this need, which leverages a hypergraph-based data model and query language that may be better suited for representing complex multi-lateral, multi-scalar, and multi-dimensional relationships. We also discuss how this framework can be used to create rapid learning knowledge base systems to intelligently capture and relate complex patient data to biomedical knowledge in order to automate the recovery of clinically actionable information. PMID:22859646

  8. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

    PubMed

    Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

    2009-06-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).

  9. Assesment of access to bibliographic databases and telemetry databases in Astronomy: A groundswell for development.

    NASA Astrophysics Data System (ADS)

    Diaz-Merced, Wanda Liz; Casado, Johanna; Garcia, Beatriz; Aarnio, Alicia; Knierman, Karen; Monkiewicz, Jacqueline; Alicia Aarnio.

    2018-01-01

    Big Data" is a subject that has taken special relevance today, particularly in Astrophysics, where continuous advances in technology are leading to ever larger data sets. A multimodal approach in perception of astronomical data data (achieved through sonification used for the processing of data) increases the detection of signals in very low signal-to-noise ratio limits and is of special importance to achieve greater inclusion in the field of Astronomy. In the last ten years, different software tools have been developed that perform the sonification of astronomical data from tables or databases, among them the best known and in multiplatform development are Sonification Sandbox, MathTrack, and xSonify.In order to determine the accessibility of software we propose to start carrying out a conformity analysis of ISO (International Standard Organization) 9241-171171: 2008. This standard establishes the general guidelines that must be taken into account for accessibility in software design, and it is applied to software used in work, public places, and at home. To analyze the accessibility of web databases, we take into account the "Web Content Content Accessibility Guidelines (WCAG) 2.0", accepted and published by ISO in the ISO / IEC 40500: 2012 standard.In this poster, we present a User Centered Design (UCD), Human Computer Interaction (HCI), and User Experience (UX) framework to address a non-segregational provision of access to bibliographic databases and telemetry databases in Astronomy. Our framework is based on an ISO evaluation on a selection of data bases such as ADS, Simbad and SDSS. The WCAG 2.0 and ISO 9241-171171: 2008 should not be taken as absolute accessibility standards: these guidelines are very general, are not absolute, and do not address particularities. They are not to be taken as a substitute for UCD, HCI, UX design and evaluation. Based on our results, this research presents the framework for a focus group and qualitative data analysis aimed to lay the foundations for the employment of UCD functionalities on astronomical databases.

  10. Recent improvements in the NASA technical report server

    NASA Technical Reports Server (NTRS)

    Maa, Ming-Hokng; Nelson, Michael L.

    1995-01-01

    The NASA Technical Report Server (NTRS), a World Wide Web (WWW) report distribution service, has been modified to allow parallel database queries, significantly decreasing user access time by an average factor of 2.3, access from clients behind firewalls and/or proxies which truncate excessively long Uniform Resource Locators (URL's), access to non-Wide Area Information Server (WAIS) databases, and compatibility with the Z39-50.3 protocol.

  11. A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: a privacy-protecting remote access system for health-related research and evaluation.

    PubMed

    Jones, Kerina H; Ford, David V; Jones, Chris; Dsilva, Rohan; Thompson, Simon; Brooks, Caroline J; Heaven, Martin L; Thayer, Daniel S; McNerney, Cynthia L; Lyons, Ronan A

    2014-08-01

    With the current expansion of data linkage research, the challenge is to find the balance between preserving the privacy of person-level data whilst making these data accessible for use to their full potential. We describe a privacy-protecting safe haven and secure remote access system, referred to as the Secure Anonymised Information Linkage (SAIL) Gateway. The Gateway provides data users with a familiar Windows interface and their usual toolsets to access approved anonymously-linked datasets for research and evaluation. We outline the principles and operating model of the Gateway, the features provided to users within the secure environment, and how we are approaching the challenges of making data safely accessible to increasing numbers of research users. The Gateway represents a powerful analytical environment and has been designed to be scalable and adaptable to meet the needs of the rapidly growing data linkage community. Copyright © 2014 The Aurthors. Published by Elsevier Inc. All rights reserved.

  12. A case study of the Secure Anonymous Information Linkage (SAIL) Gateway: A privacy-protecting remote access system for health-related research and evaluation☆

    PubMed Central

    Jones, Kerina H.; Ford, David V.; Jones, Chris; Dsilva, Rohan; Thompson, Simon; Brooks, Caroline J.; Heaven, Martin L.; Thayer, Daniel S.; McNerney, Cynthia L.; Lyons, Ronan A.

    2014-01-01

    With the current expansion of data linkage research, the challenge is to find the balance between preserving the privacy of person-level data whilst making these data accessible for use to their full potential. We describe a privacy-protecting safe haven and secure remote access system, referred to as the Secure Anonymised Information Linkage (SAIL) Gateway. The Gateway provides data users with a familiar Windows interface and their usual toolsets to access approved anonymously-linked datasets for research and evaluation. We outline the principles and operating model of the Gateway, the features provided to users within the secure environment, and how we are approaching the challenges of making data safely accessible to increasing numbers of research users. The Gateway represents a powerful analytical environment and has been designed to be scalable and adaptable to meet the needs of the rapidly growing data linkage community. PMID:24440148

  13. Accessing the SEED genome databases via Web services API: tools for programmers.

    PubMed

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-06-14

    The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  14. Taiwan Biobank: making cross-database convergence possible in the Big Data era

    PubMed Central

    Lin, Jui-Chu; Fan, Chien-Te; Liao, Chia-Cheng; Chen, Yao-Sheng

    2018-01-01

    Abstract The Taiwan Biobank (TWB) is a biomedical research database of biopsy data from 200 000 participants. Access to this database has been granted to research communities taking part in the development of precision medicines; however, this has raised issues surrounding TWB’s access to electronic medical records (EMRs). The Personal Data Protection Act of Taiwan restricts access to EMRs for purposes not covered by patients’ original consent. This commentary explores possible legal solutions to help ensure that the access TWB has to EMR abides with legal obligations, and with governance frameworks associated with ethical, legal, and social implications. We suggest utilizing “hash function” algorithms to create nonretrospective, anonymized data for the purpose of cross-transmission and/or linkage with EMR. PMID:29149267

  15. Spindle

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    2013-04-04

    Spindle is software infrastructure that solves file system scalabiltiy problems associated with starting dynamically linked applications in HPC environments. When an HPC applications starts up thousands of pricesses at once, and those processes simultaneously access a shared file system to look for shared libraries, it can cause significant performance problems for both the application and other users. Spindle scalably coordinates the distribution of shared libraries to an application to avoid hammering the shared file system.

  16. Optimum Antenna Configuration for Maximizing Access Point Range of an IEEE 802.11 Wireless Mesh Network in Support of Multi-Mission Operations Relative to Hastily Formed Scalable Deployments

    DTIC Science & Technology

    2007-09-01

    Configuration Consideration ...........................54 C. MAE NGAT DAM, CHIANG MAI , THAILAND, FIELD EXPERIMENT...2006 802.11 Network Topology Mae Ngat Dam, Chiang Mai , Thailand.......................39 Figure 31. View of COASTS 2006 802.11 Topology...Requirements (Background From Google Earth).....62 Figure 44. Mae Ngat Dam, Chiang Mai , Thailand (From Google Earth

  17. Simulation of Physical and Media Access Control (MAC) for Resilient and Scalable Wireless Sensor Networks

    DTIC Science & Technology

    2006-03-01

    schedules uniform wakeup for all the nodes. When data is transmitted, only specific destination nodes need to remain awake. Other nodes not participating...of Carrier Sense ....................................................................30 3. Synchronous vs Asynchronous Wakeup Mechanisms...proposal is to have each pair of node schedule their “wake-up” or “beacons” independent of the IEEE 802.11 PSM beacon. In the ideal scenario, where

  18. For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases

    PubMed Central

    Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

    2015-01-01

    Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases’ criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ’s coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general. PMID:26038727

  19. Java Web Simulation (JWS); a web based database of kinetic models.

    PubMed

    Snoep, J L; Olivier, B G

    2002-01-01

    Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.

  20. DataSpread: Unifying Databases and Spreadsheets.

    PubMed

    Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya

    2015-08-01

    Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current "pane" (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases.

  1. DataSpread: Unifying Databases and Spreadsheets

    PubMed Central

    Bendre, Mangesh; Sun, Bofan; Zhang, Ding; Zhou, Xinyan; Chang, Kevin ChenChuan; Parameswaran, Aditya

    2015-01-01

    Spreadsheet software is often the tool of choice for ad-hoc tabular data management, processing, and visualization, especially on tiny data sets. On the other hand, relational database systems offer significant power, expressivity, and efficiency over spreadsheet software for data management, while lacking in the ease of use and ad-hoc analysis capabilities. We demonstrate DataSpread, a data exploration tool that holistically unifies databases and spreadsheets. It continues to offer a Microsoft Excel-based spreadsheet front-end, while in parallel managing all the data in a back-end database, specifically, PostgreSQL. DataSpread retains all the advantages of spreadsheets, including ease of use, ad-hoc analysis and visualization capabilities, and a schema-free nature, while also adding the advantages of traditional relational databases, such as scalability and the ability to use arbitrary SQL to import, filter, or join external or internal tables and have the results appear in the spreadsheet. DataSpread needs to reason about and reconcile differences in the notions of schema, addressing of cells and tuples, and the current “pane” (which exists in spreadsheets but not in traditional databases), and support data modifications at both the front-end and the back-end. Our demonstration will center on our first and early prototype of the DataSpread, and will give the attendees a sense for the enormous data exploration capabilities offered by unifying spreadsheets and databases. PMID:26900487

  2. Mobile object retrieval in server-based image databases

    NASA Astrophysics Data System (ADS)

    Manger, D.; Pagel, F.; Widak, H.

    2013-05-01

    The increasing number of mobile phones equipped with powerful cameras leads to huge collections of user-generated images. To utilize the information of the images on site, image retrieval systems are becoming more and more popular to search for similar objects in an own image database. As the computational performance and the memory capacity of mobile devices are constantly increasing, this search can often be performed on the device itself. This is feasible, for example, if the images are represented with global image features or if the search is done using EXIF or textual metadata. However, for larger image databases, if multiple users are meant to contribute to a growing image database or if powerful content-based image retrieval methods with local features are required, a server-based image retrieval backend is needed. In this work, we present a content-based image retrieval system with a client server architecture working with local features. On the server side, the scalability to large image databases is addressed with the popular bag-of-word model with state-of-the-art extensions. The client end of the system focuses on a lightweight user interface presenting the most similar images of the database highlighting the visual information which is common with the query image. Additionally, new images can be added to the database making it a powerful and interactive tool for mobile contentbased image retrieval.

  3. Toward an open-access global database for mapping, control, and surveillance of neglected tropical diseases.

    PubMed

    Hürlimann, Eveline; Schur, Nadine; Boutsika, Konstantina; Stensgaard, Anna-Sofie; Laserna de Himpsl, Maiti; Ziegelbauer, Kathrin; Laizer, Nassor; Camenzind, Lukas; Di Pasquale, Aurelio; Ekpo, Uwem F; Simoonga, Christopher; Mushinge, Gabriel; Saarnak, Christopher F L; Utzinger, Jürg; Kristensen, Thomas K; Vounatsou, Penelope

    2011-12-01

    After many years of general neglect, interest has grown and efforts came under way for the mapping, control, surveillance, and eventual elimination of neglected tropical diseases (NTDs). Disease risk estimates are a key feature to target control interventions, and serve as a benchmark for monitoring and evaluation. What is currently missing is a georeferenced global database for NTDs providing open-access to the available survey data that is constantly updated and can be utilized by researchers and disease control managers to support other relevant stakeholders. We describe the steps taken toward the development of such a database that can be employed for spatial disease risk modeling and control of NTDs. With an emphasis on schistosomiasis in Africa, we systematically searched the literature (peer-reviewed journals and 'grey literature'), contacted Ministries of Health and research institutions in schistosomiasis-endemic countries for location-specific prevalence data and survey details (e.g., study population, year of survey and diagnostic techniques). The data were extracted, georeferenced, and stored in a MySQL database with a web interface allowing free database access and data management. At the beginning of 2011, our database contained more than 12,000 georeferenced schistosomiasis survey locations from 35 African countries available under http://www.gntd.org. Currently, the database is expanded to a global repository, including a host of other NTDs, e.g. soil-transmitted helminthiasis and leishmaniasis. An open-access, spatially explicit NTD database offers unique opportunities for disease risk modeling, targeting control interventions, disease monitoring, and surveillance. Moreover, it allows for detailed geostatistical analyses of disease distribution in space and time. With an initial focus on schistosomiasis in Africa, we demonstrate the proof-of-concept that the establishment and running of a global NTD database is feasible and should be expanded without delay.

  4. Evaluation of the Huawei UDS cloud storage system for CERN specific data

    NASA Astrophysics Data System (ADS)

    Zotes Resines, M.; Heikkila, S. S.; Duellmann, D.; Adde, G.; Toebbicke, R.; Hughes, J.; Wang, L.

    2014-06-01

    Cloud storage is an emerging architecture aiming to provide increased scalability and access performance, compared to more traditional solutions. CERN is evaluating this promise using Huawei UDS and OpenStack SWIFT storage deployments, focusing on the needs of high-energy physics. Both deployed setups implement S3, one of the protocols that are emerging as a standard in the cloud storage market. A set of client machines is used to generate I/O load patterns to evaluate the storage system performance. The presented read and write test results indicate scalability both in metadata and data perspectives. Futher the Huawei UDS cloud storage is shown to be able to recover from a major failure of losing 16 disks. Both cloud storages are finally demonstrated to function as back-end storage systems to a filesystem, which is used to deliver high energy physics software.

  5. Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts.

    PubMed

    Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R; Bernstam, Elmer V; Cohen, Trevor

    2012-01-01

    The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.

  6. Deterministic Binary Vectors for Efficient Automated Indexing of MEDLINE/PubMed Abstracts

    PubMed Central

    Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R.; Bernstam, Elmer V.; Cohen, Trevor

    2012-01-01

    The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. PMID:23304369

  7. Strategic redox relay enables a scalable synthesis of ouabagenin, a bioactive cardenolide.

    PubMed

    Renata, Hans; Zhou, Qianghui; Baran, Phil S

    2013-01-04

    Here, we report on a scalable route to the polyhydroxylated steroid ouabagenin with an unusual take on the age-old practice of steroid semisynthesis. The incorporation of both redox and stereochemical relays during the design of this synthesis resulted in efficient access to more than 500 milligrams of a key precursor toward ouabagenin-and ultimately ouabagenin itself-and the discovery of innovative methods for carbon-hydrogen (C-H) and carbon-carbon activation and carbon-oxygen bond homolysis. Given the medicinal relevance of the cardenolides in the treatment of congestive heart failure, a variety of ouabagenin analogs could potentially be generated from the key intermediate as a means of addressing the narrow therapeutic index of these molecules. This synthesis also showcases an approach to bypass the historically challenging problem of selective C-H oxidation of saturated carbon centers in a controlled fashion.

  8. Recent advancements towards green optical networks

    NASA Astrophysics Data System (ADS)

    Davidson, Alan; Glesk, Ivan; Buis, Adrianus; Wang, Junjia; Chen, Lawrence

    2014-12-01

    Recent years have seen a rapid growth in demand for ultra high speed data transmission with end users expecting fast, high bandwidth network access. With this rapid growth in demand, data centres are under pressure to provide ever increasing data rates through their networks and at the same time improve the quality of data handling in terms of reduced latency, increased scalability and improved channel speed for users. However as data rates increase, present technology based on well-established CMOS technology is becoming increasingly difficult to scale and consequently data networks are struggling to satisfy current network demand. In this paper the interrelated issues of electronic scalability, power consumption, limited copper interconnect bandwidth and the limited speed of CMOS electronics will be explored alongside the tremendous bandwidth potential of optical fibre based photonic networks. Some applications of photonics to help alleviate the speed and latency in data networks will be discussed.

  9. xQTL workbench: a scalable web environment for multi-level QTL analysis.

    PubMed

    Arends, Danny; van der Velde, K Joeri; Prins, Pjotr; Broman, Karl W; Möller, Steffen; Jansen, Ritsert C; Swertz, Morris A

    2012-04-01

    xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. m.a.swertz@rug.nl.

  10. xQTL workbench: a scalable web environment for multi-level QTL analysis

    PubMed Central

    Arends, Danny; van der Velde, K. Joeri; Prins, Pjotr; Broman, Karl W.; Möller, Steffen; Jansen, Ritsert C.; Swertz, Morris A.

    2012-01-01

    Summary: xQTL workbench is a scalable web platform for the mapping of quantitative trait loci (QTLs) at multiple levels: for example gene expression (eQTL), protein abundance (pQTL), metabolite abundance (mQTL) and phenotype (phQTL) data. Popular QTL mapping methods for model organism and human populations are accessible via the web user interface. Large calculations scale easily on to multi-core computers, clusters and Cloud. All data involved can be uploaded and queried online: markers, genotypes, microarrays, NGS, LC-MS, GC-MS, NMR, etc. When new data types come available, xQTL workbench is quickly customized using the Molgenis software generator. Availability: xQTL workbench runs on all common platforms, including Linux, Mac OS X and Windows. An online demo system, installation guide, tutorials, software and source code are available under the LGPL3 license from http://www.xqtl.org. Contact: m.a.swertz@rug.nl PMID:22308096

  11. Development of a scalable mental healthcare plan for a rural district in Ethiopia

    PubMed Central

    Fekadu, Abebaw; Hanlon, Charlotte; Medhin, Girmay; Alem, Atalay; Selamu, Medhin; Giorgis, Tedla W.; Shibre, Teshome; Teferra, Solomon; Tegegn, Teketel; Breuer, Erica; Patel, Vikram; Tomlinson, Mark; Thornicroft, Graham; Prince, Martin; Lund, Crick

    2016-01-01

    Background Developing evidence for the implementation and scaling up of mental healthcare in low- and middle-income countries (LMIC) like Ethiopia is an urgent priority. Aims To outline a mental healthcare plan (MHCP), as a scalable template for the implementation of mental healthcare in rural Ethiopia. Method A mixed methods approach was used to develop the MHCP for the three levels of the district health system (community, health facility and healthcare organisation). Results The community packages were community case detection, community reintegration and community inclusion. The facility packages included capacity building, decision support and staff well-being. Organisational packages were programme management, supervision and sustainability. Conclusions The MHCP focused on improving demand and access at the community level, inclusive care at the facility level and sustainability at the organisation level. The MHCP represented an essential framework for the provision of integrated care and may be a useful template for similar LMIC. PMID:26447174

  12. High-frequency self-aligned graphene transistors with transferred gate stacks.

    PubMed

    Cheng, Rui; Bai, Jingwei; Liao, Lei; Zhou, Hailong; Chen, Yu; Liu, Lixin; Lin, Yung-Chen; Jiang, Shan; Huang, Yu; Duan, Xiangfeng

    2012-07-17

    Graphene has attracted enormous attention for radio-frequency transistor applications because of its exceptional high carrier mobility, high carrier saturation velocity, and large critical current density. Herein we report a new approach for the scalable fabrication of high-performance graphene transistors with transferred gate stacks. Specifically, arrays of gate stacks are first patterned on a sacrificial substrate, and then transferred onto arbitrary substrates with graphene on top. A self-aligned process, enabled by the unique structure of the transferred gate stacks, is then used to position precisely the source and drain electrodes with minimized access resistance or parasitic capacitance. This process has therefore enabled scalable fabrication of self-aligned graphene transistors with unprecedented performance including a record-high cutoff frequency up to 427 GHz. Our study defines a unique pathway to large-scale fabrication of high-performance graphene transistors, and holds significant potential for future application of graphene-based devices in ultra-high-frequency circuits.

  13. HEDS - EPA DATABASE SYSTEM FOR PUBLIC ACCESS TO HUMAN EXPOSURE DATA

    EPA Science Inventory

    Human Exposure Database System (HEDS) is an Internet-based system developed to provide public access to human-exposure-related data from studies conducted by EPA's National Exposure Research Laboratory (NERL). HEDS was designed to work with the EPA Office of Research and Devel...

  14. Mining Quality Phrases from Massive Text Corpora

    PubMed Central

    Liu, Jialu; Shang, Jingbo; Wang, Chi; Ren, Xiang; Han, Jiawei

    2015-01-01

    Text data are ubiquitous and play an essential role in big data applications. However, text data are mostly unstructured. Transforming unstructured text into structured units (e.g., semantically meaningful phrases) will substantially reduce semantic ambiguity and enhance the power and efficiency at manipulating such data using database technology. Thus mining quality phrases is a critical research problem in the field of databases. In this paper, we propose a new framework that extracts quality phrases from text corpora integrated with phrasal segmentation. The framework requires only limited training but the quality of phrases so generated is close to human judgment. Moreover, the method is scalable: both computation time and required space grow linearly as corpus size increases. Our experiments on large text corpora demonstrate the quality and efficiency of the new method. PMID:26705375

  15. Vanderbilt University Institute of Imaging Science Center for Computational Imaging XNAT: A multimodal data archive and processing environment.

    PubMed

    Harrigan, Robert L; Yvernault, Benjamin C; Boyd, Brian D; Damon, Stephen M; Gibney, Kyla David; Conrad, Benjamin N; Phillips, Nicholas S; Rogers, Baxter P; Gao, Yurui; Landman, Bennett A

    2016-01-01

    The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has developed a database built on XNAT housing over a quarter of a million scans. The database provides framework for (1) rapid prototyping, (2) large scale batch processing of images and (3) scalable project management. The system uses the web-based interfaces of XNAT and REDCap to allow for graphical interaction. A python middleware layer, the Distributed Automation for XNAT (DAX) package, distributes computation across the Vanderbilt Advanced Computing Center for Research and Education high performance computing center. All software are made available in open source for use in combining portable batch scripting (PBS) grids and XNAT servers. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Albany/FELIX: A parallel, scalable and robust, finite element, first-order Stokes approximation ice sheet solver built for advanced analysis

    DOE PAGES

    Tezaur, I. K.; Perego, M.; Salinger, A. G.; ...

    2015-04-27

    This paper describes a new parallel, scalable and robust finite element based solver for the first-order Stokes momentum balance equations for ice flow. The solver, known as Albany/FELIX, is constructed using the component-based approach to building application codes, in which mature, modular libraries developed as a part of the Trilinos project are combined using abstract interfaces and template-based generic programming, resulting in a final code with access to dozens of algorithmic and advanced analysis capabilities. Following an overview of the relevant partial differential equations and boundary conditions, the numerical methods chosen to discretize the ice flow equations are described, alongmore » with their implementation. The results of several verification studies of the model accuracy are presented using (1) new test cases for simplified two-dimensional (2-D) versions of the governing equations derived using the method of manufactured solutions, and (2) canonical ice sheet modeling benchmarks. Model accuracy and convergence with respect to mesh resolution are then studied on problems involving a realistic Greenland ice sheet geometry discretized using hexahedral and tetrahedral meshes. Also explored as a part of this study is the effect of vertical mesh resolution on the solution accuracy and solver performance. The robustness and scalability of our solver on these problems is demonstrated. Lastly, we show that good scalability can be achieved by preconditioning the iterative linear solver using a new algebraic multilevel preconditioner, constructed based on the idea of semi-coarsening.« less

  17. [Open access to academic scholarship as a public policy resource: a study of the Capes database on Brazilian theses and dissertations].

    PubMed

    da Silva Rosa, Teresa; Carneiro, Maria José

    2010-12-01

    Access to scientific knowledge is a valuable resource than can inform and validate positions taken in formulating public policy. But access to this knowledge can be challenging, given the diversity and breadth of available scholarship. Communication between the fields of science and of politics requires the dissemination of scholarship and access to it. We conducted a study using an open-access search tool in order to map existent knowledge on a specific topic: agricultural contributions to the preservation of biodiversity. The present article offers a critical view of access to the information available through the Capes database on Brazilian theses and dissertations.

  18. Platform for efficient switching between multiple devices in the intensive care unit.

    PubMed

    De Backere, F; Vanhove, T; Dejonghe, E; Feys, M; Herinckx, T; Vankelecom, J; Decruyenaere, J; De Turck, F

    2015-01-01

    This article is part of the Focus Theme of METHODS of Information in Medicine on "Managing Interoperability and Complexity in Health Systems". Handheld computers, such as tablets and smartphones, are becoming more and more accessible in the clinical care setting and in Intensive Care Units (ICUs). By making the most useful and appropriate data available on multiple devices and facilitate the switching between those devices, staff members can efficiently integrate them in their workflow, allowing for faster and more accurate decisions. This paper addresses the design of a platform for the efficient switching between multiple devices in the ICU. The key functionalities of the platform are the integration of the platform into the workflow of the medical staff and providing tailored and dynamic information at the point of care. The platform is designed based on a 3-tier architecture with a focus on extensibility, scalability and an optimal user experience. After identification to a device using Near Field Communication (NFC), the appropriate medical information will be shown on the selected device. The visualization of the data is adapted to the type of the device. A web-centric approach was used to enable extensibility and portability. A prototype of the platform was thoroughly evaluated. The scalability, performance and user experience were evaluated. Performance tests show that the response time of the system scales linearly with the amount of data. Measurements with up to 20 devices have shown no performance loss due to the concurrent use of multiple devices. The platform provides a scalable and responsive solution to enable the efficient switching between multiple devices. Due to the web-centric approach new devices can easily be integrated. The performance and scalability of the platform have been evaluated and it was shown that the response time and scalability of the platform was within an acceptable range.

  19. Generation of comprehensive thoracic oncology database--tool for translational research.

    PubMed

    Surati, Mosmi; Robinson, Matthew; Nandi, Suvobroto; Faoro, Leonardo; Demchuk, Carley; Kanteti, Rajani; Ferguson, Benjamin; Gangadhar, Tara; Hensing, Thomas; Hasina, Rifat; Husain, Aliya; Ferguson, Mark; Karrison, Theodore; Salgia, Ravi

    2011-01-22

    The Thoracic Oncology Program Database Project was created to serve as a comprehensive, verified, and accessible repository for well-annotated cancer specimens and clinical data to be available to researchers within the Thoracic Oncology Research Program. This database also captures a large volume of genomic and proteomic data obtained from various tumor tissue studies. A team of clinical and basic science researchers, a biostatistician, and a bioinformatics expert was convened to design the database. Variables of interest were clearly defined and their descriptions were written within a standard operating manual to ensure consistency of data annotation. Using a protocol for prospective tissue banking and another protocol for retrospective banking, tumor and normal tissue samples from patients consented to these protocols were collected. Clinical information such as demographics, cancer characterization, and treatment plans for these patients were abstracted and entered into an Access database. Proteomic and genomic data have been included in the database and have been linked to clinical information for patients described within the database. The data from each table were linked using the relationships function in Microsoft Access to allow the database manager to connect clinical and laboratory information during a query. The queried data can then be exported for statistical analysis and hypothesis generation.

  20. Fine-grained policy control in U.S. Army Research Laboratory (ARL) multimodal signatures database

    NASA Astrophysics Data System (ADS)

    Bennett, Kelly; Grueneberg, Keith; Wood, David; Calo, Seraphin

    2014-06-01

    The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) consists of a number of colocated relational databases representing a collection of data from various sensors. Role-based access to this data is granted to external organizations such as DoD contractors and other government agencies through a client Web portal. In the current MMSDB system, access control is only at the database and firewall level. In order to offer finer grained security, changes to existing user profile schemas and authentication mechanisms are usually needed. In this paper, we describe a software middleware architecture and implementation that allows fine-grained access control to the MMSDB at a dataset, table, and row level. Result sets from MMSDB queries issued in the client portal are filtered with the use of a policy enforcement proxy, with minimal changes to the existing client software and database. Before resulting data is returned to the client, policies are evaluated to determine if the user or role is authorized to access the data. Policies can be authored to filter data at the row, table or column level of a result set. The system uses various technologies developed in the International Technology Alliance in Network and Information Science (ITA) for policy-controlled information sharing and dissemination1. Use of the Policy Management Library provides a mechanism for the management and evaluation of policies to support finer grained access to the data in the MMSDB system. The GaianDB is a policy-enabled, federated database that acts as a proxy between the client application and the MMSDB system.

Top