elist database instance: Topics by Science.gov

Sample records for elist database instance

Introduction to the enhanced logistics intratheater support tool (ELIST) mission application and its segments : global data segment version 8.1.0.0, database instance segment version 8.1.0.0, database fill segment version 8.1.0.0, database segment versio

DOT National Transportation Integrated Search

2002-02-26

This document, the Introduction to the Enhanced Logistics Intratheater Support Tool (ELIST) Mission Application and its Segments, satisfies the following objectives: : It identifies the mission application, known in brief as ELIST, and all seven ...
Electronic Communities: a Forum for Supporting Women Professionals and Students in Technical and Scientific Fields

NASA Astrophysics Data System (ADS)

Single, Peg Boyle; Muller, Carol B.; Cunningham, Christine M.; Single, Richard M.

In this article, we report on electronic discussion lists (e-lists) sponsored by MentorNet, the National Electronic Industrial Mentoring Network for Women in Engineering and Science. Using the Internet, the MentorNet program connects students in engineering and science with mentors working in industry. These e-lists are a feature of MentorNet's larger electronic mentoring program and were sponsored to foster the establishment of community among women engineering and science students and men and women professionals in those fields. This research supports the hypothesis that electronic communications can be used to develop community among engineering and science students and professionals and identifies factors influencing the emergence of electronic communities (e-communities). The e-lists that emerged into self-sustaining e-communities were focused on topic-based themes, such as balancing personal and work life, issues pertaining to women in engineering and science, and job searching. These e-communities were perceived to be safe places, embraced a diversity of opinions and experiences, and sanctioned personal and meaningful postings on the part of the participants. The e-communities maintained three to four simultaneous threaded discussions and were sustained by professionals who served as facilitators by seeding the e-lists with discussion topics. The e-lists were sponsored to provide women students participating in MentorNet with access to groups of technical and scientific professionals. In addition to providing benefits to the students, the e-lists also provided the professionals with opportunities to engage in peer mentoring with other, mostly female, technical and scientific professionals. We discuss the implications of our findings for developing e-communities and for serving the needs of women in technical and scientific fields.
Logistics Process Analysis ToolProcess Analysis Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

2008-03-31

LPAT is the resulting integrated system between ANL-developed Enhanced Logistics Intra Theater Support Tool (ELIST) sponsored by SDDC-TEA and the Fort Future Virtual Installation Tool (sponsored by CERL). The Fort Future Simulation Engine was an application written in the ANL Repast Simphony framework and used as the basis for the process Anlysis Tool (PAT) which evolved into a stand=-along tool for detailed process analysis at a location. Combined with ELIST, an inter-installation logistics component was added to enable users to define large logistical agent-based models without having to program. PAT is the evolution of an ANL-developed software system called Fortmore » Future Virtual Installation Tool (sponsored by CERL). The Fort Future Simulation Engine was an application written in the ANL Repast Simphony framework and used as the basis for the Process Analysis Tool(PAT) which evolved into a stand-alone tool for detailed process analysis at a location (sponsored by the SDDC-TEA).« less
Using ontology databases for scalable query answering, inconsistency detection, and data integration

PubMed Central

Dou, Dejing

2011-01-01

An ontology database is a basic relational database management system that models an ontology plus its instances. To reason over the transitive closure of instances in the subsumption hierarchy, for example, an ontology database can either unfold views at query time or propagate assertions using triggers at load time. In this paper, we use existing benchmarks to evaluate our method—using triggers—and we demonstrate that by forward computing inferences, we not only improve query time, but the improvement appears to cost only more space (not time). However, we go on to show that the true penalties were simply opaque to the benchmark, i.e., the benchmark inadequately captures load-time costs. We have applied our methods to two case studies in biomedicine, using ontologies and data from genetics and neuroscience to illustrate two important applications: first, ontology databases answer ontology-based queries effectively; second, using triggers, ontology databases detect instance-based inconsistencies—something not possible using views. Finally, we demonstrate how to extend our methods to perform data integration across multiple, distributed ontology databases. PMID:22163378
[A relational database to store Poison Centers calls].

PubMed

Barelli, Alessandro; Biondi, Immacolata; Tafani, Chiara; Pellegrini, Aristide; Soave, Maurizio; Gaspari, Rita; Annetta, Maria Giuseppina

2006-01-01

Italian Poison Centers answer to approximately 100,000 calls per year. Potentially, this activity is a huge source of data for toxicovigilance and for syndromic surveillance. During the last decade, surveillance systems for early detection of outbreaks have drawn the attention of public health institutions due to the threat of terrorism and high-profile disease outbreaks. Poisoning surveillance needs the ongoing, systematic collection, analysis, interpretation, and dissemination of harmonised data about poisonings from all Poison Centers for use in public health action to reduce morbidity and mortality and to improve health. The entity-relationship model for a Poison Center relational database is extremely complex and not studied in detail. For this reason, not harmonised data collection happens among Italian Poison Centers. Entities are recognizable concepts, either concrete or abstract, such as patients and poisons, or events which have relevance to the database, such as calls. Connectivity and cardinality of relationships are complex as well. A one-to-many relationship exist between calls and patients: for one instance of entity calls, there are zero, one, or many instances of entity patients. At the same time, a one-to-many relationship exist between patients and poisons: for one instance of entity patients, there are zero, one, or many instances of entity poisons. This paper shows a relational model for a poison center database which allows the harmonised data collection of poison centers calls.
Enhanced Logistics Intra-theater Support Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Groningen, Charles N.; Braun, Mary Duffy; Widing, Mary Ann

2004-01-27

Developed for use by Department of Defense deployment analysts to perform detailed Reception, Staging, Onward movement and Integration (RSO&I) analyses. ELIST requires: o Vehicle characteristics for ships, planes, trucks, railcars, buses, and helicopters. o Network (physical) characteristics defining the airport, seaport, road, rail, waterway and pipeline infrastructure available in a theater of operations. o Assets available for moving the personnel, equipment and supplies over the infrastructure network. o Movement requirements plan defining the deployment requirements of a military force. This includes defining each unit, its cargo (at various levels of resolution) , where it must move from and to, whatmore » modes it is required to travel by, and when it must be delivered through each phase of deployment.« less
Information System through ANIS at CeSAM

NASA Astrophysics Data System (ADS)

Moreau, C.; Agneray, F.; Gimenez, S.

2015-09-01

ANIS (AstroNomical Information System) is a web generic tool developed at CeSAM to facilitate and standardize the implementation of astronomical data of various kinds through private and/or public dedicated Information Systems. The architecture of ANIS is composed of a database server which contains the project data, a web user interface template which provides high level services (search, extract and display imaging and spectroscopic data using a combination of criteria, an object list, a sql query module or a cone search interfaces), a framework composed of several packages, and a metadata database managed by a web administration entity. The process to implement a new ANIS instance at CeSAM is easy and fast : the scientific project has to submit data or a data secure access, the CeSAM team installs the new instance (web interface template and the metadata database), and the project administrator can configure the instance with the web ANIS-administration entity. Currently, the CeSAM offers through ANIS a web access to VO compliant Information Systems for different projects (HeDaM, HST-COSMOS, CFHTLS-ZPhots, ExoDAT,...).
Appendix A. Borderlands Site Database

Treesearch

A.C. MacWilliams

2006-01-01

The database includes modified components of the Arizona State Museum Site Recording System (Arizona State Museum 1993) and the New Mexico NMCRIS User?s Guide (State of New Mexico 1993). When sites contain more than one recorded component, these instances were entered separately with the result that many sites have multiple entries. Information for this database...
Publishing Data on Physical Samples Using the GeoLink Ontology and Linked Data Platforms

NASA Astrophysics Data System (ADS)

Ji, P.; Arko, R. A.; Lehnert, K. A.; Song, L.; Carter, M. R.; Hsu, L.

2015-12-01

Interdisciplinary Earth Data Alliance (IEDA), one of partners in EarthCube GeoLink project, seeks to explore the extent to which the use of GeoLink reusable Ontology Design Patterns (ODPs) and linked data platforms in IEDA data infrastructure can make research data more easily accessible and valuable. Linked data for the System for Earth Sample Registration (SESAR) is the first effort of IEDA to show how linked data enhance the presentation of IEDA data system architecture. SESAR Linked Data maps each table and column in SESAR database to RDF class and property based on GeoLink view, which build on the top of GeoLink ODPs. Then, uses D2RQ dumping the contents of SESAR database into RDF triples on the basis of mapping results. And, the dumped RDF triples is loaded into GRAPHDB, an RDF graph database, as permanent data in the form of atomic facts expressed as subjects, predicates and objects which provide support for semantic interoperability between IEDA and other GeoLink partners. Finally, an integrated browsing and searching interface build on Callimachus, a highly scalable platform for publishing linked data, is introduced to make sense of data stored in triplestore. Drill down and through features are built in the interface to help users locating content efficiently. The drill down feature enables users to explore beyond the summary information in the instance list of a specific class and into the detail from the specific instance page. The drill through feature enables users to jump from one instance to another one by simply clicking the link of the latter nested in the former region. Additionally, OpenLayers map is embedded into the interface to enhance the attractiveness of the presentation of instance which has geospatial information. Furthermore, by linking instances in the SESAR datasets to matching or corresponding instances in external sets, the presentation has been enriched with additional information about related classes like person, cruise, etc.
Automatic classification and detection of clinically relevant images for diabetic retinopathy

NASA Astrophysics Data System (ADS)

Xu, Xinyu; Li, Baoxin

2008-03-01

We proposed a novel approach to automatic classification of Diabetic Retinopathy (DR) images and retrieval of clinically-relevant DR images from a database. Given a query image, our approach first classifies the image into one of the three categories: microaneurysm (MA), neovascularization (NV) and normal, and then it retrieves DR images that are clinically-relevant to the query image from an archival image database. In the classification stage, the query DR images are classified by the Multi-class Multiple-Instance Learning (McMIL) approach, where images are viewed as bags, each of which contains a number of instances corresponding to non-overlapping blocks, and each block is characterized by low-level features including color, texture, histogram of edge directions, and shape. McMIL first learns a collection of instance prototypes for each class that maximizes the Diverse Density function using Expectation- Maximization algorithm. A nonlinear mapping is then defined using the instance prototypes and maps every bag to a point in a new multi-class bag feature space. Finally a multi-class Support Vector Machine is trained in the multi-class bag feature space. In the retrieval stage, we retrieve images from the archival database who bear the same label with the query image, and who are the top K nearest neighbors of the query image in terms of similarity in the multi-class bag feature space. The classification approach achieves high classification accuracy, and the retrieval of clinically-relevant images not only facilitates utilization of the vast amount of hidden diagnostic knowledge in the database, but also improves the efficiency and accuracy of DR lesion diagnosis and assessment.
SenseLab

PubMed Central

Crasto, Chiquito J.; Marenco, Luis N.; Liu, Nian; Morse, Thomas M.; Cheung, Kei-Hoi; Lai, Peter C.; Bahl, Gautam; Masiar, Peter; Lam, Hugo Y.K.; Lim, Ernest; Chen, Huajin; Nadkarni, Prakash; Migliore, Michele; Miller, Perry L.; Shepherd, Gordon M.

2009-01-01

This article presents the latest developments in neuroscience information dissemination through the SenseLab suite of databases: NeuronDB, CellPropDB, ORDB, OdorDB, OdorMapDB, ModelDB and BrainPharm. These databases include information related to: (i) neuronal membrane properties and neuronal models, and (ii) genetics, genomics, proteomics and imaging studies of the olfactory system. We describe here: the new features for each database, the evolution of SenseLab’s unifying database architecture and instances of SenseLab database interoperation with other neuroscience online resources. PMID:17510162
Redis database administration tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Martinez, J. J.

2013-02-13

MyRedis is a product of the Lorenz subproject under the ASC Scirntific Data Management effort. MyRedis is a web based utility designed to allow easy administration of instances of Redis databases. It can be usedd to view and manipulate data as well as run commands directly against a variety of different Redis hosts.
Object instance recognition using motion cues and instance specific appearance models

NASA Astrophysics Data System (ADS)

Schumann, Arne

2014-03-01

In this paper we present an object instance retrieval approach. The baseline approach consists of a pool of image features which are computed on the bounding boxes of a query object track and compared to a database of tracks in order to find additional appearances of the same object instance. We improve over this simple baseline approach in multiple ways: 1) we include motion cues to achieve improved robustness to viewpoint and rotation changes, 2) we include operator feedback to iteratively re-rank the resulting retrieval lists and 3) we use operator feedback and location constraints to train classifiers and learn an instance specific appearance model. We use these classifiers to further improve the retrieval results. The approach is evaluated on two popular public datasets for two different applications. We evaluate person re-identification on the CAVIAR shopping mall surveillance dataset and vehicle instance recognition on the VIVID aerial dataset and achieve significant improvements over our baseline results.
DataBase on Demand

NASA Astrophysics Data System (ADS)

Gaspar Aparicio, R.; Gomez, D.; Coterillo Coz, I.; Wojcik, D.

2012-12-01

At CERN a number of key database applications are running on user-managed MySQL database services. The database on demand project was born out of an idea to provide the CERN user community with an environment to develop and run database services outside of the actual centralised Oracle based database services. The Database on Demand (DBoD) empowers the user to perform certain actions that had been traditionally done by database administrators, DBA's, providing an enterprise platform for database applications. It also allows the CERN user community to run different database engines, e.g. presently open community version of MySQL and single instance Oracle database server. This article describes a technology approach to face this challenge, a service level agreement, the SLA that the project provides, and an evolution of possible scenarios.
Application GIS on university planning: building a spatial database aided spatial decision

NASA Astrophysics Data System (ADS)

Miao, Lei; Wu, Xiaofang; Wang, Kun; Nong, Yu

2007-06-01

With the development of university and its size enlarging, kinds of resource need to effective management urgently. Spacial database is the right tool to assist administrator's spatial decision. And it's ready for digital campus with integrating existing OMS. It's researched about the campus planning in detail firstly. Following instanced by south china agriculture university it is practiced that how to build the geographic database of the campus building and house for university administrator's spatial decision.
Identifying Crucial Parameter Correlations Maintaining Bursting Activity

PubMed Central

Doloc-Mihu, Anca; Calabrese, Ronald L.

2014-01-01

Recent experimental and computational studies suggest that linearly correlated sets of parameters (intrinsic and synaptic properties of neurons) allow central pattern-generating networks to produce and maintain their rhythmic activity regardless of changing internal and external conditions. To determine the role of correlated conductances in the robust maintenance of functional bursting activity, we used our existing database of half-center oscillator (HCO) model instances of the leech heartbeat CPG. From the database, we identified functional activity groups of burster (isolated neuron) and half-center oscillator model instances and realistic subgroups of each that showed burst characteristics (principally period and spike frequency) similar to the animal. To find linear correlations among the conductance parameters maintaining functional leech bursting activity, we applied Principal Component Analysis (PCA) to each of these four groups. PCA identified a set of three maximal conductances (leak current, Leak; a persistent K current, K2; and of a persistent Na+ current, P) that correlate linearly for the two groups of burster instances but not for the HCO groups. Visualizations of HCO instances in a reduced space suggested that there might be non-linear relationships between these parameters for these instances. Experimental studies have shown that period is a key attribute influenced by modulatory inputs and temperature variations in heart interneurons. Thus, we explored the sensitivity of period to changes in maximal conductances of Leak, K2, and P, and we found that for our realistic bursters the effect of these parameters on period could not be assessed because when varied individually bursting activity was not maintained. PMID:24945358
Retrieving clinically relevant diabetic retinopathy images using a multi-class multiple-instance framework

NASA Astrophysics Data System (ADS)

Chandakkar, Parag S.; Venkatesan, Ragav; Li, Baoxin

2013-02-01

Diabetic retinopathy (DR) is a vision-threatening complication from diabetes mellitus, a medical condition that is rising globally. Unfortunately, many patients are unaware of this complication because of absence of symptoms. Regular screening of DR is necessary to detect the condition for timely treatment. Content-based image retrieval, using archived and diagnosed fundus (retinal) camera DR images can improve screening efficiency of DR. This content-based image retrieval study focuses on two DR clinical findings, microaneurysm and neovascularization, which are clinical signs of non-proliferative and proliferative diabetic retinopathy. The authors propose a multi-class multiple-instance image retrieval framework which deploys a modified color correlogram and statistics of steerable Gaussian Filter responses, for retrieving clinically relevant images from a database of DR fundus image database.
Nonparametric Bayesian Modeling for Automated Database Schema Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferragut, Erik M; Laska, Jason A

2015-01-01

The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Final report for DOE Award # DE- SC0010039*: Carbon dynamics of forest recovery under a changing climate: Forcings, feedbacks, and implications for earth system modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson-Teixeira, Kristina J.; DeLucia, Evan H.; Duval, Benjamin D.

2015-10-29

To advance understanding of C dynamics of forests globally, we compiled a new database, the Forest C database (ForC-db), which contains data on ground-based measurements of ecosystem-level C stocks and annual fluxes along with disturbance history. This database currently contains 18,791 records from 2009 sites, making it the largest and most comprehensive database of C stocks and flows in forest ecosystems globally. The tropical component of the database will be published in conjunction with a manuscript that is currently under review (Anderson-Teixeira et al., in review). Database development continues, and we hope to maintain a dynamic instance of the entiremore » (global) database.« less
A Database to Support Ecosystems Services Research in Lakes of the Northeastern United States

EPA Science Inventory

Northeastern lakes provide valuable ecosystem services that benefit residents and visitors and are increasingly important for provisioning of recreational opportunities and amenities. Concurrently, however, population growth threatens lakes by, for instance, increasing nutrient...

Neuroimaging Data Sharing on the Neuroinformatics Database Platform

PubMed Central

Book, Gregory A; Stevens, Michael; Assaf, Michal; Glahn, David; Pearlson, Godfrey D

2015-01-01

We describe the Neuroinformatics Database (NiDB), an open-source database platform for archiving, analysis, and sharing of neuroimaging data. Data from the multi-site projects Autism Brain Imaging Data Exchange (ABIDE), Bipolar-Schizophrenia Network on Intermediate Phenotypes parts one and two (B-SNIP1, B-SNIP2), and Monetary Incentive Delay task (MID) are available for download from the public instance of NiDB, with more projects sharing data as it becomes available. As demonstrated by making several large datasets available, NiDB is an extensible platform appropriately suited to archive and distribute shared neuroimaging data. PMID:25888923
Analysis of Family Structures Reveals Robustness or Sensitivity of Bursting Activity to Parameter Variations in a Half-Center Oscillator (HCO) Model.

PubMed

Doloc-Mihu, Anca; Calabrese, Ronald L

2016-01-01

The underlying mechanisms that support robustness in neuronal networks are as yet unknown. However, recent studies provide evidence that neuronal networks are robust to natural variations, modulation, and environmental perturbations of parameters, such as maximal conductances of intrinsic membrane and synaptic currents. Here we sought a method for assessing robustness, which might easily be applied to large brute-force databases of model instances. Starting with groups of instances with appropriate activity (e.g., tonic spiking), our method classifies instances into much smaller subgroups, called families, in which all members vary only by the one parameter that defines the family. By analyzing the structures of families, we developed measures of robustness for activity type. Then, we applied these measures to our previously developed model database, HCO-db, of a two-neuron half-center oscillator (HCO), a neuronal microcircuit from the leech heartbeat central pattern generator where the appropriate activity type is alternating bursting. In HCO-db, the maximal conductances of five intrinsic and two synaptic currents were varied over eight values (leak reversal potential also varied, five values). We focused on how variations of particular conductance parameters maintain normal alternating bursting activity while still allowing for functional modulation of period and spike frequency. We explored the trade-off between robustness of activity type and desirable change in activity characteristics when intrinsic conductances are altered and identified the hyperpolarization-activated (h) current as an ideal target for modulation. We also identified ensembles of model instances that closely approximate physiological activity and can be used in future modeling studies.
Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python

NASA Astrophysics Data System (ADS)

Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor

2017-04-01

As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.
Stylistic Variations in Science Lectures: Teaching Vocabulary.

ERIC Educational Resources Information Center

Jackson, Jane; Bilton, Linda

1994-01-01

Twenty lectures by native speaker geology lecturers to nonnative speaker students were transcribed, and 921 instances of vocabulary elaboration were coded into a computer database according to 20 linguistic features. Analysis revealed noticeable variation among lecturers in language range/technicality, vocabulary elaboration, signalling, and use…
Loop-Extended Symbolic Execution on Binary Programs

DTIC Science & Technology

2009-03-02

1434. Based on its speci- fication [35], one valid message format contains 2 fields: a header byte of value 4, followed by a string giving a database ...potentially become expensive. For instance the polyhedron technique [16] requires costly conversion operations on a multi-dimensional abstract representation
SVM-RFE based feature selection and Taguchi parameters optimization for multiclass SVM classifier.

PubMed

Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W M; Li, R K; Jiang, Bo-Ru

2014-01-01

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases.
SVM-RFE Based Feature Selection and Taguchi Parameters Optimization for Multiclass SVM Classifier

PubMed Central

Huang, Mei-Ling; Hung, Yung-Hsiang; Lee, W. M.; Li, R. K.; Jiang, Bo-Ru

2014-01-01

Recently, support vector machine (SVM) has excellent performance on classification and prediction and is widely used on disease diagnosis or medical assistance. However, SVM only functions well on two-group classification problems. This study combines feature selection and SVM recursive feature elimination (SVM-RFE) to investigate the classification accuracy of multiclass problems for Dermatology and Zoo databases. Dermatology dataset contains 33 feature variables, 1 class variable, and 366 testing instances; and the Zoo dataset contains 16 feature variables, 1 class variable, and 101 testing instances. The feature variables in the two datasets were sorted in descending order by explanatory power, and different feature sets were selected by SVM-RFE to explore classification accuracy. Meanwhile, Taguchi method was jointly combined with SVM classifier in order to optimize parameters C and γ to increase classification accuracy for multiclass classification. The experimental results show that the classification accuracy can be more than 95% after SVM-RFE feature selection and Taguchi parameter optimization for Dermatology and Zoo databases. PMID:25295306
DOE Office of Scientific and Technical Information (OSTI.GOV)

Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less
Learning to Rapidly Re-Contact the Lost Plume in Chemical Plume Tracing

PubMed Central

Cao, Meng-Li; Meng, Qing-Hao; Wang, Jia-Ying; Luo, Bing; Jing, Ya-Qi; Ma, Shu-Gen

2015-01-01

Maintaining contact between the robot and plume is significant in chemical plume tracing (CPT). In the time immediately following the loss of chemical detection during the process of CPT, Track-Out activities bias the robot heading relative to the upwind direction, expecting to rapidly re-contact the plume. To determine the bias angle used in the Track-Out activity, we propose an online instance-based reinforcement learning method, namely virtual trail following (VTF). In VTF, action-value is generalized from recently stored instances of successful Track-Out activities. We also propose a collaborative VTF (cVTF) method, in which multiple robots store their own instances, and learn from the stored instances, in the same database. The proposed VTF and cVTF methods are compared with biased upwind surge (BUS) method, in which all Track-Out activities utilize an offline optimized universal bias angle, in an indoor environment with three different airflow fields. With respect to our experimental conditions, VTF and cVTF show stronger adaptability to different airflow environments than BUS, and furthermore, cVTF yields higher success rates and time-efficiencies than VTF. PMID:25825974
Database Development for Ocean Impacts: Imaging, Outreach and Rapid Response

DTIC Science & Technology

2011-09-30

evaluate otolith structure and relationships to the swimbladder. • Oil samples from the Deepwater Horizon spill (C Reddy , Marine Chemistry...scanner has also been used in the last year to assist with “cold cases” for several law enforcement agencies. In these instances, ultra high
Cross-Matching Source Observations from the Palomar Transient Factory (PTF)

NASA Astrophysics Data System (ADS)

Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.

2009-01-01

Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).
Systems and methods for predicting materials properties

DOEpatents

Ceder, Gerbrand; Fischer, Chris; Tibbetts, Kevin; Morgan, Dane; Curtarolo, Stefano

2007-11-06

Systems and methods for predicting features of materials of interest. Reference data are analyzed to deduce relationships between the input data sets and output data sets. Reference data includes measured values and/or computed values. The deduced relationships can be specified as equations, correspondences, and/or algorithmic processes that produce appropriate output data when suitable input data is used. In some instances, the output data set is a subset of the input data set, and computational results may be refined by optionally iterating the computational procedure. To deduce features of a new material of interest, a computed or measured input property of the material is provided to an equation, correspondence, or algorithmic procedure previously deduced, and an output is obtained. In some instances, the output is iteratively refined. In some instances, new features deduced for the material of interest are added to a database of input and output data for known materials.
Teaching Structured Design of Network Algorithms in Enhanced Versions of SQL

ERIC Educational Resources Information Center

de Brock, Bert

2004-01-01

From time to time developers of (database) applications will encounter, explicitly or implicitly, structures such as trees, graphs, and networks. Such applications can, for instance, relate to bills of material, organization charts, networks of (rail)roads, networks of conduit pipes (e.g., plumbing, electricity), telecom networks, and data…
Simple re-instantiation of small databases using cloud computing.

PubMed

Tan, Tin Wee; Xie, Chao; De Silva, Mark; Lim, Kuan Siong; Patro, C Pawan K; Lim, Shen Jean; Govindarajan, Kunde Ramamoorthy; Tong, Joo Chuan; Choo, Khar Heng; Ranganathan, Shoba; Khan, Asif M

2013-01-01

Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear.
Simple re-instantiation of small databases using cloud computing

PubMed Central

2013-01-01

Background Small bioinformatics databases, unlike institutionally funded large databases, are vulnerable to discontinuation and many reported in publications are no longer accessible. This leads to irreproducible scientific work and redundant effort, impeding the pace of scientific progress. Results We describe a Web-accessible system, available online at http://biodb100.apbionet.org, for archival and future on demand re-instantiation of small databases within minutes. Depositors can rebuild their databases by downloading a Linux live operating system (http://www.bioslax.com), preinstalled with bioinformatics and UNIX tools. The database and its dependencies can be compressed into an ".lzm" file for deposition. End-users can search for archived databases and activate them on dynamically re-instantiated BioSlax instances, run as virtual machines over the two popular full virtualization standard cloud-computing platforms, Xen Hypervisor or vSphere. The system is adaptable to increasing demand for disk storage or computational load and allows database developers to use the re-instantiated databases for integration and development of new databases. Conclusions Herein, we demonstrate that a relatively inexpensive solution can be implemented for archival of bioinformatics databases and their rapid re-instantiation should the live databases disappear. PMID:24564380
Nuclear Forensics Analysis with Missing and Uncertain Data

DOE PAGES

Langan, Roisin T.; Archibald, Richard K.; Lamberti, Vincent

2015-10-05

We have applied a new imputation-based method for analyzing incomplete data, called Monte Carlo Bayesian Database Generation (MCBDG), to the Spent Fuel Isotopic Composition (SFCOMPO) database. About 60% of the entries are absent for SFCOMPO. The method estimates missing values of a property from a probability distribution created from the existing data for the property, and then generates multiple instances of the completed database for training a machine learning algorithm. Uncertainty in the data is represented by an empirical or an assumed error distribution. The method makes few assumptions about the underlying data, and compares favorably against results obtained bymore » replacing missing information with constant values.« less
32 CFR 293.6 - Procedures.

Code of Federal Regulations, 2014 CFR

2014-07-01

... database, and initiates the record search. If a final response cannot be made to the FOIA requester within... FOIA and the Privacy Act. Not all requesters will be knowledgeable of the appropriate act to cite when requesting records or access to records. In some instances, either the FOIA or the Privacy Act may be cited...
32 CFR 293.6 - Procedures.

Code of Federal Regulations, 2012 CFR

2012-07-01

... database, and initiates the record search. If a final response cannot be made to the FOIA requester within... FOIA and the Privacy Act. Not all requesters will be knowledgeable of the appropriate act to cite when requesting records or access to records. In some instances, either the FOIA or the Privacy Act may be cited...
32 CFR 293.6 - Procedures.

Code of Federal Regulations, 2013 CFR

2013-07-01

... database, and initiates the record search. If a final response cannot be made to the FOIA requester within... FOIA and the Privacy Act. Not all requesters will be knowledgeable of the appropriate act to cite when requesting records or access to records. In some instances, either the FOIA or the Privacy Act may be cited...
Oracle Applications Patch Administration Tool (PAT) Beta Version

DOE Office of Scientific and Technical Information (OSTI.GOV)

2002-01-04

PAT is a Patch Administration Tool that provides analysis, tracking, and management of Oracle Application patches. This includes capabilities as outlined below: Patch Analysis & Management Tool Outline of capabilities: Administration Patch Data Maintenance -- track Oracle Application patches applied to what database instance & machine Patch Analysis capture text files (readme.txt and driver files) form comparison detail report comparison detail PL/SQL package comparison detail SQL scripts detail JSP module comparison detail Parse and load the current applptch.txt (10.7) or load patch data from Oracle Application database patch tables (11i) Display Analysis -- Compare patch to be applied with currentmore » Oracle Application installed Appl_top code versions Patch Detail Module comparison detail Analyze and display one Oracle Application module patch. Patch Management -- automatic queue and execution of patches Administration Parameter maintenance -- setting for directory structure of Oracle Application appl_top Validation data maintenance -- machine names and instances to patch Operation Patch Data Maintenance Schedule a patch (queue for later execution) Run a patch (queue for immediate execution) Review the patch logs Patch Management Reports« less

Learning to segment mouse embryo cells

NASA Astrophysics Data System (ADS)

León, Juan; Pardo, Alejandro; Arbeláez, Pablo

2017-11-01

Recent advances in microscopy enable the capture of temporal sequences during cell development stages. However, the study of such sequences is a complex task and time consuming task. In this paper we propose an automatic strategy to adders the problem of semantic and instance segmentation of mouse embryos using NYU's Mouse Embryo Tracking Database. We obtain our instance proposals as refined predictions from the generalized hough transform, using prior knowledge of the embryo's locations and their current cell stage. We use two main approaches to learn the priors: Hand crafted features and automatic learned features. Our strategy increases the baseline jaccard index from 0.12 up to 0.24 using hand crafted features and 0.28 by using automatic learned ones.
Architectural Implications for Spatial Object Association Algorithms*

PubMed Central

Kumar, Vijay S.; Kurc, Tahsin; Saltz, Joel; Abdulla, Ghaleb; Kohn, Scott R.; Matarazzo, Celeste

2013-01-01

Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server®, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation provides insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST). PMID:25692244
KA-SB: from data integration to large scale reasoning

PubMed Central

Roldán-García, María del Mar; Navas-Delgado, Ismael; Kerzazi, Amine; Chniber, Othmane; Molina-Castro, Joaquín; Aldana-Montes, José F

2009-01-01

Background The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. Methods KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning). Results In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts. Conclusion These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool , which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases. PMID:19796402
Observational Mishaps - a Database

NASA Astrophysics Data System (ADS)

von Braun, K.; Chiboucas, K.; Hurley-Keller, D.

1999-05-01

We present a World-Wide-Web-accessible database of astronomical images which suffer from a variety of observational problems. These problems range from common phenomena, such as dust grains on filters and/or dewar window, to more exotic cases like, for instance, deflated support airbags underneath the primary mirror. The purpose of this database is to enable astronomers at telescopes to save telescope time by discovering the nature of the trouble they might be experiencing with the help of this online catalog. Every observational mishap contained in this collection is presented in the form of a GIF image, a brief explanation of the problem, and, to the extent possible, a suggestion of what might be done to solve the problem and improve the image quality.
Optimal tree increment models for the Northeastern United Statesq

Treesearch

Don C. Bragg

2003-01-01

used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
Optimal Tree Increment Models for the Northeastern United States

Treesearch

Don C. Bragg

2005-01-01

I used the potential relative increment (PRI) methodology to develop optimal tree diameter growth models for the Northeastern United States. Thirty species from the Eastwide Forest Inventory Database yielded 69,676 individuals, which were then reduced to fast-growing subsets for PRI analysis. For instance, only 14 individuals from the greater than 6,300-tree eastern...
The Hazard Notification System (HANS)

NASA Astrophysics Data System (ADS)

Snedigar, S. F.; Venezky, D. Y.

2009-12-01

The Volcano Hazards Program (VHP) has developed a Hazard Notification System (HANS) for distributing volcanic activity information collected by scientists to airlines, emergency services, and the general public. In the past year, data from HANS have been used by airlines to make decisions about diverting or canceling flights during the eruption of Mount Redoubt. HANS was developed to provide a single system that each of the five U.S. volcano observatories could use for communicating and storing volcanic information about the 160+ potentially active U.S. volcanoes. The data that cover ten tables and nearly 100 fields are now stored in similar formats, and the information can be released in styles requested by our agency partners, such as the International Civil Aviation Organization (ICAO). Currently, HANS has about 4500 reports stored; on average, two - three reports are added daily. HANS (at its most basic form) consists of a user interface for entering data into one of many release types (Daily Status Reports, Weekly Updates, Volcano Activity Notifications, etc.); a database holding previous releases as well as observatory information such as email address lists and volcano boilerplates; and a transmission system for formatting releases and sending them out by email or other web related system. The user interface to HANS is completely web based, providing access to our observatory scientists from any online PC. The underlying database stores the observatory information and drives the observatory and program websites' dynamic updates and archived information releases. HANS also runs scripts for generating several different feeds including the program home page Volcano Status Map. Each observatory has the capability of running an instance of HANS. There are currently three instances of HANS and each instance is synchronized to all other instances using a master-slave environment. Information can be entered on any node; slave nodes transmit data to the master node, and the master retransmits that data to all slave nodes. All data transfer between instances uses the Simple Object Access Protocol (SOAP) as the envelope in which data are transmitted between nodes. The HANS data synchronization not only works as a backup feature, but also acts as a simple fault-tolerant system. Information from any observatory can be entered on any instance, and still be transmitted to the specified observatory's distribution list, which provides added flexibility if there is a disruption in access from an area that needs to send an update. Additionally, having the same information available on our multiple websites is necessary for communicating our scientists' most up-to-date information.
Architectural Implications for Spatial Object Association Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kumar, V S; Kurc, T; Saltz, J

2009-01-29

Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation providesmore » insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).« less
Modernization and multiscale databases at the U.S. geological survey

USGS Publications Warehouse

Morrison, J.L.

1992-01-01

The U.S. Geological Survey (USGS) has begun a digital cartographic modernization program. Keys to that program are the creation of a multiscale database, a feature-based file structure that is derived from a spatial data model, and a series of "templates" or rules that specify the relationships between instances of entities in reality and features in the database. The database will initially hold data collected from the USGS standard map products at scales of 1:24,000, 1:100,000, and 1:2,000,000. The spatial data model is called the digital line graph-enhanced model, and the comprehensive rule set consists of collection rules, product generation rules, and conflict resolution rules. This modernization program will affect the USGS mapmaking process because both digital and graphic products will be created from the database. In addition, non-USGS map users will have more flexibility in uses of the databases. These remarks are those of the session discussant made in response to the six papers and the keynote address given in the session. ?? 1992.
Highlights of the HITRAN2016 database

NASA Astrophysics Data System (ADS)

Gordon, I.; Rothman, L. S.; Hill, C.; Kochanov, R. V.; Tan, Y.

2016-12-01

The HITRAN2016 database will be released just before the AGU meeting. It is a titanic effort of world-wide collaboration between experimentalists, theoreticians and atmospheric scientists, who measure, calculate and validate the HITRAN data. The line-by-line lists for almost all of the HITRAN molecules were updated in comparison with the previous compilation HITRAN2012 [1] that has been in use, along with some intermediate updates, since 2012. The extent of the updates ranges from updating a few lines of certain molecules to complete replacements of the lists and introduction of additional isotopologues. Many more vibrational bands were added to the database, extending the spectral coverage and completeness of the datasets. For several molecules, including H2O, CO2 and CH4, the extent of the updates is so complex that separate task groups were assembled to make strategic decisions about the choices of sources for various parameters in different spectral regions. The amount of parameters has also been significantly increased, now incorporating, for instance, non-Voigt line profiles [2]; broadening by gases other than air and "self" [3]; and other phenomena, including line mixing. In addition, the amount of cross-sectional sets in the database has increased dramatically and includes many recent experiments as well as adaptation of the existing databases that were not in HITRAN previously (for instance the PNNL database [4]). The HITRAN2016 edition takes full advantage of the new structure and interface available at www.hitran.org [5] and the HITRAN Application Programming Interface [6]. This poster will provide a summary of the updates, emphasizing details of some of the most important or dramatic improvements. The users of the database will have an opportunity to discuss the updates relevant to their research and request a demonstration on how to work with the database. This work is supported by the NASA PATM (NNX13AI59G), PDART (NNX16AG51G) and AURA (NNX14AI55G) programs. References[1] L.S. Rothman et al, JQSRT 130, 4 (2013). [2] P. Wcisło et al., JQSRT 177, 75 (2016). [3] J. S. Wilzewski et al., JQSRT 168, 193 (2016). [4] S.W. Sharpe et al, Appl Spectrosc 58, 1452 (2004). [5] C. Hill et al, JQSRT 177, 4 (2016). [6] R.V. Kochanov et al, JQSRT 177, 15 (2016).
Testing in Service-Oriented Environments

DTIC Science & Technology

2010-03-01

software releases (versions, service packs, vulnerability patches) for one com- mon ESB during the 13-month period from January 1, 2008 through...impact on quality of service : Unlike traditional software compo- nents, a single instance of a web service can be used by multiple consumers. Since the...distributed, with heterogeneous hardware and software (SOA infrastructure, services , operating systems, and databases). Because of cost and security, it
Protein binding hot spots prediction from sequence only by a new ensemble learning method.

PubMed

Hu, Shan-Shan; Chen, Peng; Wang, Bing; Li, Jinyan

2017-10-01

Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set. http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .
Variant Alleles, Triallelic Patterns, and Point Mutations Observed in Nuclear Short Tandem Repeat Typing of Populations in Bosnia and Serbia

PubMed Central

Huel, René L. M.; Bašić, Lara; Madacki-Todorović, Kamelija; Smajlović, Lejla; Eminović, Izet; Berbić, Irfan; Miloš, Ana; Parsons, Thomas J.

2007-01-01

Aim To present a compendium of off-ladder alleles and other genotyping irregularities relating to rare/unexpected population genetic variation, observed in a large short tandem repeat (STR) database from Bosnia and Serbia. Methods DNA was extracted from blood stain cards relating to reference samples from a population of 32 800 individuals from Bosnia and Serbia, and typed using Promega’s PowerPlex®16 STR kit. Results There were 31 distinct off-ladder alleles were observed in 10 of the 15 STR loci amplified from the PowerPlex®16 STR kit. Of these 31 alleles, 3 have not been previously reported. Furthermore, 16 instances of triallelic patterns were observed in 9 of the 15 loci. Primer binding site mismatches that affected amplification were observed in two loci, D5S818 and D8S1179. Conclusion Instances of deviations from manufacturer’s allelic ladders should be expected and caution taken to properly designate the correct alleles in large DNA databases. Particular care should be taken in kinship matching or paternity cases as incorrect designation of any of these deviations from allelic ladders could lead to false exclusions. PMID:17696304
Can genetic algorithms help virus writers reshape their creations and avoid detection?

NASA Astrophysics Data System (ADS)

Abu Doush, Iyad; Al-Saleh, Mohammed I.

2017-11-01

Different attack and defence techniques have been evolved over time as actions and reactions between black-hat and white-hat communities. Encryption, polymorphism, metamorphism and obfuscation are among the techniques used by the attackers to bypass security controls. On the other hand, pattern matching, algorithmic scanning, emulation and heuristic are used by the defence team. The Antivirus (AV) is a vital security control that is used against a variety of threats. The AV mainly scans data against its database of virus signatures. Basically, it claims a virus if a match is found. This paper seeks to find the minimal possible changes that can be made on the virus so that it will appear normal when scanned by the AV. Brute-force search through all possible changes can be a computationally expensive task. Alternatively, this paper tries to apply a Genetic Algorithm in solving such a problem. Our proposed algorithm is tested on seven different malware instances. The results show that in all the tested malware instances only a small change in each instance was good enough to bypass the AV.
Selective 4D modelling framework for spatial-temporal land information management system

NASA Astrophysics Data System (ADS)

Doulamis, Anastasios; Soile, Sofia; Doulamis, Nikolaos; Chrisouli, Christina; Grammalidis, Nikos; Dimitropoulos, Kosmas; Manesis, Charalambos; Potsiou, Chryssy; Ioannidis, Charalabos

2015-06-01

This paper introduces a predictive (selective) 4D modelling framework where only the spatial 3D differences are modelled at the forthcoming time instances, while regions of no significant spatial-temporal alterations remain intact. To accomplish this, initially spatial-temporal analysis is applied between 3D digital models captured at different time instances. So, the creation of dynamic change history maps is made. Change history maps indicate spatial probabilities of regions needed further 3D modelling at forthcoming instances. Thus, change history maps are good examples for a predictive assessment, that is, to localize surfaces within the objects where a high accuracy reconstruction process needs to be activated at the forthcoming time instances. The proposed 4D Land Information Management System (LIMS) is implemented using open interoperable standards based on the CityGML framework. CityGML allows the description of the semantic metadata information and the rights of the land resources. Visualization aspects are also supported to allow easy manipulation, interaction and representation of the 4D LIMS digital parcels and the respective semantic information. The open source 3DCityDB incorporating a PostgreSQL geo-database is used to manage and manipulate 3D data and their semantics. An application is made to detect the change through time of a 3D block of plots in an urban area of Athens, Greece. Starting with an accurate 3D model of the buildings in 1983, a change history map is created using automated dense image matching on aerial photos of 2010. For both time instances meshes are created and through their comparison the changes are detected.
Exploring Short Linear Motifs Using the ELM Database and Tools.

PubMed

Gouw, Marc; Sámano-Sánchez, Hugo; Van Roey, Kim; Diella, Francesca; Gibson, Toby J; Dinkel, Holger

2017-06-27

The Eukaryotic Linear Motif (ELM) resource is dedicated to the characterization and prediction of short linear motifs (SLiMs). SLiMs are compact, degenerate peptide segments found in many proteins and essential to almost all cellular processes. However, despite their abundance, SLiMs remain largely uncharacterized. The ELM database is a collection of manually annotated SLiM instances curated from experimental literature. In this article we illustrate how to browse and search the database for curated SLiM data, and cover the different types of data integrated in the resource. We also cover how to use this resource in order to predict SLiMs in known as well as novel proteins, and how to interpret the results generated by the ELM prediction pipeline. The ELM database is a very rich resource, and in the following protocols we give helpful examples to demonstrate how this knowledge can be used to improve your own research. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Guidelines for the Effective Use of Entity-Attribute-Value Modeling for Biomedical Databases

PubMed Central

Dinu, Valentin; Nadkarni, Prakash

2007-01-01

Purpose To introduce the goals of EAV database modeling, to describe the situations where Entity-Attribute-Value (EAV) modeling is a useful alternative to conventional relational methods of database modeling, and to describe the fine points of implementation in production systems. Methods We analyze the following circumstances: 1) data are sparse and have a large number of applicable attributes, but only a small fraction will apply to a given entity; 2) numerous classes of data need to be represented, each class has a limited number of attributes, but the number of instances of each class is very small. We also consider situations calling for a mixed approach where both conventional and EAV design are used for appropriate data classes. Results and Conclusions In robust production systems, EAV-modeled databases trade a modest data sub-schema for a complex metadata sub-schema. The need to design the metadata effectively makes EAV design potentially more challenging than conventional design. PMID:17098467
Handwriting generates variable visual output to facilitate symbol learning.

PubMed

Li, Julia X; James, Karin H

2016-03-01

Recent research has demonstrated that handwriting practice facilitates letter categorization in young children. The present experiments investigated why handwriting practice facilitates visual categorization by comparing 2 hypotheses: that handwriting exerts its facilitative effect because of the visual-motor production of forms, resulting in a direct link between motor and perceptual systems, or because handwriting produces variable visual instances of a named category in the environment that then changes neural systems. We addressed these issues by measuring performance of 5-year-old children on a categorization task involving novel, Greek symbols across 6 different types of learning conditions: 3 involving visual-motor practice (copying typed symbols independently, tracing typed symbols, tracing handwritten symbols) and 3 involving visual-auditory practice (seeing and saying typed symbols of a single typed font, of variable typed fonts, and of handwritten examples). We could therefore compare visual-motor production with visual perception both of variable and similar forms. Comparisons across the 6 conditions (N = 72) demonstrated that all conditions that involved studying highly variable instances of a symbol facilitated symbol categorization relative to conditions where similar instances of a symbol were learned, regardless of visual-motor production. Therefore, learning perceptually variable instances of a category enhanced performance, suggesting that handwriting facilitates symbol understanding by virtue of its environmental output: supporting the notion of developmental change though brain-body-environment interactions. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Middle Ground on Gun Control

DTIC Science & Technology

2016-12-01

make it into the NICS. The Department of Health and Human Services acknowledges that in some instances the wording of state laws will need to be...statics. This thesis examines the efficacy of adding more mental health information to the FBI’s database of persons who are prohibited from gun...This research finds that mental health information on clinical depression and schizophrenia can be a strong predictor of suicidal tendencies, and
Using distant supervised learning to identify protein subcellular localizations from full-text scientific articles.

PubMed

Zheng, Wu; Blake, Catherine

2015-10-01

Databases of curated biomedical knowledge, such as the protein-locations reflected in the UniProtKB database, provide an accurate and useful resource to researchers and decision makers. Our goal is to augment the manual efforts currently used to curate knowledge bases with automated approaches that leverage the increased availability of full-text scientific articles. This paper describes experiments that use distant supervised learning to identify protein subcellular localizations, which are important to understand protein function and to identify candidate drug targets. Experiments consider Swiss-Prot, the manually annotated subset of the UniProtKB protein knowledge base, and 43,000 full-text articles from the Journal of Biological Chemistry that contain just under 11.5 million sentences. The system achieves 0.81 precision and 0.49 recall at sentence level and an accuracy of 57% on held-out instances in a test set. Moreover, the approach identifies 8210 instances that are not in the UniProtKB knowledge base. Manual inspection of the 50 most likely relations showed that 41 (82%) were valid. These results have immediate benefit to researchers interested in protein function, and suggest that distant supervision should be explored to complement other manual data curation efforts. Copyright © 2015 Elsevier Inc. All rights reserved.

openBIS ELN-LIMS: an open-source database for academic laboratories.

PubMed

Barillari, Caterina; Ottoz, Diana S M; Fuentes-Serna, Juan Mariano; Ramakrishnan, Chandrasekhar; Rinn, Bernd; Rudolf, Fabian

2016-02-15

The open-source platform openBIS (open Biology Information System) offers an Electronic Laboratory Notebook and a Laboratory Information Management System (ELN-LIMS) solution suitable for the academic life science laboratories. openBIS ELN-LIMS allows researchers to efficiently document their work, to describe materials and methods and to collect raw and analyzed data. The system comes with a user-friendly web interface where data can be added, edited, browsed and searched. The openBIS software, a user guide and a demo instance are available at https://openbis-eln-lims.ethz.ch. The demo instance contains some data from our laboratory as an example to demonstrate the possibilities of the ELN-LIMS (Ottoz et al., 2014). For rapid local testing, a VirtualBox image of the ELN-LIMS is also available. © The Author 2015. Published by Oxford University Press.
Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

PubMed Central

Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

2012-01-01

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Spatial cyberinfrastructures, ontologies, and the humanities.

PubMed

Sieber, Renee E; Wellen, Christopher C; Jin, Yuan

2011-04-05

We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success.
Monitoring of IaaS and scientific applications on the Cloud using the Elasticsearch ecosystem

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Guarise, A.; Lusso, S.; Masera, M.; Vallero, S.

2015-05-01

The private Cloud at the Torino INFN computing centre offers IaaS services to different scientific computing applications. The infrastructure is managed with the OpenNebula cloud controller. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BES-III collaboration, plus an increasing number of other small tenants. Besides keeping track of the usage, the automation of dynamic allocation of resources to tenants requires detailed monitoring and accounting of the resource usage. As a first investigation towards this, we set up a monitoring system to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the Elasticsearch, Logstash and Kibana stack. In the current implementation, the heterogeneous accounting information is fed to different MySQL databases and sent to Elasticsearch via a custom Logstash plugin. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service, which is also used for other accounting purposes. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BES-III virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. Each of these three cases is indexed separately in Elasticsearch. We are now starting to consider dismissing the intermediate level provided by the SQL database and evaluating a NoSQL option as a unique central database for all the monitoring information. We setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools.
Image-based query-by-example for big databases of galaxy images

NASA Astrophysics Data System (ADS)

Shamir, Lior; Kuminski, Evan

2017-01-01

Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.
Corruption of genomic databases with anomalous sequence.

PubMed

Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

1992-06-11

We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
A digital version of the 1970 U.S. Geological Survey topographic map of the San Francisco Bay region, three sheets, 1:125,000

USGS Publications Warehouse

Aitken, Douglas S.

1997-01-01

This Open-File report is a digital topographic map database. It contains a digital version of the 1970 U.S. Geological Survey topographic map of the San Francisco Bay Region (3 sheets), at a scale of 1:125,000. These ARC/INFO coverages are in vector format. The vectorization process has distorted characters representing letters and numbers, as well as some road and other symbols, making them difficult to read in some instances. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. The content and character of the database and methods of obtaining it are described herein.
The design and implementation of EPL: An event pattern language for active databases

NASA Technical Reports Server (NTRS)

Giuffrida, G.; Zaniolo, C.

1994-01-01

The growing demand for intelligent information systems requires closer coupling of rule-based reasoning engines, such as CLIPS, with advanced data base management systems (DBMS). For instance, several commercial DBMS now support the notion of triggers that monitor events and transactions occurring in the database and fire induced actions, which perform a variety of critical functions, including safeguarding the integrity of data, monitoring access, and recording volatile information needed by administrators, analysts, and expert systems to perform assorted tasks; examples of these tasks include security enforcement, market studies, knowledge discovery, and link analysis. At UCLA, we designed and implemented the event pattern language (EPL) which is capable of detecting and acting upon complex patterns of events which are temporally related to each other. For instance, a plant manager should be notified when a certain pattern of overheating repeats itself over time in a chemical process; likewise, proper notification is required when a suspicious sequence of bank transactions is executed within a certain time limit. The EPL prototype is built in CLIPS to operate on top of Sybase, a commercial relational DBMS, where actions can be triggered by events such as simple database updates, insertions, and deletions. The rule-based syntax of EPL allows the sequences of goals in rules to be interpreted as sequences of temporal events; each goal can correspond to either (1) a simple event, or (2) a (possibly negated) event/condition predicate, or (3) a complex event defined as the disjunction and repetition of other events. Various extensions have been added to CLIPS in order to tailor the interface with Sybase and its open client/server architecture.
[Utility of axial images in an early Alzheimer disease diagnosis support system (VSRAD)].

PubMed

Goto, Masami; Aoki, Shigeki; Abe, Osamu; Masumoto, Tomohiko; Watanabe, Yasushi; Satake, Yoshiroh; Nishida, Katsuji; Ino, Kenji; Yano, Keiichi; Iida, Kyohhito; Mima, Kazuo; Ohtomo, Kuni

2006-09-20

In recent years, voxel-based morphometry (VBM) has become a popular tool for the early diagnosis of Alzheimer disease. The Voxel-Based Specific Regional Analysis System for Alzheimer's Disease (VSRAD), a VBM system that uses MRI, has been reported to be clinically useful. The able-bodied person database (DB) of VSRAD, which employs sagittal plane imaging, is not suitable for analysis by axial plane imaging. However, axial plane imaging is useful for avoiding motion artifacts from the eyeball. Therefore, we created an able-bodied person DB by axial plane imaging and examined its utility. We also analyzed groups of able-bodied persons and persons with dementia by axial plane imaging and reviewed the validity. After using the DB of axial plane imaging, the Z-score of the intrahippocampal region improved by 8 in 13 instances. In all brains, the Z-score improved by 13 in all instances.
Evaluation of the performance of open-source RDBMS and triplestores for storing medical data over a web service.

PubMed

Kilintzis, Vassilis; Beredimas, Nikolaos; Chouvarda, Ioanna

2014-01-01

An integral part of a system that manages medical data is the persistent storage engine. For almost twenty five years Relational Database Management Systems(RDBMS) were considered the obvious decision, yet today new technologies have emerged that require our attention as possible alternatives. Triplestores store information in terms of RDF triples without necessarily binding to a specific predefined structural model. In this paper we present an attempt to compare the performance of Apache JENA-Fuseki and the Virtuoso Universal Server 6 triplestores with that of MySQL 5.6 RDBMS for storing and retrieving medical information that it is communicated as RDF/XML ontology instances over a RESTful web service. The results show that the performance, calculated as average time of storing and retrieving instances, is significantly better using Virtuoso Server while MySQL performed better than Fuseki.
Columba: an integrated database of proteins, structures, and annotations.

PubMed

Trissl, Silke; Rother, Kristian; Müller, Heiko; Steinke, Thomas; Koch, Ina; Preissner, Robert; Frömmel, Cornelius; Leser, Ulf

2005-03-31

Structural and functional research often requires the computation of sets of protein structures based on certain properties of the proteins, such as sequence features, fold classification, or functional annotation. Compiling such sets using current web resources is tedious because the necessary data are spread over many different databases. To facilitate this task, we have created COLUMBA, an integrated database of annotations of protein structures. COLUMBA currently integrates twelve different databases, including PDB, KEGG, Swiss-Prot, CATH, SCOP, the Gene Ontology, and ENZYME. The database can be searched using either keyword search or data source-specific web forms. Users can thus quickly select and download PDB entries that, for instance, participate in a particular pathway, are classified as containing a certain CATH architecture, are annotated as having a certain molecular function in the Gene Ontology, and whose structures have a resolution under a defined threshold. The results of queries are provided in both machine-readable extensible markup language and human-readable format. The structures themselves can be viewed interactively on the web. The COLUMBA database facilitates the creation of protein structure data sets for many structure-based studies. It allows to combine queries on a number of structure-related databases not covered by other projects at present. Thus, information on both many and few protein structures can be used efficiently. The web interface for COLUMBA is available at http://www.columba-db.de.
Reuse of the Cloud Analytics and Collaboration Environment within Tactical Applications (TacApps): A Feasibility Analysis

DTIC Science & Technology

2016-03-01

Representational state transfer  Java messaging service  Java application programming interface (API)  Internet relay chat (IRC)/extensible messaging and...JBoss application server or an Apache Tomcat servlet container instance. The relational database management system can be either PostgreSQL or MySQL ... Java library called direct web remoting. This library has been part of the core CACE architecture for quite some time; however, there have not been
Digital plagiarism - The web giveth and the web shall taketh

PubMed Central

Presti, David E

2000-01-01

Publishing students' and researchers' papers on the World Wide Web (WWW) facilitates the sharing of information within and between academic communities. However, the ease of copying and transporting digital information leaves these authors' ideas open to plagiarism. Using tools such as the Plagiarism.org database, which compares submissions to reports and papers available on the Internet, could discover instances of plagiarism, revolutionize the peer review process, and raise the quality of published research everywhere. PMID:11720925
Digital plagiarism--the Web giveth and the Web shall taketh.

PubMed

Barrie, J M; Presti, D E

2000-01-01

Publishing students' and researchers' papers on the World Wide Web (WWW) facilitates the sharing of information within and between academic communities. However, the ease of copying and transporting digital information leaves these authors' ideas open to plagiarism. Using tools such as the Plagiarism.org database, which compares submissions to reports and papers available on the Internet, could discover instances of plagiarism, revolutionize the peer review process, and raise the quality of published research everywhere.
Spatial cyberinfrastructures, ontologies, and the humanities

PubMed Central

Sieber, Renee E.; Wellen, Christopher C.; Jin, Yuan

2011-01-01

We report on research into building a cyberinfrastructure for Chinese biographical and geographic data. Our cyberinfrastructure contains (i) the McGill-Harvard-Yenching Library Ming Qing Women's Writings database (MQWW), the only online database on historical Chinese women's writings, (ii) the China Biographical Database, the authority for Chinese historical people, and (iii) the China Historical Geographical Information System, one of the first historical geographic information systems. Key to this integration is that linked databases retain separate identities as bases of knowledge, while they possess sufficient semantic interoperability to allow for multidatabase concepts and to support cross-database queries on an ad hoc basis. Computational ontologies create underlying semantics for database access. This paper focuses on the spatial component in a humanities cyberinfrastructure, which includes issues of conflicting data, heterogeneous data models, disambiguation, and geographic scale. First, we describe the methodology for integrating the databases. Then we detail the system architecture, which includes a tier of ontologies and schema. We describe the user interface and applications that allow for cross-database queries. For instance, users should be able to analyze the data, examine hypotheses on spatial and temporal relationships, and generate historical maps with datasets from MQWW for research, teaching, and publication on Chinese women writers, their familial relations, publishing venues, and the literary and social communities. Last, we discuss the social side of cyberinfrastructure development, as people are considered to be as critical as the technical components for its success. PMID:21444819
Biomedical question answering using semantic relations.

PubMed

Hristovski, Dimitar; Dinevski, Dejan; Kastrin, Andrej; Rindflesch, Thomas C

2015-01-16

The proliferation of the scientific literature in the field of biomedicine makes it difficult to keep abreast of current knowledge, even for domain experts. While general Web search engines and specialized information retrieval (IR) systems have made important strides in recent decades, the problem of accurate knowledge extraction from the biomedical literature is far from solved. Classical IR systems usually return a list of documents that have to be read by the user to extract relevant information. This tedious and time-consuming work can be lessened with automatic Question Answering (QA) systems, which aim to provide users with direct and precise answers to their questions. In this work we propose a novel methodology for QA based on semantic relations extracted from the biomedical literature. We extracted semantic relations with the SemRep natural language processing system from 122,421,765 sentences, which came from 21,014,382 MEDLINE citations (i.e., the complete MEDLINE distribution up to the end of 2012). A total of 58,879,300 semantic relation instances were extracted and organized in a relational database. The QA process is implemented as a search in this database, which is accessed through a Web-based application, called SemBT (available at http://sembt.mf.uni-lj.si ). We conducted an extensive evaluation of the proposed methodology in order to estimate the accuracy of extracting a particular semantic relation from a particular sentence. Evaluation was performed by 80 domain experts. In total 7,510 semantic relation instances belonging to 2,675 distinct relations were evaluated 12,083 times. The instances were evaluated as correct 8,228 times (68%). In this work we propose an innovative methodology for biomedical QA. The system is implemented as a Web-based application that is able to provide precise answers to a wide range of questions. A typical question is answered within a few seconds. The tool has some extensions that make it especially useful for interpretation of DNA microarray results.
Cloud-Based NoSQL Open Database of Pulmonary Nodules for Computer-Aided Lung Cancer Diagnosis and Reproducible Research.

PubMed

Ferreira Junior, José Raniery; Oliveira, Marcelo Costa; de Azevedo-Marques, Paulo Mazzoncini

2016-12-01

Lung cancer is the leading cause of cancer-related deaths in the world, and its main manifestation is pulmonary nodules. Detection and classification of pulmonary nodules are challenging tasks that must be done by qualified specialists, but image interpretation errors make those tasks difficult. In order to aid radiologists on those hard tasks, it is important to integrate the computer-based tools with the lesion detection, pathology diagnosis, and image interpretation processes. However, computer-aided diagnosis research faces the problem of not having enough shared medical reference data for the development, testing, and evaluation of computational methods for diagnosis. In order to minimize this problem, this paper presents a public nonrelational document-oriented cloud-based database of pulmonary nodules characterized by 3D texture attributes, identified by experienced radiologists and classified in nine different subjective characteristics by the same specialists. Our goal with the development of this database is to improve computer-aided lung cancer diagnosis and pulmonary nodule detection and classification research through the deployment of this database in a cloud Database as a Service framework. Pulmonary nodule data was provided by the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), image descriptors were acquired by a volumetric texture analysis, and database schema was developed using a document-oriented Not only Structured Query Language (NoSQL) approach. The proposed database is now with 379 exams, 838 nodules, and 8237 images, 4029 of them are CT scans and 4208 manually segmented nodules, and it is allocated in a MongoDB instance on a cloud infrastructure.
Genetics and Forensics: Making the National DNA Database

PubMed Central

Johnson, Paul; Williams, Robin; Martin, Paul

2005-01-01

This paper is based on a current study of the growing police use of the epistemic authority of molecular biology for the identification of criminal suspects in support of crime investigation. It discusses the development of DNA profiling and the establishment and development of the UK National DNA Database (NDNAD) as an instance of the ‘scientification of police work’ (Ericson and Shearing 1986) in which the police uses of science and technology have a recursive effect on their future development. The NDNAD, owned by the Association of Chief Police Officers of England and Wales, is the first of its kind in the world and currently contains the genetic profiles of more than 2 million people. The paper provides a framework for the examination of this socio-technical innovation, begins to tease out the dense and compact history of the database and accounts for the way in which changes and developments across disparate scientific, governmental and policing contexts, have all contributed to the range of uses to which it is put. PMID:16467921
Interactive Radiology teaching file system: the development of a MIRC-compliant and user-centered e-learning resource.

PubMed

dos-Santos, M; Fujino, A

2012-01-01

Radiology teaching usually employs a systematic and comprehensive set of medical images and related information. Databases with representative radiological images and documents are highly desirable and widely used in Radiology teaching programs. Currently, computer-based teaching file systems are widely used in Medicine and Radiology teaching as an educational resource. This work addresses a user-centered radiology electronic teaching file system as an instance of MIRC compliant medical image database. Such as a digital library, the clinical cases are available to access by using a web browser. The system has offered great opportunities to some Radiology residents interact with experts. This has been done by applying user-centered techniques and creating usage context-based tools in order to make available an interactive system.
Automated Data Aggregation for Time-Series Analysis: Study Case on Anaesthesia Data Warehouse.

PubMed

Lamer, Antoine; Jeanne, Mathieu; Ficheur, Grégoire; Marcilly, Romaric

2016-01-01

Data stored in operational databases are not reusable directly. Aggregation modules are necessary to facilitate secondary use. They decrease volume of data while increasing the number of available information. In this paper, we present four automated engines of aggregation, integrated into an anaesthesia data warehouse. Four instances of clinical questions illustrate the use of those engines for various improvements of quality of care: duration of procedure, drug administration, assessment of hypotension and its related treatment.

Variability sensitivity of dynamic texture based recognition in clinical CT data

NASA Astrophysics Data System (ADS)

Kwitt, Roland; Razzaque, Sharif; Lowell, Jeffrey; Aylward, Stephen

2014-03-01

Dynamic texture recognition using a database of template models has recently shown promising results for the task of localizing anatomical structures in Ultrasound video. In order to understand its clinical value, it is imperative to study the sensitivity with respect to inter-patient variability as well as sensitivity to acquisition parameters such as Ultrasound probe angle. Fully addressing patient and acquisition variability issues, however, would require a large database of clinical Ultrasound from many patients, acquired in a multitude of controlled conditions, e.g., using a tracked transducer. Since such data is not readily attainable, we advocate an alternative evaluation strategy using abdominal CT data as a surrogate. In this paper, we describe how to replicate Ultrasound variabilities by extracting subvolumes from CT and interpreting the image material as an ordered sequence of video frames. Utilizing this technique, and based on a database of abdominal CT from 45 patients, we report recognition results on an organ (kidney) recognition task, where we try to discriminate kidney subvolumes/videos from a collection of randomly sampled negative instances. We demonstrate that (1) dynamic texture recognition is relatively insensitive to inter-patient variation while (2) viewing angle variability needs to be accounted for in the template database. Since naively extending the template database to counteract variability issues can lead to impractical database sizes, we propose an alternative strategy based on automated identification of a small set of representative models.
Palaeo sea-level and ice-sheet databases: problems, strategies and perspectives

NASA Astrophysics Data System (ADS)

Rovere, Alessio; Düsterhus, André; Carlson, Anders; Barlow, Natasha; Bradwell, Tom; Dutton, Andrea; Gehrels, Roland; Hibbert, Fiona; Hijma, Marc; Horton, Benjamin; Klemann, Volker; Kopp, Robert; Sivan, Dorit; Tarasov, Lev; Törnqvist, Torbjorn

2016-04-01

Databases of palaeoclimate data have driven many major developments in understanding the Earth system. The measurement and interpretation of palaeo sea-level and ice-sheet data that form such databases pose considerable challenges to the scientific communities that use them for further analyses. In this paper, we build on the experience of the PALSEA (PALeo constraints on SEA level rise) community, which is a working group inside the PAGES (Past Global Changes) project, to describe the challenges and best strategies that can be adopted to build a self-consistent and standardised database of geological and geochemical data related to palaeo sea levels and ice sheets. Our aim in this paper is to identify key points that need attention and subsequent funding when undertaking the task of database creation. We conclude that any sea-level or ice-sheet database must be divided into three instances: i) measurement; ii) interpretation; iii) database creation. Measurement should include postion, age, description of geological features, and quantification of uncertainties. All must be described as objectively as possible. Interpretation can be subjective, but it should always include uncertainties and include all the possible interpretations, without unjustified a priori exclusions. We propose that, in the creation of a database, an approach based on Accessibility, Transparency, Trust, Availability, Continued updating, Completeness and Communication of content (ATTAC3) must be adopted. Also, it is essential to consider the community structure that creates and benefits of a database. We conclude that funding sources should consider to address not only the creation of original data in specific research-question oriented projects, but also include the possibility to use part of the funding for IT-related and database creation tasks, which are essential to guarantee accessibility and maintenance of the collected data.
Integrating forensic information in a crime intelligence database.

PubMed

Rossy, Quentin; Ioset, Sylvain; Dessimoz, Damien; Ribaux, Olivier

2013-07-10

Since 2008, intelligence units of six states of the western part of Switzerland have been sharing a common database for the analysis of high volume crimes. On a daily basis, events reported to the police are analysed, filtered and classified to detect crime repetitions and interpret the crime environment. Several forensic outcomes are integrated in the system such as matches of traces with persons, and links between scenes detected by the comparison of forensic case data. Systematic procedures have been settled to integrate links assumed mainly through DNA profiles, shoemarks patterns and images. A statistical outlook on a retrospective dataset of series from 2009 to 2011 of the database informs for instance on the number of repetition detected or confirmed and increased by forensic case data. Time needed to obtain forensic intelligence in regard with the type of marks treated, is seen as a critical issue. Furthermore, the underlying integration process of forensic intelligence into the crime intelligence database raised several difficulties in regards of the acquisition of data and the models used in the forensic databases. Solutions found and adopted operational procedures are described and discussed. This process form the basis to many other researches aimed at developing forensic intelligence models. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Longitudinal data for interdisciplinary ageing research. Design of the Linnaeus Database.

PubMed

Malmberg, Gunnar; Nilsson, Lars-Göran; Weinehall, Lars

2010-11-01

To allow for interdisciplinary research on the relations between socioeconomic conditions and health in the ageing population, a new anonymized longitudinal database - the Linnaeus Database - has been developed at the Centre for Population Studies at Umeå University. This paper presents the database and its research potential. Using the Swedish personal numbers the researchers have, in collaboration with Statistics Sweden and the National Board for Health and Welfare, linked individual records from Swedish register data on death causes, hospitalization and various socioeconomic conditions with two databases - Betula and VIP (Västerbottens Intervention Programme) - previously developed by the researchers at Umeå University. Whereas Betula includes rich information about e.g. cognitive functions, VIP contains information about e.g. lifestyle and health indicators. The Linnaeus Database includes annually updated socioeconomic information from Statistics Sweden registers for all registered residents of Sweden for the period 1990 to 2006, in total 12,066,478. The information from the Betula includes 4,500 participants from the city of Umeå and VIP includes data for almost 90,000 participants. Both datasets include cross-sectional as well as longitudinal information. Due to the coverage and rich information, the Linnaeus Database allows for a variety of longitudinal studies on the relations between, for instance, socioeconomic conditions, health, lifestyle, cognition, family networks, migration and working conditions in ageing cohorts. By joining various datasets developed in different disciplinary traditions new possibilities for interdisciplinary research on ageing emerge.
JEnsembl: a version-aware Java API to Ensembl data systems.

PubMed

Paterson, Trevor; Law, Andy

2012-11-01

The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system.

PubMed

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
Distributed Operations Planning

NASA Technical Reports Server (NTRS)

Fox, Jason; Norris, Jeffrey; Powell, Mark; Rabe, Kenneth; Shams, Khawaja

2007-01-01

Maestro software provides a secure and distributed mission planning system for long-term missions in general, and the Mars Exploration Rover Mission (MER) specifically. Maestro, the successor to the Science Activity Planner, has a heavy emphasis on portability and distributed operations, and requires no data replication or expensive hardware, instead relying on a set of services functioning on JPL institutional servers. Maestro works on most current computers with network connections, including laptops. When browsing down-link data from a spacecraft, Maestro functions similarly to being on a Web browser. After authenticating the user, it connects to a database server to query an index of data products. It then contacts a Web server to download and display the actual data products. The software also includes collaboration support based upon a highly reliable messaging system. Modifications made to targets in one instance are quickly and securely transmitted to other instances of Maestro. The back end that has been developed for Maestro could benefit many future missions by reducing the cost of centralized operations system architecture.
Over 20 years of reaction access systems from MDL: a novel reaction substructure search algorithm.

PubMed

Chen, Lingran; Nourse, James G; Christie, Bradley D; Leland, Burton A; Grier, David L

2002-01-01

From REACCS, to MDL ISIS/Host Reaction Gateway, and most recently to MDL Relational Chemistry Server, a new product based on Oracle data cartridge technology, MDL's reaction database management and retrieval systems have undergone great changes. The evolution of the system architecture is briefly discussed. The evolution of MDL reaction substructure search (RSS) algorithms is detailed. This article mainly describes a novel RSS algorithm. This algorithm is based on a depth-first search approach and is able to fully and prospectively use reaction specific information, such as reacting center and atom-atom mapping (AAM) information. The new algorithm has been used in the recently released MDL Relational Chemistry Server and allows the user to precisely find reaction instances in databases while minimizing unrelated hits. Finally, the existing and new RSS algorithms are compared with several examples.
On selecting evidence to test hypotheses: A theory of selection tasks.

PubMed

Ragni, Marco; Kola, Ilir; Johnson-Laird, Philip N

2018-05-21

How individuals choose evidence to test hypotheses is a long-standing puzzle. According to an algorithmic theory that we present, it is based on dual processes: individuals' intuitions depending on mental models of the hypothesis yield selections of evidence matching instances of the hypothesis, but their deliberations yield selections of potential counterexamples to the hypothesis. The results of 228 experiments using Wason's selection task corroborated the theory's predictions. Participants made dependent choices of items of evidence: the selections in 99 experiments were significantly more redundant (using Shannon's measure) than those of 10,000 simulations of each experiment based on independent selections. Participants tended to select evidence corresponding to instances of hypotheses, or to its counterexamples, or to both. Given certain contents, instructions, or framings of the task, they were more likely to select potential counterexamples to the hypothesis. When participants received feedback about their selections in the "repeated" selection task, they switched from selections of instances of the hypothesis to selection of potential counterexamples. These results eliminated most of the 15 alternative theories of selecting evidence. In a meta-analysis, the model theory yielded a better fit of the results of 228 experiments than the one remaining theory based on reasoning rather than meaning. We discuss the implications of the model theory for hypothesis testing and for a well-known paradox of confirmation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
SSTAR, a Stand-Alone Easy-To-Use Antimicrobial Resistance Gene Predictor.

PubMed

de Man, Tom J B; Limbago, Brandi M

2016-01-01

We present the easy-to-use Sequence Search Tool for Antimicrobial Resistance, SSTAR. It combines a locally executed BLASTN search against a customizable database with an intuitive graphical user interface for identifying antimicrobial resistance (AR) genes from genomic data. Although the database is initially populated from a public repository of acquired resistance determinants (i.e., ARG-ANNOT), it can be customized for particular pathogen groups and resistance mechanisms. For instance, outer membrane porin sequences associated with carbapenem resistance phenotypes can be added, and known intrinsic mechanisms can be included. Unique about this tool is the ability to easily detect putative new alleles and truncated versions of existing AR genes. Variants and potential new alleles are brought to the attention of the user for further investigation. For instance, SSTAR is able to identify modified or truncated versions of porins, which may be of great importance in carbapenemase-negative carbapenem-resistant Enterobacteriaceae. SSTAR is written in Java and is therefore platform independent and compatible with both Windows and Unix operating systems. SSTAR and its manual, which includes a simple installation guide, are freely available from https://github.com/tomdeman-bio/Sequence-Search-Tool-for-Antimicrobial-Resistance-SSTAR-. IMPORTANCE Whole-genome sequencing (WGS) is quickly becoming a routine method for identifying genes associated with antimicrobial resistance (AR). However, for many microbiologists, the use and analysis of WGS data present a substantial challenge. We developed SSTAR, software with a graphical user interface that enables the identification of known AR genes from WGS and has the unique capacity to easily detect new variants of known AR genes, including truncated protein variants. Current software solutions do not notify the user when genes are truncated and, therefore, likely nonfunctional, which makes phenotype predictions less accurate. SSTAR users can apply any AR database of interest as a reference comparator and can manually add genes that impact resistance, even if such genes are not resistance determinants per se (e.g., porins and efflux pumps).
A framework for cross-observatory volcanological database management

NASA Astrophysics Data System (ADS)

Aliotta, Marco Antonio; Amore, Mauro; Cannavò, Flavio; Cassisi, Carmelo; D'Agostino, Marcello; Dolce, Mario; Mastrolia, Andrea; Mangiagli, Salvatore; Messina, Giuseppe; Montalto, Placido; Fabio Pisciotta, Antonino; Prestifilippo, Michele; Rossi, Massimo; Scarpato, Giovanni; Torrisi, Orazio

2017-04-01

In the last years, it has been clearly shown how the multiparametric approach is the winning strategy to investigate the complex dynamics of the volcanic systems. This involves the use of different sensor networks, each one dedicated to the acquisition of particular data useful for research and monitoring. The increasing interest devoted to the study of volcanological phenomena led the constitution of different research organizations or observatories, also relative to the same volcanoes, which acquire large amounts of data from sensor networks for the multiparametric monitoring. At INGV we developed a framework, hereinafter called TSDSystem (Time Series Database System), which allows to acquire data streams from several geophysical and geochemical permanent sensor networks (also represented by different data sources such as ASCII, ODBC, URL etc.), located on the main volcanic areas of Southern Italy, and relate them within a relational database management system. Furthermore, spatial data related to different dataset are managed using a GIS module for sharing and visualization purpose. The standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common space and time scale. In order to share data between INGV observatories, and also with Civil Protection, whose activity is related on the same volcanic districts, we designed a "Master View" system that, starting from the implementation of a number of instances of the TSDSystem framework (one for each observatory), makes possible the joint interrogation of data, both temporal and spatial, on instances located in different observatories, through the use of web services technology (RESTful, SOAP). Similarly, it provides metadata for equipment using standard schemas (such as FDSN StationXML). The "Master View" is also responsible for managing the data policy through a "who owns what" system, which allows you to associate viewing/download of spatial or time intervals to particular users or groups.
Validation of a common data model for active safety surveillance research

PubMed Central

Ryan, Patrick B; Reich, Christian G; Hartzema, Abraham G; Stang, Paul E

2011-01-01

Objective Systematic analysis of observational medical databases for active safety surveillance is hindered by the variation in data models and coding systems. Data analysts often find robust clinical data models difficult to understand and ill suited to support their analytic approaches. Further, some models do not facilitate the computations required for systematic analysis across many interventions and outcomes for large datasets. Translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts' understanding and the suitability for large-scale systematic analysis. In addition to facilitating analysis, a suitable CDM has to faithfully represent the source observational database. Before beginning to use the Observational Medical Outcomes Partnership (OMOP) CDM and a related dictionary of standardized terminologies for a study of large-scale systematic active safety surveillance, the authors validated the model's suitability for this use by example. Validation by example To validate the OMOP CDM, the model was instantiated into a relational database, data from 10 different observational healthcare databases were loaded into separate instances, a comprehensive array of analytic methods that operate on the data model was created, and these methods were executed against the databases to measure performance. Conclusion There was acceptable representation of the data from 10 observational databases in the OMOP CDM using the standardized terminologies selected, and a range of analytic methods was developed and executed with sufficient performance to be useful for active safety surveillance. PMID:22037893
MIPS PlantsDB: a database framework for comparative plant genome research.

PubMed

Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

2013-01-01

The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.
MIPS PlantsDB: a database framework for comparative plant genome research

PubMed Central

Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

2013-01-01

The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886
Evaluation of MALDI-TOF mass spectrometry for identification of environmental yeasts and development of supplementary database.

PubMed

Agustini, Bruna Carla; Silva, Luciano Paulino; Bloch, Carlos; Bonfim, Tania M B; da Silva, Gildo Almeida

2014-06-01

Yeast identification using traditional methods which employ morphological, physiological, and biochemical characteristics can be considered a hard task as it requires experienced microbiologists and a rigorous control in culture conditions that could implicate in different outcomes. Considering clinical or industrial applications, the fast and accurate identification of microorganisms is a crescent demand. Hence, molecular biology approaches has been extensively used and, more recently, protein profiling using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has proved to be an even more efficient tool for taxonomic purposes. Nonetheless, concerning to mass spectrometry, data available for the differentiation of yeast species for industrial purpose is limited and reference databases commercially available comprise almost exclusively clinical microorganisms. In this context, studies focusing on environmental isolates are required to extend the existing databases. The development of a supplementary database and the assessment of a commercial database for taxonomic identifications of environmental yeast are the aims of this study. We challenge MALDI-TOF MS to create protein profiles for 845 yeast strains isolated from grape must and 67.7 % of the strains were successfully identified according to previously available manufacturer database. The remaining 32.3 % strains were not identified due to the absence of a reference spectrum. After matching the correct taxon for these strains by using molecular biology approaches, the spectra concerning the missing species were added in a supplementary database. This new library was able to accurately predict unidentified species at first instance by MALDI-TOF MS, proving it is a powerful tool for the identification of environmental yeasts.
Specialized microbial databases for inductive exploration of microbial genome sequences

PubMed Central

Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

2005-01-01

Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
The Center for Integrated Molecular Brain Imaging (Cimbi) database.

PubMed

Knudsen, Gitte M; Jensen, Peter S; Erritzoe, David; Baaré, William F C; Ettrup, Anders; Fisher, Patrick M; Gillings, Nic; Hansen, Hanne D; Hansen, Lars Kai; Hasselbalch, Steen G; Henningsson, Susanne; Herth, Matthias M; Holst, Klaus K; Iversen, Pernille; Kessing, Lars V; Macoveanu, Julian; Madsen, Kathrine Skak; Mortensen, Erik L; Nielsen, Finn Årup; Paulson, Olaf B; Siebner, Hartwig R; Stenbæk, Dea S; Svarer, Claus; Jernigan, Terry L; Strother, Stephen C; Frokjaer, Vibe G

2016-01-01

We here describe a multimodality neuroimaging containing data from healthy volunteers and patients, acquired within the Lundbeck Foundation Center for Integrated Molecular Brain Imaging (Cimbi) in Copenhagen, Denmark. The data is of particular relevance for neurobiological research questions related to the serotonergic transmitter system with its normative data on the serotonergic subtype receptors 5-HT1A, 5-HT1B, 5-HT2A, and 5-HT4 and the 5-HT transporter (5-HTT), but can easily serve other purposes. The Cimbi database and Cimbi biobank were formally established in 2008 with the purpose to store the wealth of Cimbi-acquired data in a highly structured and standardized manner in accordance with the regulations issued by the Danish Data Protection Agency as well as to provide a quality-controlled resource for future hypothesis-generating and hypothesis-driven studies. The Cimbi database currently comprises a total of 1100 PET and 1000 structural and functional MRI scans and it holds a multitude of additional data, such as genetic and biochemical data, and scores from 17 self-reported questionnaires and from 11 neuropsychological paper/computer tests. The database associated Cimbi biobank currently contains blood and in some instances saliva samples from about 500 healthy volunteers and 300 patients with e.g., major depression, dementia, substance abuse, obesity, and impulsive aggression. Data continue to be added to the Cimbi database and biobank. Copyright © 2015. Published by Elsevier Inc.
Challenges of the information age: the impact of false discovery on pathway identification.

PubMed

Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

2012-11-21

Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.
Component Database for the APS Upgrade

DOE Office of Scientific and Technical Information (OSTI.GOV)

Veseli, S.; Arnold, N. D.; Jarosz, D. P.

The Advanced Photon Source Upgrade (APS-U) project will replace the existing APS storage ring with a multi-bend achromat (MBA) lattice to provide extreme transverse coherence and extreme brightness x-rays to its users. As the time to replace the existing storage ring accelerator is of critical concern, an aggressive one-year removal/installation/testing period is being planned. To aid in the management of the thousands of components to be installed in such a short time, the Component Database (CDB) application is being developed with the purpose to identify, document, track, locate, and organize components in a central database. Three major domains are beingmore » addressed: Component definitions (which together make up an exhaustive "Component Catalog"), Designs (groupings of components to create subsystems), and Component Instances (“Inventory”). Relationships between the major domains offer additional "system knowledge" to be captured that will be leveraged with future tools and applications. It is imperative to provide sub-system engineers with a functional application early in the machine design cycle. Topics discussed in this paper include the initial design and deployment of CDB, as well as future development plans.« less
A k-Vector Approach to Sampling, Interpolation, and Approximation

NASA Astrophysics Data System (ADS)

Mortari, Daniele; Rogers, Jonathan

2013-12-01

The k-vector search technique is a method designed to perform extremely fast range searching of large databases at computational cost independent of the size of the database. k-vector search algorithms have historically found application in satellite star-tracker navigation systems which index very large star catalogues repeatedly in the process of attitude estimation. Recently, the k-vector search algorithm has been applied to numerous other problem areas including non-uniform random variate sampling, interpolation of 1-D or 2-D tables, nonlinear function inversion, and solution of systems of nonlinear equations. This paper presents algorithms in which the k-vector search technique is used to solve each of these problems in a computationally-efficient manner. In instances where these tasks must be performed repeatedly on a static (or nearly-static) data set, the proposed k-vector-based algorithms offer an extremely fast solution technique that outperforms standard methods.

Jobs within a 30-minute transit ride - Service

EPA Pesticide Factsheets

This mapping service summarizes the total number of jobs that can be reached within 30 minutes by transit. EPA modeled accessibility via transit by calculating total travel time between block group centroids inclusive of walking to/from transit stops, wait times, and transfers. Block groups that can be accessed in 30 minutes or less from the origin block group are considered accessible. Values reflect public transit service in December 2012 and employment counts in 2010. Coverage is limited to census block groups within metropolitan regions served by transit agencies who share their service data in a standardized format called GTFS.All variable names refer to variables in EPA's Smart Location Database. For instance EmpTot10_sum summarizes total employment (EmpTot10) in block groups that are reachable within a 30-minute transit and walking commute. See Smart Location Database User Guide for full variable descriptions.
Jobs within a 30-minute transit ride - Download

EPA Pesticide Factsheets

A collection of performance indicators for consistently comparing neighborhoods (census block groups) across the US in regards to their accessibility to jobs or workers via public transit service. Accessibility was modeled by calculating total travel time between block group centroids inclusive of walking to/from transit stops, wait times, and transfers. Block groups that can be accessed in 30 minutes or less from the origin block group are considered accessible. Indicators reflect public transit service in December 2012 and employment/worker counts in 2010. Coverage is limited to census block groups within metropolitan regions served by transit agencies who share their service data in a standardized format called GTFS.All variable names refer to variables in EPA's Smart Location Database. For instance EmpTot10_sum summarizes total employment (EmpTot10) in block groups that are reachable within a 30-minute transit and walking commute. See Smart Location Database User Guide for full variable descriptions.
Data mining with unsupervised clustering using photonic micro-ring resonators

NASA Astrophysics Data System (ADS)

McAulay, Alastair D.

2013-09-01

Data is commonly moved through optical fiber in modern data centers and may be stored optically. We propose an optical method of data mining for future data centers to enhance performance. For example, in clustering, a form of unsupervised learning, we propose that parameters corresponding to information in a database are converted from analog values to frequencies, as in the brain's neurons, where similar data will have close frequencies. We describe the Wilson-Cowan model for oscillating neurons. In optics we implement the frequencies with micro ring resonators. Due to the influence of weak coupling, a group of resonators will form clusters of similar frequencies that will indicate the desired parameters having close relations. Fewer clusters are formed as clustering proceeds, which allows the creation of a tree showing topics of importance and their relationships in the database. The tree can be used for instance to target advertising and for planning.
rAvis: an R-package for downloading information stored in Proyecto AVIS, a citizen science bird project.

PubMed

Varela, Sara; González-Hernández, Javier; Casabella, Eduardo; Barrientos, Rafael

2014-01-01

Citizen science projects store an enormous amount of information about species distribution, diversity and characteristics. Researchers are now beginning to make use of this rich collection of data. However, access to these databases is not always straightforward. Apart from the largest and international projects, citizen science repositories often lack specific Application Programming Interfaces (APIs) to connect them to the scientific environments. Thus, it is necessary to develop simple routines to allow researchers to take advantage of the information collected by smaller citizen science projects, for instance, programming specific packages to connect them to popular scientific environments (like R). Here, we present rAvis, an R-package to connect R-users with Proyecto AVIS (http://proyectoavis.com), a Spanish citizen science project with more than 82,000 bird observation records. We develop several functions to explore the database, to plot the geographic distribution of the species occurrences, and to generate personal queries to the database about species occurrences (number of individuals, distribution, etc.) and birdwatcher observations (number of species recorded by each collaborator, UTMs visited, etc.). This new R-package will allow scientists to access this database and to exploit the information generated by Spanish birdwatchers over the last 40 years.
Ontology-based geospatial data query and integration

USGS Publications Warehouse

Zhao, T.; Zhang, C.; Wei, M.; Peng, Z.-R.

2008-01-01

Geospatial data sharing is an increasingly important subject as large amount of data is produced by a variety of sources, stored in incompatible formats, and accessible through different GIS applications. Past efforts to enable sharing have produced standardized data format such as GML and data access protocols such as Web Feature Service (WFS). While these standards help enabling client applications to gain access to heterogeneous data stored in different formats from diverse sources, the usability of the access is limited due to the lack of data semantics encoded in the WFS feature types. Past research has used ontology languages to describe the semantics of geospatial data but ontology-based queries cannot be applied directly to legacy data stored in databases or shapefiles, or to feature data in WFS services. This paper presents a method to enable ontology query on spatial data available from WFS services and on data stored in databases. We do not create ontology instances explicitly and thus avoid the problems of data replication. Instead, user queries are rewritten to WFS getFeature requests and SQL queries to database. The method also has the benefits of being able to utilize existing tools of databases, WFS, and GML while enabling query based on ontology semantics. ?? 2008 Springer-Verlag Berlin Heidelberg.
JEnsembl: a version-aware Java API to Ensembl data systems

PubMed Central

Paterson, Trevor; Law, Andy

2012-01-01

Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net). Contact: jensembl-develop@lists.sf.net, andy.law@roslin.ed.ac.uk, trevor.paterson@roslin.ed.ac.uk PMID:22945789
rAvis: An R-Package for Downloading Information Stored in Proyecto AVIS, a Citizen Science Bird Project

PubMed Central

Varela, Sara; González-Hernández, Javier; Casabella, Eduardo; Barrientos, Rafael

2014-01-01

Citizen science projects store an enormous amount of information about species distribution, diversity and characteristics. Researchers are now beginning to make use of this rich collection of data. However, access to these databases is not always straightforward. Apart from the largest and international projects, citizen science repositories often lack specific Application Programming Interfaces (APIs) to connect them to the scientific environments. Thus, it is necessary to develop simple routines to allow researchers to take advantage of the information collected by smaller citizen science projects, for instance, programming specific packages to connect them to popular scientific environments (like R). Here, we present rAvis, an R-package to connect R-users with Proyecto AVIS (http://proyectoavis.com), a Spanish citizen science project with more than 82,000 bird observation records. We develop several functions to explore the database, to plot the geographic distribution of the species occurrences, and to generate personal queries to the database about species occurrences (number of individuals, distribution, etc.) and birdwatcher observations (number of species recorded by each collaborator, UTMs visited, etc.). This new R-package will allow scientists to access this database and to exploit the information generated by Spanish birdwatchers over the last 40 years. PMID:24626233
EuPathDB: the eukaryotic pathogen genomics database resource

PubMed Central

Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

2017-01-01

The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906
The LncRNA Connectivity Map: Using LncRNA Signatures to Connect Small Molecules, LncRNAs, and Diseases.

PubMed

Yang, Haixiu; Shang, Desi; Xu, Yanjun; Zhang, Chunlong; Feng, Li; Sun, Zeguo; Shi, Xinrui; Zhang, Yunpeng; Han, Junwei; Su, Fei; Li, Chunquan; Li, Xia

2017-07-27

Well characterized the connections among diseases, long non-coding RNAs (lncRNAs) and drugs are important for elucidating the key roles of lncRNAs in biological mechanisms in various biological states. In this study, we constructed a database called LNCmap (LncRNA Connectivity Map), available at http://www.bio-bigdata.com/LNCmap/ , to establish the correlations among diseases, physiological processes, and the action of small molecule therapeutics by attempting to describe all biological states in terms of lncRNA signatures. By reannotating the microarray data from the Connectivity Map database, the LNCmap obtained 237 lncRNA signatures of 5916 instances corresponding to 1262 small molecular drugs. We provided a user-friendly interface for the convenient browsing, retrieval and download of the database, including detailed information and the associations of drugs and corresponding affected lncRNAs. Additionally, we developed two enrichment analysis methods for users to identify candidate drugs for a particular disease by inputting the corresponding lncRNA expression profiles or an associated lncRNA list and then comparing them to the lncRNA signatures in our database. Overall, LNCmap could significantly improve our understanding of the biological roles of lncRNAs and provide a unique resource to reveal the connections among drugs, lncRNAs and diseases.
GenoQuery: a new querying module for functional annotation in a genomic warehouse

PubMed Central

Lemoine, Frédéric; Labedan, Bernard; Froidevaux, Christine

2008-01-01

Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability: http://www.lri.fr/~lemoine/GenoQuery/ Contact: chris@lri.fr, lemoine@lri.fr PMID:18586731
Graphical tool for navigation within the semantic network of the UMLS metathesaurus on a locally installed database.

PubMed

Frankewitsch, T; Prokosch, H U

2000-01-01

Knowledge in the environment of information technologies is bound to structured vocabularies. Medical data dictionaries are necessary for uniquely describing findings like diagnoses, procedures or functions. Therefore we decided to locally install a version of the Unified Medical Language System (UMLS) of the U.S. National Library of Medicine as a repository for defining entries of a medical multimedia database. Because of the requirement to extend the vocabulary in concepts and relations between existing concepts a graphical tool for appending new items to the database has been developed: Although the database is an instance of a semantic network the focus on single entries offers the opportunity of reducing the net to a tree within this detail. Based on the graph theorem, there are definitions of nodes of concepts and nodes of knowledge. The UMLS additionally offers the specification of sub-relations, which can be represented, too. Using this view it is possible to manage these 1:n-Relations in a simple tree view. On this background an explorer like graphical user interface has been realised to add new concepts and define new relationships between those and existing entries for adapting the UMLS for specific purposes such as describing medical multimedia objects.
Enhancing navigation in biomedical databases by community voting and database-driven text classification

PubMed Central

Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph

2009-01-01

Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
Towards Monitoring-as-a-service for Scientific Computing Cloud applications using the ElasticSearch ecosystem

NASA Astrophysics Data System (ADS)

Bagnasco, S.; Berzano, D.; Guarise, A.; Lusso, S.; Masera, M.; Vallero, S.

2015-12-01

The INFN computing centre in Torino hosts a private Cloud, which is managed with the OpenNebula cloud controller. The infrastructure offers Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) services to different scientific computing applications. The main stakeholders of the facility are a grid Tier-2 site for the ALICE collaboration at LHC, an interactive analysis facility for the same experiment and a grid Tier-2 site for the BESIII collaboration, plus an increasing number of other small tenants. The dynamic allocation of resources to tenants is partially automated. This feature requires detailed monitoring and accounting of the resource usage. We set up a monitoring framework to inspect the site activities both in terms of IaaS and applications running on the hosted virtual instances. For this purpose we used the ElasticSearch, Logstash and Kibana (ELK) stack. The infrastructure relies on a MySQL database back-end for data preservation and to ensure flexibility to choose a different monitoring solution if needed. The heterogeneous accounting information is transferred from the database to the ElasticSearch engine via a custom Logstash plugin. Each use-case is indexed separately in ElasticSearch and we setup a set of Kibana dashboards with pre-defined queries in order to monitor the relevant information in each case. For the IaaS metering, we developed sensors for the OpenNebula API. The IaaS level information gathered through the API is sent to the MySQL database through an ad-hoc developed RESTful web service. Moreover, we have developed a billing system for our private Cloud, which relies on the RabbitMQ message queue for asynchronous communication to the database and on the ELK stack for its graphical interface. The Italian Grid accounting framework is also migrating to a similar set-up. Concerning the application level, we used the Root plugin TProofMonSenderSQL to collect accounting data from the interactive analysis facility. The BESIII virtual instances used to be monitored with Zabbix, as a proof of concept we also retrieve the information contained in the Zabbix database. In this way we have achieved a uniform monitoring interface for both the IaaS and the scientific applications, mostly leveraging off-the-shelf tools. At present, we are working to define a model for monitoring-as-a-service, based on the tools described above, which the Cloud tenants can easily configure to suit their specific needs.
Representing sentence information

NASA Astrophysics Data System (ADS)

Perkins, Walton A., III

1991-03-01

This paper describes a computer-oriented representation for sentence information. Whereas many Artificial Intelligence (AI) natural language systems start with a syntactic parse of a sentence into the linguist's components: noun, verb, adjective, preposition, etc., we argue that it is better to parse the input sentence into 'meaning' components: attribute, attribute value, object class, object instance, and relation. AI systems need a representation that will allow rapid storage and retrieval of information and convenient reasoning with that information. The attribute-of-object representation has proven useful for handling information in relational databases (which are well known for their efficiency in storage and retrieval) and for reasoning in knowledge- based systems. On the other hand, the linguist's syntactic representation of the works in sentences has not been shown to be useful for information handling and reasoning. We think it is an unnecessary and misleading intermediate form. Our sentence representation is semantic based in terms of attribute, attribute value, object class, object instance, and relation. Every sentence is segmented into one or more components with the form: 'attribute' of 'object' 'relation' 'attribute value'. Using only one format for all information gives the system simplicity and good performance as a RISC architecture does for hardware. The attribute-of-object representation is not new; it is used extensively in relational databases and knowledge-based systems. However, we will show that it can be used as a meaning representation for natural language sentences with minor extensions. In this paper we describe how a computer system can parse English sentences into this representation and generate English sentences from this representation. Much of this has been tested with computer implementation.
Geodata Modeling and Query in Geographic Information Systems

NASA Technical Reports Server (NTRS)

Adam, Nabil

1996-01-01

Geographic information systems (GIS) deal with collecting, modeling, man- aging, analyzing, and integrating spatial (locational) and non-spatial (attribute) data required for geographic applications. Examples of spatial data are digital maps, administrative boundaries, road networks, and those of non-spatial data are census counts, land elevations and soil characteristics. GIS shares common areas with a number of other disciplines such as computer- aided design, computer cartography, database management, and remote sensing. None of these disciplines however, can by themselves fully meet the requirements of a GIS application. Examples of such requirements include: the ability to use locational data to produce high quality plots, perform complex operations such as network analysis, enable spatial searching and overlay operations, support spatial analysis and modeling, and provide data management functions such as efficient storage, retrieval, and modification of large datasets; independence, integrity, and security of data; and concurrent access to multiple users. It is on the data management issues that we devote our discussions in this monograph. Traditionally, database management technology have been developed for business applications. Such applications require, among other things, capturing the data requirements of high-level business functions and developing machine- level implementations; supporting multiple views of data and yet providing integration that would minimize redundancy and maintain data integrity and security; providing a high-level language for data definition and manipulation; allowing concurrent access to multiple users; and processing user transactions in an efficient manner. The demands on database management systems have been for speed, reliability, efficiency, cost effectiveness, and user-friendliness. Significant progress have been made in all of these areas over the last two decades to the point that many generalized database platforms are now available for developing data intensive applications that run in real-time. While continuous improvement is still being made at a very fast-paced and competitive rate, new application areas such as computer aided design, image processing, VLSI design, and GIS have been identified by many as the next generation of database applications. These new application areas pose serious challenges to the currently available database technology. At the core of these challenges is the nature of data that is manipulated. In traditional database applications, the database objects do not have any spatial dimension, and as such, can be thought of as point data in a multi-dimensional space. For example, each instance of an entity EMPLOYEE will have a unique value corresponding to every attribute such as employee id, employee name, employee address and so on. Thus, every Employee instance can be thought of as a point in a multi-dimensional space where each dimension is represented by an attribute. Furthermore, all operations on such data are one-dimensional. Thus, users may retrieve all entities satisfying one or more constraints. Examples of such constraints include employees with addresses in a certain area code, or salaries within a certain range. Even though constraints can be specified on multiple attributes (dimensions), the search for such data is essentially orthogonal across these dimensions.
DOMMINO 2.0: integrating structurally resolved protein-, RNA-, and DNA-mediated macromolecular interactions

PubMed Central

Kuang, Xingyan; Dhroso, Andi; Han, Jing Ginger; Shyu, Chi-Ren; Korkin, Dmitry

2016-01-01

Macromolecular interactions are formed between proteins, DNA and RNA molecules. Being a principle building block in macromolecular assemblies and pathways, the interactions underlie most of cellular functions. Malfunctioning of macromolecular interactions is also linked to a number of diseases. Structural knowledge of the macromolecular interaction allows one to understand the interaction’s mechanism, determine its functional implications and characterize the effects of genetic variations, such as single nucleotide polymorphisms, on the interaction. Unfortunately, until now the interactions mediated by different types of macromolecules, e.g. protein–protein interactions or protein–DNA interactions, are collected into individual and unrelated structural databases. This presents a significant obstacle in the analysis of macromolecular interactions. For instance, the homogeneous structural interaction databases prevent scientists from studying structural interactions of different types but occurring in the same macromolecular complex. Here, we introduce DOMMINO 2.0, a structural Database Of Macro-Molecular INteractiOns. Compared to DOMMINO 1.0, a comprehensive database on protein-protein interactions, DOMMINO 2.0 includes the interactions between all three basic types of macromolecules extracted from PDB files. DOMMINO 2.0 is automatically updated on a weekly basis. It currently includes ∼1 040 000 interactions between two polypeptide subunits (e.g. domains, peptides, termini and interdomain linkers), ∼43 000 RNA-mediated interactions, and ∼12 000 DNA-mediated interactions. All protein structures in the database are annotated using SCOP and SUPERFAMILY family annotation. As a result, protein-mediated interactions involving protein domains, interdomain linkers, C- and N- termini, and peptides are identified. Our database provides an intuitive web interface, allowing one to investigate interactions at three different resolution levels: whole subunit network, binary interaction and interaction interface. Database URL: http://dommino.org PMID:26827237
Morphology-based Query for Galaxy Image Databases

NASA Astrophysics Data System (ADS)

Shamir, Lior

2017-02-01

Galaxies of rare morphology are of paramount scientific interest, as they carry important information about the past, present, and future Universe. Once a rare galaxy is identified, studying it more effectively requires a set of galaxies of similar morphology, allowing generalization and statistical analysis that cannot be done when N=1. Databases generated by digital sky surveys can contain a very large number of galaxy images, and therefore once a rare galaxy of interest is identified it is possible that more instances of the same morphology are also present in the database. However, when a researcher identifies a certain galaxy of rare morphology in the database, it is virtually impossible to mine the database manually in the search for galaxies of similar morphology. Here we propose a computer method that can automatically search databases of galaxy images and identify galaxies that are morphologically similar to a certain user-defined query galaxy. That is, the researcher provides an image of a galaxy of interest, and the pattern recognition system automatically returns a list of galaxies that are visually similar to the target galaxy. The algorithm uses a comprehensive set of descriptors, allowing it to support different types of galaxies, and it is not limited to a finite set of known morphologies. While the list of returned galaxies is neither clean nor complete, it contains a far higher frequency of galaxies of the morphology of interest, providing a substantial reduction of the data. Such algorithms can be integrated into data management systems of autonomous digital sky surveys such as the Large Synoptic Survey Telescope (LSST), where the number of galaxies in the database is extremely large. The source code of the method is available at http://vfacstaff.ltu.edu/lshamir/downloads/udat.
Enhancing Geoscience Research Discovery Through the Semantic Web

NASA Astrophysics Data System (ADS)

Rowan, Linda R.; Gross, M. Benjamin; Mayernik, Matthew; Khan, Huda; Boler, Frances; Maull, Keith; Stott, Don; Williams, Steve; Corson-Rikert, Jon; Johns, Erica M.; Daniels, Michael; Krafft, Dean B.; Meertens, Charles

2016-04-01

UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, a U.S. National Science Foundation EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to enhance connectivity across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Much of the VIVO ontology was built for the life sciences, so we have added some components of existing geoscience-based ontologies and a few terms from a local ontology that we created. The UNAVCO VIVO instance, connect.unavco.org, utilizes persistent identifiers whenever possible; for example using ORCIDs for people, publication DOIs, data DOIs and unique NSF grant numbers. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page shows, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can be queried using SPARQL, a query language for semantic data. EarthCollab is extending the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. About half of UNAVCO's membership is international and we hope to connect our data to institutions in other countries with a similar approach. Additional extensions, including enhanced geospatial capabilities, will be developed based on task-centered usability testing.
More evidence for non-maternal inheritance of mitochondrial DNA?

PubMed

Bandelt, H-J; Kong, Q-P; Parson, W; Salas, A

2005-12-01

A single case of paternal co-transmission of mitochondrial DNA (mtDNA) in humans has been reported so far. To find potential instances of non-maternal inheritance of mtDNA. Published medical case studies (of single patients) were searched for irregular mtDNA patterns by comparing the given haplotype information for different clones or tissues with the worldwide mtDNA database as known to date-a method that has proved robust and reliable for the detection of flawed mtDNA sequence data. More than 20 studies were found reporting clear cut instances with mtDNAs of different ancestries in single individuals. As examples, cases are reviewed from recent published reports which, at face value, may be taken as evidence for paternal inheritance of mtDNA or recombination. Multiple types (or recombinant types) of quite dissimilar mitochondrial DNA from different parts of the known mtDNA phylogeny are often reported in single individuals. From re-analyses and corrigenda of forensic mtDNA data, it is apparent that the phenomenon of mixed or mosaic mtDNA can be ascribed solely to contamination and sample mix up.
Automatic analysis of online image data for law enforcement agencies by concept detection and instance search

NASA Astrophysics Data System (ADS)

de Boer, Maaike H. T.; Bouma, Henri; Kruithof, Maarten C.; ter Haar, Frank B.; Fischer, Noëlle M.; Hagendoorn, Laurens K.; Joosten, Bart; Raaijmakers, Stephan

2017-10-01

The information available on-line and off-line, from open as well as from private sources, is growing at an exponential rate and places an increasing demand on the limited resources of Law Enforcement Agencies (LEAs). The absence of appropriate tools and techniques to collect, process, and analyze the volumes of complex and heterogeneous data has created a severe information overload. If a solution is not found, the impact on law enforcement will be dramatic, e.g. because important evidence is missed or the investigation time is too long. Furthermore, there is an uneven level of capabilities to deal with the large volumes of complex and heterogeneous data that come from multiple open and private sources at national level across the EU, which hinders cooperation and information sharing. Consequently, there is a pertinent need to develop tools, systems and processes which expedite online investigations. In this paper, we describe a suite of analysis tools to identify and localize generic concepts, instances of objects and logos in images, which constitutes a significant portion of everyday law enforcement data. We describe how incremental learning based on only a few examples and large-scale indexing are addressed in both concept detection and instance search. Our search technology allows querying of the database by visual examples and by keywords. Our tools are packaged in a Docker container to guarantee easy deployment on a system and our tools exploit possibilities provided by open source toolboxes, contributing to the technical autonomy of LEAs.

An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE Office of Scientific and Technical Information (OSTI.GOV)

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system

DOE PAGES

AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide

2015-11-19

Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An alternative database approach for management of SNOMED CT and improved patient data queries.

PubMed

Campbell, W Scott; Pedersen, Jay; McClay, James C; Rao, Praveen; Bastola, Dhundy; Campbell, James R

2015-10-01

SNOMED CT is the international lingua franca of terminologies for human health. Based in Description Logics (DL), the terminology enables data queries that incorporate inferences between data elements, as well as, those relationships that are explicitly stated. However, the ontologic and polyhierarchical nature of the SNOMED CT concept model make it difficult to implement in its entirety within electronic health record systems that largely employ object oriented or relational database architectures. The result is a reduction of data richness, limitations of query capability and increased systems overhead. The hypothesis of this research was that a graph database (graph DB) architecture using SNOMED CT as the basis for the data model and subsequently modeling patient data upon the semantic core of SNOMED CT could exploit the full value of the terminology to enrich and support advanced data querying capability of patient data sets. The hypothesis was tested by instantiating a graph DB with the fully classified SNOMED CT concept model. The graph DB instance was tested for integrity by calculating the transitive closure table for the SNOMED CT hierarchy and comparing the results with transitive closure tables created using current, validated methods. The graph DB was then populated with 461,171 anonymized patient record fragments and over 2.1 million associated SNOMED CT clinical findings. Queries, including concept negation and disjunction, were then run against the graph database and an enterprise Oracle relational database (RDBMS) of the same patient data sets. The graph DB was then populated with laboratory data encoded using LOINC, as well as, medication data encoded with RxNorm and complex queries performed using LOINC, RxNorm and SNOMED CT to identify uniquely described patient populations. A graph database instance was successfully created for two international releases of SNOMED CT and two US SNOMED CT editions. Transitive closure tables and descriptive statistics generated using the graph database were identical to those using validated methods. Patient queries produced identical patient count results to the Oracle RDBMS with comparable times. Database queries involving defining attributes of SNOMED CT concepts were possible with the graph DB. The same queries could not be directly performed with the Oracle RDBMS representation of the patient data and required the creation and use of external terminology services. Further, queries of undefined depth were successful in identifying unknown relationships between patient cohorts. The results of this study supported the hypothesis that a patient database built upon and around the semantic model of SNOMED CT was possible. The model supported queries that leveraged all aspects of the SNOMED CT logical model to produce clinically relevant query results. Logical disjunction and negation queries were possible using the data model, as well as, queries that extended beyond the structural IS_A hierarchy of SNOMED CT to include queries that employed defining attribute-values of SNOMED CT concepts as search parameters. As medical terminologies, such as SNOMED CT, continue to expand, they will become more complex and model consistency will be more difficult to assure. Simultaneously, consumers of data will increasingly demand improvements to query functionality to accommodate additional granularity of clinical concepts without sacrificing speed. This new line of research provides an alternative approach to instantiating and querying patient data represented using advanced computable clinical terminologies. Copyright © 2015 Elsevier Inc. All rights reserved.
Recognizing vocal emotions in Mandarin Chinese: a validated database of Chinese vocal emotional stimuli.

PubMed

Liu, Pan; Pell, Marc D

2012-12-01

To establish a valid database of vocal emotional stimuli in Mandarin Chinese, a set of Chinese pseudosentences (i.e., semantically meaningless sentences that resembled real Chinese) were produced by four native Mandarin speakers to express seven emotional meanings: anger, disgust, fear, sadness, happiness, pleasant surprise, and neutrality. These expressions were identified by a group of native Mandarin listeners in a seven-alternative forced choice task, and items reaching a recognition rate of at least three times chance performance in the seven-choice task were selected as a valid database and then subjected to acoustic analysis. The results demonstrated expected variations in both perceptual and acoustic patterns of the seven vocal emotions in Mandarin. For instance, fear, anger, sadness, and neutrality were associated with relatively high recognition, whereas happiness, disgust, and pleasant surprise were recognized less accurately. Acoustically, anger and pleasant surprise exhibited relatively high mean f0 values and large variation in f0 and amplitude; in contrast, sadness, disgust, fear, and neutrality exhibited relatively low mean f0 values and small amplitude variations, and happiness exhibited a moderate mean f0 value and f0 variation. Emotional expressions varied systematically in speech rate and harmonics-to-noise ratio values as well. This validated database is available to the research community and will contribute to future studies of emotional prosody for a number of purposes. To access the database, please contact pan.liu@mail.mcgill.ca.
PhyloExplorer: a web server to validate, explore and query phylogenetic trees

PubMed Central

Ranwez, Vincent; Clairon, Nicolas; Delsuc, Frédéric; Pourali, Saeed; Auberval, Nicolas; Diser, Sorel; Berry, Vincent

2009-01-01

Background Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees. Results We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database. Conclusion PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: . PMID:19450253
Revisiting the Canadian English vowel space

NASA Astrophysics Data System (ADS)

Hagiwara, Robert

2005-04-01

In order to fill a need for experimental-acoustic baseline measurements of Canadian English vowels, a database is currently being constructed in Winnipeg, Manitoba. The database derives from multiple repetitions of fifteen English vowels (eleven standard monophthongs, syllabic /r/ and three standard diphthongs) in /hVd/ and /hVt/ contexts, as spoken by multiple speakers. Frequencies of the first four formants are taken from three timepoints in every vowel token (25, 50, and 75% of vowel duration). Preliminary results (from five men and five women) confirm some features characteristic of Canadian English, but call others into question. For instance the merger of low back vowels appears to be complete for these speakers, but the result is a lower-mid and probably rounded vowel rather than the low back unround vowel often described. With these data Canadian Raising can be quantified as an average 200 Hz or 1.5 Bark downward shift in the frequency of F1 before voiceless /t/. Analysis of the database will lead to a more accurate picture of the Canadian English vowel system, as well as provide a practical and up-to-date point of reference for further phonetic and sociophonetic comparisons.
Computational prediction of new auxetic materials.

PubMed

Dagdelen, John; Montoya, Joseph; de Jong, Maarten; Persson, Kristin

2017-08-22

Auxetics comprise a rare family of materials that manifest negative Poisson's ratio, which causes an expansion instead of contraction under tension. Most known homogeneously auxetic materials are porous foams or artificial macrostructures and there are few examples of inorganic materials that exhibit this behavior as polycrystalline solids. It is now possible to accelerate the discovery of materials with target properties, such as auxetics, using high-throughput computations, open databases, and efficient search algorithms. Candidates exhibiting features correlating with auxetic behavior were chosen from the set of more than 67 000 materials in the Materials Project database. Poisson's ratios were derived from the calculated elastic tensor of each material in this reduced set of compounds. We report that this strategy results in the prediction of three previously unidentified homogeneously auxetic materials as well as a number of compounds with a near-zero homogeneous Poisson's ratio, which are here denoted "anepirretic materials".There are very few inorganic materials with auxetic homogenous Poisson's ratio in polycrystalline form. Here authors develop an approach to screening materials databases for target properties such as negative Poisson's ratio by using stability and structural motifs to predict new instances of homogenous auxetic behavior as well as a number of materials with near-zero Poisson's ratio.
A cloud-based system for measuring radiation treatment plan similarity

NASA Astrophysics Data System (ADS)

Andrea, Jennifer

PURPOSE: Radiation therapy is used to treat cancer using carefully designed plans that maximize the radiation dose delivered to the target and minimize damage to healthy tissue, with the dose administered over multiple occasions. Creating treatment plans is a laborious process and presents an obstacle to more frequent replanning, which remains an unsolved problem. However, in between new plans being created, the patient's anatomy can change due to multiple factors including reduction in tumor size and loss of weight, which results in poorer patient outcomes. Cloud computing is a newer technology that is slowly being used for medical applications with promising results. The objective of this work was to design and build a system that could analyze a database of previously created treatment plans, which are stored with their associated anatomical information in studies, to find the one with the most similar anatomy to a new patient. The analyses would be performed in parallel on the cloud to decrease the computation time of finding this plan. METHODS: The system used SlicerRT, a radiation therapy toolkit for the open-source platform 3D Slicer, for its tools to perform the similarity analysis algorithm. Amazon Web Services was used for the cloud instances on which the analyses were performed, as well as for storage of the radiation therapy studies and messaging between the instances and a master local computer. A module was built in SlicerRT to provide the user with an interface to direct the system on the cloud, as well as to perform other related tasks. RESULTS: The cloud-based system out-performed previous methods of conducting the similarity analyses in terms of time, as it analyzed 100 studies in approximately 13 minutes, and produced the same similarity values as those methods. It also scaled up to larger numbers of studies to analyze in the database with a small increase in computation time of just over 2 minutes. CONCLUSION: This system successfully analyzes a large database of radiation therapy studies and finds the one that is most similar to a new patient, which represents a potential step forward in achieving feasible adaptive radiation therapy replanning.
Legacy2Drupal - Conversion of an existing oceanographic relational database to a semantically enabled Drupal content management system

NASA Astrophysics Data System (ADS)

Maffei, A. R.; Chandler, C. L.; Work, T.; Allen, J.; Groman, R. C.; Fox, P. A.

2009-12-01

Content Management Systems (CMSs) provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have previously designed customized schemas for their metadata. The WHOI Ocean Informatics initiative and the NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) have jointly sponsored a project to port an existing, relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal6 Content Management System. The goal was to translate all the existing database tables, input forms, website reports, and other features present in the existing system to employ Drupal CMS features. The replacement features include Drupal content types, CCK node-reference fields, themes, RDB, SPARQL, workflow, and a number of other supporting modules. Strategic use of some Drupal6 CMS features enables three separate but complementary interfaces that provide access to oceanographic research metadata via the MySQL database: 1) a Drupal6-powered front-end; 2) a standard SQL port (used to provide a Mapserver interface to the metadata and data; and 3) a SPARQL port (feeding a new faceted search capability being developed). Future plans include the creation of science ontologies, by scientist/technologist teams, that will drive semantically-enabled faceted search capabilities planned for the site. Incorporation of semantic technologies included in the future Drupal 7 core release is also anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 6 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO database for interoperability with other ecosystem databases.
The EXOSAT database and archive

NASA Technical Reports Server (NTRS)

Reynolds, A. P.; Parmar, A. N.

1992-01-01

The EXOSAT database provides on-line access to the results and data products (spectra, images, and lightcurves) from the EXOSAT mission as well as access to data and logs from a number of other missions (such as EINSTEIN, COS-B, ROSAT, and IRAS). In addition, a number of familiar optical, infrared, and x ray catalogs, including the Hubble Space Telescope (HST) guide star catalog are available. The complete database is located at the EXOSAT observatory at ESTEC in the Netherlands and is accessible remotely via a captive account. The database management system was specifically developed to efficiently access the database and to allow the user to perform statistical studies on large samples of astronomical objects as well as to retrieve scientific and bibliographic information on single sources. The system was designed to be mission independent and includes timing, image processing, and spectral analysis packages as well as software to allow the easy transfer of analysis results and products to the user's own institute. The archive at ESTEC comprises a subset of the EXOSAT observations, stored on magnetic tape. Observations of particular interest were copied in compressed format to an optical jukebox, allowing users to retrieve and analyze selected raw data entirely from their terminals. Such analysis may be necessary if the user's needs are not accommodated by the products contained in the database (in terms of time resolution, spectral range, and the finesse of the background subtraction, for instance). Long-term archiving of the full final observation data is taking place at ESRIN in Italy as part of the ESIS program, again using optical media, and ESRIN have now assumed responsibility for distributing the data to the community. Tests showed that raw observational data (typically several tens of megabytes for a single target) can be transferred via the existing networks in reasonable time.
CLSI-based transference of the CALIPER database of pediatric reference intervals from Abbott to Beckman, Ortho, Roche and Siemens Clinical Chemistry Assays: direct validation using reference samples from the CALIPER cohort.

PubMed

Estey, Mathew P; Cohen, Ashley H; Colantonio, David A; Chan, Man Khun; Marvasti, Tina Binesh; Randell, Edward; Delvin, Edgard; Cousineau, Jocelyne; Grey, Vijaylaxmi; Greenway, Donald; Meng, Qing H; Jung, Benjamin; Bhuiyan, Jalaluddin; Seccombe, David; Adeli, Khosrow

2013-09-01

The CALIPER program recently established a comprehensive database of age- and sex-stratified pediatric reference intervals for 40 biochemical markers. However, this database was only directly applicable for Abbott ARCHITECT assays. We therefore sought to expand the scope of this database to biochemical assays from other major manufacturers, allowing for a much wider application of the CALIPER database. Based on CLSI C28-A3 and EP9-A2 guidelines, CALIPER reference intervals were transferred (using specific statistical criteria) to assays performed on four other commonly used clinical chemistry platforms including Beckman Coulter DxC800, Ortho Vitros 5600, Roche Cobas 6000, and Siemens Vista 1500. The resulting reference intervals were subjected to a thorough validation using 100 reference specimens (healthy community children and adolescents) from the CALIPER bio-bank, and all testing centers participated in an external quality assessment (EQA) evaluation. In general, the transferred pediatric reference intervals were similar to those established in our previous study. However, assay-specific differences in reference limits were observed for many analytes, and in some instances were considerable. The results of the EQA evaluation generally mimicked the similarities and differences in reference limits among the five manufacturers' assays. In addition, the majority of transferred reference intervals were validated through the analysis of CALIPER reference samples. This study greatly extends the utility of the CALIPER reference interval database which is now directly applicable for assays performed on five major analytical platforms in clinical use, and should permit the worldwide application of CALIPER pediatric reference intervals. Copyright © 2013 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
A Dashboard for the Italian Computing in ALICE

NASA Astrophysics Data System (ADS)

Elia, D.; Vino, G.; Bagnasco, S.; Crescente, A.; Donvito, G.; Franco, A.; Lusso, S.; Mura, D.; Piano, S.; Platania, G.; ALICE Collaboration

2017-10-01

A dashboard devoted to the computing in the Italian sites for the ALICE experiment at the LHC has been deployed. A combination of different complementary monitoring tools is typically used in most of the Tier-2 sites: this makes somewhat difficult to figure out at a glance the status of the site and to compare information extracted from different sources for debugging purposes. To overcome these limitations a dedicated ALICE dashboard has been designed and implemented in each of the ALICE Tier-2 sites in Italy: in particular, it provides a single, interactive and easily customizable graphical interface where heterogeneous data are presented. The dashboard is based on two main ingredients: an open source time-series database and a dashboard builder tool for visualizing time-series metrics. Various sensors, able to collect data from the multiple data sources, have been also written. A first version of a national computing dashboard has been implemented using a specific instance of the builder to gather data from all the local databases.
Information Network Model Query Processing

NASA Astrophysics Data System (ADS)

Song, Xiaopu

Information Networking Model (INM) [31] is a novel database model for real world objects and relationships management. It naturally and directly supports various kinds of static and dynamic relationships between objects. In INM, objects are networked through various natural and complex relationships. INM Query Language (INM-QL) [30] is designed to explore such information network, retrieve information about schema, instance, their attributes, relationships, and context-dependent information, and process query results in the user specified form. INM database management system has been implemented using Berkeley DB, and it supports INM-QL. This thesis is mainly focused on the implementation of the subsystem that is able to effectively and efficiently process INM-QL. The subsystem provides a lexical and syntactical analyzer of INM-QL, and it is able to choose appropriate evaluation strategies and index mechanism to process queries in INM-QL without the user's intervention. It also uses intermediate result structure to hold intermediate query result and other helping structures to reduce complexity of query processing.
Usability flaws of medication-related alerting functions: A systematic qualitative review.

PubMed

Marcilly, Romaric; Ammenwerth, Elske; Vasseur, Francis; Roehrer, Erin; Beuscart-Zéphir, Marie-Catherine

2015-06-01

Medication-related alerting functions may include usability flaws that limit their optimal use. A first step on the way to preventing usability flaws is to understand the characteristics of these usability flaws. This systematic qualitative review aims to analyze the type of usability flaws found in medication-related alerting functions. Papers were searched via PubMed, Scopus and Ergonomics Abstracts databases, along with references lists. Paper selection, data extraction and data analysis was performed by two to three Human Factors experts. Meaningful semantic units representing instances of usability flaws were the main data extracted. They were analyzed through qualitative methods: categorization following general usability heuristics and through an inductive process for the flaws specific to medication-related alerting functions. From the 6380 papers initially identified, 26 met all eligibility criteria. The analysis of the papers identified a total of 168 instances of usability flaws that could be classified into 13 categories of usability flaws representing either violations of general usability principles (i.e. they could be found in any system, e.g. guidance and workload issues) or infractions specific to medication-related alerting functions. The latter refer to issues of low signal-to-noise ratio, incomplete content of alerts, transparency, presentation mode and timing, missing alert features, tasks and control distribution. The list of 168 instances of usability flaws of medication-related alerting functions provides a source of knowledge for checking the usability of medication-related alerting functions during their design and evaluation process and ultimately constructs evidence-based usability design principles for these functions. Copyright © 2015 Elsevier Inc. All rights reserved.
The influence of negative training set size on machine learning-based virtual screening.

PubMed

Kurczab, Rafał; Smusz, Sabina; Bojarski, Andrzej J

2014-01-01

The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening.
The influence of negative training set size on machine learning-based virtual screening

PubMed Central

2014-01-01

Background The paper presents a thorough analysis of the influence of the number of negative training examples on the performance of machine learning methods. Results The impact of this rather neglected aspect of machine learning methods application was examined for sets containing a fixed number of positive and a varying number of negative examples randomly selected from the ZINC database. An increase in the ratio of positive to negative training instances was found to greatly influence most of the investigated evaluating parameters of ML methods in simulated virtual screening experiments. In a majority of cases, substantial increases in precision and MCC were observed in conjunction with some decreases in hit recall. The analysis of dynamics of those variations let us recommend an optimal composition of training data. The study was performed on several protein targets, 5 machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest) and 2 types of molecular fingerprints (MACCS and CDK FP). The most effective classification was provided by the combination of CDK FP with SMO or Random Forest algorithms. The Naïve Bayes models appeared to be hardly sensitive to changes in the number of negative instances in the training set. Conclusions In conclusion, the ratio of positive to negative training instances should be taken into account during the preparation of machine learning experiments, as it might significantly influence the performance of particular classifier. What is more, the optimization of negative training set size can be applied as a boosting-like approach in machine learning-based virtual screening. PMID:24976867
LISTA, LISTA-HOP and LISTA-HON: a comprehensive compilation of protein encoding sequences and its associated homology databases from the yeast Saccharomyces.

PubMed Central

Dölz, R; Mossé, M O; Slonimski, P P; Bairoch, A; Linder, P

1996-01-01

We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. As in previous editions the genetic names are consistently associated to each sequence with a known and confirmed ORF. If necessary, synonyms are given in the case of allelic duplicated sequences. Although the first publication of a sequence gives-according to our rules-the genetic name of a gene, in some instances more commonly used names are given to avoid nomenclature problems and the use of ancient designations which are no longer used. In these cases the old designation is given as synonym. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, SWISSPROT and EMBL accession numbers. New entries will also contain the name from the systematic sequencing efforts. Since the release of LISTA4.1 we update the database continuously. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. This release includes reports from full Smith and Watermann peptide-level searches against a non-redundant protein sequence database. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). The database is available by FTP and on World Wide Web. PMID:8594599
PDB ligand conformational energies calculated quantum-mechanically.

PubMed

Sitzmann, Markus; Weidlich, Iwona E; Filippov, Igor V; Liao, Chenzhong; Peach, Megan L; Ihlenfeldt, Wolf-Dietrich; Karki, Rajeshri G; Borodina, Yulia V; Cachau, Raul E; Nicklaus, Marc C

2012-03-26

We present here a greatly updated version of an earlier study on the conformational energies of protein-ligand complexes in the Protein Data Bank (PDB) [Nicklaus et al. Bioorg. Med. Chem. 1995, 3, 411-428], with the goal of improving on all possible aspects such as number and selection of ligand instances, energy calculations performed, and additional analyses conducted. Starting from about 357,000 ligand instances deposited in the 2008 version of the Ligand Expo database of the experimental 3D coordinates of all small-molecule instances in the PDB, we created a "high-quality" subset of ligand instances by various filtering steps including application of crystallographic quality criteria and structural unambiguousness. Submission of 640 Gaussian 03 jobs yielded a set of about 415 successfully concluded runs. We used a stepwise optimization of internal degrees of freedom at the DFT level of theory with the B3LYP/6-31G(d) basis set and a single-point energy calculation at B3LYP/6-311++G(3df,2p) after each round of (partial) optimization to separate energy changes due to bond length stretches vs bond angle changes vs torsion changes. Even for the most "conservative" choice of all the possible conformational energies-the energy difference between the conformation in which all internal degrees of freedom except torsions have been optimized and the fully optimized conformer-significant energy values were found. The range of 0 to ~25 kcal/mol was populated quite evenly and independently of the crystallographic resolution. A smaller number of "outliers" of yet higher energies were seen only at resolutions above 1.3 Å. The energies showed some correlation with molecular size and flexibility but not with crystallographic quality metrics such as the Cruickshank diffraction-component precision index (DPI) and R(free)-R, or with the ligand instance-specific metrics such as occupancy-weighted B-factor (OWAB), real-space R factor (RSR), and real-space correlation coefficient (RSCC). We repeated these calculations with the solvent model IEFPCM, which yielded energy differences that were generally somewhat lower than the corresponding vacuum results but did not produce a qualitatively different picture. Torsional sampling around the crystal conformation at the molecular mechanics level using the MMFF94s force field typically led to an increase in energy. © 2012 American Chemical Society
Turbulence Modeling: Progress and Future Outlook

NASA Technical Reports Server (NTRS)

Marvin, Joseph G.; Huang, George P.

1996-01-01

Progress in the development of the hierarchy of turbulence models for Reynolds-averaged Navier-Stokes codes used in aerodynamic applications is reviewed. Steady progress is demonstrated, but transfer of the modeling technology has not kept pace with the development and demands of the computational fluid dynamics (CFD) tools. An examination of the process of model development leads to recommendations for a mid-course correction involving close coordination between modelers, CFD developers, and application engineers. In instances where the old process is changed and cooperation enhanced, timely transfer is realized. A turbulence modeling information database is proposed to refine the process and open it to greater participation among modeling and CFD practitioners.
Obedience lite.

PubMed

Elms, Alan C

2009-01-01

Jerry M. Burger's partial replication of Stanley Milgram's (1963, 1965, 1974) classic experiments on obedience to authority is considered from the viewpoint of a contributor and witness to the original obedience experiments. Although Burger's replication succeeded in terms of gaining the approval of his local institutional review board, it did so by removing a large portion of the stressful circumstances that made Milgram's findings so psychologically interesting and so broadly applicable to instances of real-world destructive obedience. However, Burger has provided an initial demonstration that his "obedience lite" procedures can be used to extend the study of certain situational and personality variables beyond those examined by Milgram. PsycINFO Database Record 2009 APA.

Technology solutions to support supervisory activities and also to provide information access to the society

NASA Astrophysics Data System (ADS)

Paladini, D.; Mello, A. B.

2016-07-01

Inmetro's data about the conformity of certificated products, process and services are, usually, displayed at fragmented databases of difficult access for several reasons, for instance, the lack of computational solutions which allow this kind of access to its users. A discussion about some of the technological solutions to support supervisory activities by the appropriate regulatory bodies and also to provide information access to society in general is herein presented, along with a theoretical explanation of the pros and cons of such technologies to the conclusion that a mobile platform seems to be the best tool for the requirements of Inmetro.
Exploration of large, rare copy number variants associated with psychiatric and neurodevelopmental disorders in individuals with anorexia nervosa.

PubMed

Yilmaz, Zeynep; Szatkiewicz, Jin P; Crowley, James J; Ancalade, NaEshia; Brandys, Marek K; van Elburg, Annemarie; de Kovel, Carolien G F; Adan, Roger A H; Hinney, Anke; Hebebrand, Johannes; Gratacos, Monica; Fernandez-Aranda, Fernando; Escaramis, Georgia; Gonzalez, Juan R; Estivill, Xavier; Zeggini, Eleftheria; Sullivan, Patrick F; Bulik, Cynthia M

2017-08-01

Anorexia nervosa (AN) is a serious and heritable psychiatric disorder. To date, studies of copy number variants (CNVs) have been limited and inconclusive because of small sample sizes. We conducted a case-only genome-wide CNV survey in 1983 female AN cases included in the Genetic Consortium for Anorexia Nervosa. Following stringent quality control procedures, we investigated whether pathogenic CNVs in regions previously implicated in psychiatric and neurodevelopmental disorders were present in AN cases. We observed two instances of the well-established pathogenic CNVs in AN cases. In addition, one case had a deletion in the 13q12 region, overlapping with a deletion reported previously in two AN cases. As a secondary aim, we also examined our sample for CNVs over 1 Mbp in size. Out of the 40 instances of such large CNVs that were not implicated previously for AN or neuropsychiatric phenotypes, two of them contained genes with previous neuropsychiatric associations, and only five of them had no associated reports in public CNV databases. Although ours is the largest study of its kind in AN, larger datasets are needed to comprehensively assess the role of CNVs in the etiology of AN.
An expectancy-value model of emotion regulation: implications for motivation, emotional experience, and decision making.

PubMed

Tamir, Maya; Bigman, Yochanan E; Rhodes, Emily; Salerno, James; Schreier, Jenna

2015-02-01

According to expectancy-value models of self-regulation, people are motivated to act in ways they expect to be useful to them. For instance, people are motivated to run when they believe running is useful, even when they have nothing to run away from. Similarly, we propose an expectancy-value model of emotion regulation, according to which people are motivated to emote in ways they expect to be useful to them, regardless of immediate contextual demands. For instance, people may be motivated to get angry when they believe anger is useful, even when there is nothing to be angry about. In 5 studies, we demonstrate that leading people to expect an emotion to be useful increased their motivation to experience that emotion (Studies 1-5), led them to up-regulate the experience of that emotion (Studies 3-4), and led to emotion-consistent behavior (Study 4). Our hypotheses were supported when we manipulated the expected value of anxiety (Study 1) and anger (Studies 2-5), both consciously (Studies 1-4) and unconsciously (Study 5). We discuss the theoretical and pragmatic implications of the proposed model. PsycINFO Database Record (c) 2015 APA, all rights reserved.
Time-varying BRDFs.

PubMed

Sun, Bo; Sunkavalli, Kalyan; Ramamoorthi, Ravi; Belhumeur, Peter N; Nayar, Shree K

2007-01-01

The properties of virtually all real-world materials change with time, causing their bidirectional reflectance distribution functions (BRDFs) to be time varying. However, none of the existing BRDF models and databases take time variation into consideration; they represent the appearance of a material at a single time instance. In this paper, we address the acquisition, analysis, modeling, and rendering of a wide range of time-varying BRDFs (TVBRDFs). We have developed an acquisition system that is capable of sampling a material's BRDF at multiple time instances, with each time sample acquired within 36 sec. We have used this acquisition system to measure the BRDFs of a wide range of time-varying phenomena, which include the drying of various types of paints (watercolor, spray, and oil), the drying of wet rough surfaces (cement, plaster, and fabrics), the accumulation of dusts (household and joint compound) on surfaces, and the melting of materials (chocolate). Analytic BRDF functions are fit to these measurements and the model parameters' variations with time are analyzed. Each category exhibits interesting and sometimes nonintuitive parameter trends. These parameter trends are then used to develop analytic TVBRDF models. The analytic TVBRDF models enable us to apply effects such as paint drying and dust accumulation to arbitrary surfaces and novel materials.
The Human Transcript Database: A Catalogue of Full Length cDNA Inserts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouckk John; Michael McLeod; Kim Worley

1999-09-10

The BCM Search Launcher provided improved access to web-based sequence analysis services during the granting period and beyond. The Search Launcher web site grouped analysis procedures by function and provided default parameters that provided reasonable search results for most applications. For instance, most queries were automatically masked for repeat sequences prior to sequence database searches to avoid spurious matches. In addition to the web-based access and arrangements that were made using the functions easier, the BCM Search Launcher provided unique value-added applications like the BEAUTY sequence database search tool that combined information about protein domains and sequence database search resultsmore » to give an enhanced, more complete picture of the reliability and relative value of the information reported. This enhanced search tool made evaluating search results more straight-forward and consistent. Some of the favorite features of the web site are the sequence utilities and the batch client functionality that allows processing of multiple samples from the command line interface. One measure of the success of the BCM Search Launcher is the number of sites that have adopted the models first developed on the site. The graphic display on the BLAST search from the NCBI web site is one such outgrowth, as is the display of protein domain search results within BLAST search results, and the design of the Biology Workbench application. The logs of usage and comments from users confirm the great utility of this resource.« less
ARIADNE: a Tracking System for Relationships in LHCb Metadata

NASA Astrophysics Data System (ADS)

Shapoval, I.; Clemencic, M.; Cattaneo, M.

2014-06-01

The data processing model of the LHCb experiment implies handling of an evolving set of heterogeneous metadata entities and relationships between them. The entities range from software and databases states to architecture specificators and software/data deployment locations. For instance, there is an important relationship between the LHCb Conditions Database (CondDB), which provides versioned, time dependent geometry and conditions data, and the LHCb software, which is the data processing applications (used for simulation, high level triggering, reconstruction and analysis of physics data). The evolution of CondDB and of the LHCb applications is a weakly-homomorphic process. It means that relationships between a CondDB state and LHCb application state may not be preserved across different database and application generations. These issues may lead to various kinds of problems in the LHCb production, varying from unexpected application crashes to incorrect data processing results. In this paper we present Ariadne - a generic metadata relationships tracking system based on the novel NoSQL Neo4j graph database. Its aim is to track and analyze many thousands of evolving relationships for cases such as the one described above, and several others, which would otherwise remain unmanaged and potentially harmful. The highlights of the paper include the system's implementation and management details, infrastructure needed for running it, security issues, first experience of usage in the LHCb production and potential of the system to be applied to a wider set of LHCb tasks.
PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures.

PubMed

Shin, Jae-Min; Cho, Doo-Ho

2005-01-01

PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications.
mirPub: a database for searching microRNA publications.

PubMed

Vergoulis, Thanasis; Kanellos, Ilias; Kostoulas, Nikos; Georgakilas, Georgios; Sellis, Timos; Hatzigeorgiou, Artemis; Dalamagas, Theodore

2015-05-01

Identifying, amongst millions of publications available in MEDLINE, those that are relevant to specific microRNAs (miRNAs) of interest based on keyword search faces major obstacles. References to miRNA names in the literature often deviate from standard nomenclature for various reasons, since even the official nomenclature evolves. For instance, a single miRNA name may identify two completely different molecules or two different names may refer to the same molecule. mirPub is a database with a powerful and intuitive interface, which facilitates searching for miRNA literature, addressing the aforementioned issues. To provide effective search services, mirPub applies text mining techniques on MEDLINE, integrates data from several curated databases and exploits data from its user community following a crowdsourcing approach. Other key features include an interactive visualization service that illustrates intuitively the evolution of miRNA data, tag clouds summarizing the relevance of publications to particular diseases, cell types or tissues and access to TarBase 6.0 data to oversee genes related to miRNA publications. mirPub is freely available at http://www.microrna.gr/mirpub/. vergoulis@imis.athena-innovation.gr or dalamag@imis.athena-innovation.gr Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
The European Southern Observatory-MIDAS table file system

NASA Technical Reports Server (NTRS)

Peron, M.; Grosbol, P.

1992-01-01

The new and substantially upgraded version of the Table File System in MIDAS is presented as a scientific database system. MIDAS applications for performing database operations on tables are discussed, for instance, the exchange of the data to and from the TFS, the selection of objects, the uncertainty joins across tables, and the graphical representation of data. This upgraded version of the TFS is a full implementation of the binary table extension of the FITS format; in addition, it also supports arrays of strings. Different storage strategies for optimal access of very large data sets are implemented and are addressed in detail. As a simple relational database, the TFS may be used for the management of personal data files. This opens the way to intelligent pipeline processing of large amounts of data. One of the key features of the Table File System is to provide also an extensive set of tools for the analysis of the final results of a reduction process. Column operations using standard and special mathematical functions as well as statistical distributions can be carried out; commands for linear regression and model fitting using nonlinear least square methods and user-defined functions are available. Finally, statistical tests of hypothesis and multivariate methods can also operate on tables.
MM-MDS: a multidimensional scaling database with similarity ratings for 240 object categories from the Massive Memory picture database.

PubMed

Hout, Michael C; Goldinger, Stephen D; Brady, Kyle J

2014-01-01

Cognitive theories in visual attention and perception, categorization, and memory often critically rely on concepts of similarity among objects, and empirically require measures of "sameness" among their stimuli. For instance, a researcher may require similarity estimates among multiple exemplars of a target category in visual search, or targets and lures in recognition memory. Quantifying similarity, however, is challenging when everyday items are the desired stimulus set, particularly when researchers require several different pictures from the same category. In this article, we document a new multidimensional scaling database with similarity ratings for 240 categories, each containing color photographs of 16-17 exemplar objects. We collected similarity ratings using the spatial arrangement method. Reports include: the multidimensional scaling solutions for each category, up to five dimensions, stress and fit measures, coordinate locations for each stimulus, and two new classifications. For each picture, we categorized the item's prototypicality, indexed by its proximity to other items in the space. We also classified pairs of images along a continuum of similarity, by assessing the overall arrangement of each MDS space. These similarity ratings will be useful to any researcher that wishes to control the similarity of experimental stimuli according to an objective quantification of "sameness."
Multi-instance learning based on instance consistency for image retrieval

NASA Astrophysics Data System (ADS)

Zhang, Miao; Wu, Zhize; Wan, Shouhong; Yue, Lihua; Yin, Bangjie

2017-07-01

Multiple-instance learning (MIL) has been successfully utilized in image retrieval. Existing approaches cannot select positive instances correctly from positive bags which may result in a low accuracy. In this paper, we propose a new image retrieval approach called multiple instance learning based on instance-consistency (MILIC) to mitigate such issue. First, we select potential positive instances effectively in each positive bag by ranking instance-consistency (IC) values of instances. Then, we design a feature representation scheme, which can represent the relationship among bags and instances, based on potential positive instances to convert a bag into a single instance. Finally, we can use a standard single-instance learning strategy, such as the support vector machine, for performing object-based image retrieval. Experimental results on two challenging data sets show the effectiveness of our proposal in terms of accuracy and run time.
KaBOB: ontology-based semantic integration of biomedical databases.

PubMed

Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

2015-04-23

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.
Effects of animal-assisted therapy on concentration and attention span in patients with acquired brain injury: A randomized controlled trial.

PubMed

Gocheva, Vanya; Hund-Georgiadis, Margret; Hediger, Karin

2018-01-01

Previous studies have reported that brain-injured patients frequently suffer from cognitive impairments such as attention and concentration deficits. Numerous rehabilitation clinics offer animal-assisted therapy (AAT) to address these difficulties. The authors' aim was to investigate the immediate effects of AAT on the concentration and attention span of brain-injured patients. Nineteen patients with acquired brain injury were included in a randomized, controlled, within-subject trial. The patients alternately received 12 standard therapy sessions (speech therapy, physiotherapy, occupational therapy) and 12 paralleled AAT sessions with comparable content. A total of 429 therapy sessions was analyzed consisting of 214 AAT and 215 control sessions. Attention span and instances of distraction were assessed via video coding in Noldus Observer. The Mehrdimensionaler Befindlichkeitsbogen ([Multidimensional Affect Rating Scale] MDBF questionnaire; Steyer, Schwenkmezger, Notz, & Eid, 1997) was used to measure the patient's self-rated alertness. Concentration was assessed through Visual Analogue Scale (VAS) via self-assessment and therapist's ratings. The patients' attention span did not differ whether an animal was present or not. However, patients displayed more instances of distraction during AAT. Moreover, patients rated themselves more concentrated and alert during AAT sessions. Further, therapists' evaluation of patients' concentration indicated that patients were more concentrated in AAT compared with the control condition. Although the patients displayed more instances of distraction while in the presence of an animal, it did not have a negative impact on their attention span. In addition, patients reported to be more alert and concentrated when an animal was present. Future studies should examine other attentional processes such as divided attention and include neurobiological correlates of attention. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Canadian Open Genetics Repository (COGR): a unified clinical genomics database as a community resource for standardising and sharing genetic interpretations.

PubMed

Lerner-Ellis, Jordan; Wang, Marina; White, Shana; Lebo, Matthew S

2015-07-01

The Canadian Open Genetics Repository is a collaborative effort for the collection, storage, sharing and robust analysis of variants reported by medical diagnostics laboratories across Canada. As clinical laboratories adopt modern genomics technologies, the need for this type of collaborative framework is increasingly important. A survey to assess existing protocols for variant classification and reporting was delivered to clinical genetics laboratories across Canada. Based on feedback from this survey, a variant assessment tool was made available to all laboratories. Each participating laboratory was provided with an instance of GeneInsight, a software featuring versioning and approval processes for variant assessments and interpretations and allowing for variant data to be shared between instances. Guidelines were established for sharing data among clinical laboratories and in the final outreach phase, data will be made readily available to patient advocacy groups for general use. The survey demonstrated the need for improved standardisation and data sharing across the country. A variant assessment template was made available to the community to aid with standardisation. Instances of the GeneInsight tool were provided to clinical diagnostic laboratories across Canada for the purpose of uploading, transferring, accessing and sharing variant data. As an ongoing endeavour and a permanent resource, the Canadian Open Genetics Repository aims to serve as a focal point for the collaboration of Canadian laboratories with other countries in the development of tools that take full advantage of laboratory data in diagnosing, managing and treating genetic diseases. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Linguistic measures of chemical diversity and the "keywords" of molecular collections.

PubMed

Woźniak, Michał; Wołos, Agnieszka; Modrzyk, Urszula; Górski, Rafał L; Winkowski, Jan; Bajczyk, Michał; Szymkuć, Sara; Grzybowski, Bartosz A; Eder, Maciej

2018-05-15

Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections ("corpora"), including those deposited on the Internet - indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic "chemical words" that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular "keywords" by which such collections are best characterized and annotated.
Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis.

PubMed

González-Calabozo, Jose M; Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen

2016-09-15

Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) paradigm. Biclustering has emerged as the machine learning method of choice to solve this task, but its unsupervised nature makes result assessment problematic. This is often addressed by means of Gene Set Enrichment Analysis (GSEA). We put forward a framework in which GED analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for continuous human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of EDA. First, we give a proper theoretical background to bi-clustering using Lattice Theory and provide a set of analysis tools revolving around [Formula: see text]-Formal Concept Analysis ([Formula: see text]-FCA), a lattice-theoretic unsupervised learning technique for real-valued matrices. By using different kinds of cost structures to quantify expression we obtain different sequences of hierarchical bi-clusterings for gene under- and over-expression using thresholds. Consequently, we provide a method with interleaved analysis steps and visualization devices so that the sequences of lattices for a particular experiment summarize the researcher's vision of the data. This also allows us to define measures of persistence and robustness of biclusters to assess them. Second, the resulting biclusters are used to index external omics databases-for instance, Gene Ontology (GO)-thus offering a new way of accessing publicly available resources. This provides different flavors of gene set enrichment against which to assess the biclusters, by obtaining their p-values according to the terminology of those resources. We illustrate the exploration procedure on a real data example confirming results previously published. The GED analysis problem gets transformed into the exploration of a sequence of lattices enabling the visualization of the hierarchical structure of the biclusters with a certain degree of granularity. The ability of FCA-based bi-clustering methods to index external databases such as GO allows us to obtain a quality measure of the biclusters, to observe the evolution of a gene throughout the different biclusters it appears in, to look for relevant biclusters-by observing their genes and what their persistence is-to infer, for instance, hypotheses on their function.
A Semantic Transformation Methodology for the Secondary Use of Observational Healthcare Data in Postmarketing Safety Studies.

PubMed

Pacaci, Anil; Gonul, Suat; Sinaci, A Anil; Yuksel, Mustafa; Laleci Erturkmen, Gokce B

2018-01-01

Background: Utilization of the available observational healthcare datasets is key to complement and strengthen the postmarketing safety studies. Use of common data models (CDM) is the predominant approach in order to enable large scale systematic analyses on disparate data models and vocabularies. Current CDM transformation practices depend on proprietarily developed Extract-Transform-Load (ETL) procedures, which require knowledge both on the semantics and technical characteristics of the source datasets and target CDM. Purpose: In this study, our aim is to develop a modular but coordinated transformation approach in order to separate semantic and technical steps of transformation processes, which do not have a strict separation in traditional ETL approaches. Such an approach would discretize the operations to extract data from source electronic health record systems, alignment of the source, and target models on the semantic level and the operations to populate target common data repositories. Approach: In order to separate the activities that are required to transform heterogeneous data sources to a target CDM, we introduce a semantic transformation approach composed of three steps: (1) transformation of source datasets to Resource Description Framework (RDF) format, (2) application of semantic conversion rules to get the data as instances of ontological model of the target CDM, and (3) population of repositories, which comply with the specifications of the CDM, by processing the RDF instances from step 2. The proposed approach has been implemented on real healthcare settings where Observational Medical Outcomes Partnership (OMOP) CDM has been chosen as the common data model and a comprehensive comparative analysis between the native and transformed data has been conducted. Results: Health records of ~1 million patients have been successfully transformed to an OMOP CDM based database from the source database. Descriptive statistics obtained from the source and target databases present analogous and consistent results. Discussion and Conclusion: Our method goes beyond the traditional ETL approaches by being more declarative and rigorous. Declarative because the use of RDF based mapping rules makes each mapping more transparent and understandable to humans while retaining logic-based computability. Rigorous because the mappings would be based on computer readable semantics which are amenable to validation through logic-based inference methods.
High temporal resolution aberrometry in a 50-eye population and implications for adaptive optics error budget.

PubMed

Jarosz, Jessica; Mecê, Pedro; Conan, Jean-Marc; Petit, Cyril; Paques, Michel; Meimon, Serge

2017-04-01

We formed a database gathering the wavefront aberrations of 50 healthy eyes measured with an original custom-built Shack-Hartmann aberrometer at a temporal frequency of 236 Hz, with 22 lenslets across a 7-mm diameter pupil, for a duration of 20 s. With this database, we draw statistics on the spatial and temporal behavior of the dynamic aberrations of the eye. Dynamic aberrations were studied on a 5-mm diameter pupil and on a 3.4 s sequence between blinks. We noted that, on average, temporal wavefront variance exhibits a n -2 power-law with radial order n and temporal spectra follow a f -1.5 power-law with temporal frequency f . From these statistics, we then extract guidelines for designing an adaptive optics system. For instance, we show the residual wavefront error evolution as a function of the number of corrected modes and of the adaptive optics loop frame rate. In particular, we infer that adaptive optics performance rapidly increases with the loop frequency up to 50 Hz, with gain being more limited at higher rates.
In silico design of porous polymer networks: high-throughput screening for methane storage materials.

PubMed

Martin, Richard L; Simon, Cory M; Smit, Berend; Haranczyk, Maciej

2014-04-02

Porous polymer networks (PPNs) are a class of advanced porous materials that combine the advantages of cheap and stable polymers with the high surface areas and tunable chemistry of metal-organic frameworks. They are of particular interest for gas separation or storage applications, for instance, as methane adsorbents for a vehicular natural gas tank or other portable applications. PPNs are self-assembled from distinct building units; here, we utilize commercially available chemical fragments and two experimentally known synthetic routes to design in silico a large database of synthetically realistic PPN materials. All structures from our database of 18,000 materials have been relaxed with semiempirical electronic structure methods and characterized with Grand-canonical Monte Carlo simulations for methane uptake and deliverable (working) capacity. A number of novel structure-property relationships that govern methane storage performance were identified. The relationships are translated into experimental guidelines to realize the ideal PPN structure. We found that cooperative methane-methane attractions were present in all of the best-performing materials, highlighting the importance of guest interaction in the design of optimal materials for methane storage.
Construction of a century solar chromosphere data set for solar activity related research

NASA Astrophysics Data System (ADS)

Lin, Ganghua; Wang, Xiao Fan; Yang, Xiao; Liu, Suo; Zhang, Mei; Wang, Haimin; Liu, Chang; Xu, Yan; Tlatov, Andrey; Demidov, Mihail; Borovik, Aleksandr; Golovko, Aleksey

2017-06-01

This article introduces our ongoing project "Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research". Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a time span of more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant signs of progress are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

High temporal resolution aberrometry in a 50-eye population and implications for adaptive optics error budget

PubMed Central

Jarosz, Jessica; Mecê, Pedro; Conan, Jean-Marc; Petit, Cyril; Paques, Michel; Meimon, Serge

2017-01-01

We formed a database gathering the wavefront aberrations of 50 healthy eyes measured with an original custom-built Shack-Hartmann aberrometer at a temporal frequency of 236 Hz, with 22 lenslets across a 7-mm diameter pupil, for a duration of 20 s. With this database, we draw statistics on the spatial and temporal behavior of the dynamic aberrations of the eye. Dynamic aberrations were studied on a 5-mm diameter pupil and on a 3.4 s sequence between blinks. We noted that, on average, temporal wavefront variance exhibits a n−2 power-law with radial order n and temporal spectra follow a f−1.5 power-law with temporal frequency f. From these statistics, we then extract guidelines for designing an adaptive optics system. For instance, we show the residual wavefront error evolution as a function of the number of corrected modes and of the adaptive optics loop frame rate. In particular, we infer that adaptive optics performance rapidly increases with the loop frequency up to 50 Hz, with gain being more limited at higher rates. PMID:28736657
Conditions Database for the Belle II Experiment

NASA Astrophysics Data System (ADS)

Wood, L.; Elsethagen, T.; Schram, M.; Stephan, E.

2017-10-01

The Belle II experiment at KEK is preparing for first collisions in 2017. Processing the large amounts of data that will be produced will require conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. The Belle II conditions database was designed with a straightforward goal: make it as easily maintainable as possible. To this end, HEP-specific software tools were avoided as much as possible and industry standard tools used instead. HTTP REST services were selected as the application interface, which provide a high-level interface to users through the use of standard libraries such as curl. The application interface itself is written in Java and runs in an embedded Payara-Micro Java EE application server. Scalability at the application interface is provided by use of Hazelcast, an open source In-Memory Data Grid (IMDG) providing distributed in-memory computing and supporting the creation and clustering of new application interface instances as demand increases. The IMDG provides fast and efficient access to conditions data via in-memory caching.
The Development of Two Science Investigator-led Processing Systems (SIPS) for NASA's Earth Observation System (EOS)

NASA Technical Reports Server (NTRS)

Tilmes, Curt

2004-01-01

In 2001, NASA Goddard Space Flight Center's Laboratory for Terrestrial Physics started the construction of a science Investigator-led Processing System (SIPS) for processing data from the Ozone Monitoring Instrument (OMI) which will launch on the Aura platform in mid 2004. The Ozone Monitoring Instrument (OMI) is a contribution of the Netherlands Agency for Aerospace Programs (NIVR) in collaboration with the Finnish Meteorological Institute (FMI) to the Earth Observing System (EOS) Aura mission. It will continue the Total Ozone Monitoring System (TOMS) record for total ozone and other atmospheric parameters related to ozone chemistry and climate. OMI measurements will be highly synergistic with the other instruments on the EOS Aura platform. The LTP previously developed the Moderate Resolution Imaging Spectrometer (MODIS) Data Processing System (MODAPS), which has been in full operations since the launches of the Terra and Aqua spacecrafts in December, 1999 and May, 2002 respectively. During that time, it has continually evolved to better support the needs of the MODIS team. We now run multiple instances of the system managing faster than real time reprocessings of the data as well as continuing forward processing. The new OMI Data Processing System (OMIDAPS) was adapted from the MODAPS. It will ingest raw data from the satellite ground station and process it to produce calibrated, geolocated higher level data products. These data products will be transmitted to the Goddard Distributed Active Archive Center (GDAAC) instance of the Earth Observing System (EOS) Data and Information System (EOSDIS) for long term archive and distribution to the public. The OMIDAPS will also provide data distribution to the OMI Science Team for quality assessment, algorithm improvement, calibration, etc. We have taken advantage of lessons learned from the MODIS experience and software already developed for MODIS. We made some changes in the hardware system organization, database and software to adapt the system for OMI. We replaced the fundamental database system, Sybase, with an Open Source RDBMS called PostgreSQL, and based the entire OMIDAPS on a cluster of Linux based commodity computers rather than the large SGI servers that MODAPS uses. Rather than relying on a central I/O server host, the new system distributes its data archive among multiple server hosts in the cluster. OMI is also customizing the graphical user interfaces and reporting structure to more closely meet the needs of the OMI Science Team. Prior to 2003, simulated OMI data and the science algorithms were not ready for production testing. We initially constructed a prototype system and tested using a 25 year dataset of Total Ozone Mapping Spectrometer (TOMS) and Solar Backscatter Ultraviolet Instrument (SBUV) data. This prototype system provided a platform to support the adaptation of the algorithms for OMI, and provided reprocessing of the historical data aiding in its analysis. In a recent reanalysis of the TOMS data, the OMIDAPS processed 108,000 full orbits of data through 4 processing steps per orbit, producing about 800,000 files (400 GiB) of level 2 and greater data files. More recently we have installed two instances of the OMIDAPS for integration and testing of OM1 science processes as they get delivered from the Science Team. A Test instance of the OMIDAPS has also supported a series of "Interface Confidence Tests" (ICTs) and End-to-End Ground System tests to ensure the launch readiness of the system. This paper will discuss the high-level hardware, software, and database organization of the OMIDAPS and how it builds on the MODAPS heritage system. It will also provide an overview of the testing and implementation of the production OMIDAPS.
Textual appropriation in engineering master's theses: a preliminary study.

PubMed

Eckel, Edward J

2011-09-01

In the thesis literature review, an engineering graduate student is expected to place original research in the context of previous work by other researchers. However, for some students, particularly those for whom English is a second language, the literature review may be a mixture of original writing and verbatim source text appropriated without quotations. Such problematic use of source material leaves students vulnerable to an accusation of plagiarism, which carries severe consequences. Is such textual appropriation common in engineering master's writing? Furthermore, what, if anything, can be concluded when two texts have been found to have textual material in common? Do existing definitions of plagiarism provide a sufficient framework for determining if an instance of copying is transgressive or not? In a preliminary attempt to answer these questions, text strings from a random sample of 100 engineering master's theses from the ProQuest Dissertations and Theses database were searched for appropriated verbatim source text using the Google search engine. The results suggest that textual borrowing may indeed be a common feature of the master's engineering literature review, raising questions about the ability of graduate students to synthesize the literature. The study also illustrates the difficulties of making a determination of plagiarism based on simple textual similarity. A context-specific approach is recommended when dealing with any instance of apparent copying.
Establishing a database of Canadian feline mitotypes for forensic use.

PubMed

Arcieri, M; Agostinelli, G; Gray, Z; Spadaro, A; Lyons, L A; Webb, K M

2016-05-01

Hair shed by pet animals is often found and collected as evidence from crime scenes. Due to limitations such as small amount and low quality, mitochondrial DNA (mtDNA) is often the only type of DNA that can be used for linking the hair to a potential contributor. mtDNA has lower discriminatory power than nuclear DNA because multiple, unrelated individuals within a population can have the same mtDNA sequence, or mitotype. Therefore, to determine the evidentiary value of a match between crime scene evidence and a suspected contributor, the frequency of the mitotype must be known within the regional population. While mitotype frequencies have been determined for the United States' cat population, the frequencies are unknown for the Canadian cat population. Given the countries' close proximity and similar human settlement patterns, these populations may be homogenous, meaning a single, regional database may be used for estimating cat population mitotype frequencies. Here we determined the mitotype frequencies of the Canadian cat population and compared them to the United States' cat population. The two cat populations are statistically homogenous, however mitotype B6 was found in high frequency in Canada and extremely low frequency in the United States, meaning a single database would not be appropriate for North America. Furthermore, this work calls attention to these local spikes in frequency of otherwise rare mitotypes, instances of which exist around the world and have the potential to misrepresent the evidentiary value of matches compared to a regional database. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
HMDB 4.0: the human metabolome database for 2018

PubMed Central

Feunang, Yannick Djoumbou; Marcu, Ana; Guo, An Chi; Liang, Kevin; Vázquez-Fresno, Rosa; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Karu, Naama; Sayeeda, Zinat; Lo, Elvis; Assempour, Nazanin; Berjanskii, Mark; Singhal, Sandeep; Arndt, David; Liang, Yonjie; Badran, Hasan; Grant, Jason; Serra-Cayuela, Arnau; Liu, Yifeng; Mandal, Rupa; Neveu, Vanessa; Pon, Allison; Knox, Craig; Wilson, Michael; Manach, Claudine; Scalbert, Augustin

2018-01-01

Abstract The Human Metabolome Database or HMDB (www.hmdb.ca) is a web-enabled metabolomic database containing comprehensive information about human metabolites along with their biological roles, physiological concentrations, disease associations, chemical reactions, metabolic pathways, and reference spectra. First described in 2007, the HMDB is now considered the standard metabolomic resource for human metabolic studies. Over the past decade the HMDB has continued to grow and evolve in response to emerging needs for metabolomics researchers and continuing changes in web standards. This year's update, HMDB 4.0, represents the most significant upgrade to the database in its history. For instance, the number of fully annotated metabolites has increased by nearly threefold, the number of experimental spectra has grown by almost fourfold and the number of illustrated metabolic pathways has grown by a factor of almost 60. Significant improvements have also been made to the HMDB’s chemical taxonomy, chemical ontology, spectral viewing, and spectral/text searching tools. A great deal of brand new data has also been added to HMDB 4.0. This includes large quantities of predicted MS/MS and GC–MS reference spectral data as well as predicted (physiologically feasible) metabolite structures to facilitate novel metabolite identification. Additional information on metabolite-SNP interactions and the influence of drugs on metabolite levels (pharmacometabolomics) has also been added. Many other important improvements in the content, the interface, and the performance of the HMDB website have been made and these should greatly enhance its ease of use and its potential applications in nutrition, biochemistry, clinical chemistry, clinical genetics, medicine, and metabolomics science. PMID:29140435
Gee Fu: a sequence version and web-services database tool for genomic assembly, genome feature and NGS data.

PubMed

Ramirez-Gonzalez, Ricardo; Caccamo, Mario; MacLean, Daniel

2011-10-01

Scientists now use high-throughput sequencing technologies and short-read assembly methods to create draft genome assemblies in just days. Tools and pipelines like the assembler, and the workflow management environments make it easy for a non-specialist to implement complicated pipelines to produce genome assemblies and annotations very quickly. Such accessibility results in a proliferation of assemblies and associated files, often for many organisms. These assemblies get used as a working reference by lots of different workers, from a bioinformatician doing gene prediction or a bench scientist designing primers for PCR. Here we describe Gee Fu, a database tool for genomic assembly and feature data, including next-generation sequence alignments. Gee Fu is an instance of a Ruby-On-Rails web application on a feature database that provides web and console interfaces for input, visualization of feature data via AnnoJ, access to data through a web-service interface, an API for direct data access by Ruby scripts and access to feature data stored in BAM files. Gee Fu provides a platform for storing and sharing different versions of an assembly and associated features that can be accessed and updated by bench biologists and bioinformaticians in ways that are easy and useful for each. http://tinyurl.com/geefu dan.maclean@tsl.ac.uk.
Hardware based redundant multi-threading inside a GPU for improved reliability

DOEpatents

Sridharan, Vilas; Gurumurthi, Sudhanva

2015-05-05

A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.
Some insights on hard quadratic assignment problem instances

NASA Astrophysics Data System (ADS)

Hussin, Mohamed Saifullah

2017-11-01

Since the formal introduction of metaheuristics, a huge number Quadratic Assignment Problem (QAP) instances have been introduced. Those instances however are loosely-structured, and therefore made it difficult to perform any systematic analysis. The QAPLIB for example, is a library that contains a huge number of QAP benchmark instances that consists of instances with different size and structure, but with a very limited availability for every instance type. This prevents researchers from performing organized study on those instances, such as parameter tuning and testing. In this paper, we will discuss several hard instances that have been introduced over the years, and algorithms that have been used for solving them.
Dental insurance: A systematic review.

PubMed

Garla, Bharath Kumar; Satish, G; Divya, K T

2014-12-01

To review uses of finance in dentistry. A search of 25 electronic databases and World Wide Web was conducted. Relevant journals were hand searched and further information was requested from authors. Inclusion criteria were a predefined hierarchy of evidence and objectives. Study validity was assessed with checklists. Two reviewers independently screened sources, extracted data, and assessed validity. Insurance has come of ages and has become the mainstay of payment in many developed countries. So much so that all the alternative forms of payment which originated as an alternative to fee for service now depend on insurance at one point or the other. Fee for service is still the major form of payment in many developing countries including India. It is preferred in many instances since the payment is made immediately.
Improvements to the National Transport Code Collaboration Data Server

NASA Astrophysics Data System (ADS)

Alexander, David A.

2001-10-01

The data server of the National Transport Code Colaboration Project provides a universal network interface to interpolated or raw transport data accessible by a universal set of names. Data can be acquired from a local copy of the Iternational Multi-Tokamak (ITER) profile database as well as from TRANSP trees of MDS Plus data systems on the net. Data is provided to the user's network client via a CORBA interface, thus providing stateful data server instances, which have the advantage of remembering the desired interpolation, data set, etc. This paper will review the status and discuss the recent improvements made to the data server, such as the modularization of the data server and the addition of hdf5 and MDS Plus data file writing capability.
Application Program Interface for the Orion Aerodynamics Database

NASA Technical Reports Server (NTRS)

Robinson, Philip E.; Thompson, James

2013-01-01

The Application Programming Interface (API) for the Crew Exploration Vehicle (CEV) Aerodynamic Database has been developed to provide the developers of software an easily implemented, fully self-contained method of accessing the CEV Aerodynamic Database for use in their analysis and simulation tools. The API is programmed in C and provides a series of functions to interact with the database, such as initialization, selecting various options, and calculating the aerodynamic data. No special functions (file read/write, table lookup) are required on the host system other than those included with a standard ANSI C installation. It reads one or more files of aero data tables. Previous releases of aerodynamic databases for space vehicles have only included data tables and a document of the algorithm and equations to combine them for the total aerodynamic forces and moments. This process required each software tool to have a unique implementation of the database code. Errors or omissions in the documentation, or errors in the implementation, led to a lengthy and burdensome process of having to debug each instance of the code. Additionally, input file formats differ for each space vehicle simulation tool, requiring the aero database tables to be reformatted to meet the tool s input file structure requirements. Finally, the capabilities for built-in table lookup routines vary for each simulation tool. Implementation of a new database may require an update to and verification of the table lookup routines. This may be required if the number of dimensions of a data table exceeds the capability of the simulation tools built-in lookup routines. A single software solution was created to provide an aerodynamics software model that could be integrated into other simulation and analysis tools. The highly complex Orion aerodynamics model can then be quickly included in a wide variety of tools. The API code is written in ANSI C for ease of portability to a wide variety of systems. The input data files are in standard formatted ASCII, also for improved portability. The API contains its own implementation of multidimensional table reading and lookup routines. The same aerodynamics input file can be used without modification on all implementations. The turnaround time from aerodynamics model release to a working implementation is significantly reduced
DrugBank 5.0: a major update to the DrugBank database for 2018.

PubMed

Wishart, David S; Feunang, Yannick D; Guo, An C; Lo, Elvis J; Marcu, Ana; Grant, Jason R; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Sayeeda, Zinat; Assempour, Nazanin; Iynkkaran, Ithayavani; Liu, Yifeng; Maciejewski, Adam; Gale, Nicola; Wilson, Alex; Chin, Lucy; Cummings, Ryan; Le, Diana; Pon, Allison; Knox, Craig; Wilson, Michael

2018-01-04

DrugBank (www.drugbank.ca) is a web-enabled database containing comprehensive molecular information about drugs, their mechanisms, their interactions and their targets. First described in 2006, DrugBank has continued to evolve over the past 12 years in response to marked improvements to web standards and changing needs for drug research and development. This year's update, DrugBank 5.0, represents the most significant upgrade to the database in more than 10 years. In many cases, existing data content has grown by 100% or more over the last update. For instance, the total number of investigational drugs in the database has grown by almost 300%, the number of drug-drug interactions has grown by nearly 600% and the number of SNP-associated drug effects has grown more than 3000%. Significant improvements have been made to the quantity, quality and consistency of drug indications, drug binding data as well as drug-drug and drug-food interactions. A great deal of brand new data have also been added to DrugBank 5.0. This includes information on the influence of hundreds of drugs on metabolite levels (pharmacometabolomics), gene expression levels (pharmacotranscriptomics) and protein expression levels (pharmacoprotoemics). New data have also been added on the status of hundreds of new drug clinical trials and existing drug repurposing trials. Many other important improvements in the content, interface and performance of the DrugBank website have been made and these should greatly enhance its ease of use, utility and potential applications in many areas of pharmacological research, pharmaceutical science and drug education. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Monitoring of small laboratory animal experiments by a designated web-based database.

PubMed

Frenzel, T; Grohmann, C; Schumacher, U; Krüll, A

2015-10-01

Multiple-parametric small animal experiments require, by their very nature, a sufficient number of animals which may need to be large to obtain statistically significant results.(1) For this reason database-related systems are required to collect the experimental data as well as to support the later (re-) analysis of the information gained during the experiments. In particular, the monitoring of animal welfare is simplified by the inclusion of warning signals (for instance, loss in body weight >20%). Digital patient charts have been developed for human patients but are usually not able to fulfill the specific needs of animal experimentation. To address this problem a unique web-based monitoring system using standard MySQL, PHP, and nginx has been created. PHP was used to create the HTML-based user interface and outputs in a variety of proprietary file formats, namely portable document format (PDF) or spreadsheet files. This article demonstrates its fundamental features and the easy and secure access it offers to the data from any place using a web browser. This information will help other researchers create their own individual databases in a similar way. The use of QR-codes plays an important role for stress-free use of the database. We demonstrate a way to easily identify all animals and samples and data collected during the experiments. Specific ways to record animal irradiations and chemotherapy applications are shown. This new analysis tool allows the effective and detailed analysis of huge amounts of data collected through small animal experiments. It supports proper statistical evaluation of the data and provides excellent retrievable data storage. © The Author(s) 2015.
Historical analysis of US pipeline accidents triggered by natural hazards

NASA Astrophysics Data System (ADS)

Girgin, Serkan; Krausmann, Elisabeth

2015-04-01

Natural hazards, such as earthquakes, floods, landslides, or lightning, can initiate accidents in oil and gas pipelines with potentially major consequences on the population or the environment due to toxic releases, fires and explosions. Accidents of this type are also referred to as Natech events. Many major accidents highlight the risk associated with natural-hazard impact on pipelines transporting dangerous substances. For instance, in the USA in 1994, flooding of the San Jacinto River caused the rupture of 8 and the undermining of 29 pipelines by the floodwaters. About 5.5 million litres of petroleum and related products were spilled into the river and ignited. As a results, 547 people were injured and significant environmental damage occurred. Post-incident analysis is a valuable tool for better understanding the causes, dynamics and impacts of pipeline Natech accidents in support of future accident prevention and mitigation. Therefore, data on onshore hazardous-liquid pipeline accidents collected by the US Pipeline and Hazardous Materials Safety Administration (PHMSA) was analysed. For this purpose, a database-driven incident data analysis system was developed to aid the rapid review and categorization of PHMSA incident reports. Using an automated data-mining process followed by a peer review of the incident records and supported by natural hazard databases and external information sources, the pipeline Natechs were identified. As a by-product of the data-collection process, the database now includes over 800,000 incidents from all causes in industrial and transportation activities, which are automatically classified in the same way as the PHMSA record. This presentation describes the data collection and reviewing steps conducted during the study, provides information on the developed database and data analysis tools, and reports the findings of a statistical analysis of the identified hazardous liquid pipeline incidents in terms of accident dynamics and consequences.
HMDB 4.0: the human metabolome database for 2018.

PubMed

Wishart, David S; Feunang, Yannick Djoumbou; Marcu, Ana; Guo, An Chi; Liang, Kevin; Vázquez-Fresno, Rosa; Sajed, Tanvir; Johnson, Daniel; Li, Carin; Karu, Naama; Sayeeda, Zinat; Lo, Elvis; Assempour, Nazanin; Berjanskii, Mark; Singhal, Sandeep; Arndt, David; Liang, Yonjie; Badran, Hasan; Grant, Jason; Serra-Cayuela, Arnau; Liu, Yifeng; Mandal, Rupa; Neveu, Vanessa; Pon, Allison; Knox, Craig; Wilson, Michael; Manach, Claudine; Scalbert, Augustin

2018-01-04

The Human Metabolome Database or HMDB (www.hmdb.ca) is a web-enabled metabolomic database containing comprehensive information about human metabolites along with their biological roles, physiological concentrations, disease associations, chemical reactions, metabolic pathways, and reference spectra. First described in 2007, the HMDB is now considered the standard metabolomic resource for human metabolic studies. Over the past decade the HMDB has continued to grow and evolve in response to emerging needs for metabolomics researchers and continuing changes in web standards. This year's update, HMDB 4.0, represents the most significant upgrade to the database in its history. For instance, the number of fully annotated metabolites has increased by nearly threefold, the number of experimental spectra has grown by almost fourfold and the number of illustrated metabolic pathways has grown by a factor of almost 60. Significant improvements have also been made to the HMDB's chemical taxonomy, chemical ontology, spectral viewing, and spectral/text searching tools. A great deal of brand new data has also been added to HMDB 4.0. This includes large quantities of predicted MS/MS and GC-MS reference spectral data as well as predicted (physiologically feasible) metabolite structures to facilitate novel metabolite identification. Additional information on metabolite-SNP interactions and the influence of drugs on metabolite levels (pharmacometabolomics) has also been added. Many other important improvements in the content, the interface, and the performance of the HMDB website have been made and these should greatly enhance its ease of use and its potential applications in nutrition, biochemistry, clinical chemistry, clinical genetics, medicine, and metabolomics science. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Comparative Study of Frequent and Maximal Periodic Pattern Mining Algorithms in Spatiotemporal Databases

NASA Astrophysics Data System (ADS)

Obulesu, O.; Rama Mohan Reddy, A., Dr; Mahendra, M.

2017-08-01

Detecting regular and efficient cyclic models is the demanding activity for data analysts due to unstructured, vigorous and enormous raw information produced from web. Many existing approaches generate large candidate patterns in the occurrence of huge and complex databases. In this work, two novel algorithms are proposed and a comparative examination is performed by considering scalability and performance parameters. The first algorithm is, EFPMA (Extended Regular Model Detection Algorithm) used to find frequent sequential patterns from the spatiotemporal dataset and the second one is, ETMA (Enhanced Tree-based Mining Algorithm) for detecting effective cyclic models with symbolic database representation. EFPMA is an algorithm grows models from both ends (prefixes and suffixes) of detected patterns, which results in faster pattern growth because of less levels of database projection compared to existing approaches such as Prefixspan and SPADE. ETMA uses distinct notions to store and manage transactions data horizontally such as segment, sequence and individual symbols. ETMA exploits a partition-and-conquer method to find maximal patterns by using symbolic notations. Using this algorithm, we can mine cyclic models in full-series sequential patterns including subsection series also. ETMA reduces the memory consumption and makes use of the efficient symbolic operation. Furthermore, ETMA only records time-series instances dynamically, in terms of character, series and section approaches respectively. The extent of the pattern and proving efficiency of the reducing and retrieval techniques from synthetic and actual datasets is a really open & challenging mining problem. These techniques are useful in data streams, traffic risk analysis, medical diagnosis, DNA sequence Mining, Earthquake prediction applications. Extensive investigational outcomes illustrates that the algorithms outperforms well towards efficiency and scalability than ECLAT, STNR and MAFIA approaches.
HITRAN2016 : new and improved data and tools towards studies of planetary atmospheres

NASA Astrophysics Data System (ADS)

Gordon, Iouli; Rothman, Laurence S.; Wilzewski, Jonas S.; Kochanov, Roman V.; Hill, Christian; Tan, Yan; Wcislo, Piotr

2016-10-01

The HITRAN2016 molecular spectroscopic database is scheduled to be released this year. It will replace the current edition, HITRAN2012 [1], which has been in use, along with some intermediate updates, since 2012.We have added, revised, and improved many transitions and bands of molecular species and their isotopologues. Also, the amount of parameters has also been significantly increased, now incorporating, for instance, broadening by He, H2 and CO2 which are dominant in different planetary atmospheres [2]; non-Voigt line profiles [3]; and other phenomena. This poster will provide a summary of the updates, emphasizing details of some of the most important or drastic improvements or additions.To allow flexible incorporation of the new parameters and improve the efficiency of the database usage, the whole database has been reorganized into a relational database structure and presented to the user by means of a very powerful, easy-to-use internet program called HITRANonline [4] accessible at . This interface allows the user many queries in standard and user-defined formats. In addition, a powerful application called HAPI (HITRAN Application Programing Interface) [5] was developed. HAPI is a set of Python libraries that allows much more functionality for the user. Demonstration of the power of the new tools will also be offered.This work is supported by the NASA PATM (NNX13AI59G), PDART (NNX16AG51G) and AURA (NNX14AI55G) programs.References[1] L.S. Rothman et al, JQSRT 130, 4 (2013)[2] J. S. Wilzewski et al., JQSRT 168, 193 (2016)[3] P. Wcislo et al., JQSRT 177, 75 (2016)[4] C. Hill et al, JQSRT 177, 4 (2016)[5] R.V. Kochanov et al, JQSRT 177, 15 (2016)
Estimates of the absolute error and a scheme for an approximate solution to scheduling problems

NASA Astrophysics Data System (ADS)

Lazarev, A. A.

2009-02-01

An approach is proposed for estimating absolute errors and finding approximate solutions to classical NP-hard scheduling problems of minimizing the maximum lateness for one or many machines and makespan is minimized. The concept of a metric (distance) between instances of the problem is introduced. The idea behind the approach is, given the problem instance, to construct another instance for which an optimal or approximate solution can be found at the minimum distance from the initial instance in the metric introduced. Instead of solving the original problem (instance), a set of approximating polynomially/pseudopolynomially solvable problems (instances) are considered, an instance at the minimum distance from the given one is chosen, and the resulting schedule is then applied to the original instance.
Atlas - a data warehouse for integrative bioinformatics.

PubMed

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire M S; Ling, John; Ouellette, B F Francis

2005-02-21

We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: http://bioinformatics.ubc.ca/atlas/

PEP725 Pan European Phenological Database

NASA Astrophysics Data System (ADS)

Koch, E.; Adler, S.; Lipa, W.; Ungersböck, M.; Zach-Hermann, S.

2010-09-01

Europe is in the fortunate situation that it has a long tradition in phenological networking: the history of collecting phenological data and using them in climatology has its starting point in 1751 when Carl von Linné outlined in his work Philosophia Botanica methods for compiling annual plant calendars of leaf opening, flowering, fruiting and leaf fall together with climatological observations "so as to show how areas differ". Recently in most European countries, phenological observations have been carried out routinely for more than 50 years by different governmental and non governmental organisations and following different observation guidelines, the data stored at different places in different formats. This has been really hampering pan European studies as one has to address many network operators to get access to the data before one can start to bring them in a uniform style. From 2004 to 2009 the COST-action 725 established a European wide data set of phenological observations. But the deliverables of this COST action was not only the common phenological database and common observation guidelines - COST725 helped to trigger a revival of some old networks and to establish new ones as for instance in Sweden. At the end of 2009 the COST action the database comprised about 8 million data in total from 15 European countries plus the data from the International Phenological Gardens IPG. In January 2010 PEP725 began its work as follow up project with funding from EUMETNET the network of European meteorological services and of ZAMG the Austrian national meteorological service. PEP725 not only will take over the part of maintaining, updating the COST725 database, but also to bring in phenological data from the time before 1951, developing better quality checking procedures and ensuring an open access to the database. An attractive webpage will make phenology and climate impacts on vegetation more visible in the public enabling a monitoring of vegetation development.
Abductive Equivalential Translation and its application to Natural Language Database Interfacing

NASA Astrophysics Data System (ADS)

Rayner, Manny

1994-05-01

The thesis describes a logical formalization of natural-language database interfacing. We assume the existence of a ``natural language engine'' capable of mediating between surface linguistic string and their representations as ``literal'' logical forms: the focus of interest will be the question of relating ``literal'' logical forms to representations in terms of primitives meaningful to the underlying database engine. We begin by describing the nature of the problem, and show how a variety of interface functionalities can be considered as instances of a type of formal inference task which we call ``Abductive Equivalential Translation'' (AET); functionalities which can be reduced to this form include answering questions, responding to commands, reasoning about the completeness of answers, answering meta-questions of type ``Do you know...'', and generating assertions and questions. In each case, a ``linguistic domain theory'' (LDT) Γ and an input formula F are given, and the goal is to construct a formula with certain properties which is equivalent to F, given Γ and a set of permitted assumptions. If the LDT is of a certain specified type, whose formulas are either conditional equivalences or Horn-clauses, we show that the AET problem can be reduced to a goal-directed inference method. We present an abstract description of this method, and sketch its realization in Prolog. The relationship between AET and several problems previously discussed in the literature is discussed. In particular, we show how AET can provide a simple and elegant solution to the so-called ``Doctor on Board'' problem, and in effect allows a ``relativization'' of the Closed World Assumption. The ideas in the thesis have all been implemented concretely within the SRI CLARE project, using a real projects and payments database. The LDT for the example database is described in detail, and examples of the types of functionality that can be achieved within the example domain are presented.
Atlas – a data warehouse for integrative bioinformatics

PubMed Central

Shah, Sohrab P; Huang, Yong; Xu, Tao; Yuen, Macaire MS; Ling, John; Ouellette, BF Francis

2005-01-01

Background We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. Description The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. Conclusion The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: PMID:15723693
AT_CHLORO, a comprehensive chloroplast proteome database with subplastidial localization and curated information on envelope proteins.

PubMed

Ferro, Myriam; Brugière, Sabine; Salvi, Daniel; Seigneurin-Berny, Daphné; Court, Magali; Moyet, Lucas; Ramus, Claire; Miras, Stéphane; Mellal, Mourad; Le Gall, Sophie; Kieffer-Jaquinod, Sylvie; Bruley, Christophe; Garin, Jérôme; Joyard, Jacques; Masselon, Christophe; Rolland, Norbert

2010-06-01

Recent advances in the proteomics field have allowed a series of high throughput experiments to be conducted on chloroplast samples, and the data are available in several public databases. However, the accurate localization of many chloroplast proteins often remains hypothetical. This is especially true for envelope proteins. We went a step further into the knowledge of the chloroplast proteome by focusing, in the same set of experiments, on the localization of proteins in the stroma, the thylakoids, and envelope membranes. LC-MS/MS-based analyses first allowed building the AT_CHLORO database (http://www.grenoble.prabi.fr/protehome/grenoble-plant-proteomics/), a comprehensive repertoire of the 1323 proteins, identified by 10,654 unique peptide sequences, present in highly purified chloroplasts and their subfractions prepared from Arabidopsis thaliana leaves. This database also provides extensive proteomics information (peptide sequences and molecular weight, chromatographic retention times, MS/MS spectra, and spectral count) for a unique chloroplast protein accurate mass and time tag database gathering identified peptides with their respective and precise analytical coordinates, molecular weight, and retention time. We assessed the partitioning of each protein in the three chloroplast compartments by using a semiquantitative proteomics approach (spectral count). These data together with an in-depth investigation of the literature were compiled to provide accurate subplastidial localization of previously known and newly identified proteins. A unique knowledge base containing extensive information on the proteins identified in envelope fractions was thus obtained, allowing new insights into this membrane system to be revealed. Altogether, the data we obtained provide unexpected information about plastidial or subplastidial localization of some proteins that were not suspected to be associated to this membrane system. The spectral counting-based strategy was further validated as the compartmentation of well known pathways (for instance, photosynthesis and amino acid, fatty acid, or glycerolipid biosynthesis) within chloroplasts could be dissected. It also allowed revisiting the compartmentation of the chloroplast metabolism and functions.
MotifNet: a web-server for network motif analysis.

PubMed

Smoly, Ilan Y; Lerman, Eugene; Ziv-Ukelson, Michal; Yeger-Lotem, Esti

2017-06-15

Network motifs are small topological patterns that recur in a network significantly more often than expected by chance. Their identification emerged as a powerful approach for uncovering the design principles underlying complex networks. However, available tools for network motif analysis typically require download and execution of computationally intensive software on a local computer. We present MotifNet, the first open-access web-server for network motif analysis. MotifNet allows researchers to analyze integrated networks, where nodes and edges may be labeled, and to search for motifs of up to eight nodes. The output motifs are presented graphically and the user can interactively filter them by their significance, number of instances, node and edge labels, and node identities, and view their instances. MotifNet also allows the user to distinguish between motifs that are centered on specific nodes and motifs that recur in distinct parts of the network. MotifNet is freely available at http://netbio.bgu.ac.il/motifnet . The website was implemented using ReactJs and supports all major browsers. The server interface was implemented in Python with data stored on a MySQL database. estiyl@bgu.ac.il or michaluz@cs.bgu.ac.il. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
A cyber infrastructure for the SKA Telescope Manager

NASA Astrophysics Data System (ADS)

Barbosa, Domingos; Barraca, João. P.; Carvalho, Bruno; Maia, Dalmiro; Gupta, Yashwant; Natarajan, Swaminathan; Le Roux, Gerhard; Swart, Paul

2016-07-01

The Square Kilometre Array Telescope Manager (SKA TM) will be responsible for assisting the SKA Operations and Observation Management, carrying out System diagnosis and collecting Monitoring and Control data from the SKA subsystems and components. To provide adequate compute resources, scalability, operation continuity and high availability, as well as strict Quality of Service, the TM cyber-infrastructure (embodied in the Local Infrastructure - LINFRA) consists of COTS hardware and infrastructural software (for example: server monitoring software, host operating system, virtualization software, device firmware), providing a specially tailored Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) solution. The TM infrastructure provides services in the form of computational power, software defined networking, power, storage abstractions, and high level, state of the art IaaS and PaaS management interfaces. This cyber platform will be tailored to each of the two SKA Phase 1 telescopes (SKA_MID in South Africa and SKA_LOW in Australia) instances, each presenting different computational and storage infrastructures and conditioned by location. This cyber platform will provide a compute model enabling TM to manage the deployment and execution of its multiple components (observation scheduler, proposal submission tools, MandC components, Forensic tools and several Databases, etc). In this sense, the TM LINFRA is primarily focused towards the provision of isolated instances, mostly resorting to virtualization technologies, while defaulting to bare hardware if specifically required due to performance, security, availability, or other requirement.
A sampling model of social judgment.

PubMed

Galesic, Mirta; Olsson, Henrik; Rieskamp, Jörg

2018-04-01

Studies of social judgments have demonstrated a number of diverse phenomena that were so far difficult to explain within a single theoretical framework. Prominent examples are false consensus and false uniqueness, as well as self-enhancement and self-depreciation. Here we show that these seemingly complex phenomena can be a product of an interplay between basic cognitive processes and the structure of social and task environments. We propose and test a new process model of social judgment, the social sampling model (SSM), which provides a parsimonious quantitative account of different types of social judgments. In the SSM, judgments about characteristics of broader social environments are based on sampling of social instances from memory, where instances receive activation if they belong to a target reference class and have a particular characteristic. These sampling processes interact with the properties of social and task environments, including homophily, shapes of frequency distributions, and question formats. For example, in line with the model's predictions we found that whether false consensus or false uniqueness will occur depends on the level of homophily in people's social circles and on the way questions are asked. The model also explains some previously unaccounted-for patterns of self-enhancement and self-depreciation. People seem to be well informed about many characteristics of their immediate social circles, which in turn influence how they evaluate broader social environments and their position within them. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Lawsuit lead time prediction: Comparison of data mining techniques based on categorical response variable.

PubMed

Gruginskie, Lúcia Adriana Dos Santos; Vaccaro, Guilherme Luís Roehe

2018-01-01

The quality of the judicial system of a country can be verified by the overall length time of lawsuits, or the lead time. When the lead time is excessive, a country's economy can be affected, leading to the adoption of measures such as the creation of the Saturn Center in Europe. Although there are performance indicators to measure the lead time of lawsuits, the analysis and the fit of prediction models are still underdeveloped themes in the literature. To contribute to this subject, this article compares different prediction models according to their accuracy, sensitivity, specificity, precision, and F1 measure. The database used was from TRF4-the Tribunal Regional Federal da 4a Região-a federal court in southern Brazil, corresponding to the 2nd Instance civil lawsuits completed in 2016. The models were fitted using support vector machine, naive Bayes, random forests, and neural network approaches with categorical predictor variables. The lead time of the 2nd Instance judgment was selected as the response variable measured in days and categorized in bands. The comparison among the models showed that the support vector machine and random forest approaches produced measurements that were superior to those of the other models. The evaluation of the models was made using k-fold cross-validation similar to that applied to the test models.
NiCd cell reliability in the mission environment

NASA Technical Reports Server (NTRS)

Denson, William K.; Klein, Glenn C.

1993-01-01

This paper summarizes an effort by Gates Aerospace Batteries (GAB) and the Reliability Analysis Center (RAC) to analyze survivability data for both General Electric and GAB NiCd cells utilized in various spacecraft. For simplicity sake, all mission environments are described as either low Earth orbital (LEO) or geosynchronous Earth orbit (GEO). 'Extreme value statistical methods' are applied to this database because of the longevity of the numerous missions while encountering relatively few failures. Every attempt was made to include all known instances of cell-induced-failures of the battery and to exclude battery-induced-failures of the cell. While this distinction may be somewhat limited due to availability of in-flight data, we have accepted the learned opinion of the specific customer contacts to ensure integrity of the common databases. This paper advances the preliminary analysis reported upon at the 1991 NASA Battery Workshop. That prior analysis was concerned with an estimated 278 million cell-hours of operation encompassing 183 satellites. The paper also cited 'no reported failures to date.' This analysis reports on 428 million cell hours of operation emcompassing 212 satellites. This analysis also reports on seven 'cell-induced-failures.'
Chronic Rhinosinusitis Associated with Erectile Dysfunction: A Population-Based Study.

PubMed

Tai, Shu-Yu; Wang, Ling-Feng; Tai, Chih-Feng; Huang, Yu-Ting; Chien, Chen-Yu

2016-08-31

Few studies have investigated the relationship between chronic rhinosinusitis (CRS) and erectile dysfunction (ED). This case-control study aimed to investigate the association between CRS and the risk of ED in a large national sample. Tapping Taiwan's National Health Insurance Research Database, we identified people 30 years or older with a new primary diagnosis of CRS between 1996 and 2007. The cases were compared with sex- and age-matched controls. We identified 14 039 cases and recruited 140 387 matched controls. Both groups were followed up in the same database until the end of 2007 for instances of ED. Of those with CRS, 294 (2.1%) developed ED during a mean (SD) follow-up of 3.20 (2.33) years, while 1 661 (1.2%) of the matched controls developed ED, mean follow up 2.97 (2.39) years. Cox regression analyses were performed adjusting for sex, age, insurance premium, residence, hypertension, hyperlipidemia, diabetes, obesity, coronary heart disease, chronic kidney disease, chronic obstructive pulmonary disease, asthma, allergic rhinitis, arrhythmia, ischemic stroke, intracerebral hemorrhage, and medications. CRS was revealed to be an independent predictor of ED in the fully adjusted model (HR = 1.51; 95% CI = 1.33-1.73; P < 0.0001).
The Function Biomedical Informatics Research Network Data Repository

PubMed Central

Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.

2015-01-01

The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863
Considerations to improve functional annotations in biological databases.

PubMed

Benítez-Páez, Alfonso

2009-12-01

Despite the great effort to design efficient systems allowing the electronic indexation of information concerning genes, proteins, structures, and interactions published daily in scientific journals, some problems are still observed in specific tasks such as functional annotation. The annotation of function is a critical issue for bioinformatic routines, such as for instance, in functional genomics and the further prediction of unknown protein function, which are highly dependent of the quality of existing annotations. Some information management systems evolve to efficiently incorporate information from large-scale projects, but often, annotation of single records from the literature is difficult and slow. In this short report, functional characterizations of a representative sample of the entire set of uncharacterized proteins from Escherichia coli K12 was compiled from Swiss-Prot, PubMed, and EcoCyc and demonstrate a functional annotation deficit in biological databases. Some issues are postulated as causes of the lack of annotation, and different solutions are evaluated and proposed to avoid them. The hope is that as a consequence of these observations, there will be new impetus to improve the speed and quality of functional annotation and ultimately provide updated, reliable information to the scientific community.
Interrater reliability of Violence Risk Appraisal Guide scores provided in Canadian criminal proceedings.

PubMed

Edens, John F; Penson, Brittany N; Ruchensky, Jared R; Cox, Jennifer; Smith, Shannon Toney

2016-12-01

Published research suggests that most violence risk assessment tools have relatively high levels of interrater reliability, but recent evidence of inconsistent scores among forensic examiners in adversarial settings raises concerns about the "field reliability" of such measures. This study specifically examined the reliability of Violence Risk Appraisal Guide (VRAG) scores in Canadian criminal cases identified in the legal database, LexisNexis. Over 250 reported cases were located that made mention of the VRAG, with 42 of these cases containing 2 or more scores that could be submitted to interrater reliability analyses. Overall, scores were skewed toward higher risk categories. The intraclass correlation (ICCA1) was .66, with pairs of forensic examiners placing defendants into the same VRAG risk "bin" in 68% of the cases. For categorical risk statements (i.e., low, moderate, high), examiners provided converging assessment results in most instances (86%). In terms of potential predictors of rater disagreement, there was no evidence for adversarial allegiance in our sample. Rater disagreement in the scoring of 1 VRAG item (Psychopathy Checklist-Revised; Hare, 2003), however, strongly predicted rater disagreement in the scoring of the VRAG (r = .58). (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Discovery of candidate KEN-box motifs using cell cycle keyword enrichment combined with native disorder prediction and motif conservation.

PubMed

Michael, Sushama; Travé, Gilles; Ramu, Chenna; Chica, Claudia; Gibson, Toby J

2008-02-15

KEN-box-mediated target selection is one of the mechanisms used in the proteasomal destruction of mitotic cell cycle proteins via the APC/C complex. While annotating the Eukaryotic Linear Motif resource (ELM, http://elm.eu.org/), we found that KEN motifs were significantly enriched in human protein entries with cell cycle keywords in the UniProt/Swiss-Prot database-implying that KEN-boxes might be more common than reported. Matches to short linear motifs in protein database searches are not, per se, significant. KEN-box enrichment with cell cycle Gene Ontology terms suggests that collectively these motifs are functional but does not prove that any given instance is so. Candidates were surveyed for native disorder prediction using GlobPlot and IUPred and for motif conservation in homologues. Among >25 strong new candidates, the most notable are human HIPK2, CHFR, CDC27, Dab2, Upf2, kinesin Eg5, DNA Topoisomerase 1 and yeast Cdc5 and Swi5. A similar number of weaker candidates were present. These proteins have yet to be tested for APC/C targeted destruction, providing potential new avenues of research.
OPserver: opacities and radiative accelerations on demand

NASA Astrophysics Data System (ADS)

Mendoza, C.; González, J.; Seaton, M. J.; Buerger, P.; Bellorín, A.; Meléndez, M.; Rodríguez, L. S.; Delahaye, F.; Zeippen, C. J.; Palacios, E.; Pradhan, A. K.

2009-05-01

We report on developments carried out within the Opacity Project (OP) to upgrade atomic database services to comply with e-infrastructure requirements. We give a detailed description of an interactive, online server for astrophysical opacities, referred to as OPserver, to be used in sophisticated stellar modelling where Rosseland mean opacities and radiative accelerations are computed at every depth point and each evolution cycle. This is crucial, for instance, in chemically peculiar stars and in the exploitation of the new asteroseismological data. OPserver, downloadable with the new OPCD_3.0 release from the Centre de Données Astronomiques de Strasbourg, France, computes mean opacities and radiative data for arbitrary chemical mixtures from the OP monochromatic opacities. It is essentially a client-server network restructuring and optimization of the suite of codes included in the earlier OPCD_2.0 release. The server can be installed locally or, alternatively, accessed remotely from the Ohio Supercomputer Center, Columbus, Ohio, USA. The client is an interactive web page or a subroutine library that can be linked to the user code. The suitability of this scheme in grid computing environments is emphasized, and its extension to other atomic database services for astrophysical purposes is discussed.
Implementing Connected Component Labeling as a User Defined Operator for SciDB

NASA Technical Reports Server (NTRS)

Oloso, Amidu; Kuo, Kwo-Sen; Clune, Thomas; Brown, Paul; Poliakov, Alex; Yu, Hongfeng

2016-01-01

We have implemented a flexible User Defined Operator (UDO) for labeling connected components of a binary mask expressed as an array in SciDB, a parallel distributed database management system based on the array data model. This UDO is able to process very large multidimensional arrays by exploiting SciDB's memory management mechanism that efficiently manipulates arrays whose memory requirements far exceed available physical memory. The UDO takes as primary inputs a binary mask array and a binary stencil array that specifies the connectivity of a given cell to its neighbors. The UDO returns an array of the same shape as the input mask array with each foreground cell containing the label of the component it belongs to. By default, dimensions are treated as non-periodic, but the UDO also accepts optional input parameters to specify periodicity in any of the array dimensions. The UDO requires four stages to completely label connected components. In the first stage, labels are computed for each subarray or chunk of the mask array in parallel across SciDB instances using the weighted quick union (WQU) with half-path compression algorithm. In the second stage, labels around chunk boundaries from the first stage are stored in a temporary SciDB array that is then replicated across all SciDB instances. Equivalences are resolved by again applying the WQU algorithm to these boundary labels. In the third stage, relabeling is done for each chunk using the resolved equivalences. In the fourth stage, the resolved labels, which so far are "flattened" coordinates of the original binary mask array, are renamed with sequential integers for legibility. The UDO is demonstrated on a 3-D mask of O(1011) elements, with O(108) foreground cells and O(106) connected components. The operator completes in 19 minutes using 84 SciDB instances.
5D Modelling: An Efficient Approach for Creating Spatiotemporal Predictive 3D Maps of Large-Scale Cultural Resources

NASA Astrophysics Data System (ADS)

Doulamis, A.; Doulamis, N.; Ioannidis, C.; Chrysouli, C.; Grammalidis, N.; Dimitropoulos, K.; Potsiou, C.; Stathopoulou, E.-K.; Ioannides, M.

2015-08-01

Outdoor large-scale cultural sites are mostly sensitive to environmental, natural and human made factors, implying an imminent need for a spatio-temporal assessment to identify regions of potential cultural interest (material degradation, structuring, conservation). On the other hand, in Cultural Heritage research quite different actors are involved (archaeologists, curators, conservators, simple users) each of diverse needs. All these statements advocate that a 5D modelling (3D geometry plus time plus levels of details) is ideally required for preservation and assessment of outdoor large scale cultural sites, which is currently implemented as a simple aggregation of 3D digital models at different time and levels of details. The main bottleneck of such an approach is its complexity, making 5D modelling impossible to be validated in real life conditions. In this paper, a cost effective and affordable framework for 5D modelling is proposed based on a spatial-temporal dependent aggregation of 3D digital models, by incorporating a predictive assessment procedure to indicate which regions (surfaces) of an object should be reconstructed at higher levels of details at next time instances and which at lower ones. In this way, dynamic change history maps are created, indicating spatial probabilities of regions needed further 3D modelling at forthcoming instances. Using these maps, predictive assessment can be made, that is, to localize surfaces within the objects where a high accuracy reconstruction process needs to be activated at the forthcoming time instances. The proposed 5D Digital Cultural Heritage Model (5D-DCHM) is implemented using open interoperable standards based on the CityGML framework, which also allows the description of additional semantic metadata information. Visualization aspects are also supported to allow easy manipulation, interaction and representation of the 5D-DCHM geometry and the respective semantic information. The open source 3DCityDB incorporating a PostgreSQL geo-database is used to manage and manipulate 3D data and their semantics.
Integration of the stratigraphic aspects of very large sea-floor databases using information processing

USGS Publications Warehouse

Jenkins, Clinton N.; Flocks, J.; Kulp, M.; ,

2006-01-01

Information-processing methods are described that integrate the stratigraphic aspects of large and diverse collections of sea-floor sample data. They efficiently convert common types of sea-floor data into database and GIS (geographical information system) tables, visual core logs, stratigraphic fence diagrams and sophisticated stratigraphic statistics. The input data are held in structured documents, essentially written core logs that are particularly efficient to create from raw input datasets. Techniques are described that permit efficient construction of regional databases consisting of hundreds of cores. The sedimentological observations in each core are located by their downhole depths (metres below sea floor - mbsf) and also by a verbal term that describes the sample 'situation' - a special fraction of the sediment or position in the core. The main processing creates a separate output event for each instance of top, bottom and situation, assigning top-base mbsf values from numeric or, where possible, from word-based relative locational information such as 'core catcher' in reference to sampler device, and recovery or penetration length. The processing outputs represent the sub-bottom as a sparse matrix of over 20 sediment properties of interest, such as grain size, porosity and colour. They can be plotted in a range of core-log programs including an in-built facility that better suits the requirements of sea-floor data. Finally, a suite of stratigraphic statistics are computed, including volumetric grades, overburdens, thicknesses and degrees of layering. ?? The Geological Society of London 2006.
Instrument Failures for the da Vinci Surgical System: a Food and Drug Administration MAUDE Database Study.

PubMed

Friedman, Diana C W; Lendvay, Thomas S; Hannaford, Blake

2013-05-01

Our goal was to analyze reported instances of the da Vinci robotic surgical system instrument failures using the FDA's MAUDE (Manufacturer and User Facility Device Experience) database. From these data we identified some root causes of failures as well as trends that may assist surgeons and users of the robotic technology. We conducted a survey of the MAUDE database and tallied robotic instrument failures that occurred between January 2009 and December 2010. We categorized failures into five main groups (cautery, shaft, wrist or tool tip, cable, and control housing) based on technical differences in instrument design and function. A total of 565 instrument failures were documented through 528 reports. The majority of failures (285) were of the instrument's wrist or tool tip. Cautery problems comprised 174 failures, 76 were shaft failures, 29 were cable failures, and 7 were control housing failures. Of the reports, 10 had no discernible failure mode and 49 exhibited multiple failures. The data show that a number of robotic instrument failures occurred in a short period of time. In reality, many instrument failures may go unreported, thus a true failure rate cannot be determined from these data. However, education of hospital administrators, operating room staff, surgeons, and patients should be incorporated into discussions regarding the introduction and utilization of robotic technology. We recommend institutions incorporate standard failure reporting policies so that the community of robotic surgery companies and surgeons can improve on existing technologies for optimal patient safety and outcomes.
A Comparison of Different Database Technologies for the CMS AsyncStageOut Transfer Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ciangottini, D.; Balcas, J.; Mascheroni, M.

AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses amore » NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.« less

Collaborative Resource Allocation

NASA Technical Reports Server (NTRS)

Wang, Yeou-Fang; Wax, Allan; Lam, Raymond; Baldwin, John; Borden, Chester

2007-01-01

Collaborative Resource Allocation Networking Environment (CRANE) Version 0.5 is a prototype created to prove the newest concept of using a distributed environment to schedule Deep Space Network (DSN) antenna times in a collaborative fashion. This program is for all space-flight and terrestrial science project users and DSN schedulers to perform scheduling activities and conflict resolution, both synchronously and asynchronously. Project schedulers can, for the first time, participate directly in scheduling their tracking times into the official DSN schedule, and negotiate directly with other projects in an integrated scheduling system. A master schedule covers long-range, mid-range, near-real-time, and real-time scheduling time frames all in one, rather than the current method of separate functions that are supported by different processes and tools. CRANE also provides private workspaces (both dynamic and static), data sharing, scenario management, user control, rapid messaging (based on Java Message Service), data/time synchronization, workflow management, notification (including emails), conflict checking, and a linkage to a schedule generation engine. The data structure with corresponding database design combines object trees with multiple associated mortal instances and relational database to provide unprecedented traceability and simplify the existing DSN XML schedule representation. These technologies are used to provide traceability, schedule negotiation, conflict resolution, and load forecasting from real-time operations to long-range loading analysis up to 20 years in the future. CRANE includes a database, a stored procedure layer, an agent-based middle tier, a Web service wrapper, a Windows Integrated Analysis Environment (IAE), a Java application, and a Web page interface.
'Isotopo' a database application for facile analysis and management of mass isotopomer data.

PubMed

Ahmed, Zeeshan; Zeeshan, Saman; Huber, Claudia; Hensel, Michael; Schomburg, Dietmar; Münch, Richard; Eylert, Eva; Eisenreich, Wolfgang; Dandekar, Thomas

2014-01-01

The composition of stable-isotope labelled isotopologues/isotopomers in metabolic products can be measured by mass spectrometry and supports the analysis of pathways and fluxes. As a prerequisite, the original mass spectra have to be processed, managed and stored to rapidly calculate, analyse and compare isotopomer enrichments to study, for instance, bacterial metabolism in infection. For such applications, we provide here the database application 'Isotopo'. This software package includes (i) a database to store and process isotopomer data, (ii) a parser to upload and translate different data formats for such data and (iii) an improved application to process and convert signal intensities from mass spectra of (13)C-labelled metabolites such as tertbutyldimethylsilyl-derivatives of amino acids. Relative mass intensities and isotopomer distributions are calculated applying a partial least square method with iterative refinement for high precision data. The data output includes formats such as graphs for overall enrichments in amino acids. The package is user-friendly for easy and robust data management of multiple experiments. The 'Isotopo' software is available at the following web link (section Download): http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. The package contains three additional files: software executable setup (installer), one data set file (discussed in this article) and one excel file (which can be used to convert data from excel to '.iso' format). The 'Isotopo' software is compatible only with the Microsoft Windows operating system. http://spp1316.uni-wuerzburg.de/bioinformatics/isotopo/. © The Author(s) 2014. Published by Oxford University Press.
A comparison of different database technologies for the CMS AsyncStageOut transfer database

NASA Astrophysics Data System (ADS)

Ciangottini, D.; Balcas, J.; Mascheroni, M.; Rupeika, E. A.; Vaandering, E.; Riahi, H.; Silva, J. M. D.; Hernandez, J. M.; Belforte, S.; Ivanov, T. T.

2017-10-01

AsyncStageOut (ASO) is the component of the CMS distributed data analysis system (CRAB) that manages users transfers in a centrally controlled way using the File Transfer System (FTS3) at CERN. It addresses a major weakness of the previous, decentralized model, namely that the transfer of the user’s output data to a single remote site was part of the job execution, resulting in inefficient use of job slots and an unacceptable failure rate. Currently ASO manages up to 600k files of various sizes per day from more than 500 users per month, spread over more than 100 sites. ASO uses a NoSQL database (CouchDB) as internal bookkeeping and as way to communicate with other CRAB components. Since ASO/CRAB were put in production in 2014, the number of transfers constantly increased up to a point where the pressure to the central CouchDB instance became critical, creating new challenges for the system scalability, performance, and monitoring. This forced a re-engineering of the ASO application to increase its scalability and lowering its operational effort. In this contribution we present a comparison of the performance of the current NoSQL implementation and a new SQL implementation, and how their different strengths and features influenced the design choices and operational experience. We also discuss other architectural changes introduced in the system to handle the increasing load and latency in delivering output to the user.
dbSUPER: a database of super-enhancers in mouse and human genome

PubMed Central

Khan, Aziz; Zhang, Xuegong

2016-01-01

Super-enhancers are clusters of transcriptional enhancers that drive cell-type-specific gene expression and are crucial to cell identity. Many disease-associated sequence variations are enriched in super-enhancer regions of disease-relevant cell types. Thus, super-enhancers can be used as potential biomarkers for disease diagnosis and therapeutics. Current studies have identified super-enhancers in more than 100 cell types and demonstrated their functional importance. However, a centralized resource to integrate all these findings is not currently available. We developed dbSUPER (http://bioinfo.au.tsinghua.edu.cn/dbsuper/), the first integrated and interactive database of super-enhancers, with the primary goal of providing a resource for assistance in further studies related to transcriptional control of cell identity and disease. dbSUPER provides a responsive and user-friendly web interface to facilitate efficient and comprehensive search and browsing. The data can be easily sent to Galaxy instances, GREAT and Cistrome web-servers for downstream analysis, and can also be visualized in the UCSC genome browser where custom tracks can be added automatically. The data can be downloaded and exported in variety of formats. Furthermore, dbSUPER lists genes associated with super-enhancers and also links to external databases such as GeneCards, UniProt and Entrez. dbSUPER also provides an overlap analysis tool to annotate user-defined regions. We believe dbSUPER is a valuable resource for the biology and genetic research communities. PMID:26438538
Biomine: predicting links between biological entities using network models of heterogeneous databases.

PubMed

Eronen, Lauri; Toivonen, Hannu

2012-06-06

Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Wireless access to a pharmaceutical database: a demonstrator for data driven Wireless Application Protocol (WAP) applications in medical information processing.

PubMed

Schacht Hansen, M; Dørup, J

2001-01-01

The Wireless Application Protocol technology implemented in newer mobile phones has built-in facilities for handling much of the information processing needed in clinical work. To test a practical approach we ported a relational database of the Danish pharmaceutical catalogue to Wireless Application Protocol using open source freeware at all steps. We used Apache 1.3 web software on a Linux server. Data containing the Danish pharmaceutical catalogue were imported from an ASCII file into a MySQL 3.22.32 database using a Practical Extraction and Report Language script for easy update of the database. Data were distributed in 35 interrelated tables. Each pharmaceutical brand name was given its own card with links to general information about the drug, active substances, contraindications etc. Access was available through 1) browsing therapeutic groups and 2) searching for a brand name. The database interface was programmed in the server-side scripting language PHP3. A free, open source Wireless Application Protocol gateway to a pharmaceutical catalogue was established to allow dial-in access independent of commercial Wireless Application Protocol service providers. The application was tested on the Nokia 7110 and Ericsson R320s cellular phones. We have demonstrated that Wireless Application Protocol-based access to a dynamic clinical database can be established using open source freeware. The project opens perspectives for a further integration of Wireless Application Protocol phone functions in clinical information processing: Global System for Mobile communication telephony for bilateral communication, asynchronous unilateral communication via e-mail and Short Message Service, built-in calculator, calendar, personal organizer, phone number catalogue and Dictaphone function via answering machine technology. An independent Wireless Application Protocol gateway may be placed within hospital firewalls, which may be an advantage with respect to security. However, if Wireless Application Protocol phones are to become effective tools for physicians, special attention must be paid to the limitations of the devices. Input tools of Wireless Application Protocol phones should be improved, for instance by increased use of speech control.
Wireless access to a pharmaceutical database: A demonstrator for data driven Wireless Application Protocol applications in medical information processing

PubMed Central

Hansen, Michael Schacht

2001-01-01

Background The Wireless Application Protocol technology implemented in newer mobile phones has built-in facilities for handling much of the information processing needed in clinical work. Objectives To test a practical approach we ported a relational database of the Danish pharmaceutical catalogue to Wireless Application Protocol using open source freeware at all steps. Methods We used Apache 1.3 web software on a Linux server. Data containing the Danish pharmaceutical catalogue were imported from an ASCII file into a MySQL 3.22.32 database using a Practical Extraction and Report Language script for easy update of the database. Data were distributed in 35 interrelated tables. Each pharmaceutical brand name was given its own card with links to general information about the drug, active substances, contraindications etc. Access was available through 1) browsing therapeutic groups and 2) searching for a brand name. The database interface was programmed in the server-side scripting language PHP3. Results A free, open source Wireless Application Protocol gateway to a pharmaceutical catalogue was established to allow dial-in access independent of commercial Wireless Application Protocol service providers. The application was tested on the Nokia 7110 and Ericsson R320s cellular phones. Conclusions We have demonstrated that Wireless Application Protocol-based access to a dynamic clinical database can be established using open source freeware. The project opens perspectives for a further integration of Wireless Application Protocol phone functions in clinical information processing: Global System for Mobile communication telephony for bilateral communication, asynchronous unilateral communication via e-mail and Short Message Service, built-in calculator, calendar, personal organizer, phone number catalogue and Dictaphone function via answering machine technology. An independent Wireless Application Protocol gateway may be placed within hospital firewalls, which may be an advantage with respect to security. However, if Wireless Application Protocol phones are to become effective tools for physicians, special attention must be paid to the limitations of the devices. Input tools of Wireless Application Protocol phones should be improved, for instance by increased use of speech control. PMID:11720946
Statistical inference of static analysis rules

NASA Technical Reports Server (NTRS)

Engler, Dawson Richards (Inventor)

2009-01-01

Various apparatus and methods are disclosed for identifying errors in program code. Respective numbers of observances of at least one correctness rule by different code instances that relate to the at least one correctness rule are counted in the program code. Each code instance has an associated counted number of observances of the correctness rule by the code instance. Also counted are respective numbers of violations of the correctness rule by different code instances that relate to the correctness rule. Each code instance has an associated counted number of violations of the correctness rule by the code instance. A respective likelihood of the validity is determined for each code instance as a function of the counted number of observances and counted number of violations. The likelihood of validity indicates a relative likelihood that a related code instance is required to observe the correctness rule. The violations may be output in order of the likelihood of validity of a violated correctness rule.
Surgical scheduling: a lean approach to process improvement.

PubMed

Simon, Ross William; Canacari, Elena G

2014-01-01

A large teaching hospital in the northeast United States had an inefficient, paper-based process for scheduling orthopedic surgery that caused delays and contributed to site/side discrepancies. The hospital's leaders formed a team with the goals of developing a safe, effective, patient-centered, timely, efficient, and accurate orthopedic scheduling process; smoothing the schedule so that block time was allocated more evenly; and ensuring correct site/side. Under the resulting process, real-time patient information is entered into a database during the patient's preoperative visit in the surgeon's office. The team found the new process reduced the occurrence of site/side discrepancies to zero, reduced instances of changing the sequence of orthopedic procedures by 70%, and increased patient satisfaction. Copyright © 2014 AORN, Inc. Published by Elsevier Inc. All rights reserved.
Incomplete Data in Smart Grid: Treatment of Values in Electric Vehicle Charging Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Majipour, Mostafa; Chu, Peter; Gadh, Rajit

2014-11-03

In this paper, five imputation methods namely Constant (zero), Mean, Median, Maximum Likelihood, and Multiple Imputation methods have been applied to compensate for missing values in Electric Vehicle (EV) charging data. The outcome of each of these methods have been used as the input to a prediction algorithm to forecast the EV load in the next 24 hours at each individual outlet. The data is real world data at the outlet level from the UCLA campus parking lots. Given the sparsity of the data, both Median and Constant (=zero) imputations improved the prediction results. Since in most missing value casesmore » in our database, all values of that instance are missing, the multivariate imputation methods did not improve the results significantly compared to univariate approaches.« less
Design and implementation of a health data interoperability mediator.

PubMed

Kuo, Mu-Hsing; Kushniruk, Andre William; Borycki, Elizabeth Marie

2010-01-01

The objective of this study is to design and implement a common-gateway oriented mediator to solve the health data interoperability problems that exist among heterogeneous health information systems. The proposed mediator has three main components: (1) a Synonym Dictionary (SD) that stores a set of global metadata and terminologies to serve as the mapping intermediary, (2) a Semantic Mapping Engine (SME) that can be used to map metadata and instance semantics, and (3) a DB-to-XML module that translates source health data stored in a database into XML format and back. A routine admission notification data exchange scenario is used to test the efficiency and feasibility of the proposed mediator. The study results show that the proposed mediator can make health information exchange more efficient.
New spectroscopy in the HITRAN2016 database and its impact on atmospheric retrievals

NASA Astrophysics Data System (ADS)

Gordon, I.; Rothman, L. S.; Kochanov, R. V.; Tan, Y.; Toon, G. C.

2017-12-01

The HITRAN spectroscopic database is a backbone of the interpretation of spectral atmospheric retrievals and is an important input to the radiative transfer codes. The database is serving the atmospheric community for nearly half-a-century with every new edition being released every four years. The most recent release of the database is HITRAN2016 [1]. It consists of line-by-line lists, experimental absorption cross-sections, collision-induced absorption data and aerosol indices of refraction. In this presentation it will be stressed the importance of using the most recent edition of the database in the radiative transfer codes. The line-by-line lists for most of the HITRAN molecules were updated (and two new molecules added) in comparison with the previous compilation HITRAN2012 [2] that has been in use, along with some intermediate updates, since 2012. The extent of the updates ranges from updating a few lines of certain molecules to complete replacements of the lists and introduction of additional isotopologues. In addition, the amount of molecules in cross-sectional part of the database has increased dramatically from nearly 50 to over 300. The molecules covered by the HITRAN database are important in planetary remote sensing, environment monitoring (in particular, biomass burning detection), climate applications, industrial pollution tracking, atrophysics, and more. Taking advantage of the new structure and interface available at www.hitran.org [3] and the HITRAN Application Programming Interface [4] the amount of parameters has also been significantly increased, now incorporating, for instance, non-Voigt line profiles [5]; broadening by gases other than air and "self" [6]; and other phenomena, including line mixing. This is a very important novelty that needs to be properly introduced in the radiative transfer codes in order to advance accurate interpretation of the remote sensing retrievals. This work is supported by the NASA PDART (NNX16AG51G) and AURA (NNX 17AI78G) programs. References[1] I.E. Gordon et al, JQSRT in press (2017) http://doi.org/10.1016/j.jqsrt.2017.06.038. [2] L.S. Rothman et al, JQSRT 130, 4 (2013). [3] C. Hill et al, JQSRT 177, 4 (2016). [4] R.V. Kochanov et al, JQSRT 177, 15 (2016). [5] P. Wcisło et al., JQSRT 177, 75 (2016). [6] J. S. Wilzewski et al., JQSRT 168, 193 (2016).
Experimental Matching of Instances to Heuristics for Constraint Satisfaction Problems.

PubMed

Moreno-Scott, Jorge Humberto; Ortiz-Bayliss, José Carlos; Terashima-Marín, Hugo; Conant-Pablos, Santiago Enrique

2016-01-01

Constraint satisfaction problems are of special interest for the artificial intelligence and operations research community due to their many applications. Although heuristics involved in solving these problems have largely been studied in the past, little is known about the relation between instances and the respective performance of the heuristics used to solve them. This paper focuses on both the exploration of the instance space to identify relations between instances and good performing heuristics and how to use such relations to improve the search. Firstly, the document describes a methodology to explore the instance space of constraint satisfaction problems and evaluate the corresponding performance of six variable ordering heuristics for such instances in order to find regions on the instance space where some heuristics outperform the others. Analyzing such regions favors the understanding of how these heuristics work and contribute to their improvement. Secondly, we use the information gathered from the first stage to predict the most suitable heuristic to use according to the features of the instance currently being solved. This approach proved to be competitive when compared against the heuristics applied in isolation on both randomly generated and structured instances of constraint satisfaction problems.
Experimental Matching of Instances to Heuristics for Constraint Satisfaction Problems

PubMed Central

Moreno-Scott, Jorge Humberto; Ortiz-Bayliss, José Carlos; Terashima-Marín, Hugo; Conant-Pablos, Santiago Enrique

2016-01-01

Constraint satisfaction problems are of special interest for the artificial intelligence and operations research community due to their many applications. Although heuristics involved in solving these problems have largely been studied in the past, little is known about the relation between instances and the respective performance of the heuristics used to solve them. This paper focuses on both the exploration of the instance space to identify relations between instances and good performing heuristics and how to use such relations to improve the search. Firstly, the document describes a methodology to explore the instance space of constraint satisfaction problems and evaluate the corresponding performance of six variable ordering heuristics for such instances in order to find regions on the instance space where some heuristics outperform the others. Analyzing such regions favors the understanding of how these heuristics work and contribute to their improvement. Secondly, we use the information gathered from the first stage to predict the most suitable heuristic to use according to the features of the instance currently being solved. This approach proved to be competitive when compared against the heuristics applied in isolation on both randomly generated and structured instances of constraint satisfaction problems. PMID:26949383
Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

PubMed

Santos, Carlos; Eggle, Daniela; States, David J

2005-04-15

Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases. The pipeline software components are freely available on request to the authors. dstates@umich.edu http://stateslab.bioinformatics.med.umich.edu/software.html.
Toward An Unstructured Mesh Database

NASA Astrophysics Data System (ADS)

Rezaei Mahdiraji, Alireza; Baumann, Peter Peter

2014-05-01

Unstructured meshes are used in several application domains such as earth sciences (e.g., seismology), medicine, oceanography, cli- mate modeling, GIS as approximate representations of physical objects. Meshes subdivide a domain into smaller geometric elements (called cells) which are glued together by incidence relationships. The subdivision of a domain allows computational manipulation of complicated physical structures. For instance, seismologists model earthquakes using elastic wave propagation solvers on hexahedral meshes. The hexahedral con- tains several hundred millions of grid points and millions of hexahedral cells. Each vertex node in the hexahedrals stores a multitude of data fields. To run simulation on such meshes, one needs to iterate over all the cells, iterate over incident cells to a given cell, retrieve coordinates of cells, assign data values to cells, etc. Although meshes are used in many application domains, to the best of our knowledge there is no database vendor that support unstructured mesh features. Currently, the main tool for querying and manipulating unstructured meshes are mesh libraries, e.g., CGAL and GRAL. Mesh li- braries are dedicated libraries which includes mesh algorithms and can be run on mesh representations. The libraries do not scale with dataset size, do not have declarative query language, and need deep C++ knowledge for query implementations. Furthermore, due to high coupling between the implementations and input file structure, the implementations are less reusable and costly to maintain. A dedicated mesh database offers the following advantages: 1) declarative querying, 2) ease of maintenance, 3) hiding mesh storage structure from applications, and 4) transparent query optimization. To design a mesh database, the first challenge is to define a suitable generic data model for unstructured meshes. We proposed ImG-Complexes data model as a generic topological mesh data model which extends incidence graph model to multi-incidence relationships. We instrument ImG model with sets of optional and application-specific constraints which can be used to check validity of meshes for a specific class of object such as manifold, pseudo-manifold, and simplicial manifold. We conducted experiments to measure the performance of the graph database solution in processing mesh queries and compare it with GrAL mesh library and PostgreSQL database on synthetic and real mesh datasets. The experiments show that each system perform well on specific types of mesh queries, e.g., graph databases perform well on global path-intensive queries. In the future, we investigate database operations for the ImG model and design a mesh query language.
Maximum margin multiple instance clustering with applications to image and text clustering.

PubMed

Zhang, Dan; Wang, Fei; Si, Luo; Li, Tao

2011-05-01

In multiple instance learning problems, patterns are often given as bags and each bag consists of some instances. Most of existing research in the area focuses on multiple instance classification and multiple instance regression, while very limited work has been conducted for multiple instance clustering (MIC). This paper formulates a novel framework, maximum margin multiple instance clustering (M(3)IC), for MIC. However, it is impractical to directly solve the optimization problem of M(3)IC. Therefore, M(3)IC is relaxed in this paper to enable an efficient optimization solution with a combination of the constrained concave-convex procedure and the cutting plane method. Furthermore, this paper presents some important properties of the proposed method and discusses the relationship between the proposed method and some other related ones. An extensive set of empirical results are shown to demonstrate the advantages of the proposed method against existing research for both effectiveness and efficiency.
Toward a Cognitive Task Analysis for Biomedical Query Mediation

PubMed Central

Hruby, Gregory W.; Cimino, James J.; Patel, Vimla; Weng, Chunhua

2014-01-01

In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM. PMID:25954589
Toward a cognitive task analysis for biomedical query mediation.

PubMed

Hruby, Gregory W; Cimino, James J; Patel, Vimla; Weng, Chunhua

2014-01-01

In many institutions, data analysts use a Biomedical Query Mediation (BQM) process to facilitate data access for medical researchers. However, understanding of the BQM process is limited in the literature. To bridge this gap, we performed the initial steps of a cognitive task analysis using 31 BQM instances conducted between one analyst and 22 researchers in one academic department. We identified five top-level tasks, i.e., clarify research statement, explain clinical process, identify related data elements, locate EHR data element, and end BQM with either a database query or unmet, infeasible information needs, and 10 sub-tasks. We evaluated the BQM task model with seven data analysts from different clinical research institutions. Evaluators found all the tasks completely or semi-valid. This study contributes initial knowledge towards the development of a generalizable cognitive task representation for BQM.
Toward Computational Cumulative Biology by Combining Models of Biological Datasets

PubMed Central

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations—for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database. PMID:25427176

Toward computational cumulative biology by combining models of biological datasets.

PubMed

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
PEP725 Pan European Phenological Database

NASA Astrophysics Data System (ADS)

Koch, E.; Lipa, W.; Ungersböck, M.; Zach-Hermann, S.

2012-04-01

PEP725 is a 5 years project with the main object to promote and facilitate phenological research by delivering a pan European phenological database with an open, unrestricted data access for science, research and education. PEP725 is funded by EUMETNET (the network of European meteorological services), ZAMG and the Austrian ministry for science & research bm:w_f. So far 16 European national meteorological services and 7 partners from different nati-onal phenological network operators have joined PEP725. The data access is very easy via web-access from the homepage www.pep725.eu. Ha-ving accepted the PEP725 data policy and registry the data download can be done by different criteria as for instance the selection of a specific plant or all data from one country. At present more than 300 000 new records are available in the PEP725 data-base coming from 31 European countries and from 8150 stations. For some more sta-tions (154) META data (location and data holder) are provided. Links to the network operators and data owners are also on the webpage in case you have more sophisticated questions about the data. Another objective of PEP725 is to bring together network-operators and scientists by organizing workshops. In April 2012 the second of these workshops will take place on the premises of ZAMG. Invited speakers will give presentations spanning the whole study area of phenology starting from observations to modelling. Quality checking is also a big issue. At the moment we study the literature to find ap-propriate methods.
Using remote sensing to predict earthquake impacts

NASA Astrophysics Data System (ADS)

Fylaktos, Asimakis; Yfantidou, Anastasia

2017-09-01

Natural hazards like earthquakes can result to enormous property damage, and human casualties in mountainous areas. Italy has always been exposed to numerous earthquakes, mostly concentrated in central and southern regions. Last year, two seismic events near Norcia (central Italy) have occurred, which led to substantial loss of life and extensive damage to properties, infrastructure and cultural heritage. This research utilizes remote sensing products and GIS software, to provide a database of information. We used both SAR images of Sentinel 1A and optical imagery of Landsat 8 to examine the differences of topography with the aid of the multi temporal monitoring technique. This technique suits for the observation of any surface deformation. This database is a cluster of information regarding the consequences of the earthquakes in groups, such as property and infrastructure damage, regional rifts, cultivation loss, landslides and surface deformations amongst others, all mapped on GIS software. Relevant organizations can implement these data in order to calculate the financial impact of these types of earthquakes. In the future, we can enrich this database including more regions and enhance the variety of its applications. For instance, we could predict the future impacts of any type of earthquake in several areas, and design a preliminarily model of emergency for immediate evacuation and quick recovery response. It is important to know how the surface moves, in particular geographical regions like Italy, Cyprus and Greece, where earthquakes are so frequent. We are not able to predict earthquakes, but using data from this research, we may assess the damage that could be caused in the future.
A real-time ECG data compression and transmission algorithm for an e-health device.

PubMed

Lee, SangJoon; Kim, Jungkuk; Lee, Myoungho

2011-09-01

This paper introduces a real-time data compression and transmission algorithm between e-health terminals for a periodic ECGsignal. The proposed algorithm consists of five compression procedures and four reconstruction procedures. In order to evaluate the performance of the proposed algorithm, the algorithm was applied to all 48 recordings of MIT-BIH arrhythmia database, and the compress ratio (CR), percent root mean square difference (PRD), percent root mean square difference normalized (PRDN), rms, SNR, and quality score (QS) values were obtained. The result showed that the CR was 27.9:1 and the PRD was 2.93 on average for all 48 data instances with a 15% window size. In addition, the performance of the algorithm was compared to those of similar algorithms introduced recently by others. It was found that the proposed algorithm showed clearly superior performance in all 48 data instances at a compression ratio lower than 15:1, whereas it showed similar or slightly inferior PRD performance for a data compression ratio higher than 20:1. In light of the fact that the similarity with the original data becomes meaningless when the PRD is higher than 2, the proposed algorithm shows significantly better performance compared to the performance levels of other algorithms. Moreover, because the algorithm can compress and transmit data in real time, it can be served as an optimal biosignal data transmission method for limited bandwidth communication between e-health devices.
An adaptable architecture for patient cohort identification from diverse data sources.

PubMed

Bache, Richard; Miles, Simon; Taweel, Adel

2013-12-01

We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity.
The Function Biomedical Informatics Research Network Data Repository.

PubMed

Keator, David B; van Erp, Theo G M; Turner, Jessica A; Glover, Gary H; Mueller, Bryon A; Liu, Thomas T; Voyvodic, James T; Rasmussen, Jerod; Calhoun, Vince D; Lee, Hyo Jong; Toga, Arthur W; McEwen, Sarah; Ford, Judith M; Mathalon, Daniel H; Diaz, Michele; O'Leary, Daniel S; Jeremy Bockholt, H; Gadde, Syam; Preda, Adrian; Wible, Cynthia G; Stern, Hal S; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G

2016-01-01

The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical data sets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 data set consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 T scanners. The FBIRN Phase 2 and Phase 3 data sets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN's multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. Copyright © 2015 Elsevier Inc. All rights reserved.
An infrastructure to mine molecular descriptors for ligand selection on virtual screening.

PubMed

Seus, Vinicius Rosa; Perazzo, Giovanni Xavier; Winck, Ana T; Werhli, Adriano V; Machado, Karina S

2014-01-01

The receptor-ligand interaction evaluation is one important step in rational drug design. The databases that provide the structures of the ligands are growing on a daily basis. This makes it impossible to test all the ligands for a target receptor. Hence, a ligand selection before testing the ligands is needed. One possible approach is to evaluate a set of molecular descriptors. With the aim of describing the characteristics of promising compounds for a specific receptor we introduce a data warehouse-based infrastructure to mine molecular descriptors for virtual screening (VS). We performed experiments that consider as target the receptor HIV-1 protease and different compounds for this protein. A set of 9 molecular descriptors are taken as the predictive attributes and the free energy of binding is taken as a target attribute. By applying the J48 algorithm over the data we obtain decision tree models that achieved up to 84% of accuracy. The models indicate which molecular descriptors and their respective values are relevant to influence good FEB results. Using their rules we performed ligand selection on ZINC database. Our results show important reduction in ligands selection to be applied in VS experiments; for instance, the best selection model picked only 0.21% of the total amount of drug-like ligands.
In silico search for functionally similar proteins involved in meiosis and recombination in evolutionarily distant organisms.

PubMed

Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M

2003-01-01

Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
Quantitative Evaluation of Compliance with Recommendation for Sulfonylurea Dose Co-Administered with DPP-4 Inhibitors in Japan

PubMed Central

Kimura, Tomomi; Shiosakai, Kazuhito; Takeda, Yasuaki; Takahashi, Shinji; Kobayashi, Masahiko; Sakaguchi, Motonobu

2012-01-01

After the launch of dipeptidyl peptidase-4 (DPP-4), a new oral hypoglycemic drug (OHD), in December 2009, severe hypoglycemia cases were reported in Japan. Although the definite cause was unknown, co-administration with sulfonylureas (SU) was suspected as one of the potential risk factors. The Japan Association for Diabetes Education and Care (JADEC) released a recommendation in April 2010 to lower the dose of three major SUs (glimepiride, glibenclamide, and gliclazide) when adding a DPP-4 inhibitor. To evaluate the effectiveness of this risk minimization action along with labeling changes, dispensing records for 114,263 patients prescribed OHDs between December 2008 and December 2010 were identified in the Nihon-Chouzai pharmacy claims database. The adherence to the recommended dosing of SU co-prescribed with DPP-4 inhibitors increased from 46.3% before to 63.8% after the JADEC recommendation (p < 0.01 by time-series analysis), while no change was found in those for SU monotherapy and SU with other OHD co-prescriptions. The adherence was significantly worse for those receiving a glibenclamide prescription. The JADEC recommendation, along with labeling changes, appeared to have a favorable effect on the risk minimization action in Japan. In these instances, a pharmacy claims database can be a useful tool to evaluate risk minimization actions. PMID:24300302
The role of sensory perception in the development and targeting of tobacco products.

PubMed

Carpenter, Carrie M; Wayne, Geoffrey Ferris; Connolly, Gregory N

2007-01-01

To examine tobacco industry research on smoking-related sensory effects, including differences in sensory perception across smoker groups, and to determine whether this research informed targeted product development and impacted the development of commercial tobacco products. We searched previously secret internal tobacco industry documents available online through document databases housed at Tobacco Documents Online, the British American Tobacco Document Archive and the Legacy Tobacco Documents Library. We identified relevant documents using a snowball sampling method to first search the databases using an initial set of key words and to then establish further search terms. Sensory research is a priority within the tobacco industry directly impacting commercial markets both in the United States and internationally. Sensory factors contribute to smoker satisfaction and product acceptance, and play an important role in controlling puffing behavior. Cigarette manufacturers have capitalized on distinct sensory preferences across gender, age and ethnic groups by tailoring products for specific populations. Regulation of tobacco products is needed to address product changes that are used to reinforce or contribute to tobacco dependence; for instance, the incorporation of additives that target attributes such as smoothness, harshness and aftertaste. Greater understanding of the role of sensory effects on smoking behavior may also help to inform the development of tobacco treatment options that support long-term tobacco abstinence.
Instance annotation for multi-instance multi-label learning

Treesearch

F. Briggs; X.Z. Fern; R. Raich; Q. Lou

2013-01-01

Multi-instance multi-label learning (MIML) is a framework for supervised classification where the objects to be classified are bags of instances associated with multiple labels. For example, an image can be represented as a bag of segments and associated with a list of objects it contains. Prior work on MIML has focused on predicting label sets for previously unseen...
Multi-user investigation organizer

NASA Technical Reports Server (NTRS)

Panontin, Tina L. (Inventor); Williams, James F. (Inventor); Carvalho, Robert E. (Inventor); Sturken, Ian (Inventor); Wolfe, Shawn R. (Inventor); Gawdiak, Yuri O. (Inventor); Keller, Richard M. (Inventor)

2009-01-01

A system that allows a team of geographically dispersed users to collaboratively analyze a mishap event. The system includes a reconfigurable ontology, including instances that are related to and characterize the mishap, a semantic network that receives, indexes and stores, for retrieval, viewing and editing, the instances and links between the instances, a network browser interface for retrieving and viewing screens that present the instances and links to other instances and that allow editing thereof, and a rule-based inference engine, including a collection of rules associated with establishment of links between the instances. A possible conclusion arising from analysis of the mishap event may be characterized as one or more of: not a credible conclusion; an unlikely conclusion; a credible conclusion; conclusion needs analysis; conclusion needs supporting data; conclusion proposed to be closed; and an un-reviewed conclusion.
An ethnobotanical survey of medicinal plants used in the East Sepik province of Papua New Guinea.

PubMed

Koch, Michael; Kehop, Dickson Andrew; Kinminja, Boniface; Sabak, Malcolm; Wavimbukie, Graham; Barrows, Katherine M; Matainaho, Teatulohi K; Barrows, Louis R; Rai, Prem P

2015-11-14

Rapid modernization in the East Sepik (ES) Province of Papua New Guinea (PNG) is resulting in a decrease in individuals knowledgeable in medicinal plant use. Here we report a synthesis and comparison of traditional medicinal plant use from four ethnically distinct locations in the ES Province and furthermore compare them to two other previous reports of traditional plant use from different provinces of PNG. This manuscript is based on an annotated combination of four Traditional Medicines (TM) survey reports generated by University of Papua New Guinea (UPNG) trainees. The surveys utilized a questionnaire titled "Information sheet on traditional herbal preparations and medicinal plants of PNG", administered in the context of the TM survey project which is supported by WHO, US NIH and PNG governmental health care initiatives and funding. Regional and transregional comparison of medicinal plant utilization was facilitated by using existing plant databases: the UPNG TM Database and the PNG Plant Database (PNG Plants) using Bayesian statistical analysis. Medicinal plant use between four distinct dialect study areas in the ES Province of PNG showed that only a small fraction of plants had shared use in each area, however usually utilizing different plant parts, being prepared differently and to treat different medical conditions. Several instances of previously unreported medicinal plants could be located. Medicinally under- and over-utilized plants were found both in the regional reports and in a transregional analysis, thus showing that these medicinal utilization frequencies differ between provinces. Documentation of consistent plant use argues for efficacy and is particularly important since established and effective herbal medicinal interventions are sorely needed in the rural areas of PNG, and unfortunately clinical validation for the same is often lacking. Despite the existence of a large corpus of medical annotation of plants for PNG, previously unknown medical uses of plants can be uncovered. Furthermore, comparisons of medicinal plant utilization is possible if databases are reformatted for consistencies that allow comparisons. A concerted effort in building easily comparable databases could dramatically facilitate ethnopharmacological analysis of the existing plant diversity.
What can we learn from national-scale geodata describing soil erosion?

NASA Astrophysics Data System (ADS)

Benaud, Pia; Anderson, Karen; Carvalho, Jason; Evans, Martin; Glendell, Miriam; James, Mike; Lark, Murray; Quine, Timothy; Quinton, John; Rawlins, Barry; Rickson, Jane; Truckell, Ian; Brazier, Richard

2017-04-01

The United Kingdom has a rich dataset of soil erosion observations, which have been collected using a wide range of methodologies, across various spatial and temporal scales. Yet, while observations of soil erosion have been carried out along-side agricultural development and intensification, understanding whether or not the UK has a soil erosion problem remains a question to be answered. Furthermore, although good reviews of existing soil erosion rates exist, there is no single resource that brings all of this work together. Therefore, the primary aim of this research was to build a picture of why attempts to quantify erosion rates across the UK empirically have fallen short, through: (1) Collating all available, UK-based and empirically-derived soil erosion datasets into a spatially explicit and open-access database, (2) Developing an understanding of observed magnitudes of erosion, in the UK, (3) Evaluating impact of non-environmental controls on erosion observations i.e. study methodologies, and (4) Exploring trends between environmental controls and erosion rates. To-date, the database holds over 1500 records, which include results from both experimental and natural conditions, across arable, grassland and upland environments. Of the studies contained in the database, erosion has been observed ca. 40% of instances, ranging from <0.01 t.ha-1.yr-1 to 143 t.ha-1.yr-1. However, preliminary analysis has highlighted that over 90% of the studies included in the database only quantify soil loss via visible erosion features, such as rills or gullies, through volumetric assessments. Furthermore, there has been an inherent bias in the UK towards quantifying soil erosion in locations with either a known history or high probability of erosion occurrence. As a consequence, we conclude that such databases, may not be used to make a statistically unbiased assessment of national-scale erosion rates, however, they can highlight maximum likely rates under a wide range of soil, topography and land use conditions. Finally, this work suggests there is a strong argument for a replicable and statistically robust national soil erosion monitoring program to be carried out along-side the proposed sustainable intensification of agriculture.
Analysis of national and regional landslide inventories in Europe

NASA Astrophysics Data System (ADS)

Hervás, J.; Van Den Eeckhaut, M.

2012-04-01

A landslide inventory can be defined as a detailed register of the distribution and characteristics of past landslides in an area. Today most landslide inventories have the form of digital databases including landslide distribution maps and associated alphanumeric information for each landslide. While landslide inventories are of the utmost importance for land use planning and risk management through the generation of landslide zonation (susceptibility, hazard and risk) maps, landslide databases are thought to greatly differ from one country to another and often also within the same country. This hampers the generation of comparable, harmonised landslide zonation maps at national and continental scales, which is needed for policy and decision making at EU level as regarded for instance in the INSPIRE Directive and the Thematic Strategy for Soil Protection. In order to have a clear understanding of the landslide inventories available in Europe and their potential to produce landslide zonation maps as well as to draw recommendations to improve harmonisation and interoperability between landslide databases, we have surveyed 37 countries. In total, information has been collected and analysed for 24 national databases in 22 countries (Albania, Andorra, Austria, Bosnia and Herzegovina, Bulgaria, Czech Republic, Former Yugoslav Republic of Macedonia, France, Greece, Hungary, Iceland, Ireland, Italy, Norway, Poland, Portugal, Slovakia, Slovenia, Spain, Sweden, Switzerland and UK) and 22 regional databases in 10 countries. At the moment, over 633,000 landslides are recorded in national databases, representing on average less than 50% of the estimated landslides occurred in these countries. The sample of regional databases included over 103,000 landslides, with an estimated completeness substantially higher than that of national databases, as more attention can be paid for data collection over smaller regions. Yet, both for national and regional coverage, the data collection methods only occasionally included advanced technologies such as remote sensing. With regard to the inventory maps of most databases, the analysis illustrates the high variability of scales (between 1:10 000 and 1:1 M for national inventories, and from 1:10 000 to 1:25 000 for regional inventories), landslide classification systems and representation symbology. It also shows the difficulties to precisely locate landslides referred to in historical documents only. In addition, information on landslide magnitude, geometrical characteristics and age reported in national and regional databases greatly differs, even within the same database, as it strongly depends on the objectives of the database, the data collection methods used, the resources employed and the remaining landslide expression. In particular, landslide initiation and/or reactivation dates are generally estimated in less than 25% of records, thus making hazard and hence risk assessment difficult. In most databases, scarce information on landslide impact (damage and casualties) further hinders risk assessment at regional and national scales. Estimated landslide activity, which is very relevant to early warning and emergency management, is only included in half of the national databases and restricted to part of the landslides registered. Moreover, the availability of this information is not substantially higher in regional databases than in national ones. Most landslide databases further included information on geo-environmental characteristics at the landslide site, which is very important for modelling landslide zoning. Although a number of national and regional agencies provide free web-GIS visualisation services, the potential of existing landslide databases is often not fully exploited as, in many cases, access by the general public and external researchers is restricted. Additionally, the availability of information only in the national or local language is common to most national and regional databases, thus hampering consultation for most foreigners. Finally, some suggestions for a minimum set of attributes to be collected and made available by European countries for building up a continental landslide database in support of EU policies are presented. This study has been conducted in the framework of the EU-FP7 project SafeLand (Grant Agreement 22647).
Compilation of Disruptions to Airports by Volcanic Activity (Version 1.0, 1944-2006)

USGS Publications Warehouse

Guffanti, Marianne; Mayberry, Gari C.; Casadevall, Thomas J.; Wunderman, Richard

2008-01-01

Volcanic activity has caused significant hazards to numerous airports worldwide, with local to far-ranging effects on travelers and commerce. To more fully characterize the nature and scope of volcanic hazards to airports, we collected data on incidents of airports throughout the world that have been affected by volcanic activity, beginning in 1944 with the first documented instance of damage to modern aircraft and facilities in Naples, Italy, and extending through 2006. Information was gleaned from various sources, including news outlets, volcanological reports (particularly the Smithsonian Institution's Bulletin of the Global Volcanism Network), and previous publications on the topic. This report presents the full compilation of the data collected. For each incident, information about the affected airport and the volcanic source has been compiled as a record in a Microsoft Access database. The database is incomplete in so far as incidents may not have not been reported or documented, but it does present a good sample from diverse parts of the world. Not included are en-route diversions to avoid airborne ash clouds at cruise altitudes. The database has been converted to a Microsoft Excel spreadsheet. To make the PDF version of table 1 in this open-file report resemble the spreadsheet, order the PDF pages as 12, 17, 22; 13, 18, 23; 14, 19, 24; 15, 20, 25; and 16, 21, 26. Analysis of the database reveals that, at a minimum, 101 airports in 28 countries were impacted on 171 occasions from 1944 through 2006 by eruptions at 46 volcanoes. The number of affected airports (101) probably is better constrained than the number of incidents (171) because recurring disruptions at a given airport may have been lumped together or not reported by news agencies, whereas the initial disruption likely is noticed and reported and thus the airport correctly counted.
[Development of an analyzing system for soil parameters based on NIR spectroscopy].

PubMed

Zheng, Li-Hua; Li, Min-Zan; Sun, Hong

2009-10-01

A rapid estimation system for soil parameters based on spectral analysis was developed by using object-oriented (OO) technology. A class of SOIL was designed. The instance of the SOIL class is the object of the soil samples with the particular type, specific physical properties and spectral characteristics. Through extracting the effective information from the modeling spectral data of soil object, a map model was established between the soil parameters and its spectral data, while it was possible to save the mapping model parameters in the database of the model. When forecasting the content of any soil parameter, the corresponding prediction model of this parameter can be selected with the same soil type and the similar soil physical properties of objects. And after the object of target soil samples was carried into the prediction model and processed by the system, the accurate forecasting content of the target soil samples could be obtained. The system includes modules such as file operations, spectra pretreatment, sample analysis, calibrating and validating, and samples content forecasting. The system was designed to run out of equipment. The parameters and spectral data files (*.xls) of the known soil samples can be input into the system. Due to various data pretreatment being selected according to the concrete conditions, the results of predicting content will appear in the terminal and the forecasting model can be stored in the model database. The system reads the predicting models and their parameters are saved in the model database from the module interface, and then the data of the tested samples are transferred into the selected model. Finally the content of soil parameters can be predicted by the developed system. The system was programmed with Visual C++6.0 and Matlab 7.0. And the Access XP was used to create and manage the model database.
Pulsotype Diversity of Clostridium botulinum Strains Containing Serotypes A and/or B Genes

PubMed Central

Halpin, Jessica L.; Joseph, Lavin; Dykes, Janet K.; McCroskey, Loretta; Smith, Elise; Toney, Denise; Stroika, Steven; Hise, Kelley; Maslanka, Susan; Lúquez, Carolina

2017-01-01

Clostridium botulinum strains are prevalent in the environment and produce a potent neurotoxin that causes botulism, a rare but serious paralytic disease. In 2010, a national PulseNet database was established to curate C. botulinum pulsotypes and facilitate epidemiological investigations, particularly for serotypes A and B strains frequently associated with botulism cases in the United States. Between 2010 and 2014 we performed pulsed-field gel electrophoresis (PFGE) using a PulseNet protocol, uploaded the resulting PFGE patterns into a national database, and analyzed data according to PulseNet criteria (UPGMA clustering, Dice coefficient, 1.5% position tolerance, and 1.5% optimization). A retrospective data analysis was undertaken on 349 entries comprised of type A and B strains isolated from foodborne and infant cases to determine epidemiological relevance, resolution of the method, and the diversity of the database. Most studies to date on the pulsotype diversity of C. botulinum have encompassed very small sets of isolates; this study, with over 300 isolates, is more comprehensive than any published to date. Epidemiologically linked isolates had indistinguishable patterns, except in four instances and there were no obvious geographic trends noted. Simpson’s Index of Diversity (D) has historically been used to demonstrate species diversity and abundance within a group, and is considered a standard descriptor for PFGE databases. Simpson’s Index was calculated for each restriction endonuclease (SmaI, XhoI), the pattern combination SmaI-XhoI, as well as for each toxin serotype. The D values indicate that both enzymes provided better resolution for serotype B isolates than serotype A. XhoI as the secondary enzyme provided little additional discrimination for C. botulinum. SmaI patterns can be used to exclude unrelated isolates during a foodborne outbreak, but pulsotypes should always be considered concurrently with available epidemiological data. PMID:28692343
Progress connecting multi-disciplinary geoscience communities through the VIVO semantic web application

NASA Astrophysics Data System (ADS)

Gross, M. B.; Mayernik, M. S.; Rowan, L. R.; Khan, H.; Boler, F. M.; Maull, K. E.; Stott, D.; Williams, S.; Corson-Rikert, J.; Johns, E. M.; Daniels, M. D.; Krafft, D. B.

2015-12-01

UNAVCO, UCAR, and Cornell University are working together to leverage semantic web technologies to enable discovery of people, datasets, publications and other research products, as well as the connections between them. The EarthCollab project, an EarthCube Building Block, is enhancing an existing open-source semantic web application, VIVO, to address connectivity gaps across distributed networks of researchers and resources related to the following two geoscience-based communities: (1) the Bering Sea Project, an interdisciplinary field program whose data archive is hosted by NCAR's Earth Observing Laboratory (EOL), and (2) UNAVCO, a geodetic facility and consortium that supports diverse research projects informed by geodesy. People, publications, datasets and grant information have been mapped to an extended version of the VIVO-ISF ontology and ingested into VIVO's database. Data is ingested using a custom set of scripts that include the ability to perform basic automated and curated disambiguation. VIVO can display a page for every object ingested, including connections to other objects in the VIVO database. A dataset page, for example, includes the dataset type, time interval, DOI, related publications, and authors. The dataset type field provides a connection to all other datasets of the same type. The author's page will show, among other information, related datasets and co-authors. Information previously spread across several unconnected databases is now stored in a single location. In addition to VIVO's default display, the new database can also be queried using SPARQL, a query language for semantic data. EarthCollab will also extend the VIVO web application. One such extension is the ability to cross-link separate VIVO instances across institutions, allowing local display of externally curated information. For example, Cornell's VIVO faculty pages will display UNAVCO's dataset information and UNAVCO's VIVO will display Cornell faculty member contact and position information. Additional extensions, including enhanced geospatial capabilities, will be developed following task-centered usability testing.
Deep Convolutional Neural Networks for breast cancer screening.

PubMed

Chougrad, Hiba; Zouaki, Hamid; Alheyane, Omar

2018-04-01

Radiologists often have a hard time classifying mammography mass lesions which leads to unnecessary breast biopsies to remove suspicions and this ends up adding exorbitant expenses to an already burdened patient and health care system. In this paper we developed a Computer-aided Diagnosis (CAD) system based on deep Convolutional Neural Networks (CNN) that aims to help the radiologist classify mammography mass lesions. Deep learning usually requires large datasets to train networks of a certain depth from scratch. Transfer learning is an effective method to deal with relatively small datasets as in the case of medical images, although it can be tricky as we can easily start overfitting. In this work, we explore the importance of transfer learning and we experimentally determine the best fine-tuning strategy to adopt when training a CNN model. We were able to successfully fine-tune some of the recent, most powerful CNNs and achieved better results compared to other state-of-the-art methods which classified the same public datasets. For instance we achieved 97.35% accuracy and 0.98 AUC on the DDSM database, 95.50% accuracy and 0.97 AUC on the INbreast database and 96.67% accuracy and 0.96 AUC on the BCDR database. Furthermore, after pre-processing and normalizing all the extracted Regions of Interest (ROIs) from the full mammograms, we merged all the datasets to build one large set of images and used it to fine-tune our CNNs. The CNN model which achieved the best results, a 98.94% accuracy, was used as a baseline to build the Breast Cancer Screening Framework. To evaluate the proposed CAD system and its efficiency to classify new images, we tested it on an independent database (MIAS) and got 98.23% accuracy and 0.99 AUC. The results obtained demonstrate that the proposed framework is performant and can indeed be used to predict if the mass lesions are benign or malignant. Copyright © 2018 Elsevier B.V. All rights reserved.

Smoking and The Simpsons.

PubMed

Eslick, Guy D; Eslick, Marielle G

2009-06-01

To determine the frequency of smoking on The Simpsons television show, and the relationship with the sex and age groups of characters shown smoking, and with positive, negative and neutral connotations associated with instances of smoking. Content analysis (performed from January to October 2008) of instances of smoking that appeared in the first 18 seasons of The Simpsons television show, which aired from 1989 to 2007. Frequency, impact (positive, negative, neutral) of instances of smoking; and frequency associated with age (child or adolescent versus adult characters), sex and types of characters on the show. There were 795 instances of smoking in the 400 episodes observed. Most (498; 63%) involved male characters. Only 8% of instances of smoking (63) involved child or adolescent characters. Just over a third of instances of smoking (275; 35%) reflected smoking in a negative way, compared with the majority, which reflected smoking in a neutral way (504; 63%) and the minority, which reflected smoking in a positive way (16; 2%). Child and adolescent characters were much more likely to be involved in instances of smoking reflected in a negative way compared with adult characters (odds ratio, 44.93; 95% CI, 16.15-172.18). There are a large number of instances of smoking in The Simpsons television show. Child and adolescent characters are much more likely to be portrayed in instances of smoking reflected in a negative way than adult characters. Viewing The Simpsons characters smoking may prompt children to consider smoking at an early age.
Collaborative Data Publication Utilizing the Open Data Repository's (ODR) Data Publisher

NASA Technical Reports Server (NTRS)

Stone, N.; Lafuente, B.; Bristow, T.; Keller, R. M.; Downs, R. T.; Blake, D.; Fonda, M.; Dateo, C.; Pires, A.

2017-01-01

Introduction: For small communities in diverse fields such as astrobiology, publishing and sharing data can be a difficult challenge. While large, homogenous fields often have repositories and existing data standards, small groups of independent researchers have few options for publishing standards and data that can be utilized within their community. In conjunction with teams at NASA Ames and the University of Arizona, the Open Data Repository's (ODR) Data Publisher has been conducting ongoing pilots to assess the needs of diverse research groups and to develop software to allow them to publish and share their data collaboratively. Objectives: The ODR's Data Publisher aims to provide an easy-to-use and implement software tool that will allow researchers to create and publish database templates and related data. The end product will facilitate both human-readable interfaces (web-based with embedded images, files, and charts) and machine-readable interfaces utilizing semantic standards. Characteristics: The Data Publisher software runs on the standard LAMP (Linux, Apache, MySQL, PHP) stack to provide the widest server base available. The software is based on Symfony (www.symfony.com) which provides a robust framework for creating extensible, object-oriented software in PHP. The software interface consists of a template designer where individual or master database templates can be created. A master database template can be shared by many researchers to provide a common metadata standard that will set a compatibility standard for all derivative databases. Individual researchers can then extend their instance of the template with custom fields, file storage, or visualizations that may be unique to their studies. This allows groups to create compatible databases for data discovery and sharing purposes while still providing the flexibility needed to meet the needs of scientists in rapidly evolving areas of research. Research: As part of this effort, a number of ongoing pilot and test projects are currently in progress. The Astrobiology Habitable Environments Database Working Group is developing a shared database standard using the ODR's Data Publisher and has a number of example databases where astrobiology data are shared. Soon these databases will be integrated via the template-based standard. Work with this group helps determine what data researchers in these diverse fields need to share and archive. Additionally, this pilot helps determine what standards are viable for sharing these types of data from internally developed standards to existing open standards such as the Dublin Core (http://dublincore.org) and Darwin Core (http://rs.twdg.org) metadata standards. Further studies are ongoing with the University of Arizona Department of Geosciences where a number of mineralogy databases are being constructed within the ODR Data Publisher system. Conclusions: Through the ongoing pilots and discussions with individual researchers and small research teams, a definition of the tools desired by these groups is coming into focus. As the software development moves forward, the goal is to meet the publication and collaboration needs of these scientists in an unobtrusive and functional way.
Size reduction techniques for vital compliant VHDL simulation models

DOEpatents

Rich, Marvin J.; Misra, Ashutosh

2006-08-01

A method and system select delay values from a VHDL standard delay file that correspond to an instance of a logic gate in a logic model. Then the system collects all the delay values of the selected instance and builds super generics for the rise-time and the fall-time of the selected instance. Then, the system repeats this process for every delay value in the standard delay file (310) that correspond to every instance of every logic gate in the logic model. The system then outputs a reduced size standard delay file (314) containing the super generics for every instance of every logic gate in the logic model.
CHISSL: A Human-Machine Collaboration Space for Unsupervised Learning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arendt, Dustin L.; Komurlu, Caner; Blaha, Leslie M.

We developed CHISSL, a human-machine interface that utilizes supervised machine learning in an unsupervised context to help the user group unlabeled instances by her own mental model. The user primarily interacts via correction (moving a misplaced instance into its correct group) or confirmation (accepting that an instance is placed in its correct group). Concurrent with the user's interactions, CHISSL trains a classification model guided by the user's grouping of the data. It then predicts the group of unlabeled instances and arranges some of these alongside the instances manually organized by the user. We hypothesize that this mode of human andmore » machine collaboration is more effective than Active Learning, wherein the machine decides for itself which instances should be labeled by the user. We found supporting evidence for this hypothesis in a pilot study where we applied CHISSL to organize a collection of handwritten digits.« less
Calyx{trademark} EA implementation at AECB

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1997-12-31

This report describes a project to examine the applicability of a knowledge-based decision support software for environmental assessment (Calyx) to assist the Atomic Energy Control Board in environmental screenings, assessment, management, and database searches. The report begins with background on the Calyx software and then reviews activities with regard to modification of the Calyx knowledge base for application to the nuclear sector. This is followed by lists of standard activities handled by the software and activities specific to the Board; the hierarchy of environmental components developed for the Board; details of impact rules that describe the conditions under which environmentalmore » impacts will occur (the bulk of the report); information on mitigation and monitoring rules and on instance data; and considerations for future work on implementing Calyx at the Board. Appendices include an introduction to expert systems and an overview of the Calyx knowledge base structure.« less
Therapeutic self-disclosure in integrative psychotherapy: When is this a clinical error?

PubMed

Ziv-Beiman, Sharon; Shahar, Golan

2016-09-01

Ascending to prominence in virtually all forms of psychotherapy, therapist self-disclosure (TSD) has recently been identified as a primarily integrative intervention (Ziv-Beiman, 2013). In the present article, we discuss various instances in which using TSD in integrative psychotherapy might constitute a clinical error. First, we briefly review extant theory and empirical research on TSD, followed by our preferred version of integrative psychotherapy (i.e., a version of Wachtel's Cyclical Psychodynamics [Wachtel, 1977, 1997, 2014]), which we title cognitive existential psychodynamics. Next, we provide and discuss three examples in which implementing TSD constitutes a clinical error. In essence, we submit that using TSD constitutes an error when patients, constrained by their representational structures (object relations), experience the subjectivity of the other as impinging, and thus propels them to "react" instead of "emerge." (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Collectivism and the meaning of suffering.

PubMed

Sullivan, Daniel; Landau, Mark J; Kay, Aaron C; Rothschild, Zachary K

2012-12-01

People need to understand why an instance of suffering occurred and what purpose it might have. One widespread account of suffering is a repressive suffering construal (RSC): interpreting suffering as occurring because people deviate from social norms and as having the purpose of reinforcing the social order. Based on the theorizing of Emile Durkheim and others, we propose that RSC is associated with social morality-the belief that society dictates morality-and is encouraged by collectivist (as opposed to individualist) sentiments. Study 1 showed that dispositional collectivism predicts both social morality and RSC. Studies 2-4 showed that priming collectivist (vs. individualist) self-construal increases RSC of various types of suffering and that this effect is mediated by increased social morality (Study 4). Study 5 examined behavioral intentions, demonstrating that parents primed with a collectivist self-construal interpreted children's suffering more repressively and showed greater support for corporal punishment of children. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
The dorsal anterior cingulate cortex is selective for pain: Results from large-scale reverse inference

PubMed Central

Lieberman, Matthew D.; Eisenberger, Naomi I.

2015-01-01

Dorsal anterior cingulate cortex (dACC) activation is commonly observed in studies of pain, executive control, conflict monitoring, and salience processing, making it difficult to interpret the dACC’s specific psychological function. Using Neurosynth, an automated brainmapping database [of over 10,000 functional MRI (fMRI) studies], we performed quantitative reverse inference analyses to explore the best general psychological account of the dACC function P(Ψ process|dACC activity). Results clearly indicated that the best psychological description of dACC function was related to pain processing—not executive, conflict, or salience processing. We conclude by considering that physical pain may be an instance of a broader class of survival-relevant goals monitored by the dACC, in contrast to more arbitrary temporary goals, which may be monitored by the supplementary motor area. PMID:26582792
Cryptic diversity in Australian stick insects (Insecta; Phasmida) uncovered by the DNA barcoding approach.

PubMed

Velonà, A; Brock, P D; Hasenpusch, J; Mantovani, B

2015-05-18

The barcoding approach was applied to analyze 16 Australian morphospecies of the order Phasmida, with the aim to test if it could be suitable as a tool for phasmid species identification and if its discrimination power would allow uncovering of cryptic diversity. Both goals were reached. Eighty-two specimens representing twelve morphospecies (Sipyloidea sp. A, Candovia annulata, Candovia sp. A, Candovia sp. B, Candovia sp. C, Denhama austrocarinata, Xeroderus kirbii, Parapodacanthus hasenpuschorum, Tropidoderus childrenii, Cigarrophasma tessellatum, Acrophylla wuelfingi, Eurycantha calcarata) were correctly recovered as clades through the molecular approach, their sequences forming monophyletic and well-supported clusters. In four instances, Neighbor-Joining tree and barcoding gap analyses supported either a specific (Austrocarausius mercurius, Anchiale briareus) or a subspecific (Anchiale austrotessulata, Extatosoma tiaratum) level of divergence within the analyzed morphospecies. The lack of an appropriate database of homologous coxI sequences prevented more detailed identification of undescribed taxa.
MISFITS: evaluating the goodness of fit between a phylogenetic model and an alignment.

PubMed

Nguyen, Minh Anh Thi; Klaere, Steffen; von Haeseler, Arndt

2011-01-01

As models of sequence evolution become more and more complicated, many criteria for model selection have been proposed, and tools are available to select the best model for an alignment under a particular criterion. However, in many instances the selected model fails to explain the data adequately as reflected by large deviations between observed pattern frequencies and the corresponding expectation. We present MISFITS, an approach to evaluate the goodness of fit (http://www.cibiv.at/software/misfits). MISFITS introduces a minimum number of "extra substitutions" on the inferred tree to provide a biologically motivated explanation why the alignment may deviate from expectation. These extra substitutions plus the evolutionary model then fully explain the alignment. We illustrate the method on several examples and then give a survey about the goodness of fit of the selected models to the alignments in the PANDIT database.
Parent, family, and child characteristics: associations with mother- and father-reported emotion socialization practices.

PubMed

Wong, Maria S; McElwain, Nancy L; Halberstadt, Amy G

2009-08-01

The present research examined parental beliefs about children's negative emotions, parent-reported marital conflict/ambivalence, and child negative emotionality and gender as predictors of mothers' and fathers' reported reactions to their kindergarten children's negative emotions and self-expressiveness in the family (N = 55, two-parent families). Models predicting parents' nonsupportive reactions and negative expressiveness were significant. For both mothers and fathers, more accepting beliefs about children's negative emotions were associated with fewer nonsupportive reactions, and greater marital conflict/ambivalence was associated with more negative expressiveness. Furthermore, interactions between child negative emotionality and parental resources (e.g., marital conflict/ambivalence; accepting beliefs) emerged for fathers' nonsupportive reactions and mothers' negative expressiveness. In some instances, child gender acted as a moderator such that associations between parental beliefs about emotions and the emotion socialization outcomes emerged when child and parent gender were concordant. (PsycINFO Database Record (c) 2009 APA, all rights reserved).
Redrawing the map of Great Britain from a network of human interactions.

PubMed

Ratti, Carlo; Sobolevsky, Stanislav; Calabrese, Francesco; Andris, Clio; Reades, Jonathan; Martino, Mauro; Claxton, Rob; Strogatz, Steven H

2010-12-08

Do regional boundaries defined by governments respect the more natural ways that people interact across space? This paper proposes a novel, fine-grained approach to regional delineation, based on analyzing networks of billions of individual human transactions. Given a geographical area and some measure of the strength of links between its inhabitants, we show how to partition the area into smaller, non-overlapping regions while minimizing the disruption to each person's links. We tested our method on the largest non-Internet human network, inferred from a large telecommunications database in Great Britain. Our partitioning algorithm yields geographically cohesive regions that correspond remarkably well with administrative regions, while unveiling unexpected spatial structures that had previously only been hypothesized in the literature. We also quantify the effects of partitioning, showing for instance that the effects of a possible secession of Wales from Great Britain would be twice as disruptive for the human network than that of Scotland.
Rational assignment of key motifs for function guides in silico enzyme identification.

PubMed

Höhne, Matthias; Schätzle, Sebastian; Jochens, Helge; Robins, Karen; Bornscheuer, Uwe T

2010-11-01

Biocatalysis has emerged as a powerful alternative to traditional chemistry, especially for asymmetric synthesis. One key requirement during process development is the discovery of a biocatalyst with an appropriate enantiopreference and enantioselectivity, which can be achieved, for instance, by protein engineering or screening of metagenome libraries. We have developed an in silico strategy for a sequence-based prediction of substrate specificity and enantiopreference. First, we used rational protein design to predict key amino acid substitutions that indicate the desired activity. Then, we searched protein databases for proteins already carrying these mutations instead of constructing the corresponding mutants in the laboratory. This methodology exploits the fact that naturally evolved proteins have undergone selection over millions of years, which has resulted in highly optimized catalysts. Using this in silico approach, we have discovered 17 (R)-selective amine transaminases, which catalyzed the synthesis of several (R)-amines with excellent optical purity up to >99% enantiomeric excess.
Archaeal Viruses: Diversity, Replication, and Structure.

PubMed

Dellas, Nikki; Snyder, Jamie C; Bolduc, Benjamin; Young, Mark J

2014-11-01

The Archaea-and their viruses-remain the most enigmatic of life's three domains. Once thought to inhabit only extreme environments, archaea are now known to inhabit diverse environments. Even though the first archaeal virus was described over 40 years ago, only 117 archaeal viruses have been discovered to date. Despite this small number, these viruses have painted a portrait of enormous morphological and genetic diversity. For example, research centered around the various steps of the archaeal virus life cycle has led to the discovery of unique mechanisms employed by archaeal viruses during replication, maturation, and virion release. In many instances, archaeal virus proteins display very low levels of sequence homology to other proteins listed in the public database, and therefore, structural characterization of these proteins has played an integral role in functional assignment. These structural studies have not only provided insights into structure-function relationships but have also identified links between viruses across all three domains of life.
The Chandra Source Catalog 2.0: Building The Catalog

NASA Astrophysics Data System (ADS)

Grier, John D.; Plummer, David A.; Allen, Christopher E.; Anderson, Craig S.; Budynkiewicz, Jamie A.; Burke, Douglas; Chen, Judy C.; Civano, Francesca Maria; D'Abrusco, Raffaele; Doe, Stephen M.; Evans, Ian N.; Evans, Janet D.; Fabbiano, Giuseppina; Gibbs, Danny G., II; Glotfelty, Kenny J.; Graessle, Dale E.; Hain, Roger; Hall, Diane M.; Harbo, Peter N.; Houck, John C.; Lauer, Jennifer L.; Laurino, Omar; Lee, Nicholas P.; Martínez-Galarza, Juan Rafael; McCollough, Michael L.; McDowell, Jonathan C.; Miller, Joseph; McLaughlin, Warren; Morgan, Douglas L.; Mossman, Amy E.; Nguyen, Dan T.; Nichols, Joy S.; Nowak, Michael A.; Paxson, Charles; Primini, Francis Anthony; Rots, Arnold H.; Siemiginowska, Aneta; Sundheim, Beth A.; Tibbetts, Michael; Van Stone, David W.; Zografou, Panagoula

2018-01-01

To build release 2.0 of the Chandra Source Catalog (CSC2), we require scientific software tools and processing pipelines to evaluate and analyze the data. Additionally, software and hardware infrastructure is needed to coordinate and distribute pipeline execution, manage data i/o, and handle data for Quality Assurance (QA) intervention. We also provide data product staging for archive ingestion.Release 2 utilizes a database driven system used for integration and production. Included are four distinct instances of the Automatic Processing (AP) system (Source Detection, Master Match, Source Properties and Convex Hulls) and a high performance computing (HPC) cluster that is managed to provide efficient catalog processing. In this poster we highlight the internal systems developed to meet the CSC2 challenge.This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center.
Automatic Detection of Frontal Face Midline by Chain-coded Merlin-Farber Hough Trasform

NASA Astrophysics Data System (ADS)

Okamoto, Daichi; Ohyama, Wataru; Wakabayashi, Tetsushi; Kimura, Fumitaka

We propose a novel approach for detection of the facial midline (facial symmetry axis) from a frontal face image. The facial midline has several applications, for instance reducing computational cost required for facial feature extraction (FFE) and postoperative assessment for cosmetic or dental surgery. The proposed method detects the facial midline of a frontal face from an edge image as the symmetry axis using the Merlin-Faber Hough transformation. And a new performance improvement scheme for midline detection by MFHT is present. The main concept of the proposed scheme is suppression of redundant vote on the Hough parameter space by introducing chain code representation for the binary edge image. Experimental results on the image dataset containing 2409 images from FERET database indicate that the proposed algorithm can improve the accuracy of midline detection from 89.9% to 95.1 % for face images with different scales and rotation.
PDBFlex: exploring flexibility in protein structures

PubMed Central

Hrabe, Thomas; Li, Zhanwen; Sedova, Mayya; Rotkiewicz, Piotr; Jaroszewski, Lukasz; Godzik, Adam

2016-01-01

The PDBFlex database, available freely and with no login requirements at http://pdbflex.org, provides information on flexibility of protein structures as revealed by the analysis of variations between depositions of different structural models of the same protein in the Protein Data Bank (PDB). PDBFlex collects information on all instances of such depositions, identifying them by a 95% sequence identity threshold, performs analysis of their structural differences and clusters them according to their structural similarities for easy analysis. The PDBFlex contains tools and viewers enabling in-depth examination of structural variability including: 2D-scaling visualization of RMSD distances between structures of the same protein, graphs of average local RMSD in the aligned structures of protein chains, graphical presentation of differences in secondary structure and observed structural disorder (unresolved residues), difference distance maps between all sets of coordinates and 3D views of individual structures and simulated transitions between different conformations, the latter displayed using JSMol visualization software. PMID:26615193
It's a sentence, not a word: insights from a keyword analysis in cancer communication.

PubMed

Taylor, Kimberly; Thorne, Sally; Oliffe, John L

2015-01-01

Keyword analysis has been championed as a methodological option for expanding the insights that can be extracted from qualitative datasets using various properties available in qualitative software. Intrigued by the pioneering applications of Clive Seale and his colleagues in this regard, we conducted keyword analyses for word frequency and "keyness" on a qualitative database of interview transcripts from a study on cancer communication. We then subjected the results from these operations to an in-depth contextual inquiry by resituating word instances within their original speech contexts, finding that most of what had initially appeared as group variations broke down under close analysis. In this article, we illustrate the various threads of analysis, and explain how they unraveled under closer scrutiny. On the basis of this tentative exercise, we conclude that a healthy skepticism for the benefits of keyword analysis within a qualitative investigative process seems warranted. © The Author(s) 2014.
KUTE-BASE: storing, downloading and exporting MIAME-compliant microarray experiments in minutes rather than hours.

PubMed

Draghici, Sorin; Tarca, Adi L; Yu, Longfei; Ethier, Stephen; Romero, Roberto

2008-03-01

The BioArray Software Environment (BASE) is a very popular MIAME-compliant, web-based microarray data repository. However in BASE, like in most other microarray data repositories, the experiment annotation and raw data uploading can be very timeconsuming, especially for large microarray experiments. We developed KUTE (Karmanos Universal daTabase for microarray Experiments), as a plug-in for BASE 2.0 that addresses these issues. KUTE provides an automatic experiment annotation feature and a completely redesigned data work-flow that dramatically reduce the human-computer interaction time. For instance, in BASE 2.0 a typical Affymetrix experiment involving 100 arrays required 4 h 30 min of user interaction time forexperiment annotation, and 45 min for data upload/download. In contrast, for the same experiment, KUTE required only 28 min of user interaction time for experiment annotation, and 3.3 min for data upload/download. http://vortex.cs.wayne.edu/kute/index.html.
Ordering actions for visibility. [distributed computing based on idea of atomic actions operating on data

NASA Technical Reports Server (NTRS)

Mckendry, M. S.

1985-01-01

The notion of 'atomic actions' has been considered in recent work on data integrity and reliability. It has been found that the standard database operations of 'read' and 'write' carry with them severe performance limitations. For this reason, systems are now being designed in which actions operate on 'objects' through operations with more-or-less arbitrary semantics. An object (i.e., an instance of an abstract data type) comprises data, a set of operations (procedures) to manipulate the data, and a set of invariants. An 'action' is a unit of work. It appears to be primitive to its surrounding environment, and 'atomic' to other actions. Attention is given to the conventional model of nested actions, ordering requirements, the maximum possible visibility (full visibility) for items which must be controlled by ordering constraints, item management paradigms, and requirements for blocking mechanisms which provide the required visibility.

Efficient privacy-preserving string search and an application in genomics.

PubMed

Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar

2016-06-01

Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. We propose a novel approach that combines efficient string data structures such as the Burrows-Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows-Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within [Formula: see text] 4.6 s and [Formula: see text] 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Efficient privacy-preserving string search and an application in genomics

PubMed Central

Shimizu, Kana; Nuida, Koji; Rätsch, Gunnar

2016-01-01

Motivation: Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database. Approach: We propose a novel approach that combines efficient string data structures such as the Burrows–Wheeler transform with cryptographic techniques based on additive homomorphic encryption. We assume that the sequence data is searchable in efficient iterative query operations over a large indexed dictionary, for instance, from large genome collections and employing the (positional) Burrows–Wheeler transform. We use a technique called oblivious transfer that is based on additive homomorphic encryption to conceal the sequence query and the genomic region of interest in positional queries. Results: We designed and implemented an efficient algorithm for searching sequences of SNPs in large genome databases. During search, the user can only identify the longest match while the server does not learn which sequence of SNPs the user queried. In an experiment based on 2184 aligned haploid genomes from the 1000 Genomes Project, our algorithm was able to perform typical queries within ≈ 4.6 s and ≈ 10.8 s for client and server side, respectively, on laptop computers. The presented algorithm is at least one order of magnitude faster than an exhaustive baseline algorithm. Availability and implementation: https://github.com/iskana/PBWT-sec and https://github.com/ratschlab/PBWT-sec. Contacts: shimizu-kana@aist.go.jp or Gunnar.Ratsch@ratschlab.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153731
Characteristics of Retractions from Korean Medical Journals in the KoreaMed Database: A Bibliometric Analysis

PubMed Central

Cho, Hye-Min

2016-01-01

Background Flawed or misleading articles may be retracted because of either honest scientific errors or scientific misconduct. This study explored the characteristics of retractions in medical journals published in Korea through the KoreaMed database. Methods We retrieved retraction articles indexed in the KoreaMed database from January 1990 to January 2016. Three authors each reviewed the details of the retractions including the reason for retraction, adherence to retraction guidelines, and appropriateness of retraction. Points of disagreement were reconciled by discussion among the three. Results Out of 217,839 articles in KoreaMed published from 1990 to January 2016, the publication type of 111 articles was retraction (0.051%). Of the 111 articles (addressing the retraction of 114 papers), 58.8% were issued by the authors, 17.5% were jointly issued (author, editor, and publisher), 15.8% came from editors, and 4.4% were dispatched by institutions; in 5.3% of the instances, the issuer was unstated. The reasons for retraction included duplicate publication (57.0%), plagiarism (8.8%), scientific error (4.4%), author dispute (3.5%), and other (5.3%); the reasons were unstated or unclear in 20.2%. The degree of adherence to COPE’s retraction guidelines varied (79.8%–100%), and some retractions were inappropriate by COPE standards. These were categorized as follows: retraction of the first published article in the case of duplicate publication (69.2%), authorship dispute (15.4%), errata (7.7%), and other (7.7%). Conclusion The major reason for retraction in Korean medical journals is duplicate publication. Some retractions resulted from overreaction by the editors. Therefore, editors of Korean medical journals should take careful note of the COPE retraction guidelines and should undergo training on appropriate retraction practices. PMID:27706245
A review of the volatiles from the healthy human body.

PubMed

de Lacy Costello, B; Amann, A; Al-Kateb, H; Flynn, C; Filipiak, W; Khalid, T; Osborne, D; Ratcliffe, N M

2014-03-01

A compendium of all the volatile organic compounds (VOCs) emanating from the human body (the volatolome) is for the first time reported. 1840 VOCs have been assigned from breath (872), saliva (359), blood (154), milk (256), skin secretions (532) urine (279), and faeces (381) in apparently healthy individuals. Compounds were assigned CAS registry numbers and named according to a common convention where possible. The compounds have been grouped into tables according to their chemical class or functionality to permit easy comparison. Some clear differences are observed, for instance, a lack of esters in urine with a high number in faeces. Careful use of the database is needed. The numbers may not be a true reflection of the actual VOCs present from each bodily excretion. The lack of a compound could be due to the techniques used or reflect the intensity of effort e.g. there are few publications on VOCs from blood compared to a large number on VOCs in breath. The large number of volatiles reported from skin is partly due to the methodologies used, e.g. collecting excretions on glass beads and then heating to desorb VOCs. All compounds have been included as reported (unless there was a clear discrepancy between name and chemical structure), but there may be some mistaken assignations arising from the original publications, particularly for isomers. It is the authors' intention that this database will not only be a useful database of VOCs listed in the literature, but will stimulate further study of VOCs from healthy individuals. Establishing a list of volatiles emanating from healthy individuals and increased understanding of VOC metabolic pathways is an important step for differentiating between diseases using VOCs.
Short tandem repeat profiling: part of an overall strategy for reducing the frequency of cell misidentification.

PubMed

Nims, Raymond W; Sykes, Greg; Cottrill, Karin; Ikonomi, Pranvera; Elmore, Eugene

2010-12-01

The role of cell authentication in biomedical science has received considerable attention, especially within the past decade. This quality control attribute is now beginning to be given the emphasis it deserves by granting agencies and by scientific journals. Short tandem repeat (STR) profiling, one of a few DNA profiling technologies now available, is being proposed for routine identification (authentication) of human cell lines, stem cells, and tissues. The advantage of this technique over methods such as isoenzyme analysis, karyotyping, human leukocyte antigen typing, etc., is that STR profiling can establish identity to the individual level, provided that the appropriate number and types of loci are evaluated. To best employ this technology, a standardized protocol and a data-driven, quality-controlled, and publically searchable database will be necessary. This public STR database (currently under development) will enable investigators to rapidly authenticate human-based cultures to the individual from whom the cells were sourced. Use of similar approaches for non-human animal cells will require developing other suitable loci sets. While implementing STR analysis on a more routine basis should significantly reduce the frequency of cell misidentification, additional technologies may be needed as part of an overall authentication paradigm. For instance, isoenzyme analysis, PCR-based DNA amplification, and sequence-based barcoding methods enable rapid confirmation of a cell line's species of origin while screening against cross-contaminations, especially when the cells present are not recognized by the species-specific STR method. Karyotyping may also be needed as a supporting tool during establishment of an STR database. Finally, good cell culture practices must always remain a major component of any effort to reduce the frequency of cell misidentification.
Filipino DNA variation at 12 X-chromosome short tandem repeat markers.

PubMed

Salvador, Jazelyn M; Apaga, Dame Loveliness T; Delfin, Frederick C; Calacal, Gayvelline C; Dennis, Sheila Estacio; De Ungria, Maria Corazon A

2018-06-08

Demands for solving complex kinship scenarios where only distant relatives are available for testing have risen in the past years. In these instances, other genetic markers such as X-chromosome short tandem repeat (X-STR) markers are employed to supplement autosomal and Y-chromosomal STR DNA typing. However, prior to use, the degree of STR polymorphism in the population requires evaluation through generation of an allele or haplotype frequency population database. This population database is also used for statistical evaluation of DNA typing results. Here, we report X-STR data from 143 unrelated Filipino male individuals who were genotyped via conventional polymerase chain reaction-capillary electrophoresis (PCR-CE) using the 12 X-STR loci included in the Investigator ® Argus X-12 kit (Qiagen) and via massively parallel sequencing (MPS) of seven X-STR loci included in the ForenSeq ™ DNA Signature Prep kit of the MiSeq ® FGx ™ Forensic Genomics System (Illumina). Allele calls between PCR-CE and MPS systems were consistent (100% concordance) across seven overlapping X-STRs. Allele and haplotype frequencies and other parameters of forensic interest were calculated based on length (PCR-CE, 12 X-STRs) and sequence (MPS, seven X-STRs) variations observed in the population. Results of our study indicate that the 12 X-STRs in the PCR-CE system are highly informative for the Filipino population. MPS of seven X-STR loci identified 73 X-STR alleles compared with 55 X-STR alleles that were identified solely by length via PCR-CE. Of the 73 sequence-based alleles observed, six alleles have not been reported in the literature. The population data presented here may serve as a reference Philippine frequency database of X-STRs for forensic casework applications. Copyright © 2018 Elsevier B.V. All rights reserved.
Automated classification of RNA 3D motifs and the RNA 3D Motif Atlas

PubMed Central

Petrov, Anton I.; Zirbel, Craig L.; Leontis, Neocles B.

2013-01-01

The analysis of atomic-resolution RNA three-dimensional (3D) structures reveals that many internal and hairpin loops are modular, recurrent, and structured by conserved non-Watson–Crick base pairs. Structurally similar loops define RNA 3D motifs that are conserved in homologous RNA molecules, but can also occur at nonhomologous sites in diverse RNAs, and which often vary in sequence. To further our understanding of RNA motif structure and sequence variability and to provide a useful resource for structure modeling and prediction, we present a new method for automated classification of internal and hairpin loop RNA 3D motifs and a new online database called the RNA 3D Motif Atlas. To classify the motif instances, a representative set of internal and hairpin loops is automatically extracted from a nonredundant list of RNA-containing PDB files. Their structures are compared geometrically, all-against-all, using the FR3D program suite. The loops are clustered into motif groups, taking into account geometric similarity and structural annotations and making allowance for a variable number of bulged bases. The automated procedure that we have implemented identifies all hairpin and internal loop motifs previously described in the literature. All motif instances and motif groups are assigned unique and stable identifiers and are made available in the RNA 3D Motif Atlas (http://rna.bgsu.edu/motifs), which is automatically updated every four weeks. The RNA 3D Motif Atlas provides an interactive user interface for exploring motif diversity and tools for programmatic data access. PMID:23970545
Beyond Essentialism: Cultural Differences in Emotions Revisited.

PubMed

Boiger, Michael; Ceulemans, Eva; De Leersnyder, Jozefien; Uchida, Yukiko; Norasakkunkit, Vinai; Mesquita, Batja

2018-02-01

The current research offers an alternative to essentialism for studying cultural variation in emotional experience. Rather than assuming that individuals always experience an emotion in the same way, our starting point was that the experience of an emotion like anger or shame may vary from one instance to another. We expected to find different anger and shame experience types, that is, groups of people who differ in the instances of anger and shame that they experience. We proposed that studying cultural differences in emotional experience means studying differences in the distribution of these types across cultural contexts: There should be systematic differences in the types that are most common in each culture. Students from the United States, Japan, and Belgium (N = 928) indicated their emotional experiences in terms of appraisals and action tendencies in response to 15 hypothetical anger or shame situations. Using an inductive clustering approach, we identified anger and shame types who were characterized by different patterns of anger and shame experience. As expected, we found that the distribution of these types differed across the three cultural contexts: Of the two anger types, one was common in Japan and one in the United States and Belgium; the three shame types were each most prevalent in a different cultural context. Participants' anger and shame types were primarily predicted by their culture of origin (with an accuracy of 72.3% for anger and 74.0% for shame) and not, or much less, by their ethnic origin, socioeconomic status, gender, self-construal, or personality. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
An adaptable architecture for patient cohort identification from diverse data sources

PubMed Central

Bache, Richard; Miles, Simon; Taweel, Adel

2013-01-01

Objective We define and validate an architecture for systems that identify patient cohorts for clinical trials from multiple heterogeneous data sources. This architecture has an explicit query model capable of supporting temporal reasoning and expressing eligibility criteria independently of the representation of the data used to evaluate them. Method The architecture has the key feature that queries defined according to the query model are both pre and post-processed and this is used to address both structural and semantic heterogeneity. The process of extracting the relevant clinical facts is separated from the process of reasoning about them. A specific instance of the query model is then defined and implemented. Results We show that the specific instance of the query model has wide applicability. We then describe how it is used to access three diverse data warehouses to determine patient counts. Discussion Although the proposed architecture requires greater effort to implement the query model than would be the case for using just SQL and accessing a data-based management system directly, this effort is justified because it supports both temporal reasoning and heterogeneous data sources. The query model only needs to be implemented once no matter how many data sources are accessed. Each additional source requires only the implementation of a lightweight adaptor. Conclusions The architecture has been used to implement a specific query model that can express complex eligibility criteria and access three diverse data warehouses thus demonstrating the feasibility of this approach in dealing with temporal reasoning and data heterogeneity. PMID:24064442
Uninformative contexts support word learning for high-skill spellers.

PubMed

Eskenazi, Michael A; Swischuk, Natascha K; Folk, Jocelyn R; Abraham, Ashley N

2018-04-30

The current study investigated how high-skill spellers and low-skill spellers incidentally learn words during reading. The purpose of the study was to determine whether readers can use uninformative contexts to support word learning after forming a lexical representation for a novel word, consistent with instance-based resonance processes. Previous research has found that uninformative contexts damage word learning; however, there may have been insufficient exposure to informative contexts (only one) prior to exposure to uninformative contexts (Webb, 2007; Webb, 2008). In Experiment 1, participants read sentences with one novel word (i.e., blaph, clurge) embedded in them in three different conditions: Informative (six informative contexts to support word learning), Mixed (three informative contexts followed by three uninformative contexts), and Uninformative (six uninformative contexts). Experiment 2 added a new condition with only three informative contexts to further clarify the conclusions of Experiment 1. Results indicated that uninformative contexts can support word learning, but only for high-skill spellers. Further, when participants learned the spelling of the novel word, they were more likely to learn the meaning of that word. This effect was much larger for high-skill spellers than for low-skill spellers. Results are consistent with the Lexical Quality Hypothesis (LQH) in that high-skill spellers form stronger orthographic representations which support word learning (Perfetti, 2007). Results also support an instance-based resonance process of word learning in that prior informative contexts can be reactivated to support word learning in future contexts (Bolger, Balass, Landen, & Perfetti, 2008; Balass, Nelson, & Perfetti, 2010; Reichle & Perfetti, 2003). (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Federated Giovanni: A Distributed Web Service for Analysis and Visualization of Remote Sensing Data

NASA Technical Reports Server (NTRS)

Lynnes, Chris

2014-01-01

The Geospatial Interactive Online Visualization and Analysis Interface (Giovanni) is a popular tool for users of the Goddard Earth Sciences Data and Information Services Center (GES DISC) and has been in use for over a decade. It provides a wide variety of algorithms and visualizations to explore large remote sensing datasets without having to download the data and without having to write readers and visualizers for it. Giovanni is now being extended to enable its capabilities at other data centers within the Earth Observing System Data and Information System (EOSDIS). This Federated Giovanni will allow four other data centers to add and maintain their data within Giovanni on behalf of their user community. Those data centers are the Physical Oceanography Distributed Active Archive Center (PO.DAAC), MODIS Adaptive Processing System (MODAPS), Ocean Biology Processing Group (OBPG), and Land Processes Distributed Active Archive Center (LP DAAC). Three tiers are supported: Tier 1 (GES DISC-hosted) gives the remote data center a data management interface to add and maintain data, which are provided through the Giovanni instance at the GES DISC. Tier 2 packages Giovanni up as a virtual machine for distribution to and deployment by the other data centers. Data variables are shared among data centers by sharing documents from the Solr database that underpins Giovanni's data management capabilities. However, each data center maintains their own instance of Giovanni, exposing the variables of most interest to their user community. Tier 3 is a Shared Source model, in which the data centers cooperate to extend the infrastructure by contributing source code.
Failed magmatic eruptions: Late-stage cessation of magma ascent

USGS Publications Warehouse

Moran, S.C.; Newhall, C.; Roman, D.C.

2011-01-01

When a volcano becomes restless, a primary question is whether the unrest will lead to an eruption. Here we recognize four possible outcomes of a magmatic intrusion: "deep intrusion", "shallow intrusion", "sluggish/viscous magmatic eruption", and "rapid, often explosive magmatic eruption". We define "failed eruptions" as instances in which magma reaches but does not pass the "shallow intrusion" stage, i. e., when magma gets close to, but does not reach, the surface. Competing factors act to promote or hinder the eventual eruption of a magma intrusion. Fresh intrusion from depth, high magma gas content, rapid ascent rates that leave little time for enroute degassing, opening of pathways, and sudden decompression near the surface all act to promote eruption, whereas decreased magma supply from depth, slow ascent, significant enroute degassing and associated increases in viscosity, and impingement on structural barriers all act to hinder eruption. All of these factors interact in complex ways with variable results, but often cause magma to stall at some depth before reaching the surface. Although certain precursory phenomena, such as rapidly escalating seismic swarms or rates of degassing or deformation, are good indicators that an eruption is likely, such phenomena have also been observed in association with intrusions that have ultimately failed to erupt. A perpetual difficulty with quantifying the probability of eruption is a lack of data, particularly on instances of failed eruptions. This difficulty is being addressed in part through the WOVOdat database. Papers in this volume will be an additional resource for scientists grappling with the issue of whether or not an episode of unrest will lead to a magmatic eruption.
Population Density Modeling for Diverse Land Use Classes: Creating a National Dasymetric Worker Population Model

NASA Astrophysics Data System (ADS)

Trombley, N.; Weber, E.; Moehl, J.

2017-12-01

Many studies invoke dasymetric mapping to make more accurate depictions of population distribution by spatially restricting populations to inhabited/inhabitable portions of observational units (e.g., census blocks) and/or by varying population density among different land classes. LandScan USA uses this approach by restricting particular population components (such as residents or workers) to building area detected from remotely sensed imagery, but also goes a step further by classifying each cell of building area in accordance with ancillary land use information from national parcel data (CoreLogic, Inc.'s ParcelPoint database). Modeling population density according to land use is critical. For instance, office buildings would have a higher density of workers than warehouses even though the latter would likely have more cells of detection. This paper presents a modeling approach by which different land uses are assigned different densities to more accurately distribute populations within them. For parts of the country where the parcel data is insufficient, an alternate methodology is developed that uses National Land Cover Database (NLCD) data to define the land use type of building detection. Furthermore, LiDAR data is incorporated for many of the largest cities across the US, allowing the independent variables to be updated from two-dimensional building detection area to total building floor space. In the end, four different regression models are created to explain the effect of different land uses on worker distribution: A two-dimensional model using land use types from the parcel data A three-dimensional model using land use types from the parcel data A two-dimensional model using land use types from the NLCD data, and A three-dimensional model using land use types from the NLCD data. By and large, the resultant coefficients followed intuition, but importantly allow the relationships between different land uses to be quantified. For instance, in the model using two-dimensional building area, commercial building area had a density 2.5 times greater than public building area and 4 times greater than industrial building area. These coefficients can be applied to define the ratios at which population is distributed to building cells. Finally, possible avenues for refining the methodology are presented.
An Investigation of Automatic Change Detection for Topographic Map Updating

NASA Astrophysics Data System (ADS)

Duncan, P.; Smit, J.

2012-08-01

Changes to the landscape are constantly occurring and it is essential for geospatial and mapping organisations that these changes are regularly detected and captured, so that map databases can be updated to reflect the current status of the landscape. The Chief Directorate of National Geospatial Information (CD: NGI), South Africa's national mapping agency, currently relies on manual methods of detecting changes and capturing these changes. These manual methods are time consuming and labour intensive, and rely on the skills and interpretation of the operator. It is therefore necessary to move towards more automated methods in the production process at CD: NGI. The aim of this research is to do an investigation into a methodology for automatic or semi-automatic change detection for the purpose of updating topographic databases. The method investigated for detecting changes is through image classification as well as spatial analysis and is focussed on urban landscapes. The major data input into this study is high resolution aerial imagery and existing topographic vector data. Initial results indicate the traditional pixel-based image classification approaches are unsatisfactory for large scale land-use mapping and that object-orientated approaches hold more promise. Even in the instance of object-oriented image classification generalization of techniques on a broad-scale has provided inconsistent results. A solution may lie with a hybrid approach of pixel and object-oriented techniques.
Small-molecule inhibitors of hepatitis C virus (HCV) non-structural protein 5A (NS5A): a patent review (2010-2015).

PubMed

Ivanenkov, Yan A; Aladinskiy, Vladimir A; Bushkov, Nikolay A; Ayginin, Andrey A; Majouga, Alexander G; Ivachtchenko, Alexandre V

2017-04-01

Non-structural 5A (NS5A) protein has achieved a considerable attention as an attractive target for the treatment of hepatitis C (HCV). A number of novel NS5A inhibitors have been reported to date. Several drugs having favorable ADME properties and mild side effects were launched into the pharmaceutical market. For instance, daclatasvir was launched in 2014, elbasvir is currently undergoing registration, ledipasvir was launched in 2014 as a fixed-dose combination with sofosbuvir (NS5B inhibitor). Areas covered: Thomson integrity database and SciFinder database were used as a valuable source to collect the patents on small-molecule NS5A inhibitors. All the structures were ranked by the date of priority. Patent holder and antiviral activity for each scaffold claimed were summarized and presented in a convenient manner. A particular focus was placed on the best-in-class bis-pyrrolidine-containing NS5A inhibitors. Expert opinion: Several first generation NS5A inhibitors have recently progressed into advanced clinical trials and showed superior efficacy in reducing viral load in infected subjects. Therapy schemes of using these agents in combination with other established antiviral drugs with complementary mechanisms of action can address the emergence of resistance and poor therapeutic outcome frequently attributed to antiviral drugs.
Arabic handwritten: pre-processing and segmentation

NASA Astrophysics Data System (ADS)

Maliki, Makki; Jassim, Sabah; Al-Jawad, Naseer; Sellahewa, Harin

2012-06-01

This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.
Object Recognition and Localization: The Role of Tactile Sensors

PubMed Central

Aggarwal, Achint; Kirchner, Frank

2014-01-01

Tactile sensors, because of their intrinsic insensitivity to lighting conditions and water turbidity, provide promising opportunities for augmenting the capabilities of vision sensors in applications involving object recognition and localization. This paper presents two approaches for haptic object recognition and localization for ground and underwater environments. The first approach called Batch Ransac and Iterative Closest Point augmented Particle Filter (BRICPPF) is based on an innovative combination of particle filters, Iterative-Closest-Point algorithm, and a feature-based Random Sampling and Consensus (RANSAC) algorithm for database matching. It can handle a large database of 3D-objects of complex shapes and performs a complete six-degree-of-freedom localization of static objects. The algorithms are validated by experimentation in ground and underwater environments using real hardware. To our knowledge this is the first instance of haptic object recognition and localization in underwater environments. The second approach is biologically inspired, and provides a close integration between exploration and recognition. An edge following exploration strategy is developed that receives feedback from the current state of recognition. A recognition by parts approach is developed which uses the BRICPPF for object sub-part recognition. Object exploration is either directed to explore a part until it is successfully recognized, or is directed towards new parts to endorse the current recognition belief. This approach is validated by simulation experiments. PMID:24553087
The Functional Human C-Terminome

PubMed Central

Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.

2016-01-01

All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421
sscMap: an extensible Java application for connecting small-molecule drugs using gene-expression signatures.

PubMed

Zhang, Shu-Dong; Gant, Timothy W

2009-07-31

Connectivity mapping is a process to recognize novel pharmacological and toxicological properties in small molecules by comparing their gene expression signatures with others in a database. A simple and robust method for connectivity mapping with increased specificity and sensitivity was recently developed, and its utility demonstrated using experimentally derived gene signatures. This paper introduces sscMap (statistically significant connections' map), a Java application designed to undertake connectivity mapping tasks using the recently published method. The software is bundled with a default collection of reference gene-expression profiles based on the publicly available dataset from the Broad Institute Connectivity Map 02, which includes data from over 7000 Affymetrix microarrays, for over 1000 small-molecule compounds, and 6100 treatment instances in 5 human cell lines. In addition, the application allows users to add their custom collections of reference profiles and is applicable to a wide range of other 'omics technologies. The utility of sscMap is two fold. First, it serves to make statistically significant connections between a user-supplied gene signature and the 6100 core reference profiles based on the Broad Institute expanded dataset. Second, it allows users to apply the same improved method to custom-built reference profiles which can be added to the database for future referencing. The software can be freely downloaded from http://purl.oclc.org/NET/sscMap.
Is Routine Pathologic Evaluation of Sebaceous Cysts Necessary?: A 15-Year Retrospective Review of a Single Institution.

PubMed

Gargya, Vipul; Lucas, Heather D; Wendel Spiczka, Amy J; Mahabir, Raman Chaos

2017-02-01

A question arose in our practice of whether all cysts considered sebaceous should be sent for pathologic evaluation. To address this controversial topic, we performed a retrospective study of our single institution's histopathology database. A natural language search of the CoPath database across the institution was undertaken using the diagnosis of sebaceous cyst, epidermal cyst, epidermoid cyst, epithelial cyst, infundibular cyst, pilar cyst, trichilemmal cyst, and steatocystoma. A surgical pathologic review of all specimens with one of these preexcision diagnoses was included in the 15-year retrospective study of 1998 to 2013. All slides were confirmed to have undergone histopathologic review, and the preexcision diagnoses were compared with the postexcision diagnoses. Chart review was undertaken in instances of a diagnosis of malignancy. A total of 13,746 samples were identified. Forty-eight specimens had histopathologic diagnosis of malignancy, for an incidence of 0.3% and with the most common malignancy being squamous cell carcinoma. Chart review showed that for all cases, the surgeons reported uncertainty with regard to the diagnosis because of history or physical characteristics, or both. In addition, a comprehensive literature review showed results consistent with our data and illustrated 19 cases during the past 10 years in which most of the findings were squamous cell carcinoma. We propose the recommendation that routine pathologic evaluation of sebaceous cysts is necessary only when clinical suspicion exists.

Dfam: a database of repetitive DNA based on profile hidden Markov models.

PubMed

Wheeler, Travis J; Clements, Jody; Eddy, Sean R; Hubley, Robert; Jones, Thomas A; Jurka, Jerzy; Smit, Arian F A; Finn, Robert D

2013-01-01

We present a database of repetitive DNA elements, called Dfam (http://dfam.janelia.org). Many genomes contain a large fraction of repetitive DNA, much of which is made up of remnants of transposable elements (TEs). Accurate annotation of TEs enables research into their biology and can shed light on the evolutionary processes that shape genomes. Identification and masking of TEs can also greatly simplify many downstream genome annotation and sequence analysis tasks. The commonly used TE annotation tools RepeatMasker and Censor depend on sequence homology search tools such as cross_match and BLAST variants, as well as Repbase, a collection of known TE families each represented by a single consensus sequence. Dfam contains entries corresponding to all Repbase TE entries for which instances have been found in the human genome. Each Dfam entry is represented by a profile hidden Markov model, built from alignments generated using RepeatMasker and Repbase. When used in conjunction with the hidden Markov model search tool nhmmer, Dfam produces a 2.9% increase in coverage over consensus sequence search methods on a large human benchmark, while maintaining low false discovery rates, and coverage of the full human genome is 54.5%. The website provides a collection of tools and data views to support improved TE curation and annotation efforts. Dfam is also available for download in flat file format or in the form of MySQL table dumps.
Securely and Flexibly Sharing a Biomedical Data Management System

PubMed Central

Wang, Fusheng; Hussels, Phillip; Liu, Peiya

2011-01-01

Biomedical database systems need not only to address the issues of managing complex data, but also to provide data security and access control to the system. These include not only system level security, but also instance level access control such as access of documents, schemas, or aggregation of information. The latter is becoming more important as multiple users can share a single scientific data management system to conduct their research, while data have to be protected before they are published or IP-protected. This problem is challenging as users’ needs for data security vary dramatically from one application to another, in terms of who to share with, what resources to be shared, and at what access level. We develop a comprehensive data access framework for a biomedical data management system SciPort. SciPort provides fine-grained multi-level space based access control of resources at not only object level (documents and schemas), but also space level (resources set aggregated in a hierarchy way). Furthermore, to simplify the management of users and privileges, customizable role-based user model is developed. The access control is implemented efficiently by integrating access privileges into the backend XML database, thus efficient queries are supported. The secure access approach we take makes it possible for multiple users to share the same biomedical data management system with flexible access management and high data security. PMID:21625285
Overview of EVE - the event visualization environment of ROOT

NASA Astrophysics Data System (ADS)

Tadel, Matevž

2010-04-01

EVE is a high-level visualization library using ROOT's data-processing, GUI and OpenGL interfaces. It is designed as a framework for object management offering hierarchical data organization, object interaction and visualization via GUI and OpenGL representations. Automatic creation of 2D projected views is also supported. On the other hand, it can serve as an event visualization toolkit satisfying most HEP requirements: visualization of geometry, simulated and reconstructed data such as hits, clusters, tracks and calorimeter information. Special classes are available for visualization of raw-data. Object-interaction layer allows for easy selection and highlighting of objects and their derived representations (projections) across several views (3D, Rho-Z, R-Phi). Object-specific tooltips are provided in both GUI and GL views. The visual-configuration layer of EVE is built around a data-base of template objects that can be applied to specific instances of visualization objects to ensure consistent object presentation. The data-base can be retrieved from a file, edited during the framework operation and stored to file. EVE prototype was developed within the ALICE collaboration and has been included into ROOT in December 2007. Since then all EVE components have reached maturity. EVE is used as the base of AliEve visualization framework in ALICE, Firework physics-oriented event-display in CMS, and as the visualization engine of FairRoot in FAIR.
True Dopers or Negligent Athletes? An Analysis of Anti-Doping Rule Violations Reported to the World Anti-Doping Agency 2010-2012.

PubMed

de Hon, Olivier; van Bottenburg, Maarten

2017-12-06

The sanction that an athlete receives when an anti-doping rule violation has been committed depends on the specific circumstances of the case. Anti-doping tribunals decide on the final sanction, following the rules of the World Anti-Doping Code. To assess the athletes' degree of fault based on the length of sanctions imposed on them to feed policy-related discussions. Analysing data from the results management database of the World Anti-Doping Agency for anonymous information of anti-doping rule violations in eight selected sports covering the years 2010-2012. Four out of ten athletes who committed an anti-doping rule violation received a suspension that was lower than the standard. This is an indication that tribunals in many instances are not convinced that the athletes concerned were completely at fault, that mitigating circumstances were applicable, or that full responsibility of the suspected violation should not be held against them. Anabolic agents, peptide hormones, and hormone modulators lead to higher sanctions, as do combinations of several anti-doping rule violations. This first analysis of information from the World Anti-Doping Agency's results management database indicates that a large proportion of the athletes who commit anti-doping rule violations may have done this unintentionally. Anti-doping professionals should strive to improve this situation in various ways.
Supervised orthogonal discriminant subspace projects learning for face recognition.

PubMed

Chen, Yu; Xu, Xiao-Hong

2014-02-01

In this paper, a new linear dimension reduction method called supervised orthogonal discriminant subspace projection (SODSP) is proposed, which addresses high-dimensionality of data and the small sample size problem. More specifically, given a set of data points in the ambient space, a novel weight matrix that describes the relationship between the data points is first built. And in order to model the manifold structure, the class information is incorporated into the weight matrix. Based on the novel weight matrix, the local scatter matrix as well as non-local scatter matrix is defined such that the neighborhood structure can be preserved. In order to enhance the recognition ability, we impose an orthogonal constraint into a graph-based maximum margin analysis, seeking to find a projection that maximizes the difference, rather than the ratio between the non-local scatter and the local scatter. In this way, SODSP naturally avoids the singularity problem. Further, we develop an efficient and stable algorithm for implementing SODSP, especially, on high-dimensional data set. Moreover, the theoretical analysis shows that LPP is a special instance of SODSP by imposing some constraints. Experiments on the ORL, Yale, Extended Yale face database B and FERET face database are performed to test and evaluate the proposed algorithm. The results demonstrate the effectiveness of SODSP. Copyright © 2013 Elsevier Ltd. All rights reserved.
Parsing GML data based on integrative GML syntactic and semantic schemas database

NASA Astrophysics Data System (ADS)

Miao, Lizhi; Zhang, Shuliang; Lu, Guonian; Gao, Xiaoli; Jiao, Donglai; Gan, Jiayan

2007-06-01

This paper proposes a new method to parse various application schemas of Geography Markup Language (GML) for understanding syntax and semantic of their element and type in order to implement uniform interpretation of the same GML instance data among diverse users. The proposed method generates an Integrative GML Syntactic and Semantic Schemas Database (IGSSSDB) from GML3.1 core schemas and corresponding application schema. This paper parses GML data based on IGSSSDB, which is composed of syntactic and semantic information, nesting information and mapping rules of GML core schemas and application schemas. Three kinds of relational tables are designed for storing information from schemas when constructing IGSSSDB. Those are info tables for schemas included and namespace imported in application schemas, tables for information related to schemas and catalog tables of core schemas. In relational tables, we propose to use homologous regular expression to describe model of elements and complex types in schemas, which can ensure model complete and readable. Based on IGSSSDB, we design and develop many APIs to implement GML data parsing, and can process syntactic and semantic information of GML data from diverse fields and users. At the latter part of this paper, test study is implemented to show that the proposed method is feasible and appropriate for parsing GML data. Also, it founds a good basis for future GML data studies such as storage, index and query etc.
Environmental drivers of sapwood and heartwood proportions

NASA Astrophysics Data System (ADS)

Thurner, Martin; Beer, Christian

2017-04-01

Recent advances combining information on stem volume from remote sensing with allometric relationships derived from forest inventory databases have led to spatially continuous estimates of stem, branch, root and foliage biomass in northern boreal and temperate forests. However, a separation of stem biomass into sapwood and heartwood mass has remained unsolved, despite their important differences in biogeochemical function, for instance concerning their contribution to tree respiratory costs. Although relationships between sapwood cross-sectional area and supported leaf area are well established, less is known about relations between sapwood or heartwood mass and other traits (e.g. stem mass), since these biomass compartments are more difficult to measure in practice. Here we investigate the variability in sapwood and heartwood proportions and determining environmental factors. For this task we explore an available biomass and allometry database (BAAD) and study relative sapwood and heartwood area, volume, mass and density in dependence of tree species, age and climate. First, a theoretical framework on how to estimate sap- and heartwood mass from stem mass is developed. Subsequently, the underlying assumptions and relationships are explored with the help of the BAAD. The established relationships can be used to derive spatially continuous sapwood and heartwood mass estimates by applying them to remote sensing based stem volume products. This would be a fundamental step forward to a data-driven estimate of autotrophic respiration.
Using mixed methods when researching communities.

PubMed

Ochieng, Bertha M N; Meetoo, Danny

2015-09-01

To argue for the use of mixed methods when researching communities. Although research involving minority communities is now advanced, not enough effort has been made to formulate methodological linkages between qualitative and quantitative methods in most studies. For instance, the quantitative approaches used by epidemiologists and others in examining the wellbeing of communities are usually empirical. While the rationale for this is sound, quantitative findings can be expanded with data from in-depth qualitative approaches, such as interviews or observations, which are likely to provide insights into the experiences of people in those communities and their relationships with their wellbeing. Academic databases including The Cochrane Library, MEDLINE, CINAHL, AMED, INTERNURSE, Science Direct, Web of Knowledge and PubMed. An iterative process of identifying eligible literature was carried out by comprehensively searching electronic databases. Using mixed-methods approaches is likely to address any potential drawbacks of individual methods by exploiting the strengths of each at the various stages of research. Combining methods can provide additional ways of looking at a complex problem and improve the understanding of a community's experiences. However, it is important for researchers to use the different methods interactively during their research. The use of qualitative and quantitative methods is likely to enrich our understanding of the interrelationship between wellbeing and the experiences of communities. This should help researchers to explore socio-cultural factors and experiences of health and healthcare practice more effectively.
Applicability of computational systems biology in toxicology.

PubMed

Kongsbak, Kristine; Hadrup, Niels; Audouze, Karine; Vinggaard, Anne Marie

2014-07-01

Systems biology as a research field has emerged within the last few decades. Systems biology, often defined as the antithesis of the reductionist approach, integrates information about individual components of a biological system. In integrative systems biology, large data sets from various sources and databases are used to model and predict effects of chemicals on, for instance, human health. In toxicology, computational systems biology enables identification of important pathways and molecules from large data sets; tasks that can be extremely laborious when performed by a classical literature search. However, computational systems biology offers more advantages than providing a high-throughput literature search; it may form the basis for establishment of hypotheses on potential links between environmental chemicals and human diseases, which would be very difficult to establish experimentally. This is possible due to the existence of comprehensive databases containing information on networks of human protein-protein interactions and protein-disease associations. Experimentally determined targets of the specific chemical of interest can be fed into these networks to obtain additional information that can be used to establish hypotheses on links between the chemical and human diseases. Such information can also be applied for designing more intelligent animal/cell experiments that can test the established hypotheses. Here, we describe how and why to apply an integrative systems biology method in the hypothesis-generating phase of toxicological research. © 2014 Nordic Association for the Publication of BCPT (former Nordic Pharmacological Society).
The ESA FELYX High Resolution Diagnostic Data Set System Design and Implementation

NASA Astrophysics Data System (ADS)

Taberner, M.; Shutler, J.; Walker, P.; Poulter, D.; Piolle, J.-F.; Donlon, C.; Guidetti, V.

2013-10-01

Felyx is currently under development and is the latest evolution of a generalised High Resolution Diagnostic Data Set system funded by ESA. It draws on previous prototype developments and experience in the GHRSST, Medspiration, GlobColour and GlobWave projects. In this paper, we outline the design and implementation of the system, and illustrate using the Ocean Colour demonstration activities. Felyx is fundamentally a tool to facilitate the analysis of EO data: it is being developed by IFREMER, PML and Pelamis. It will be free software written in python and javascript. The aim is to provide Earth Observation data producers and users with an opensource, flexible and reusable tool to allow the quality and performance of data streams from satellite, in situ and model sources to be easily monitored and studied. New to this project, is the ability to establish and incorporate multi-sensor match-up database capabilities. The systems will be deployable anywhere and even include interaction mechanisms between the deployed instances. The primary concept of Felyx is to work as an extraction tool. It allows for the extraction of subsets of source data over predefined target areas(which can be static or moving). These data subsets, and associated metrics, can then be accessed by users or client applications either as raw files or through automatic alerts. These data can then be used to generate periodic reports or be used for statistical analysis and visualisation through a flexible web interface. Felyx can be used for subsetting, the generation of statistics, the generation of reports or warnings/alerts, and in-depth analyses, to name a few. There are many potential applications but important uses foreseen are: * monitoring and assessing the quality of Earth observations (e.g. satellite products and time series) through statistical analysis and/or comparison with other data sources * assessing and inter-comparing geophysical inversion algorithms * observing a given phenomenon, collecting and cumulating various parameters over a defined area * crossing different sources of data for synergy applications The services provided by felyx will be generic, deployable at users own premises, and flexible allowing the integration and development of any kind of parameters. Users will be able to operate their own felyx instance at any location, on datasets and parameters of their own interest, and the various instances will be able to interact with each other, creating a web of felyx systems enabling aggregation and cross comparison of miniProds and metrics from multiple sources. Initially two instances will be operated simultaneously during a 6 months demonstration phase, at IFREMER - on sea surface temperature and ocean waves datasets - and PML - on ocean colour.
Connecting Instances to Promote Children's Relational Reasoning

ERIC Educational Resources Information Center

Son, Ji Y.; Smith, Linda B.; Goldstone, Robert L.

2011-01-01

The practice of learning from multiple instances seems to allow children to learn about relational structure. The experiments reported here focused on two issues regarding relational learning from multiple instances: (a) what kind of perceptual situations foster such learning and (b) how particular object properties, such as complexity and…
An Experimental Comparison of CLOS and C++ Implementations of An Object- Oriented Graphical Simulation of Walking Robot Kinematics

DTIC Science & Technology

1993-03-25

attachment-angle :accessor leg-attachment-angle) (linkO :initform (make-instance ’linkO) :accessor linkO) ( linki :initform (make-instance ’ linki ...accessor linki ) (link2 :initform (make-instance ’link2) :accessor link2) (link3 :initform (make-instance ’link3) :accessor link3) (motion-complete-flag...inboard-link ( linki leg)) (linkO leg)) (setf (inboard-link (link2 leg)) ( linki leg)) (setf (inboard-link (link3 leg)) (link2 leg)) (rotate-link (linkO
OntoPop: An Ontology Population System for the Semantic Web

NASA Astrophysics Data System (ADS)

Thongkrau, Theerayut; Lalitrojwong, Pattarachai

The development of ontology at the instance level requires the extraction of the terms defining the instances from various data sources. These instances then are linked to the concepts of the ontology, and relationships are created between these instances for the next step. However, before establishing links among data, ontology engineers must classify terms or instances from a web document into an ontology concept. The tool for help ontology engineer in this task is called ontology population. The present research is not suitable for ontology development applications, such as long time processing or analyzing large or noisy data sets. OntoPop system introduces a methodology to solve these problems, which comprises two parts. First, we select meaningful features from syntactic relations, which can produce more significant features than any other method. Second, we differentiate feature meaning and reduce noise based on latent semantic analysis. Experimental evaluation demonstrates that the OntoPop works well, significantly out-performing the accuracy of 49.64%, a learning accuracy of 76.93%, and executes time of 5.46 second/instance.
Redox shuttle additives for lithium-ion batteries

DOEpatents

Zhang, Lu; Zhang, Zhengcheng; Amine, Khalil

2017-03-21

An electro lye includes a compound of Formula I or IA: where each instance of R.sup.1 is independently H, alkyl, alkoxy, alkenyl, aryl, heteroaryl, or cycloalkyl; each instance of R.sup.2 is independently H, alkyl, alkoxy, alkenyl, aryl, heteroaryl, or cycloalkyl; each instance of R.sup.3 is independently H, alkyl, alkenyl, aryl, or cycloalkyl; each instance of R.sup.4 is independently H, halogen, CN, NO.sub.2, phosphate, alkyl, alkenyl, aryl, heteroaryl, or cycloalkyl; x is 1, 2, 3, 4, or 5; y is 1 or 2; and z is 0, 1, 2, 3, or 4. ##STR00001##
An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains

PubMed Central

Airoldi, Edoardo M.; Bai, Xue; Malin, Bradley A.

2011-01-01

We live in an increasingly mobile world, which leads to the duplication of information across domains. Though organizations attempt to obscure the identities of their constituents when sharing information for worthwhile purposes, such as basic research, the uncoordinated nature of such environment can lead to privacy vulnerabilities. For instance, disparate healthcare providers can collect information on the same patient. Federal policy requires that such providers share “de-identified” sensitive data, such as biomedical (e.g., clinical and genomic) records. But at the same time, such providers can share identified information, devoid of sensitive biomedical data, for administrative functions. On a provider-by-provider basis, the biomedical and identified records appear unrelated, however, links can be established when multiple providers’ databases are studied jointly. The problem, known as trail disclosure, is a generalized phenomenon and occurs because an individual’s location access pattern can be matched across the shared databases. Due to technical and legal constraints, it is often difficult to coordinate between providers and thus it is critical to assess the disclosure risk in distributed environments, so that we can develop techniques to mitigate such risks. Research on privacy protection has so far focused on developing technologies to suppress or encrypt identifiers associated with sensitive information. There is growing body of work on the formal assessment of the disclosure risk of database entries in publicly shared databases, but a less attention has been paid to the distributed setting. In this research, we review the trail disclosure problem in several domains with known vulnerabilities and show that disclosure risk is influenced by the distribution of how people visit service providers. Based on empirical evidence, we propose an entropy metric for assessing such risk in shared databases prior to their release. This metric assesses risk by leveraging the statistical characteristics of a visit distribution, as opposed to person-level data. It is computationally efficient and superior to existing risk assessment methods, which rely on ad hoc assessment that are often computationally expensive and unreliable. We evaluate our approach on a range of location access patterns in simulated environments. Our results demonstrate the approach is effective at estimating trail disclosure risks and the amount of self-information contained in a distributed system is one of the main driving factors. PMID:21647242
CloudMC: a cloud computing application for Monte Carlo simulation.

PubMed

Miras, H; Jiménez, R; Miras, C; Gomà, C

2013-04-21

This work presents CloudMC, a cloud computing application-developed in Windows Azure®, the platform of the Microsoft® cloud-for the parallelization of Monte Carlo simulations in a dynamic virtual cluster. CloudMC is a web application designed to be independent of the Monte Carlo code in which the simulations are based-the simulations just need to be of the form: input files → executable → output files. To study the performance of CloudMC in Windows Azure®, Monte Carlo simulations with penelope were performed on different instance (virtual machine) sizes, and for different number of instances. The instance size was found to have no effect on the simulation runtime. It was also found that the decrease in time with the number of instances followed Amdahl's law, with a slight deviation due to the increase in the fraction of non-parallelizable time with increasing number of instances. A simulation that would have required 30 h of CPU on a single instance was completed in 48.6 min when executed on 64 instances in parallel (speedup of 37 ×). Furthermore, the use of cloud computing for parallel computing offers some advantages over conventional clusters: high accessibility, scalability and pay per usage. Therefore, it is strongly believed that cloud computing will play an important role in making Monte Carlo dose calculation a reality in future clinical practice.
Strategies for generating multiple instances of common and ad hoc categories.

PubMed

Vallée-Tourangeau, F; Anthony, S H; Austin, N G

1998-09-01

In a free-emission procedure participants were asked to generate instances of a given category and to report, retrospectively, the strategies that they were aware of using in retrieving instances. In two studies reported here, participants generated instances for common categories (e.g. fruit) and for ad hoc categories (e.g., things people keep in their pockets) for 90 seconds and for each category described how they had proceeded in doing so. Analysis of the protocols identified three broad classes of strategy: (1) experiential, where memories of specific or generic personal experiences involving interactions with the category instances acted as cues; (2) semantic, where a consideration of abstract conceptual characteristics of a category were employed to retrieve category exemplars; (3) unmediated, where instances were effortlessly retrieved without mediating cognitions of which subjects were aware. Experiential strategies outnumbered semantic strategies (on average 4 to 1) not only for ad hoc categories but also for common categories. This pattern was noticeably reversed for ad hoc categories that subjects were unlikely to have experienced personally (e.g. things sold on the black market in Russia). Whereas more traditional accounts of semantic memory have favoured decontextualised abstract representations of category knowledge, to the extent that mode of access informs us of knowledge structures, our data suggest that category knowledge is significantly grounded in terms of everyday contexts where category instances are encountered.
Instances selection algorithm by ensemble margin

NASA Astrophysics Data System (ADS)

Saidi, Meryem; Bechar, Mohammed El Amine; Settouti, Nesma; Chikh, Mohamed Amine

2018-05-01

The main limit of data mining algorithms is their inability to deal with the huge amount of available data in a reasonable processing time. A solution of producing fast and accurate results is instances and features selection. This process eliminates noisy or redundant data in order to reduce the storage and computational cost without performances degradation. In this paper, a new instance selection approach called Ensemble Margin Instance Selection (EMIS) algorithm is proposed. This approach is based on the ensemble margin. To evaluate our approach, we have conducted several experiments on different real-world classification problems from UCI Machine learning repository. The pixel-based image segmentation is a field where the storage requirement and computational cost of applied model become higher. To solve these limitations we conduct a study based on the application of EMIS and other instance selection techniques for the segmentation and automatic recognition of white blood cells WBC (nucleus and cytoplasm) in cytological images.
Learning Instance-Specific Predictive Models

PubMed Central

Visweswaran, Shyam; Cooper, Gregory F.

2013-01-01

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
PEP725: real time monitoring of phenological events in Austria, Germany, Sweden and Switzerland

NASA Astrophysics Data System (ADS)

Ungersboeck, Markus; Bolmgren, Kjell; Huebner, Thomas; Kaspar, Frank; Langvall, Ola; Paul, Anita; Pietragalla, Barbara; Scheifinger, Helfried; Koch, Elisabeth

2017-04-01

The main objective of PEP725 (Pan European Phenological database; http://www.pep725.eu/) is to promote and facilitate phenological research by delivering a pan European phenological database with an open, unrestricted data access for science, research and education. The first datasets in PEP725 date back to 1868; however, there are only a few observations available until 1950. From 1951 onwards, the phenological networks all over Europe developed rapidly. So far more than 11 923 489 of observations of 121 different plants are now available in the PEP725 database. Approximately 40 % of all data are flowering records, 10 % are fruit ripeness observations and also 10 % are leaf unfolding observations. The PEP725 database is updated annually. But since recently Deutscher Wetterdienst and MeteoSwiss offer their observers to upload their observations via web in real time mode, ZAMG introduced this web-based feature already in 2007 (phenowatch.at) and the observers of SWE-NPN (the Swedish National Phenology Network) can submit their observations through the web application naturenskalender.se since the start in 2008. Since spring 2016 one you can find a real time animated monitoring tool showing how the "green wave" in spring is moving from 46° northern latitude up to the Arctic Circle and the "brown wave" in autumn in the opposite direction. In 2015 the "green wave" speeds up from app. 4.4 days/degree latitude for hazel flowering to 2.9 days/ degree latitude for willow flowering and 2.25 days/degree latitude for birch leaf unfolding. There are other European countries as for instance Italy, The Netherlands, UK that have been doing visualizations of ground phenology in real time for some years, but these efforts always end at the national borders. PEP725 is funded by ZAMG, the Austrian ministry of science, research and economy and EUMETNET, the network of European meteorological services. So far 21 European meteorological services and 7 partners from different phenological network operators have joined PEP725.

Signature-based store checking buffer

DOEpatents

Sridharan, Vilas; Gurumurthi, Sudhanva

2015-06-02

A system and method for optimizing redundant output verification, are provided. A hardware-based store fingerprint buffer receives multiple instances of output from multiple instances of computation. The store fingerprint buffer generates a signature from the content included in the multiple instances of output. When a barrier is reached, the store fingerprint buffer uses the signature to verify the content is error-free.
What Children Recall about a Repeated Event When One Instance Is Different from the Others

ERIC Educational Resources Information Center

Connolly, Deborah A.; Gordon, Heidi M.; Woiwod, Dayna M.; Price, Heather L.

2016-01-01

This research examined whether a memorable and unexpected change (deviation details) presented during 1 instance of a repeated event facilitated children's memory for that instance and whether a repeated event facilitated children's memory for deviation details. In Experiments 1 and 2, 8-year-olds (N = 167) watched 1 or 4 live magic shows.…
Online ranking by projecting.

PubMed

Crammer, Koby; Singer, Yoram

2005-01-01

We discuss the problem of ranking instances. In our framework, each instance is associated with a rank or a rating, which is an integer in 1 to k. Our goal is to find a rank-prediction rule that assigns each instance a rank that is as close as possible to the instance's true rank. We discuss a group of closely related online algorithms, analyze their performance in the mistake-bound model, and prove their correctness. We describe two sets of experiments, with synthetic data and with the EachMovie data set for collaborative filtering. In the experiments we performed, our algorithms outperform online algorithms for regression and classification applied to ranking.
Stalling Tropical Cyclones over the Atlantic Basin

NASA Astrophysics Data System (ADS)

Nielsen-Gammon, J. W.; Emanuel, K.

2017-12-01

Hurricane Harvey produced massive amounts of rain over southeast Texas and southwest Louisiana. Average storm total rainfall amounts over a 10,000 square mile (26,000 square km) area exceeded 30 inches (750 mm). An important aspect of the storm that contributed to the large rainfall totals was its unusual motion. The storm stalled shortly after making landfall, then moved back offshore before once again making landfall five days later. This storm motion permitted heavy rainfall to occur in the same general area for an extended period of time. The unusual nature of this event motivates an investigation into the characteristics and potential climate change influences on stalled tropical cyclones in the Atlantic basin using the HURDAT 2 storm track database for 1866-2016 and downscaled tropical cyclones driven by simulations of present and future climate. The motion of cyclones is quantified as the size of a circle circumscribing all storm locations during a given length of time. For a three-day period, Harvey remained inside a circle with a radius of 123 km. This ranks within the top 0.6% of slowest-moving historical storm instances. Among the 2% of slowest-moving storm instances prior to Harvey, only 13 involved storms that stalled near the continental United States coast, where they may have produced substantial rainfall onshore while tapping into marine moisture. Only two such storms stalled in the month of September, in contrast to 20 September stalls out of the 36 storms that stalled over the nearby open Atlantic. Just four of the stalled coastal storms were hurricanes, implying a return frequency for such storms of much less than once per decade. The synoptic setting of these storms is examined for common features, and historical and projected trends in occurrences of stalled storms near the coast and farther offshore are investigated.
Virtual machine provisioning, code management, and data movement design for the Fermilab HEPCloud Facility

NASA Astrophysics Data System (ADS)

Timm, S.; Cooper, G.; Fuess, S.; Garzoglio, G.; Holzman, B.; Kennedy, R.; Grassano, D.; Tiradani, A.; Krishnamurthy, R.; Vinayagam, S.; Raicu, I.; Wu, H.; Ren, S.; Noh, S.-Y.

2017-10-01

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores. This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.
Isosteric And Non-Isosteric Base Pairs In RNA Motifs: Molecular Dynamics And Bioinformatics Study Of The Sarcin-Ricin Internal Loop

PubMed Central

Havrila, Marek; Réblová, Kamila; Zirbel, Craig L.; Leontis, Neocles B.; Šponer, Jiří

2013-01-01

The Sarcin-Ricin RNA motif (SR motif) is one of the most prominent recurrent RNA building blocks that occurs in many different RNA contexts and folds autonomously, i.e., in a context-independent manner. In this study, we combined bioinformatics analysis with explicit-solvent molecular dynamics (MD) simulations to better understand the relation between the RNA sequence and the evolutionary patterns of SR motif. SHAPE probing experiment was also performed to confirm fidelity of MD simulations. We identified 57 instances of the SR motif in a non-redundant subset of the RNA X-ray structure database and analyzed their basepairing, base-phosphate, and backbone-backbone interactions. We extracted sequences aligned to these instances from large ribosomal RNA alignments to determine frequency of occurrence for different sequence variants. We then used a simple scoring scheme based on isostericity to suggest 10 sequence variants with highly variable expected degree of compatibility with the SR motif 3D structure. We carried out MD simulations of SR motifs with these base substitutions. Non isosteric base substitutions led to unstable structures, but so did isosteric substitutions which were unable to make key base-phosphate interactions. MD technique explains why some potentially isosteric SR motifs are not realized during evolution. We also found that inability to form stable cWW geometry is an important factor in case of the first base pair of the flexible region of the SR motif. Comparison of structural, bioinformatics, SHAPE probing and MD simulation data reveals that explicit solvent MD simulations neatly reflect viability of different sequence variants of the SR motif. Thus, MD simulations can efficiently complement bioinformatics tools in studies of conservation patterns of RNA motifs and provide atomistic insight into the role of their different signature interactions. PMID:24144333
Grid Computing Application for Brain Magnetic Resonance Image Processing

NASA Astrophysics Data System (ADS)

Valdivia, F.; Crépeault, B.; Duchesne, S.

2012-02-01

This work emphasizes the use of grid computing and web technology for automatic post-processing of brain magnetic resonance images (MRI) in the context of neuropsychiatric (Alzheimer's disease) research. Post-acquisition image processing is achieved through the interconnection of several individual processes into pipelines. Each process has input and output data ports, options and execution parameters, and performs single tasks such as: a) extracting individual image attributes (e.g. dimensions, orientation, center of mass), b) performing image transformations (e.g. scaling, rotation, skewing, intensity standardization, linear and non-linear registration), c) performing image statistical analyses, and d) producing the necessary quality control images and/or files for user review. The pipelines are built to perform specific sequences of tasks on the alphanumeric data and MRIs contained in our database. The web application is coded in PHP and allows the creation of scripts to create, store and execute pipelines and their instances either on our local cluster or on high-performance computing platforms. To run an instance on an external cluster, the web application opens a communication tunnel through which it copies the necessary files, submits the execution commands and collects the results. We present result on system tests for the processing of a set of 821 brain MRIs from the Alzheimer's Disease Neuroimaging Initiative study via a nonlinear registration pipeline composed of 10 processes. Our results show successful execution on both local and external clusters, and a 4-fold increase in performance if using the external cluster. However, the latter's performance does not scale linearly as queue waiting times and execution overhead increase with the number of tasks to be executed.
PlanWorks: A Debugging Environment for Constraint Based Planning Systems

NASA Technical Reports Server (NTRS)

Daley, Patrick; Frank, Jeremy; Iatauro, Michael; McGann, Conor; Taylor, Will

2005-01-01

Numerous planning and scheduling systems employ underlying constraint reasoning systems. Debugging such systems involves the search for errors in model rules, constraint reasoning algorithms, search heuristics, and the problem instance (initial state and goals). In order to effectively find such problems, users must see why each state or action is in a plan by tracking causal chains back to part of the initial problem instance. They must be able to visualize complex relationships among many different entities and distinguish between those entities easily. For example, a variable can be in the scope of several constraints, as well as part of a state or activity in a plan; the activity can arise as a consequence of another activity and a model rule. Finally, they must be able to track each logical inference made during planning. We have developed PlanWorks, a comprehensive system for debugging constraint-based planning and scheduling systems. PlanWorks assumes a strong transaction model of the entire planning process, including adding and removing parts of the constraint network, variable assignment, and constraint propagation. A planner logs all transactions to a relational database that is tailored to support queries for of specialized views to display different forms of data (e.g. constraints, activities, resources, and causal links). PlanWorks was specifically developed for the Extensible Universal Remote Operations Planning Architecture (EUROPA(sub 2)) developed at NASA, but the underlying principles behind PlanWorks make it useful for many constraint-based planning systems. The paper is organized as follows. We first describe some fundamentals of EUROPA(sub 2). We then describe PlanWorks' principal components. We then discuss each component in detail, and then describe inter-component navigation features. We close with a discussion of how PlanWorks is used to find model flaws.
GIS Technologies For The New Planetary Science Archive (PSA)

NASA Astrophysics Data System (ADS)

Docasal, R.; Barbarisi, I.; Rios, C.; Macfarlane, A. J.; Gonzalez, J.; Arviset, C.; De Marchi, G.; Martinez, S.; Grotheer, E.; Lim, T.; Besse, S.; Heather, D.; Fraga, D.; Barthelemy, M.

2015-12-01

Geographical information system (GIS) is becoming increasingly used for planetary science. GIS are computerised systems for the storage, retrieval, manipulation, analysis, and display of geographically referenced data. Some data stored in the Planetary Science Archive (PSA), for instance, a set of Mars Express/Venus Express data, have spatial metadata associated to them. To facilitate users in handling and visualising spatial data in GIS applications, the new PSA should support interoperability with interfaces implementing the standards approved by the Open Geospatial Consortium (OGC). These standards are followed in order to develop open interfaces and encodings that allow data to be exchanged with GIS Client Applications, well-known examples of which are Google Earth and NASA World Wind as well as open source tools such as Openlayers. The technology already exists within PostgreSQL databases to store searchable geometrical data in the form of the PostGIS extension. An existing open source maps server is GeoServer, an instance of which has been deployed for the new PSA, uses the OGC standards to allow, among others, the sharing, processing and editing of data and spatial data through the Web Feature Service (WFS) standard as well as serving georeferenced map images through the Web Map Service (WMS). The final goal of the new PSA, being developed by the European Space Astronomy Centre (ESAC) Science Data Centre (ESDC), is to create an archive which enables science exploitation of ESA's planetary missions datasets. This can be facilitated through the GIS framework, offering interfaces (both web GUI and scriptable APIs) that can be used more easily and scientifically by the community, and that will also enable the community to build added value services on top of the PSA.
Virtual Machine Provisioning, Code Management, and Data Movement Design for the Fermilab HEPCloud Facility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Timm, S.; Cooper, G.; Fuess, S.

The Fermilab HEPCloud Facility Project has as its goal to extend the current Fermilab facility interface to provide transparent access to disparate resources including commercial and community clouds, grid federations, and HPC centers. This facility enables experiments to perform the full spectrum of computing tasks, including data-intensive simulation and reconstruction. We have evaluated the use of the commercial cloud to provide elasticity to respond to peaks of demand without overprovisioning local resources. Full scale data-intensive workflows have been successfully completed on Amazon Web Services for two High Energy Physics Experiments, CMS and NOνA, at the scale of 58000 simultaneous cores.more » This paper describes the significant improvements that were made to the virtual machine provisioning system, code caching system, and data movement system to accomplish this work. The virtual image provisioning and contextualization service was extended to multiple AWS regions, and to support experiment-specific data configurations. A prototype Decision Engine was written to determine the optimal availability zone and instance type to run on, minimizing cost and job interruptions. We have deployed a scalable on-demand caching service to deliver code and database information to jobs running on the commercial cloud. It uses the frontiersquid server and CERN VM File System (CVMFS) clients on EC2 instances and utilizes various services provided by AWS to build the infrastructure (stack). We discuss the architecture and load testing benchmarks on the squid servers. We also describe various approaches that were evaluated to transport experimental data to and from the cloud, and the optimal solutions that were used for the bulk of the data transport. Finally, we summarize lessons learned from this scale test, and our future plans to expand and improve the Fermilab HEP Cloud Facility.« less
Application of structure from motion to digitized historical airphotos to document geomorphic change over the past century

NASA Astrophysics Data System (ADS)

Roberti, Gioachino; Ward, Brent; van Wyk de Vries, Benjamin; Perotti, Luigi; Giardino, Marco; Friele, Pierre; Clague, John

2017-04-01

Topographic modeling is becoming more accessible due to the development of structure from motion (SFM), and multi-view stereo (MVS) image matching algorithms in digital photogrammetry. Many studies are utilizing SFM-MVS with either UAV or hand-held consumer-grade digital cameras. However, little work has been done in using SFM-MVS with digitized historical air photos. Large databases of historical airphotos are available in university, public, and government libraries, commonly as paper copies. In many instances, the photos are in poor condition (i.e. deformed by humidity, scratched, or annotated). In addition, the negatives, as well as metadata on the camera and the flight mission, may be missing. Processing such photos using classic stereo-photogrammetry is difficult and in many instances impossible. Yet these photos can provide a valuable archive of geomorphic changes. In this study, we digitized over 1000 vertical air photos of the Mount Meager massif (British Columbia, Canada), acquired during flights between 1947 and 2006. We processed the scans using the commercial SFM-MVS software package PhotoScan. PhotoScan provided high-quality orthophotos (0.42-1.13 m/pixel) and DTMs (1-5 m/pixel). We used the orthophotos to document glacier retreat and deep-seated gravitational deformation over the 60-year photo period. Notably, we reconstructed geomorphic changes that led to the very large (˜50 x 106 m 3) 2010 failure of the south flank of Meager Peak and also documented other unstable areas that might fail catastrophically in the future. This technique can be applied to other photosets to provide rapid high-quality cartographic products that allow researchers to track landscape changes over large areas over the past century.
Use of Low-Value Pediatric Services Among the Commercially Insured

PubMed Central

Schwartz, Aaron L.; Volerman, Anna; Conti, Rena M.; Huang, Elbert S.

2016-01-01

BACKGROUND: Claims-based measures of “low-value” pediatric services could facilitate the implementation of interventions to reduce the provision of potentially harmful services to children. However, few such measures have been developed. METHODS: We developed claims-based measures of 20 services that typically do not improve child health according to evidence-based guidelines (eg, cough and cold medicines). Using these measures and claims from 4.4 million commercially insured US children in the 2014 Truven MarketScan Commercial Claims and Encounters database, we calculated the proportion of children who received at least 1 low-value pediatric service during the year, as well as total and out-of-pocket spending on these services. We report estimates based on "narrow" measures designed to only capture instances of service use that were low-value. To assess the sensitivity of results to measure specification, we also reported estimates based on "broad measures" designed to capture most instances of service use that were low-value. RESULTS: According to the narrow measures, 9.6% of children in our sample received at least 1 of the 20 low-value services during the year, resulting in $27.0 million in spending, of which $9.2 million was paid out-of-pocket (33.9%). According to the broad measures, 14.0% of children in our sample received at least 1 of the 20 low-value services during the year. CONCLUSIONS: According to a novel set of claims-based measures, at least 1 in 10 children in our sample received low-value pediatric services during 2014. Estimates of low-value pediatric service use may vary substantially with measure specification. PMID:27940698
Mutual benefits in academic-service partnership: An integrative review.

PubMed

Sadeghnezhad, Maliheh; Heshmati Nabavi, Fatemeh; Najafi, Fereshteh; Kareshki, Hossein; Esmaily, Habibollah

2018-05-30

Academic and service institutions involve with many challenges. Partnership programs are a golden opportunity to achieve mutual benefits to overcome these challenges. Identifying mutual benefits is the cornerstone of forming a successful partnership and guarantee to its continuity. There are definitions and instances of mutual benefits in the literature related to partnership programs, but there is no coherent evidence and clear picture of these benefits. This study is conducted to identify mutual benefits in academic-service partnership by analyzing the definitions and instances of it in the literature. An integrative review of key papers regarding mutual benefits in academic-service partnership was undertaken. This review was guided by the framework described by Whittemore and Knafl. Search of the following databases was conducted: MEDLINE, ERIC, Google Scholar, Emerald Insight and Science Direct. The search terms were mutual benefits, mutual gains, mutual interest, mutual expectations, mutual goals, mutual demand, partnership, collaboration, academic-service partnership and academic service collaboration. Cooper's five-stage integrative review method was used. Quality evaluation of articles was conducted. Data were abstracted from included articles. The analysis was conducted based on the qualitative content analysis of the literature suggested by Zhang and Wildemuth. 28 articles were included in this review. Mutual benefits are described in four categories include: synergy in training and empowerment of human resources, education improvement, access to shared resources, facilitate production and application of beneficial knowledge into practice. Mutual benefits in the academic-service partnership include a range of goals, interests, expectations, and needs of partner organizations that is achievable and measurable through joint planning and collaboration. We suggest academic and service policymakers to consider these benefits in the planning and evaluating partnership programs. Copyright © 2018 Elsevier Ltd. All rights reserved.
Factors influencing the implementation of clinical guidelines for health care professionals: a systematic meta-review.

PubMed

Francke, Anneke L; Smit, Marieke C; de Veer, Anke J E; Mistiaen, Patriek

2008-09-12

Nowadays more and more clinical guidelines for health care professionals are being developed. However, this does not automatically mean that these guidelines are actually implemented. The aim of this meta-review is twofold: firstly, to gain a better understanding of which factors affect the implementation of guidelines, and secondly, to provide insight into the "state-of-the-art" regarding research within this field. A search of five literature databases and one website was performed to find relevant existing systematic reviews or meta-reviews. Subsequently, a two-step inclusion process was conducted: (1) screening on the basis of references and abstracts and (2) screening based on full-text papers. After that, relevant data from the included reviews were extracted and the methodological quality of the reviews was assessed by using the Quality Assessment Checklist for Reviews. Twelve systematic reviews met our inclusion criteria. No previous systematic meta-reviews meeting all our inclusion criteria were found. Two of the twelve reviews scored high on the checklist used, indicating only "minimal" or "minor flaws". The other ten reviews scored in the lowest of middle ranges, indicating "extensive" or "major" flaws. A substantial proportion (although not all) of the reviews indicates that effective strategies often have multiple components and that the use of one single strategy, such as reminders only or an educational intervention, is less effective. Besides, characteristics of the guidelines themselves affect actual use. For instance, guidelines that are easy to understand, can easily be tried out, and do not require specific resources, have a greater chance of implementation. In addition, characteristics of professionals - e.g., awareness of the existence of the guideline and familiarity with its content - likewise affect implementation. Furthermore, patient characteristics appear to exert influence: for instance, co-morbidity reduces the chance that guidelines are followed. Finally, environmental characteristics may influence guideline implementation. For example, a lack of support from peers or superiors, as well as insufficient staff and time, appear to be the main impediments. Existing reviews describe various factors that influence whether guidelines are actually used. However, the evidence base is still thin, and future sound research - for instance comparing combinations of implementation strategies versus single strategies - is needed.
Application of an artificial intelligence program to therapy of high-risk surgical patients.

PubMed

Patil, R S; Adibi, J; Shoemaker, W C

1996-11-01

We developed an artificial intelligence program from a large computerized database of hemodynamic and oxygen transport measurements together with prior studies defining survivors' values, outcome predictors, and a branched-chain decision tree. The artificial intelligence program was then tested on the data of 100 survivors and 100 nonsurvivors not used for the development of the program or other analyses. Using the predictor as a surrogate outcome measure, the therapy recommended by the program improved the predicted outcome 3.16% per therapeutic intervention while the actual therapy given increased outcome 1.86% in surviving patients; the artificial intelligence-recommended therapy improved outcome 7.9% in nonsurvivors, while the actual therapy given increased predicted outcome -0.29% in nonsurvivors (p < .05). There were fewer patients whose predicted outcome decreased after recommended treatment (14%) than after the actual therapy given (37%). Review of therapy recommended by the program did not reveal instances of inappropriate or potentially harmful recommendations.
Archive interoperability in the Virtual Observatory

NASA Astrophysics Data System (ADS)

Genova, Françoise

2003-02-01

Main goals of Virtual Observatory projects are to build interoperability between astronomical on-line services, observatory archives, databases and results published in journals, and to develop tools permitting the best scientific usage from the very large data sets stored in observatory archives and produced by large surveys. The different Virtual Observatory projects collaborate to define common exchange standards, which are the key for a truly International Virtual Observatory: for instance their first common milestone has been a standard allowing exchange of tabular data, called VOTable. The Interoperability Work Area of the European Astrophysical Virtual Observatory project aims at networking European archives, by building a prototype using the CDS VizieR and Aladin tools, and at defining basic rules to help archive providers in interoperability implementation. The prototype is accessible for scientific usage, to get user feedback (and science results!) at an early stage of the project. ISO archive participates very actively to this endeavour, and more generally to information networking. The on-going inclusion of the ISO log in SIMBAD will allow higher level links for users.
A Large-Sample Test of a Semi-Automated Clavicle Search Engine to Assist Skeletal Identification by Radiograph Comparison.

PubMed

D'Alonzo, Susan S; Guyomarc'h, Pierre; Byrd, John E; Stephan, Carl N

2017-01-01

In 2014, a morphometric capability to search chest radiograph databases by quantified clavicle shape was published to assist skeletal identification. Here, we extend the validation tests conducted by increasing the search universe 18-fold, from 409 to 7361 individuals to determine whether there is any associated decrease in performance under these more challenging circumstances. The number of trials and analysts were also increased, respectively, from 17 to 30 skeletons, and two to four examiners. Elliptical Fourier analysis was conducted on clavicles from each skeleton by each analyst (shadowgrams trimmed from scratch in every instance) and compared to the search universe. Correctly matching individuals were found in shortlists of 10% of the sample 70% of the time. This rate is similar to, although slightly lower than, rates previously found for much smaller samples (80%). Accuracy and reliability are thereby maintained, even when the comparison system is challenged by much larger search universes. © 2016 American Academy of Forensic Sciences.
Redrawing the Map of Great Britain from a Network of Human Interactions

PubMed Central

Ratti, Carlo; Sobolevsky, Stanislav; Calabrese, Francesco; Andris, Clio; Reades, Jonathan; Martino, Mauro; Claxton, Rob; Strogatz, Steven H.

2010-01-01

Do regional boundaries defined by governments respect the more natural ways that people interact across space? This paper proposes a novel, fine-grained approach to regional delineation, based on analyzing networks of billions of individual human transactions. Given a geographical area and some measure of the strength of links between its inhabitants, we show how to partition the area into smaller, non-overlapping regions while minimizing the disruption to each person's links. We tested our method on the largest non-Internet human network, inferred from a large telecommunications database in Great Britain. Our partitioning algorithm yields geographically cohesive regions that correspond remarkably well with administrative regions, while unveiling unexpected spatial structures that had previously only been hypothesized in the literature. We also quantify the effects of partitioning, showing for instance that the effects of a possible secession of Wales from Great Britain would be twice as disruptive for the human network than that of Scotland. PMID:21170390
Does the evidence make a difference in consumer behavior? Sales of supplements before and after publication of negative research results.

PubMed

Tilburt, Jon C; Emanuel, Ezekiel J; Miller, Franklin G

2008-09-01

To determine if the public consumption of herbs, vitamins, and supplements changes in light of emerging negative evidence. We describe trends in annual US sales of five major supplements in temporal relationship with publication of research from three top US general medical journals published from 2001 through early 2006 and the number of news citations associated with each publication using the Lexus-Nexis database. In four of five supplements (St. John's wort, echinacea, saw palmetto, and glucosamine), there was little or no change in sales trends after publication of research results. In one instance, however, dramatic changes in sales occurred following publication of data suggesting harm from high doses of vitamin E. Results reporting harm may have a greater impact on supplement consumption than those demonstrating lack of efficacy. In order for clinical trial evidence to influence public behavior, there needs to be a better understanding of the factors that influence the translation of evidence in the public.
Women are underrepresented in computational biology: An analysis of the scholarly literature in biology, computer science and computational biology.

PubMed

Bonham, Kevin S; Stefan, Melanie I

2017-10-01

While women are generally underrepresented in STEM fields, there are noticeable differences between fields. For instance, the gender ratio in biology is more balanced than in computer science. We were interested in how this difference is reflected in the interdisciplinary field of computational/quantitative biology. To this end, we examined the proportion of female authors in publications from the PubMed and arXiv databases. There are fewer female authors on research papers in computational biology, as compared to biology in general. This is true across authorship position, year, and journal impact factor. A comparison with arXiv shows that quantitative biology papers have a higher ratio of female authors than computer science papers, placing computational biology in between its two parent fields in terms of gender representation. Both in biology and in computational biology, a female last author increases the probability of other authors on the paper being female, pointing to a potential role of female PIs in influencing the gender balance.

Designing Domain-Specific HUMS Architectures: An Automated Approach

NASA Technical Reports Server (NTRS)

Mukkamala, Ravi; Agarwal, Neha; Kumar, Pramod; Sundaram, Parthiban

2004-01-01

The HUMS automation system automates the design of HUMS architectures. The automated design process involves selection of solutions from a large space of designs as well as pure synthesis of designs. Hence the whole objective is to efficiently search for or synthesize designs or parts of designs in the database and to integrate them to form the entire system design. The automation system adopts two approaches in order to produce the designs: (a) Bottom-up approach and (b) Top down approach. Both the approaches are endowed with a Suite of quantitative and quantitative techniques that enable a) the selection of matching component instances, b) the determination of design parameters, c) the evaluation of candidate designs at component-level and at system-level, d) the performance of cost-benefit analyses, e) the performance of trade-off analyses, etc. In short, the automation system attempts to capitalize on the knowledge developed from years of experience in engineering, system design and operation of the HUMS systems in order to economically produce the most optimal and domain-specific designs.
A fatigue monitoring system based on time-domain and frequency-domain analysis of pulse data

NASA Astrophysics Data System (ADS)

Shen, Jiaai

2018-04-01

Fatigue is almost a problem that everyone would face, and a psychosis that everyone hates. If we can test people's fatigue condition and remind them of the tiredness, dangers in life, for instance, traffic accidents and sudden death will be effectively reduced, people's fatigued operations will be avoided. And people can be assisted to have access to their own and others' physical condition in time to alternate work with rest. The article develops a wearable bracelet based on FFT Pulse Frequency Spectrum Analysis and IBI's standard deviation and range calculation, according to people's heart rate (BPM) and inter-beat interval (IBI) while being tired and conscious. The hardware part is based on Arduino, pulse rate sensor, and Bluetooth module, and the software part is relied on network micro database and APP. By doing sample experiment to get more accurate standard value to judge tiredness, we prove that we can judge people's fatigue condition based on heart rate (BPM) and inter-beat interval (IBI).
RNA 3D Structural Motifs: Definition, Identification, Annotation, and Database Searching

NASA Astrophysics Data System (ADS)

Nasalean, Lorena; Stombaugh, Jesse; Zirbel, Craig L.; Leontis, Neocles B.

Structured RNA molecules resemble proteins in the hierarchical organization of their global structures, folding and broad range of functions. Structured RNAs are composed of recurrent modular motifs that play specific functional roles. Some motifs direct the folding of the RNA or stabilize the folded structure through tertiary interactions. Others bind ligands or proteins or catalyze chemical reactions. Therefore, it is desirable, starting from the RNA sequence, to be able to predict the locations of recurrent motifs in RNA molecules. Conversely, the potential occurrence of one or more known 3D RNA motifs may indicate that a genomic sequence codes for a structured RNA molecule. To identify known RNA structural motifs in new RNA sequences, precise structure-based definitions are needed that specify the core nucleotides of each motif and their conserved interactions. By comparing instances of each recurrent motif and applying base pair isosteriCity relations, one can identify neutral mutations that preserve its structure and function in the contexts in which it occurs.
Correlation analysis between the occurrence of ionospheric scintillation at the magnetic equator and at the southern peak of the Equatorial Ionization Anomaly

NASA Astrophysics Data System (ADS)

de Lima, G. R. T.; Stephany, S.; de Paula, E. R.; Batista, I. S.; Abdu, M. A.; Rezende, L. F. C.; Aquino, M. G. S.; Dutra, A. P. S.

2014-06-01

Ionospheric scintillation refers to amplitude and phase fluctuations in radio signals due to electron density irregularities associated to structures named ionospheric plasma bubbles. The phenomenon is more pronounced around the magnetic equator where, after sunset, plasma bubbles of varying sizes and density depletions are generated by plasma instability mechanisms. The bubble depletions are aligned along Earth's magnetic field lines, and they develop vertically upward over the magnetic equator so that their extremities extend in latitude to north and south of the dip equator. Over Brazil, developing bubbles can extend to the southern peak of the Equatorial Ionization Anomaly, where high levels of ionospheric scintillation are common. Scintillation may seriously affect satellite navigation systems, such as the Global Navigation Satellite Systems. However, its effects may be mitigated by using a predictive model derived from a collection of extended databases on scintillation and its associated variables. This work proposes the use of a classification and regression decision tree to perform a study on the correlation between the occurrence of scintillation at the magnetic equator and that at the southern peak of the equatorial anomaly. Due to limited size of the original database, a novel resampling heuristic was applied to generate new training instances from the original ones in order to improve the accuracy of the decision tree. The correlation analysis presented in this work may serve as a starting point for the eventual development of a predictive model suitable for operational use.
Exact and approximate graph matching using random walks.

PubMed

Gori, Marco; Maggini, Marco; Sarti, Lorenzo

2005-07-01

In this paper, we propose a general framework for graph matching which is suitable for different problems of pattern recognition. The pattern representation we assume is at the same time highly structured, like for classic syntactic and structural approaches, and of subsymbolic nature with real-valued features, like for connectionist and statistic approaches. We show that random walk based models, inspired by Google's PageRank, give rise to a spectral theory that nicely enhances the graph topological features at node level. As a straightforward consequence, we derive a polynomial algorithm for the classic graph isomorphism problem, under the restriction of dealing with Markovian spectrally distinguishable graphs (MSD), a class of graphs that does not seem to be easily reducible to others proposed in the literature. The experimental results that we found on different test-beds of the TC-15 graph database show that the defined MSD class "almost always" covers the database, and that the proposed algorithm is significantly more efficient than top scoring VF algorithm on the same data. Most interestingly, the proposed approach is very well-suited for dealing with partial and approximate graph matching problems, derived for instance from image retrieval tasks. We consider the objects of the COIL-100 visual collection and provide a graph-based representation, whose node's labels contain appropriate visual features. We show that the adoption of classic bipartite graph matching algorithms offers a straightforward generalization of the algorithm given for graph isomorphism and, finally, we report very promising experimental results on the COIL-100 visual collection.
High-Performance Computational Analysis of Glioblastoma Pathology Images with Database Support Identifies Molecular and Survival Correlates.

PubMed

Kong, Jun; Wang, Fusheng; Teodoro, George; Cooper, Lee; Moreno, Carlos S; Kurc, Tahsin; Pan, Tony; Saltz, Joel; Brat, Daniel

2013-12-01

In this paper, we present a novel framework for microscopic image analysis of nuclei, data management, and high performance computation to support translational research involving nuclear morphometry features, molecular data, and clinical outcomes. Our image analysis pipeline consists of nuclei segmentation and feature computation facilitated by high performance computing with coordinated execution in multi-core CPUs and Graphical Processor Units (GPUs). All data derived from image analysis are managed in a spatial relational database supporting highly efficient scientific queries. We applied our image analysis workflow to 159 glioblastomas (GBM) from The Cancer Genome Atlas dataset. With integrative studies, we found statistics of four specific nuclear features were significantly associated with patient survival. Additionally, we correlated nuclear features with molecular data and found interesting results that support pathologic domain knowledge. We found that Proneural subtype GBMs had the smallest mean of nuclear Eccentricity and the largest mean of nuclear Extent, and MinorAxisLength. We also found gene expressions of stem cell marker MYC and cell proliferation maker MKI67 were correlated with nuclear features. To complement and inform pathologists of relevant diagnostic features, we queried the most representative nuclear instances from each patient population based on genetic and transcriptional classes. Our results demonstrate that specific nuclear features carry prognostic significance and associations with transcriptional and genetic classes, highlighting the potential of high throughput pathology image analysis as a complementary approach to human-based review and translational research.
When overweight is the normal weight: an examination of obesity using a social media internet database.

PubMed

Kuebler, Meghan; Yom-Tov, Elad; Pelleg, Dan; Puhl, Rebecca M; Muennig, Peter

2013-01-01

Using a large social media database, Yahoo Answers, we explored postings to an online forum in which posters asked whether their height and weight qualify themselves as "skinny," "thin," "fat," or "obese" over time and across forum topics. We used these data to better understand whether a higher-than-average body mass index (BMI) in one's county might, in some ways, be protective for one's mental and physical health. For instance, we explored whether higher proportions of obese people in one's county predicts lower levels of bullying or "am I fat?" questions from those with a normal BMI relative to his/her actual BMI. Most women asking whether they were themselves fat/obese were not actually fat/obese. Both men and women who were actually overweight/obese were significantly more likely in the future to ask for advice about bullying than thinner individuals. Moreover, as mean county-level BMI increased, bullying decreased and then increased again (in a U-shape curve). Regardless of where they lived, posters who asked "am I fat?" who had a BMI in the healthy range were more likely than other posters to subsequently post on health problems, but the proportions of such posters also declined greatly as county-level BMI increased. Our findings suggest that obese people residing in counties with higher levels of BMI may have better physical and mental health than obese people living in counties with lower levels of BMI by some measures, but these improvements are modest.
Beyond the ridge pattern: multi-informative analysis of latent fingermarks by MALDI mass spectrometry.

PubMed

Francese, S; Bradshaw, R; Ferguson, L S; Wolstenholme, R; Clench, M R; Bleay, S

2013-08-07

After over a century, fingerprints are still one of the most powerful means of biometric identification. The conventional forensic workflow for suspect identification consists of (i) recovering latent marks from crime scenes using the appropriate enhancement technique and (ii) obtaining an image of the mark to compare either against known suspect prints and/or to search in a Fingerprint Database. The suspect is identified through matching the ridge pattern and local characteristics of the ridge pattern (minutiae). However successful, there are a number of scenarios in which this process may fail; they include the recovery of partial, distorted or smudged marks, poor quality of the image resulting from inadequacy of the enhancement technique applied, extensive scarring/abrasion of the fingertips or absence of suspect's fingerprint records in the database. In all of these instances it would be very desirable to have a technology able to provide additional information from a fingermark exploiting its endogenous and exogenous chemical content. This opportunity could potentially provide new investigative leads, especially when the fingermark comparison and match process fails. We have demonstrated that Matrix Assisted Laser Desorption Ionisation Mass Spectrometry and Mass Spectrometry Imaging (MALDI MSI) can provide multiple images of the same fingermark in one analysis simultaneous with additional intelligence. Here, a review on the pioneering use and development of MALDI MSI for the analysis of latent fingermarks is presented along with the latest achievements on the forensic intelligence retrievable.
Upper Pleistocene uplifted shorelines as tracers of (local rather than global) subduction dynamics

NASA Astrophysics Data System (ADS)

Henry, Hadrien; Regard, Vincent; Pedoja, Kevin; Husson, Laurent; Martinod, Joseph; Witt, Cesar; Heuret, Arnauld

2014-08-01

Past studies have shown that high coastal uplift rates are restricted to active areas, especially in a subduction context. The origin of coastal uplift in subduction zones, however, has not yet been globally investigated. Quaternary shorelines correlated to the last interglacial maximum (MIS 5e) were defined as a global tectonic benchmark (Pedoja et al., 2011). In order to investigate the relationships between the vertical motion and the subduction dynamic parameters, we cross-linked this coastal uplift database with the “geodynamical” databases from Heuret (2005), Conrad and Husson (2009) and Müller et al. (2008). Our statistical study shows that: (1) the most intuitive parameters one can think responsible for coastal uplift (e.g., subduction obliquity, trench motion, oceanic crust age, interplate friction and force, convergence variation, dynamic topography, overriding and subducted plate velocity) are not related with the uplift (and its magnitude); (2) the only intuitive parameter is the distance to the trench which shows in specific areas a decrease from the trench up to a distance of ˜300 km; (3) the slab dip (especially the deep slab dip), the position along the trench and the overriding plate tectonic regime are correlated with the coastal uplift, probably reflecting transient changes in subduction parameters. Finally we conclude that the first order parameter explaining coastal uplift is small-scale heterogeneities of the subducting plate, as for instance subducting aseismic ridges. The influence of large-scale geodynamic setting of subduction zones is secondary.
Patient safety in palliative care: A mixed-methods study of reports to a national database of serious incidents.

PubMed

Yardley, Iain; Yardley, Sarah; Williams, Huw; Carson-Stevens, Andrew; Donaldson, Liam J

2018-06-01

Patients receiving palliative care are vulnerable to patient safety incidents but little is known about the extent of harm caused or the origins of unsafe care in this population. To quantify and qualitatively analyse serious incident reports in order to understand the causes and impact of unsafe care in a population receiving palliative care. A mixed-methods approach was used. Following quantification of type of incidents and their location, a qualitative analysis using a modified framework method was used to interpret themes in reports to examine the underlying causes and the nature of resultant harms. Reports to a national database of 'serious incidents requiring investigation' involving patients receiving palliative care in the National Health Service (NHS) in England during the 12-year period, April 2002 to March 2014. A total of 475 reports were identified: 266 related to pressure ulcers, 91 to medication errors, 46 to falls, 21 to healthcare-associated infections (HCAIs), 18 were other instances of disturbed dying, 14 were allegations against health professions, 8 transfer incidents, 6 suicides and 5 other concerns. The frequency of report types differed according to the care setting. Underlying causes included lack of palliative care experience, under-resourcing and poor service coordination. Resultant harms included worsened symptoms, disrupted dying, serious injury and hastened death. Unsafe care presents a risk of significant harm to patients receiving palliative care. Improvements in the coordination of care delivery alongside wider availability of specialist palliative care support may reduce this risk.
Derivation and analysis of cross relations of photosynthesis and respiration across at FLUXNET sites for model improvement

NASA Astrophysics Data System (ADS)

Lasslop, G.; Reichstein, M.; Papale, D.; Richardson, A. D.

2009-12-01

The FLUXNET database provides measurements of the net ecosystem exchange (NEE) of carbon across vegetation types and climate regions. To simplify the interpretation in terms of processes the net exchange is frequently split up into the two main components: gross primary production (GPP) and ecosystem respiration (Reco). A strong relation between these two fluxes related derived from eddy covariance data was found across temporal scales and is to be expected as variation in recent photosynthesis is known to be correlated with root respiration; plants use energy from photosynthesis to drive the metabolism. At long time scales, substrate availability (constrained by past productivity) limits the whole-ecosystem respiration. Previous studies exploring this relationship relied on GPP and Reco estimates derived from the same data, this may lead to spurious correlation that must not be interpreted ecologically. In this study we use two estimates derived from disjunct datasets, one based on daytime data, the other on nighttime data and explore the reliability and robustness of this relationship. We find distinct relationship between the two, varying between vegetation types but also across temporal and spatial scales. We also infer that spatial and temporal variability of net ecosystem exchange is driven by GPP in many cases. Exceptions to this rule include for example disturbed sites. We advocate that for model calibration and evaluation not only the fluxes itself but also robust patterns between fluxes that can be extracted from the database, for instance between the flux components, should be considered.
Challenges in the Development and Evolution of Secure Open Architecture Command and Control Systems (Briefing Charts)

DTIC Science & Technology

2013-06-01

widgets for an OA system Design-time architecture: Browser, email, widget, DB, OS Go ogle Instance architecture: Chrome, Gmail, Google...provides functionally similar components or applications compatible with an OA system design Firefox Browser, WP, calendar Opera Instance...architecture: Firefox , AbiWord, Evolution, Fedora GPL Ab1Word Google Docs Instance ardlitecture: Fire fox, OR Google cal., Google Docs, Fedora
Spin Glass Patch Planting

NASA Technical Reports Server (NTRS)

Wang, Wenlong; Mandra, Salvatore; Katzgraber, Helmut G.

2016-01-01

In this paper, we propose a patch planting method for creating arbitrarily large spin glass instances with known ground states. The scaling of the computational complexity of these instances with various block numbers and sizes is investigated and compared with random instances using population annealing Monte Carlo and the quantum annealing DW2X machine. The method can be useful for benchmarking tests for future generation quantum annealing machines, classical and quantum mechanical optimization algorithms.
Reducing Annotation Effort Using Generalized Expectation Criteria

DTIC Science & Technology

2007-11-30

constraints additionally consider input variables. Active learning is a related problem in which the learner can choose the particular instances to be...labeled. In pool-based active learning [Cohn et al., 1994], the learner has access to a set of unlabeled instances, and can choose the instance that...has the highest expected utility according to some metric. A standard pool- based active learning method is uncertainty sampling [Lewis and Catlett
Maritime Analytics Prototype: Final Development Report

DTIC Science & Technology

2014-04-01

access management platform OpenAM , support for multiple instances of the same type of widget and support for installation specific configuration files to...et de la gestion de l’accès OpenAM , le support pour plusieurs instances du même type de widget et le support des fichiers d’installation de...open source authentication and access management platform OpenAM , support for multiple instances of the same type of widget and support for
2012 Workplace and Gender Relations Survey of Reserve Component Members. Tabulation of Responses

DTIC Science & Technology

2013-05-08

Sexual Coercion can be defined as classic quid pro quo , instances of special treatment or... quid pro quo instances of job benefits or losses conditioned on sexual cooperation. Respondents were asked to indicate how often they had been in...Constructed from Q56k-l and o-p. Sexual Coercion can be defined as classic quid pro quo , instances of special treatment or favoritism conditional on sexual
Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature.

PubMed

Henderson, Jette; Ke, Junyuan; Ho, Joyce C; Ghosh, Joydeep; Wallace, Byron C

2018-05-04

Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy. ©Jette Henderson, Junyuan Ke, Joyce C Ho, Joydeep Ghosh, Byron C Wallace. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 04.05.2018.
Phenotype Instance Verification and Evaluation Tool (PIVET): A Scaled Phenotype Evidence Generation Framework Using Web-Based Medical Literature

PubMed Central

Ke, Junyuan; Ho, Joyce C; Ghosh, Joydeep; Wallace, Byron C

2018-01-01

Background Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET’s phenotype representation with PheKnow-Cloud’s by using PheKnow-Cloud’s experimental setup. In PIVET’s framework, we also introduce a statistical model trained on domain expert–verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET’s analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy. PMID:29728351
DMET-analyzer: automatic analysis of Affymetrix DMET data.

PubMed

Guzzi, Pietro Hiram; Agapito, Giuseppe; Di Martino, Maria Teresa; Arbitrio, Mariamena; Tassone, Pierfrancesco; Tagliaferri, Pierosandro; Cannataro, Mario

2012-10-05

Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) on genes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding the analysis of clinical datasets produced in the University Hospital of Catanzaro, Italy. DMET Analyzer is a novel tool able to automatically analyse data produced by the DMET-platform in case-control association studies. Using such tool user may avoid wasting time in the manual execution of multiple statistical tests avoiding possible errors and reducing the amount of time needed for a whole experiment. Moreover annotations and the direct link to external databases may increase the biological knowledge extracted. The system is freely available for academic purposes at: https://sourceforge.net/projects/dmetanalyzer/files/
Development of stable Grid service at the next generation system of KEKCC

NASA Astrophysics Data System (ADS)

Nakamura, T.; Iwai, G.; Matsunaga, H.; Murakami, K.; Sasaki, T.; Suzuki, S.; Takase, W.

2017-10-01

A lot of experiments in the field of accelerator based science are actively running at High Energy Accelerator Research Organization (KEK) by using SuperKEKB and J-PARC accelerator in Japan. In these days at KEK, the computing demand from the various experiments for the data processing, analysis, and MC simulation is monotonically increasing. It is not only for the case with high-energy experiments, the computing requirement from the hadron and neutrino experiments and some projects of astro-particle physics is also rapidly increasing due to the very high precision measurement. Under this situation, several projects, Belle II, T2K, ILC and KAGRA experiments supported by KEK are going to utilize Grid computing infrastructure as the main computing resource. The Grid system and services in KEK, which is already in production, are upgraded for the further stable operation at the same time of whole scale hardware replacement of KEK Central Computer System (KEKCC). The next generation system of KEKCC starts the operation from the beginning of September 2016. The basic Grid services e.g. BDII, VOMS, LFC, CREAM computing element and StoRM storage element are made by the more robust hardware configuration. Since the raw data transfer is one of the most important tasks for the KEKCC, two redundant GridFTP servers are adapted to the StoRM service instances with 40 Gbps network bandwidth on the LHCONE routing. These are dedicated to the Belle II raw data transfer to the other sites apart from the servers for the data transfer usage of the other VOs. Additionally, we prepare the redundant configuration for the database oriented services like LFC and AMGA by using LifeKeeper. The LFC servers are made by two read/write servers and two read-only servers for the Belle II experiment, and all of them have an individual database for the purpose of load balancing. The FTS3 service is newly deployed as a service for the Belle II data distribution. The service of CVMFS stratum-0 is started for the Belle II software repository, and stratum-1 service is prepared for the other VOs. In this way, there are a lot of upgrade for the real production service of Grid infrastructure at KEK Computing Research Center. In this paper, we would like to introduce the detailed configuration of the hardware for Grid instance, and several mechanisms to construct the robust Grid system in the next generation system of KEKCC.

Secure count query on encrypted genomic data.

PubMed

Hasan, Mohammad Zahidul; Mahdi, Md Safiur Rahman; Sadat, Md Nazmus; Mohammed, Noman

2018-05-01

Human genomic information can yield more effective healthcare by guiding medical decisions. Therefore, genomics research is gaining popularity as it can identify potential correlations between a disease and a certain gene, which improves the safety and efficacy of drug treatment and can also develop more effective prevention strategies [1]. To reduce the sampling error and to increase the statistical accuracy of this type of research projects, data from different sources need to be brought together since a single organization does not necessarily possess required amount of data. In this case, data sharing among multiple organizations must satisfy strict policies (for instance, HIPAA and PIPEDA) that have been enforced to regulate privacy-sensitive data sharing. Storage and computation on the shared data can be outsourced to a third party cloud service provider, equipped with enormous storage and computation resources. However, outsourcing data to a third party is associated with a potential risk of privacy violation of the participants, whose genomic sequence or clinical profile is used in these studies. In this article, we propose a method for secure sharing and computation on genomic data in a semi-honest cloud server. In particular, there are two main contributions. Firstly, the proposed method can handle biomedical data containing both genotype and phenotype. Secondly, our proposed index tree scheme reduces the computational overhead significantly for executing secure count query operation. In our proposed method, the confidentiality of shared data is ensured through encryption, while making the entire computation process efficient and scalable for cutting-edge biomedical applications. We evaluated our proposed method in terms of efficiency on a database of Single-Nucleotide Polymorphism (SNP) sequences, and experimental results demonstrate that the execution time for a query of 50 SNPs in a database of 50,000 records is approximately 5 s, where each record contains 500 SNPs. And, it requires 69.7 s to execute the query on the same database that also includes phenotypes. Copyright © 2018 Elsevier Inc. All rights reserved.
Instances of erroneous DNA barcoding of metazoan invertebrates: Are universal cox1 gene primers too "universal"?

PubMed

Mioduchowska, Monika; Czyż, Michał Jan; Gołdyn, Bartłomiej; Kur, Jarosław; Sell, Jerzy

2018-01-01

The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.
Hum-mPLoc: an ensemble classifier for large-scale human protein subcellular location prediction by incorporating samples with multiple sites.

PubMed

Shen, Hong-Bin; Chou, Kuo-Chen

2007-04-20

Proteins may simultaneously exist at, or move between, two or more different subcellular locations. Proteins with multiple locations or dynamic feature of this kind are particularly interesting because they may have some very special biological functions intriguing to investigators in both basic research and drug discovery. For instance, among the 6408 human protein entries that have experimentally observed subcellular location annotations in the Swiss-Prot database (version 50.7, released 19-Sept-2006), 973 ( approximately 15%) have multiple location sites. The number of total human protein entries (except those annotated with "fragment" or those with less than 50 amino acids) in the same database is 14,370, meaning a gap of (14,370-6408)=7962 entries for which no knowledge is available about their subcellular locations. Although one can use the computational approach to predict the desired information for the gap, so far all the existing methods for predicting human protein subcellular localization are limited in the case of single location site only. To overcome such a barrier, a new ensemble classifier, named Hum-mPLoc, was developed that can be used to deal with the case of multiple location sites as well. Hum-mPLoc is freely accessible to the public as a web server at http://202.120.37.186/bioinf/hum-multi. Meanwhile, for the convenience of people working in the relevant areas, Hum-mPLoc has been used to identify all human protein entries in the Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The large-scale results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Hum-mPLoc.xls". This file is available at the same website and will be updated twice a year to include new entries of human proteins and reflect the continuous development of Hum-mPLoc.
System hazards in managing laboratory test requests and results in primary care: medical protection database analysis and conceptual model.

PubMed

Bowie, Paul; Price, Julie; Hepworth, Neil; Dinwoodie, Mark; McKay, John

2015-11-27

To analyse a medical protection organisation's database to identify hazards related to general practice systems for ordering laboratory tests, managing test results and communicating test result outcomes to patients. To integrate these data with other published evidence sources to inform design of a systems-based conceptual model of related hazards. A retrospective database analysis. General practices in the UK and Ireland. 778 UK and Ireland general practices participating in a medical protection organisation's clinical risk self-assessment (CRSA) programme from January 2008 to December 2014. Proportion of practices with system risks; categorisation of identified hazards; most frequently occurring hazards; development of a conceptual model of hazards; and potential impacts on health, well-being and organisational performance. CRSA visits were undertaken to 778 UK and Ireland general practices of which a range of systems hazards were recorded across the laboratory test ordering and results management systems in 647 practices (83.2%). A total of 45 discrete hazard categories were identified with a mean of 3.6 per practice (SD=1.94). The most frequently occurring hazard was the inadequate process for matching test requests and results received (n=350, 54.1%). Of the 1604 instances where hazards were recorded, the most frequent was at the 'postanalytical test stage' (n=702, 43.8%), followed closely by 'communication outcomes issues' (n=628, 39.1%). Based on arguably the largest data set currently available on the subject matter, our study findings shed new light on the scale and nature of hazards related to test results handling systems, which can inform future efforts to research and improve the design and reliability of these systems. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/
A new database sub-system for grain-size analysis

NASA Astrophysics Data System (ADS)

Suckow, Axel

2013-04-01

Detailed grain-size analyses of large depth profiles for palaeoclimate studies create large amounts of data. For instance (Novothny et al., 2011) presented a depth profile of grain-size analyses with 2 cm resolution and a total depth of more than 15 m, where each sample was measured with 5 repetitions on a Beckman Coulter LS13320 with 116 channels. This adds up to a total of more than four million numbers. Such amounts of data are not easily post-processed by spreadsheets or standard software; also MS Access databases would face serious performance problems. The poster describes a database sub-system dedicated to grain-size analyses. It expands the LabData database and laboratory management system published by Suckow and Dumke (2001). This compatibility with a very flexible database system provides ease to import the grain-size data, as well as the overall infrastructure of also storing geographic context and the ability to organize content like comprising several samples into one set or project. It also allows easy export and direct plot generation of final data in MS Excel. The sub-system allows automated import of raw data from the Beckman Coulter LS13320 Laser Diffraction Particle Size Analyzer. During post processing MS Excel is used as a data display, but no number crunching is implemented in Excel. Raw grain size spectra can be exported and controlled as Number- Surface- and Volume-fractions, while single spectra can be locked for further post-processing. From the spectra the usual statistical values (i.e. mean, median) can be computed as well as fractions larger than a grain size, smaller than a grain size, fractions between any two grain sizes or any ratio of such values. These deduced values can be easily exported into Excel for one or more depth profiles. However, such a reprocessing for large amounts of data also allows new display possibilities: normally depth profiles of grain-size data are displayed only with summarized parameters like the clay content, sand content, etc., which always only displays part of the available information at each depth. Alternatively, full spectra were displayed at one depth. The new software now allows to display the whole grain-size spectrum at each depth in a three dimensional display. LabData and the grain-size subsystem are based on MS Access as front-end and MS SQL Server as back-end database systems. The SQL code for the data model, SQL server procedures and triggers and the MS Access basic code for the front end are public domain code, published under the GNU GPL license agreement and are available free of charge. References: Novothny, Á., Frechen, M., Horváth, E., Wacha, L., Rolf, C., 2011. Investigating the penultimate and last glacial cycles of the Sütt dating, high-resolution grain size, and magnetic susceptibility data. Quaternary International 234, 75-85. Suckow, A., Dumke, I., 2001. A database system for geochemical, isotope hydrological and geochronological laboratories. Radiocarbon 43, 325-337.
Methods and pitfalls in searching drug safety databases utilising the Medical Dictionary for Regulatory Activities (MedDRA).

PubMed

Brown, Elliot G

2003-01-01

The Medical Dictionary for Regulatory Activities (MedDRA) is a unified standard terminology for recording and reporting adverse drug event data. Its introduction is widely seen as a significant improvement on the previous situation, where a multitude of terminologies of widely varying scope and quality were in use. However, there are some complexities that may cause difficulties, and these will form the focus for this paper. Two methods of searching MedDRA-coded databases are described: searching based on term selection from all of MedDRA and searching based on terms in the safety database. There are several potential traps for the unwary in safety searches. There may be multiple locations of relevant terms within a system organ class (SOC) and lack of recognition of appropriate group terms; the user may think that group terms are more inclusive than is the case. MedDRA may distribute terms relevant to one medical condition across several primary SOCs. If the database supports the MedDRA model, it is possible to perform multiaxial searching: while this may help find terms that might have been missed, it is still necessary to consider the entire contents of the SOCs to find all relevant terms and there are many instances of incomplete secondary linkages. It is important to adjust for multiaxiality if data are presented using primary and secondary locations. Other sources for errors in searching are non-intuitive placement and the selection of terms as preferred terms (PTs) that may not be widely recognised. Some MedDRA rules could also result in errors in data retrieval if the individual is unaware of these: in particular, the lack of multiaxial linkages for the Investigations SOC, Social circumstances SOC and Surgical and medical procedures SOC and the requirement that a PT may only be present under one High Level Term (HLT) and one High Level Group Term (HLGT) within any single SOC. Special Search Categories (collections of PTs assembled from various SOCs by searching all of MedDRA) are limited by the small number available and by lack of clarity about criteria applied in their construction. Difficulties in database searching may be addressed by suitable user training and experience, and by central reporting of detected deficiencies in MedDRA. Other remedies may include regulatory guidance on implementation and use of MedDRA. Further systematic review of MedDRA is needed and generation of standardised searches that may be used 'off the shelf' will help, particularly where the same search is performed repeatedly on multiple data sets. Until these enhancements are widely available, MedDRA users should take great care when searching a safety database to ensure that cases are not inadvertently missed.
SIPHER: Scalable Implementation of Primitives for Homomorphic Encryption

DTIC Science & Technology

2015-11-01

595–618. 2009. [Ajt96] M. Ajtai. Generating hard instances of lattice problems. Quaderni di Matematica , 13:1–32, 2004. Preliminary version in STOC...1), pages 403–415. 2011. [Ajt96] M. Ajtai. Generating hard instances of lattice problems. Quaderni di Matematica , 13:1–32, 2004. Preliminary version...learning with errors. In ASIACRYPT. 2011. To appear. [Ajt96] M. Ajtai. Generating hard instances of lattice problems. Quaderni di Matematica , 13:1–32
On the importance of looking back: the role of recursive remindings in recency judgments and cued recall.

PubMed

Jacoby, Larry L; Wahlheim, Christopher N

2013-07-01

Suppose that you were asked which of two movies you had most recently seen. The results of the experiments reported here suggest that your answer would be more accurate if, when viewing the later movie, you were reminded of the earlier one. In the present experiments, we investigated the role of remindings in recency judgments and cued-recall performance. We did this by presenting a list composed of two instances from each of several different categories and later asking participants to select (Exp. 1) or to recall (Exp. 2) the more recently presented instance. Reminding was manipulated by varying instructions to look back over memory of earlier instances during the presentation of later instances. As compared to a control condition, cued-recall performance revealed facilitation effects when remindings occurred and were later recollected, but interference effects in their absence. The effects of reminding on recency judgments paralleled those on cued recall of more recently presented instances. We interpret these results as showing that reminding produces a recursive representation that embeds memory for an earlier-presented category instance into that of a later-presented one and, thereby, preserves their temporal order. Large individual differences in the probabilities of remindings and of their later recollection were observed. The widespread importance of recursive reminding for theory and for applied purposes is discussed.
Stretching

MedlinePlus

... your particular sport. For instance, if you play baseball, you might focus on your shoulder for throwing. ... your particular sport. For instance, if you play baseball you might focus on your shoulder for throwing ...
Multi-Instance Metric Transfer Learning for Genome-Wide Protein Function Prediction.

PubMed

Xu, Yonghui; Min, Huaqing; Wu, Qingyao; Song, Hengjie; Ye, Bicui

2017-02-06

Multi-Instance (MI) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with multiple instances. Many studies in this literature attempted to find an appropriate Multi-Instance Learning (MIL) method for genome-wide protein function prediction under a usual assumption, the underlying distribution from testing data (target domain, i.e., TD) is the same as that from training data (source domain, i.e., SD). However, this assumption may be violated in real practice. To tackle this problem, in this paper, we propose a Multi-Instance Metric Transfer Learning (MIMTL) approach for genome-wide protein function prediction. In MIMTL, we first transfer the source domain distribution to the target domain distribution by utilizing the bag weights. Then, we construct a distance metric learning method with the reweighted bags. At last, we develop an alternative optimization scheme for MIMTL. Comprehensive experimental evidence on seven real-world organisms verifies the effectiveness and efficiency of the proposed MIMTL approach over several state-of-the-art methods.
Quantum computing. Defining and detecting quantum speedup.

PubMed

Rønnow, Troels F; Wang, Zhihui; Job, Joshua; Boixo, Sergio; Isakov, Sergei V; Wecker, David; Martinis, John M; Lidar, Daniel A; Troyer, Matthias

2014-07-25

The development of small-scale quantum devices raises the question of how to fairly assess and detect quantum speedup. Here, we show how to define and measure quantum speedup and how to avoid pitfalls that might mask or fake such a speedup. We illustrate our discussion with data from tests run on a D-Wave Two device with up to 503 qubits. By using random spin glass instances as a benchmark, we found no evidence of quantum speedup when the entire data set is considered and obtained inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results do not rule out the possibility of speedup for other classes of problems and illustrate the subtle nature of the quantum speedup question. Copyright © 2014, American Association for the Advancement of Science.
Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

DOEpatents

Chew, Peter A; Bader, Brett W

2012-10-16

A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
On the Complexity of the Metric TSP under Stability Considerations

NASA Astrophysics Data System (ADS)

Mihalák, Matúš; Schöngens, Marcel; Šrámek, Rastislav; Widmayer, Peter

We consider the metric Traveling Salesman Problem (Δ-TSP for short) and study how stability (as defined by Bilu and Linial [3]) influences the complexity of the problem. On an intuitive level, an instance of Δ-TSP is γ-stable (γ> 1), if there is a unique optimum Hamiltonian tour and any perturbation of arbitrary edge weights by at most γ does not change the edge set of the optimal solution (i.e., there is a significant gap between the optimum tour and all other tours). We show that for γ ≥ 1.8 a simple greedy algorithm (resembling Prim's algorithm for constructing a minimum spanning tree) computes the optimum Hamiltonian tour for every γ-stable instance of the Δ-TSP, whereas a simple local search algorithm can fail to find the optimum even if γ is arbitrary. We further show that there are γ-stable instances of Δ-TSP for every 1 < γ< 2. These results provide a different view on the hardness of the Δ-TSP and give rise to a new class of problem instances which are substantially easier to solve than instances of the general Δ-TSP.
Multistrategy learning: A case study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Domingos, P.

1996-12-31

Two of the most popular approaches to induction are instance-based learning (IBL) and rule generation. Their strengths and weaknesses are largely complementary. IBL methods are able to identify small details in the instance space, but have trouble with attributes that are relevant in some parts of the space but not others. Conversely, rule induction methods may overlook small exception regions, but are able to select different attributes in different parts of the instance space. The two methods have been unified in the RISE algorithm. RISE views instances as maximally specific rules, forms more general rules by gradually clustering instances ofmore » the same class, and classifies a test example by letting the nearest rule win. This approach potentially combines the advantages of rule induction and IBL, and has indeed been observed to be more accurate than each on a large number of bench-mark datasets. However, it is important to determine if this performance is indeed due to the hypothesized advantages, and to define the situations in which RISE`s bias will and will not be preferable to those of the individual approaches. This abstract reports experiments to this end in artificial domains.« less
Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets.

PubMed

Huang, Min-Wei; Lin, Wei-Chao; Tsai, Chih-Fong

2018-01-01

Many real-world medical datasets contain some proportion of missing (attribute) values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete) observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.
Legacy2Drupal: Conversion of an existing relational oceanographic database to a Drupal 7 CMS

NASA Astrophysics Data System (ADS)

Work, T. T.; Maffei, A. R.; Chandler, C. L.; Groman, R. C.

2011-12-01

Content Management Systems (CMSs) such as Drupal provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have already designed and implemented customized schemas for their metadata. The NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) has ported an existing relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal 7 Content Management System. This is an update on an effort described as a proof-of-concept in poster IN21B-1051, presented at AGU2009. The BCO-DMO project has translated all the existing database tables, input forms, website reports, and other features present in the existing system into Drupal CMS features. The replacement features are made possible by the use of Drupal content types, CCK node-reference fields, a custom theme, and a number of other supporting modules. This presentation describes the process used to migrate content in the original BCO-DMO metadata database to Drupal 7, some problems encountered during migration, and the modules used to migrate the content successfully. Strategic use of Drupal 7 CMS features that enable three separate but complementary interfaces to provide access to oceanographic research metadata will also be covered: 1) a Drupal 7-powered user front-end; 2) REST-ful JSON web services (providing a Mapserver interface to the metadata and data; and 3) a SPARQL interface to a semantic representation of the repository metadata (this feeding a new faceted search capability currently under development). The existing BCO-DMO ontology, developed in collaboration with Rensselaer Polytechnic Institute's Tetherless World Constellation, makes strategic use of pre-existing ontologies and will be used to drive semantically-enabled faceted search capabilities planned for the site. At this point, the use of semantic technologies included in the Drupal 7 core is anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 7 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO and other science data repositories for interoperability between systems that serve ecosystem research data.
Functional & phylogenetic diversity of copepod communities

NASA Astrophysics Data System (ADS)

Benedetti, F.; Ayata, S. D.; Blanco-Bercial, L.; Cornils, A.; Guilhaumon, F.

2016-02-01

The diversity of natural communities is classically estimated through species identification (taxonomic diversity) but can also be estimated from the ecological functions performed by the species (functional diversity), or from the phylogenetic relationships among them (phylogenetic diversity). Estimating functional diversity requires the definition of specific functional traits, i.e., phenotypic characteristics that impact fitness and are relevant to ecosystem functioning. Estimating phylogenetic diversity requires the description of phylogenetic relationships, for instance by using molecular tools. In the present study, we focused on the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. First, we implemented a specific trait database for the most commonly-sampled and abundant copepod species of the Mediterranean Sea. Our database includes 191 species, described by seven traits encompassing diverse ecological functions: minimal and maximal body length, trophic group, feeding type, spawning strategy, diel vertical migration and vertical habitat. Clustering analysis in the functional trait space revealed that Mediterranean copepods can be gathered into groups that have different ecological roles. Second, we reconstructed a phylogenetic tree using the available sequences of 18S rRNA. Our tree included 154 of the analyzed Mediterranean copepod species. We used these two datasets to describe the functional and phylogenetic diversity of copepod surface communities in the Mediterranean Sea. The replacement component (turn-over) and the species richness difference component (nestedness) of the beta diversity indices were identified. Finally, by comparing various and complementary aspects of plankton diversity (taxonomic, functional, and phylogenetic diversity) we were able to gain a better understanding of the relationships among the zooplankton community, biodiversity, ecosystem function, and environmental forcing.
Evaluation Methodology for UML and GML Application Schemas Quality

NASA Astrophysics Data System (ADS)

Chojka, Agnieszka

2014-05-01

INSPIRE Directive implementation in Poland has caused the significant increase of interest in making spatial data and services available, particularly among public administration and private institutions. This entailed a series of initiatives that aim to harmonise different spatial data sets, so to ensure their internal logical and semantic coherence. Harmonisation lets to reach the interoperability of spatial databases, then among other things enables joining them together. The process of harmonisation requires either working out new data structures or adjusting existing data structures of spatial databases to INSPIRE guidelines and recommendations. Data structures are described with the use of UML and GML application schemas. Although working out accurate and correct application schemas isn't an easy task. There should be considered many issues, for instance recommendations of ISO 19100 series of Geographic Information Standards, appropriate regulations for given problem or topic, production opportunities and limitations (software, tools). In addition, GML application schema is deeply connected with UML application schema, it should be its translation. Not everything that can be expressed in UML, though can be directly expressed in GML, and this can have significant influence on the spatial data sets interoperability, and thereby the ability to valid data exchange. For these reasons, the capability to examine and estimate UML and GML application schemas quality, therein also the capability to explore their entropy, would be very important. The principal subject of this research is to propose an evaluation methodology for UML and GML application schemas quality prepared in the Head Office of Geodesy and Cartography in Poland within the INSPIRE Directive implementation works.
Determinants of seat belt use behaviour: a protocol for a systematic review

PubMed Central

Ghaffari, Mohtasham; Armoon, Bahram; Rakhshanderou, Sakineh; Mehrabi, Yadollah; Soori, Hamid; Simsekoghlu, Ozelem; Harooni, Javad

2018-01-01

Introduction The use of seat belts could prevent severe collision damage to people in vehicle accidents and keep passengers safe from sustaining serious injuries; for instance, it could prevent passengers from being thrown out of a vehicle after the collision. The current systematic review will identify and analyse the determinants of seat belt use behaviour. Methods and analysis We will include qualitative, quantitative and mixed methods studies reporting the acquired data from passengers aged more than 12 years and drivers, from both commercial and personal vehicles. Online databases including MEDLINE/PubMed, Scopus, Web of Science, Embase, Cochrane Database of Systematic Reviews and PsycINFO will be investigated in the current study. Published and available articles will be evaluated according to their titles and abstracts. Published papers conforming to the inclusion criteria will be organised for a complete review. Next, the full text of the remaining articles will be studied independently for eligibility by two authors. The quality of the selected studies will be assessed with appropriate tools. Based on the information obtained from the data extraction, the type of determinants of seat belt use will be classified. Ethics and dissemination Ethics approval is not required, because this is a protocol for a systematic review and no primary data will be collected. The authors will ensure to maintain the rights of the used and included articles in the present systematic review. The findings of this review will be published in a relevant peer-reviewed journal. PROSPERO registration number CRD42017067511. PMID:29724739
Quality Analysis of Open Street Map Data

NASA Astrophysics Data System (ADS)

Wang, M.; Li, Q.; Hu, Q.; Zhou, M.

2013-05-01

Crowd sourcing geographic data is an opensource geographic data which is contributed by lots of non-professionals and provided to the public. The typical crowd sourcing geographic data contains GPS track data like OpenStreetMap, collaborative map data like Wikimapia, social websites like Twitter and Facebook, POI signed by Jiepang user and so on. These data will provide canonical geographic information for pubic after treatment. As compared with conventional geographic data collection and update method, the crowd sourcing geographic data from the non-professional has characteristics or advantages of large data volume, high currency, abundance information and low cost and becomes a research hotspot of international geographic information science in the recent years. Large volume crowd sourcing geographic data with high currency provides a new solution for geospatial database updating while it need to solve the quality problem of crowd sourcing geographic data obtained from the non-professionals. In this paper, a quality analysis model for OpenStreetMap crowd sourcing geographic data is proposed. Firstly, a quality analysis framework is designed based on data characteristic analysis of OSM data. Secondly, a quality assessment model for OSM data by three different quality elements: completeness, thematic accuracy and positional accuracy is presented. Finally, take the OSM data of Wuhan for instance, the paper analyses and assesses the quality of OSM data with 2011 version of navigation map for reference. The result shows that the high-level roads and urban traffic network of OSM data has a high positional accuracy and completeness so that these OSM data can be used for updating of urban road network database.

When Overweight Is the Normal Weight: An Examination of Obesity Using a Social Media Internet Database

PubMed Central

Kuebler, Meghan; Yom-Tov, Elad; Pelleg, Dan; Puhl, Rebecca M.; Muennig, Peter

2013-01-01

Using a large social media database, Yahoo Answers, we explored postings to an online forum in which posters asked whether their height and weight qualify themselves as “skinny,” “thin,” “fat,” or “obese” over time and across forum topics. We used these data to better understand whether a higher-than-average body mass index (BMI) in one’s county might, in some ways, be protective for one’s mental and physical health. For instance, we explored whether higher proportions of obese people in one’s county predicts lower levels of bullying or “am I fat?” questions from those with a normal BMI relative to his/her actual BMI. Most women asking whether they were themselves fat/obese were not actually fat/obese. Both men and women who were actually overweight/obese were significantly more likely in the future to ask for advice about bullying than thinner individuals. Moreover, as mean county-level BMI increased, bullying decreased and then increased again (in a U-shape curve). Regardless of where they lived, posters who asked “am I fat?” who had a BMI in the healthy range were more likely than other posters to subsequently post on health problems, but the proportions of such posters also declined greatly as county-level BMI increased. Our findings suggest that obese people residing in counties with higher levels of BMI may have better physical and mental health than obese people living in counties with lower levels of BMI by some measures, but these improvements are modest. PMID:24058478
Web Application Software for Ground Operations Planning Database (GOPDb) Management

NASA Technical Reports Server (NTRS)

Lanham, Clifton; Kallner, Shawn; Gernand, Jeffrey

2013-01-01

A Web application facilitates collaborative development of the ground operations planning document. This will reduce costs and development time for new programs by incorporating the data governance, access control, and revision tracking of the ground operations planning data. Ground Operations Planning requires the creation and maintenance of detailed timelines and documentation. The GOPDb Web application was created using state-of-the-art Web 2.0 technologies, and was deployed as SaaS (Software as a Service), with an emphasis on data governance and security needs. Application access is managed using two-factor authentication, with data write permissions tied to user roles and responsibilities. Multiple instances of the application can be deployed on a Web server to meet the robust needs for multiple, future programs with minimal additional cost. This innovation features high availability and scalability, with no additional software that needs to be bought or installed. For data governance and security (data quality, management, business process management, and risk management for data handling), the software uses NAMS. No local copy/cloning of data is permitted. Data change log/tracking is addressed, as well as collaboration, work flow, and process standardization. The software provides on-line documentation and detailed Web-based help. There are multiple ways that this software can be deployed on a Web server to meet ground operations planning needs for future programs. The software could be used to support commercial crew ground operations planning, as well as commercial payload/satellite ground operations planning. The application source code and database schema are owned by NASA.
New drug candidates for liposomal delivery identified by computer modeling of liposomes' remote loading and leakage.

PubMed

Cern, Ahuva; Marcus, David; Tropsha, Alexander; Barenholz, Yechezkel; Goldblum, Amiram

2017-04-28

Remote drug loading into nano-liposomes is in most cases the best method for achieving high concentrations of active pharmaceutical ingredients (API) per nano-liposome that enable therapeutically viable API-loaded nano-liposomes, referred to as nano-drugs. This approach also enables controlled drug release. Recently, we constructed computational models to identify APIs that can achieve the desired high concentrations in nano-liposomes by remote loading. While those previous models included a broad spectrum of experimental conditions and dealt only with loading, here we reduced the scope to the molecular characteristics alone. We model and predict API suitability for nano-liposomal delivery by fixing the main experimental conditions: liposome lipid composition and size to be similar to those of Doxil® liposomes. On that basis, we add a prediction of drug leakage from the nano-liposomes during storage. The latter is critical for having pharmaceutically viable nano-drugs. The "load and leak" models were used to screen two large molecular databases in search of candidate APIs for delivery by nano-liposomes. The distribution of positive instances in both loading and leakage models was similar in the two databases screened. The screening process identified 667 molecules that were positives by both loading and leakage models (i.e., both high-loading and stable). Among them, 318 molecules received a high score in both properties and of these, 67 are FDA-approved drugs. This group of molecules, having diverse pharmacological activities, may be the basis for future liposomal drug development. Copyright © 2017 Elsevier B.V. All rights reserved.
netherland hydrological modeling instrument

NASA Astrophysics Data System (ADS)

Hoogewoud, J. C.; de Lange, W. J.; Veldhuizen, A.; Prinsen, G.

2012-04-01

Netherlands Hydrological Modeling Instrument A decision support system for water basin management. J.C. Hoogewoud , W.J. de Lange ,A. Veldhuizen , G. Prinsen , The Netherlands Hydrological modeling Instrument (NHI) is the center point of a framework of models, to coherently model the hydrological system and the multitude of functions it supports. Dutch hydrological institutes Deltares, Alterra, Netherlands Environmental Assessment Agency, RWS Waterdienst, STOWA and Vewin are cooperating in enhancing the NHI for adequate decision support. The instrument is used by three different ministries involved in national water policy matters, for instance the WFD, drought management, manure policy and climate change issues. The basis of the modeling instrument is a state-of-the-art on-line coupling of the groundwater system (MODFLOW), the unsaturated zone (metaSWAP) and the surface water system (MOZART-DM). It brings together hydro(geo)logical processes from the column to the basin scale, ranging from 250x250m plots to the river Rhine and includes salt water flow. The NHI is validated with an eight year run (1998-2006) with dry and wet periods. For this run different parts of the hydrology have been compared with measurements. For instance, water demands in dry periods (e.g. for irrigation), discharges at outlets, groundwater levels and evaporation. A validation alone is not enough to get support from stakeholders. Involvement from stakeholders in the modeling process is needed. There fore to gain sufficient support and trust in the instrument on different (policy) levels a couple of actions have been taken: 1. a transparent evaluation of modeling-results has been set up 2. an extensive program is running to cooperate with regional waterboards and suppliers of drinking water in improving the NHI 3. sharing (hydrological) data via newly setup Modeling Database for local and national models 4. Enhancing the NHI with "local" information. The NHI is and has been used for many decision supports and evaluations. The main focus of the instrument is operational drought management and evaluating adaptive measures for different climate scenario's. It has also been used though as a basis to evaluate water quality of WFD-water bodies and measures, nutrient-leaching and describing WFD groundwater bodies. There is a toolkit to translate the hydrological NHI results to values for different water users. For instance with the NHI results agricultural yields can be calculated, effects on ground water dependant ecosystems, subsidence, shipping, drinking water supply. This makes NHI a valuable decision support system in Dutch water management.
Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions

PubMed Central

Chica, Claudia; Diella, Francesca; Gibson, Toby J.

2009-01-01

Background Linear motifs are short modules of protein sequences that play a crucial role in mediating and regulating many protein–protein interactions. The function of linear motifs strongly depends on the context, e.g. functional instances mainly occur inside flexible regions that are accessible for interaction. Sometimes linear motifs appear as isolated islands of conservation in multiple sequence alignments. However, they also occur in larger blocks of sequence conservation, suggesting an active role for the neighbouring amino acids. Results The evolution of regions flanking 116 functional linear motif instances was studied. The conservation of the amino acid sequence and order/disorder tendency of those regions was related to presence/absence of the instance. For the majority of the analysed instances, the pairs of sequences conserving the linear motif were also observed to maintain a similar local structural tendency and/or to have higher local sequence conservation when compared to pairs of sequences where one is missing the linear motif. Furthermore, those instances have a higher chance to co–evolve with the neighbouring residues in comparison to the distant ones. Those findings are supported by examples where the regulation of the linear motif–mediated interaction has been shown to depend on the modifications (e.g. phosphorylation) at neighbouring positions or is thought to benefit from the binding versatility of disordered regions. Conclusion The results suggest that flanking regions are relevant for linear motif–mediated interactions, both at the structural and sequence level. More interestingly, they indicate that the prediction of linear motif instances can be enriched with contextual information by performing a sequence analysis similar to the one presented here. This can facilitate the understanding of the role of these predicted instances in determining the protein function inside the broader context of the cellular network where they arise. PMID:19584925
49 CFR 33.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2014 CFR

2014-10-01

... AND ALLOCATION SYSTEM Special Priorities Assistance § 33.44 Instances where assistance may not be... attempting to: (a) Secure a price advantage; (b) Obtain delivery prior to the time required to fill a rated...
49 CFR 33.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2012 CFR

2012-10-01

... AND ALLOCATION SYSTEM Special Priorities Assistance § 33.44 Instances where assistance may not be... attempting to: (a) Secure a price advantage; (b) Obtain delivery prior to the time required to fill a rated...
49 CFR 33.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2013 CFR

2013-10-01

... AND ALLOCATION SYSTEM Special Priorities Assistance § 33.44 Instances where assistance may not be... attempting to: (a) Secure a price advantage; (b) Obtain delivery prior to the time required to fill a rated...
Automatic Generation of Heuristics for Scheduling

NASA Technical Reports Server (NTRS)

Morris, Robert A.; Bresina, John L.; Rodgers, Stuart M.

1997-01-01

This paper presents a technique, called GenH, that automatically generates search heuristics for scheduling problems. The impetus for developing this technique is the growing consensus that heuristics encode advice that is, at best, useful in solving most, or typical, problem instances, and, at worst, useful in solving only a narrowly defined set of instances. In either case, heuristic problem solvers, to be broadly applicable, should have a means of automatically adjusting to the idiosyncrasies of each problem instance. GenH generates a search heuristic for a given problem instance by hill-climbing in the space of possible multi-attribute heuristics, where the evaluation of a candidate heuristic is based on the quality of the solution found under its guidance. We present empirical results obtained by applying GenH to the real world problem of telescope observation scheduling. These results demonstrate that GenH is a simple and effective way of improving the performance of an heuristic scheduler.
Multiple-instance ensemble learning for hyperspectral images

NASA Astrophysics Data System (ADS)

Ergul, Ugur; Bilgin, Gokhan

2017-10-01

An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
Integrating instance selection, instance weighting, and feature weighting for nearest neighbor classifiers by coevolutionary algorithms.

PubMed

Derrac, Joaquín; Triguero, Isaac; Garcia, Salvador; Herrera, Francisco

2012-10-01

Cooperative coevolution is a successful trend of evolutionary computation which allows us to define partitions of the domain of a given problem, or to integrate several related techniques into one, by the use of evolutionary algorithms. It is possible to apply it to the development of advanced classification methods, which integrate several machine learning techniques into a single proposal. A novel approach integrating instance selection, instance weighting, and feature weighting into the framework of a coevolutionary model is presented in this paper. We compare it with a wide range of evolutionary and nonevolutionary related methods, in order to show the benefits of the employment of coevolution to apply the techniques considered simultaneously. The results obtained, contrasted through nonparametric statistical tests, show that our proposal outperforms other methods in the comparison, thus becoming a suitable tool in the task of enhancing the nearest neighbor classifier.
Resource Planning for Massive Number of Process Instances

NASA Astrophysics Data System (ADS)

Xu, Jiajie; Liu, Chengfei; Zhao, Xiaohui

Resource allocation has been recognised as an important topic for business process execution. In this paper, we focus on planning resources for a massive number of process instances to meet the process requirements and cater for rational utilisation of resources before execution. After a motivating example, we present a model for planning resources for process instances. Then we design a set of heuristic rules that take both optimised planning at build time and instance dependencies at run time into account. Based on these rules we propose two strategies, one is called holistic and the other is called batched, for resource planning. Both strategies target a lower cost, however, the holistic strategy can achieve an earlier deadline while the batched strategy aims at rational use of resources. We discuss how to find balance between them in the paper with a comprehensive experimental study on these two approaches.
Boosting instance prototypes to detect local dermoscopic features.

PubMed

Situ, Ning; Yuan, Xiaojing; Zouridakis, George

2010-01-01

Local dermoscopic features are useful in many dermoscopic criteria for skin cancer detection. We address the problem of detecting local dermoscopic features from epiluminescence (ELM) microscopy skin lesion images. We formulate the recognition of local dermoscopic features as a multi-instance learning (MIL) problem. We employ the method of diverse density (DD) and evidence confidence (EC) function to convert MIL to a single-instance learning (SIL) problem. We apply Adaboost to improve the classification performance with support vector machines (SVMs) as the base classifier. We also propose to boost the selection of instance prototypes through changing the data weights in the DD function. We validate the methods on detecting ten local dermoscopic features from a dataset with 360 images. We compare the performance of the MIL approach, its boosting version, and a baseline method without using MIL. Our results show that boosting can provide performance improvement compared to the other two methods.
On solving three-dimensional open-dimension rectangular packing problems

NASA Astrophysics Data System (ADS)

Junqueira, Leonardo; Morabito, Reinaldo

2017-05-01

In this article, a recently proposed three-dimensional open-dimension rectangular packing problem is considered, in which the objective is to find a minimal volume rectangular container that packs a set of rectangular boxes. The literature has tackled small-sized instances of this problem by means of optimization solvers, position-free mixed-integer programming (MIP) formulations and piecewise linearization approaches. In this study, the problem is alternatively addressed by means of grid-based position MIP formulations, whereas still considering optimization solvers and the same piecewise linearization techniques. A comparison of the computational performance of both models is then presented, when tested with benchmark problem instances and with new instances, and it is shown that the grid-based position MIP formulation can be competitive, depending on the characteristics of the instances. The grid-based position MIP formulation is also embedded with real-world practical constraints, such as cargo stability, and results are additionally presented.
A Firefly-Inspired Method for Protein Structure Prediction in Lattice Models

PubMed Central

Maher, Brian; Albrecht, Andreas A.; Loomes, Martin; Yang, Xin-She; Steinhöfel, Kathleen

2014-01-01

We introduce a Firefly-inspired algorithmic approach for protein structure prediction over two different lattice models in three-dimensional space. In particular, we consider three-dimensional cubic and three-dimensional face-centred-cubic (FCC) lattices. The underlying energy models are the Hydrophobic-Polar (H-P) model, the Miyazawa–Jernigan (M-J) model and a related matrix model. The implementation of our approach is tested on ten H-P benchmark problems of a length of 48 and ten M-J benchmark problems of a length ranging from 48 until 61. The key complexity parameter we investigate is the total number of objective function evaluations required to achieve the optimum energy values for the H-P model or competitive results in comparison to published values for the M-J model. For H-P instances and cubic lattices, where data for comparison are available, we obtain an average speed-up over eight instances of 2.1, leaving out two extreme values (otherwise, 8.8). For six M-J instances, data for comparison are available for cubic lattices and runs with a population size of 100, where, a priori, the minimum free energy is a termination criterion. The average speed-up over four instances is 1.2 (leaving out two extreme values, otherwise 1.1), which is achieved for a population size of only eight instances. The present study is a test case with initial results for ad hoc parameter settings, with the aim of justifying future research on larger instances within lattice model settings, eventually leading to the ultimate goal of implementations for off-lattice models. PMID:24970205
A firefly-inspired method for protein structure prediction in lattice models.

PubMed

Maher, Brian; Albrecht, Andreas A; Loomes, Martin; Yang, Xin-She; Steinhöfel, Kathleen

2014-01-07

We introduce a Firefly-inspired algorithmic approach for protein structure prediction over two different lattice models in three-dimensional space. In particular, we consider three-dimensional cubic and three-dimensional face-centred-cubic (FCC) lattices. The underlying energy models are the Hydrophobic-Polar (H-P) model, the Miyazawa-Jernigan (M-J) model and a related matrix model. The implementation of our approach is tested on ten H-P benchmark problems of a length of 48 and ten M-J benchmark problems of a length ranging from 48 until 61. The key complexity parameter we investigate is the total number of objective function evaluations required to achieve the optimum energy values for the H-P model or competitive results in comparison to published values for the M-J model. For H-P instances and cubic lattices, where data for comparison are available, we obtain an average speed-up over eight instances of 2.1, leaving out two extreme values (otherwise, 8.8). For six M-J instances, data for comparison are available for cubic lattices and runs with a population size of 100, where, a priori, the minimum free energy is a termination criterion. The average speed-up over four instances is 1.2 (leaving out two extreme values, otherwise 1.1), which is achieved for a population size of only eight instances. The present study is a test case with initial results for ad hoc parameter settings, with the aim of justifying future research on larger instances within lattice model settings, eventually leading to the ultimate goal of implementations for off-lattice models.
75 FR 27318 - Privacy Act of 1974; System of Records; Correction

Federal Register 2010, 2011, 2012, 2013, 2014

2010-05-14

... that the system ID number was missing a zero in one instance (page 21251). This notice corrects that... a notice announcing its intent to amend an existing Privacy Act system of records. In one instance...
A Systematic Search for Short-term Variability of EGRET Sources

NASA Technical Reports Server (NTRS)

Wallace, P. M.; Griffis, N. J.; Bertsch, D. L.; Hartman, R. C.; Thompson, D. J.; Kniffen, D. A.; Bloom, S. D.

2000-01-01

The 3rd EGRET Catalog of High-energy Gamma-ray Sources contains 170 unidentified sources, and there is great interest in the nature of these sources. One means of determining source class is the study of flux variability on time scales of days; pulsars are believed to be stable on these time scales while blazers are known to be highly variable. In addition, previous work has demonstrated that 3EG J0241-6103 and 3EG J1837-0606 are candidates for a new gamma-ray source class. These sources near the Galactic plane display transient behavior but cannot be associated with any known blazers. Although, many instances of flaring AGN have been reported, the EGRET database has not been systematically searched for occurrences of short-timescale (approximately 1 day) variability. These considerations have led us to conduct a systematic search for short-term variability in EGRET data, covering all viewing periods through proposal cycle 4. Six 3EG catalog sources are reported here to display variability on short time scales; four of them are unidentified. In addition, three non-catalog variable sources are discussed.
Beyond passivity: Dependency as a risk factor for intimate partner violence.

PubMed

Kane, Fallon A; Bornstein, Robert F

2016-02-01

Interpersonal dependency in male perpetrators of intimate partner violence (IPV) is an understudied phenomenon but one that has noteworthy clinical implications. The present investigation used meta-analytic techniques to quantify the dependency-IPV link in all extant studies examining this relationship (n of studies = 17). Studies were gathered via an extensive literature search using relevant dependency/IPV search terms in the PsychInfo, Medline and Google Scholar databases. Results revealed a small but statistically significant relationship between dependency and perpetration of IPV in men (r = 0.150, Combined Z = 4.25, p < 0.0001), with the magnitude of the dependency-IPV link becoming stronger (r = 0.365, Combined Z = 6.00, p < 0.0001) when studies using measures of dependent personality disorder symptoms were omitted. Other moderators of the dependency-IPV effect size included IPV measure, type of sample and perpetrator age. These findings illuminate the underlying dynamics and interpersonal processes involved in some instances of IPV and may aid in understanding how to identify and treat male perpetrators of domestic violence. Copyright © 2015 John Wiley & Sons, Ltd.
The Ethics of Ironic Science in Its Search for Spoof.

PubMed

Ronagh, Maryam; Souder, Lawrence

2015-12-01

The goal of most scientific research published in peer-review journals is to discover and report the truth. However, the research record includes tongue-in-cheek papers written in the conventional form and style of a research paper. Although these papers were intended to be taken ironically, bibliographic database searches show that many have been subsequently cited as valid research, some in prestigious journals. We attempt to understand why so many readers cited such ironic science seriously. We draw from the literature on error propagation in research publication for ways categorize citations. We adopt the concept of irony from the fields of literary and rhetorical criticism to detect, characterize, and analyze the interpretations in the more than 60 published research papers that cite an instance of ironic science. We find a variety of interpretations: some citing authors interpret the research as valid and accept it, some contradict or reject it, and some acknowledge its ironic nature. We conclude that publishing ironic science in a research journal can lead to the same troubles posed by retracted research, and we recommend relevant changes to publication guidelines.

Exploration in free word association networks: models and experiment.

PubMed

Ludueña, Guillermo A; Behzad, Mehran Djalali; Gros, Claudius

2014-05-01

Free association is a task that requires a subject to express the first word to come to their mind when presented with a certain cue. It is a task which can be used to expose the basic mechanisms by which humans connect memories. In this work, we have made use of a publicly available database of free associations to model the exploration of the averaged network of associations using a statistical and the adaptive control of thought-rational (ACT-R) model. We performed, in addition, an online experiment asking participants to navigate the averaged network using their individual preferences for word associations. We have investigated the statistics of word repetitions in this guided association task. We find that the considered models mimic some of the statistical properties, viz the probability of word repetitions, the distance between repetitions and the distribution of association chain lengths, of the experiment, with the ACT-R model showing a particularly good fit to the experimental data for the more intricate properties as, for instance, the ratio of repetitions per length of association chains.
Cost estimation and analysis using the Sherpa Automated Mine Cost Engineering System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stebbins, P.E.

1993-09-01

The Sherpa Automated Mine Cost Engineering System is a menu-driven software package designed to estimate capital and operating costs for proposed surface mining operations. The program is engineering (as opposed to statistically) based, meaning that all equipment, manpower, and supply requirements are determined from deposit geology, project design and mine production information using standard engineering techniques. These requirements are used in conjunction with equipment, supply, and labor cost databases internal to the program to estimate all associated costs. Because virtually all on-site cost parameters are interrelated within the program, Sherpa provides an efficient means of examining the impact of changesmore » in the equipment mix on total capital and operating costs. If any aspect of the operation is changed, Sherpa immediately adjusts all related aspects as necessary. For instance, if the user wishes to examine the cost ramifications of selecting larger trucks, the program not only considers truck purchase and operation costs, it also automatically and immediately adjusts excavator requirements, operator and mechanic needs, repair facility size, haul road construction and maintenance costs, and ancillary equipment specifications.« less
Acting on observed social exclusion: Developmental perspectives on punishment of excluders and compensation of victims.

PubMed

Will, Geert-Jan; Crone, Eveline A; van den Bos, Wouter; Güroğlu, Berna

2013-12-01

This study examined punishment of excluders and compensation of victims after observing an instance of social exclusion at various phases of adolescent development. Participants (n = 183; age 9 to 22 years) were first included in a virtual ball-tossing game, Cyberball, and then observed the exclusion of a peer. Subsequently, they played economic games in which they divided money between themselves and the including players, the excluders, and the victim. The results demonstrate a gradual age-related increase in money given to the victim from age 9 to 22 and a gradual decrease in money allocated to the excluders from age 9 to 16 with an increase in 22-year-olds. Affective perspective-taking predicted both compensation of the victim and punishment of the excluders. Taken together these results show that across adolescence individuals sacrifice an increasingly bigger share of their own resources to punish excluders and to compensate victims and that taking the perspective of the victim enhances these decisions. PsycINFO Database Record (c) 2013 APA, all rights reserved.
A software communication tool for the tele-ICU.

PubMed

Pimintel, Denise M; Wei, Shang Heng; Odor, Alberto

2013-01-01

The Tele Intensive Care Unit (tele-ICU) supports a high volume, high acuity population of patients. There is a high-volume of incoming and outgoing calls, especially during the evening and night hours, through the tele-ICU hubs. The tele-ICU clinicians must be able to communicate effectively to team members in order to support the care of complex and critically ill patients while supporting and maintaining a standard to improve time to intervention. This study describes a software communication tool that will improve the time to intervention, over the paper-driven communication format presently used in the tele-ICU. The software provides a multi-relational database of message instances to mine information for evaluation and quality improvement for all entities that touch the tele-ICU. The software design incorporates years of critical care and software design experience combined with new skills acquired in an applied Health Informatics program. This software tool will function in the tele-ICU environment and perform as a front-end application that gathers, routes, and displays internal communication messages for intervention by priority and provider.
Hepatic Cryotherapy and Subsequent Hepatic Arterial Chemotherapy for Colorectal Metastases to the Liver

PubMed Central

Alwan, Majeed H.; Booth, Michael W. C.

1998-01-01

This paper presents an experience of thirty consecutive patients with hepatic colorectal metastases who were treated with hepatic cryotherapy and subsequent hepatic arterial infusion (HAI) chemotherapy using 5FU. Patients with colorectal metastases confined to the liver but not suitable for resection, and with liver involvement of less than 50% were offered the treatment. Prospective documentation of all patients was undertaken with data being recorded on a computerised database. Patients had a median of 6 (2–15) lesions with sizes ranging from 1–12 cm. There was no 30 day mortality. Postoperative complications developed in 8 patients but were followed by full recovery in all instances. Side effects from chemotherapy occured in 23% of cycles. Twenty seven patients have died. Median survival from the time of cryotherapy was 18.2 months (7–34), or 23months (9–44) from diagnosis of liver lesions. Hepatic cryotherapy with subsequent arterial chemotherapy is safe and well tolerated. The results suggest survival of patients with colorectal hepatic metastases can be improved by the use of this modality of treatment. PMID:9893239
A trans-Atlantic examination of haddock Melanogrammus aeglefinus food habits.

PubMed

Tam, J C; Link, J S; Large, S I; Bogstad, B; Bundy, A; Cook, A M; Dingsør, G E; Dolgov, A V; Howell, D; Kempf, A; Pinnegar, J K; Rindorf, A; Schückel, S; Sell, A F; Smith, B E

2016-06-01

The food habits of Melanogrammus aeglefinus were explored and contrasted across multiple north-eastern and north-western Atlantic Ocean ecosystems, using databases that span multiple decades. The results show that among all ecosystems, echinoderms are a consistent part of M. aeglefinus diet, but patterns emerge regarding where and when M. aeglefinus primarily eat fishes v. echinoderms. Melanogrammus aeglefinus does not regularly exhibit the increase in piscivory with ontogeny that other gadoids often show, and in several ecosystems there is a lower occurrence of piscivory. There is an apparent inverse relationship between the consumption of fishes and echinoderms in M. aeglefinus over time, where certain years show high levels of one prey item and low levels of the other. This apparent binary choice can be viewed as part of a gradient of prey options, contingent upon a suite of factors external to M. aeglefinus dynamics. The energetic consequences of this prey choice are discussed, noting that in some instances it may not be a choice at all. © 2016 The Fisheries Society of the British Isles.
Does the Evidence Make a Difference in Consumer Behavior? Sales of Supplements Before and After Publication of Negative Research Results

PubMed Central

Emanuel, Ezekiel J.; Miller, Franklin G.

2008-01-01

Objective To determine if the public consumption of herbs, vitamins, and supplements changes in light of emerging negative evidence. Methods We describe trends in annual US sales of five major supplements in temporal relationship with publication of research from three top US general medical journals published from 2001 through early 2006 and the number of news citations associated with each publication using the Lexus-Nexis database. Results In four of five supplements (St. John’s wort, echinacea, saw palmetto, and glucosamine), there was little or no change in sales trends after publication of research results. In one instance, however, dramatic changes in sales occurred following publication of data suggesting harm from high doses of vitamin E. Conclusion Results reporting harm may have a greater impact on supplement consumption than those demonstrating lack of efficacy. In order for clinical trial evidence to influence public behavior, there needs to be a better understanding of the factors that influence the translation of evidence in the public. PMID:18618194
Evolution and dynamics of shear-layer structures in near-wall turbulence

NASA Technical Reports Server (NTRS)

Johansson, Arne V.; Alfredsson, P. H.; Kim, John

1991-01-01

Near-wall flow structures in turbulent shear flows are analyzed, with particular emphasis on the study of their space-time evolution and connection to turbulence production. The results are obtained from investigation of a database generated from direct numerical simulation of turbulent channel flow at a Reynolds number of 180 based on half-channel width and friction velocity. New light is shed on problems associated with conditional sampling techniques, together with methods to improve these techniques, for use both in physical and numerical experiments. The results clearly indicate that earlier conceptual models of the processes associated with near-wall turbulence production, based on flow visualization and probe measurements need to be modified. For instance, the development of asymmetry in the spanwise direction seems to be an important element in the evolution of near-wall structures in general, and for shear layers in particular. The inhibition of spanwise motion of the near-wall streaky pattern may be the primary reason for the ability of small longitudinal riblets to reduce turbulent skin friction below the value for a flat surface.
A concept analysis of moral resilience.

PubMed

Young, Peter D; Rushton, Cynda Hylton

Nurses experience moral distress, which has led to emotional distress, frustration, anger, and nurse attrition. Overcoming moral distress has become a significant focus in nursing research. The continued focus on moral distress has not produced sustainable solutions within the nursing profession. Since positive language may alter the outcomes of morally distressing situations, we look to better understand one such positive phrase, moral resilience. We explored moral resilience through a literature search using 11 databases to identify instances of the phrase. Occurrences of moral resilience were then divided into three distinct categories: antecedents, attributes, and consequences, and following this, major themes within each category were identified. There is a dearth of scholarship on moral resilience, and additionally, there is currently no unifying definition. Despite this, our analysis offers promising direction in refining the concept. This concept analysis reveals differences in how moral resilience is understood. More conceptual work is needed to refine the definition of moral resilience and understand how the concept is useful in mitigating the negative consequences of moral distress and other types of moral adversity. Copyright © 2017 Elsevier Inc. All rights reserved.
Value-based medicine and ophthalmology: an appraisal of cost-utility analyses.

PubMed

Brown, Gary C; Brown, Melissa M; Sharma, Sanjay; Brown, Heidi; Smithen, Lindsay; Leeser, David B; Beauchamp, George

2004-01-01

To ascertain the extent to which ophthalmologic interventions have been evaluated in value-based medicine format. Retrospective literature review. Papers in the healthcare literature utilizing cost-utility analysis were reviewed by researchers at the Center for Value-Based Medicine, Flourtown, Pennsylvania. A literature review of papers addressing the cost-utility analysis of ophthalmologic procedures in the United States over a 12-year period from 1992 to 2003 was undertaken using the National Library of Medicine and EMBASE databases. The cost-utility of ophthalmologic interventions in inflation-adjusted (real) year 2003 US dollars expended per quality-adjusted life-year (dollars/QALY) was ascertained in all instances. A total of 19 papers were found, including a total of 25 interventions. The median cost-utility of ophthalmologic interventions was 5,219 dollars/QALY, with a range from 746 dollars/QALY to 6.5 million dollars/QALY. The majority of ophthalmologic interventions are especially cost-effective by conventional standards. This is because of the substantial value that ophthalmologic interventions confer to patients with eye diseases for the resources expended.
D Modelling and Visualization Based on the Unity Game Engine - Advantages and Challenges

NASA Astrophysics Data System (ADS)

Buyuksalih, I.; Bayburt, S.; Buyuksalih, G.; Baskaraca, A. P.; Karim, H.; Rahman, A. A.

2017-11-01

3D City modelling is increasingly popular and becoming valuable tools in managing big cities. Urban and energy planning, landscape, noise-sewage modelling, underground mapping and navigation are among the applications/fields which really depend on 3D modelling for their effectiveness operations. Several research areas and implementation projects had been carried out to provide the most reliable 3D data format for sharing and functionalities as well as visualization platform and analysis. For instance, BIMTAS company has recently completed a project to estimate potential solar energy on 3D buildings for the whole Istanbul and now focussing on 3D utility underground mapping for a pilot case study. The research and implementation standard on 3D City Model domain (3D data sharing and visualization schema) is based on CityGML schema version 2.0. However, there are some limitations and issues in implementation phase for large dataset. Most of the limitations were due to the visualization, database integration and analysis platform (Unity3D game engine) as highlighted in this paper.
Mobile tele-echography: user interface design.

PubMed

Cañero, Cristina; Thomos, Nikolaos; Triantafyllidis, George A; Litos, George C; Strintzis, Michael Gerassimos

2005-03-01

Ultrasound imaging allows the evaluation of the degree of emergency of a patient. However, in some instances, a well-trained sonographer is unavailable to perform such echography. To cope with this issue, the Mobile Tele-Echography Using an Ultralight Robot (OTELO) project aims to develop a fully integrated end-to-end mobile tele-echography system using an ultralight remote-controlled robot for population groups that are not served locally by medical experts. This paper focuses on the user interface of the OTELO system, consisting of the following parts: an ultrasound video transmission system providing real-time images of the scanned area, an audio/video conference to communicate with the paramedical assistant and with the patient, and a virtual-reality environment, providing visual and haptic feedback to the expert, while capturing the expert's hand movements. These movements are reproduced by the robot at the patient site while holding the ultrasound probe against the patient skin. In addition, the user interface includes an image processing facility for enhancing the received images and the possibility to include them into a database.
Spontaneous CRISPR loci generation in vivo by non-canonical spacer integration

PubMed Central

Nivala, Jeff; Shipman, Seth L.; Church, George M.

2018-01-01

The adaptation phase of CRISPR-Cas immunity depends on the precise integration of short segments of foreign DNA (spacers) into a specific genomic location within the CRISPR locus by the Cas1-Cas2 integration complex. Although off-target spacer integration outside of canonical CRISPR arrays has been described in vitro, no evidence of non-specific integration activity has been found in vivo. Here, we show that non-canonical off-target integrations can occur within bacterial chromosomes at locations that resemble the native CRISPR locus by characterizing hundreds of off-target integration locations within Escherichia coli. Considering whether such promiscuous Cas1-Cas2 activity could have an evolutionary role through the genesis of neo-CRISPR loci, we combed existing CRISPR databases and available genomes for evidence of off-target integration activity. This search uncovered several putative instances of naturally occurring off-target spacer integration events within the genomes of Yersinia pestis and Sulfolobus islandicus. These results are important in understanding alternative routes to CRISPR array genesis and evolution, as well as in the use of spacer acquisition in technological applications. PMID:29379209
FTIR spectroscopy supported by statistical techniques for the structural characterization of plastic debris in the marine environment: Application to monitoring studies.

PubMed

Mecozzi, Mauro; Pietroletti, Marco; Monakhova, Yulia B

2016-05-15

We inserted 190 FTIR spectra of plastic samples in a digital database and submitted it to Independent Component Analysis (ICA) to extract the "pure" plastic polymers present. These identified plastics were polypropylene (PP), high density polyethylene (HDPE), low density polyethylene (LDPE), high density polyethylene terephthalate (HDPET), low density polyethylene terephthalate (LDPET), polystyrene (PS), Nylon (NL), polyethylene oxide (OPE), and Teflon (TEF) and they were used to establish the similarity with unknown plastics using the correlation coefficient (r), and the crosscorrelation function (CC). For samples with r<0.8 we determined the Mahalanobis Distance (MD) as additional tool of identification. For instance, for the four plastic fragments found in the Carretta carretta, one plastic sample was assigned to OPE due to its r=0.87; for all the other three plastic samples, due to the r values ranging between 0.83 and0.70, the support of MD suggested LDPET and OPE as co-polymer constituents. Copyright © 2016 Elsevier Ltd. All rights reserved.
Detection of complex cyber attacks

NASA Astrophysics Data System (ADS)

Gregorio-de Souza, Ian; Berk, Vincent H.; Giani, Annarita; Bakos, George; Bates, Marion; Cybenko, George; Madory, Doug

2006-05-01

One significant drawback to currently available security products is their inabilty to correlate diverse sensor input. For instance, by only using network intrusion detection data, a root kit installed through a weak username-password combination may go unnoticed. Similarly, an administrator may never make the link between deteriorating response times from the database server and an attacker exfiltrating trusted data, if these facts aren't presented together. Current Security Information Management Systems (SIMS) can collect and represent diverse data but lack sufficient correlation algorithms. By using a Process Query System, we were able to quickly bring together data flowing from many sources, including NIDS, HIDS, server logs, CPU load and memory usage, etc. We constructed PQS models that describe dynamic behavior of complicated attacks and failures, allowing us to detect and differentiate simultaneous sophisticated attacks on a target network. In this paper, we discuss the benefits of implementing such a multistage cyber attack detection system using PQS. We focus on how data from multiple sources can be combined and used to detect and track comprehensive network security events that go unnoticed using conventional tools.
Women are underrepresented in computational biology: An analysis of the scholarly literature in biology, computer science and computational biology

PubMed Central

2017-01-01

While women are generally underrepresented in STEM fields, there are noticeable differences between fields. For instance, the gender ratio in biology is more balanced than in computer science. We were interested in how this difference is reflected in the interdisciplinary field of computational/quantitative biology. To this end, we examined the proportion of female authors in publications from the PubMed and arXiv databases. There are fewer female authors on research papers in computational biology, as compared to biology in general. This is true across authorship position, year, and journal impact factor. A comparison with arXiv shows that quantitative biology papers have a higher ratio of female authors than computer science papers, placing computational biology in between its two parent fields in terms of gender representation. Both in biology and in computational biology, a female last author increases the probability of other authors on the paper being female, pointing to a potential role of female PIs in influencing the gender balance. PMID:29023441
CDPP activities: Promoting research and education in space physics

NASA Astrophysics Data System (ADS)

Genot, V. N.; Andre, N.; Cecconi, B.; Gangloff, M.; Bouchemit, M.; Dufourg, N.; Pitout, F.; Budnik, E.; Lavraud, B.; Rouillard, A. P.; Heulet, D.; Bellucci, A.; Durand, J.; Delmas, D.; Alexandrova, O.; Briand, C.; Biegun, A.

2015-12-01

The French Plasma Physics Data Centre (CDPP, http://cdpp.eu/) addresses for more than 15 years all issues pertaining to natural plasma data distribution and valorization. Initially established by CNES and CNRS on the ground of a solid data archive, CDPP activities diversified with the advent of broader networks and interoperability standards, and through fruitful collaborations (e.g. with NASA/PDS): providing access to remote data, designing and building science driven analysis tools then became at the forefront of CDPP developments. For instance today AMDA helps scientists all over the world accessing and analyzing data from ancient to very recent missions (from Voyager, Galileo, Geotail, ... to Maven, Rosetta, MMS, ...) as well as results from models and numerical simulations. Other tools like the Propagation Tool or 3DView allow users to put their data in context and interconnect with other databases (CDAWeb, MEDOC) and tools (Topcat). This presentation will briefly review this evolution, show technical and science use cases, and finally put CDPP activities in the perspective of ongoing collaborative projects (Europlanet H2020, HELCATS, ...) and future missions (Bepicolombo, Solar Orbiter, ...).
VALUE-BASED MEDICINE AND OPHTHALMOLOGY: AN APPRAISAL OF COST-UTILITY ANALYSES

PubMed Central

Brown, Gary C; Brown, Melissa M; Sharma, Sanjay; Brown, Heidi; Smithen, Lindsay; Leeser, David B; Beauchamp, George

2004-01-01

ABSTRACT Purpose To ascertain the extent to which ophthalmologic interventions have been evaluated in value-based medicine format. Methods Retrospective literature review. Papers in the healthcare literature utilizing cost-utility analysis were reviewed by researchers at the Center for Value-Based Medicine, Flourtown, Pennsylvania. A literature review of papers addressing the cost-utility analysis of ophthalmologic procedures in the United States over a 12-year period from 1992 to 2003 was undertaken using the National Library of Medicine and EMBASE databases. The cost-utility of ophthalmologic interventions in inflation-adjusted (real) year 2003 US dollars expended per quality-adjusted life-year ($/QALY) was ascertained in all instances. Results A total of 19 papers were found, including a total of 25 interventions. The median cost-utility of ophthalmologic interventions was $5,219/QALY, with a range from $746/QALY to $6.5 million/QALY. Conclusions The majority of ophthalmologic interventions are especially cost-effective by conventional standards. This is because of the substantial value that ophthalmologic interventions confer to patients with eye diseases for the resources expended. PMID:15747756
New insights into diversification of hyper-heuristics.

PubMed

Ren, Zhilei; Jiang, He; Xuan, Jifeng; Hu, Yan; Luo, Zhongxuan

2014-10-01

There has been a growing research trend of applying hyper-heuristics for problem solving, due to their ability of balancing the intensification and the diversification with low level heuristics. Traditionally, the diversification mechanism is mostly realized by perturbing the incumbent solutions to escape from local optima. In this paper, we report our attempt toward providing a new diversification mechanism, which is based on the concept of instance perturbation. In contrast to existing approaches, the proposed mechanism achieves the diversification by perturbing the instance under solving, rather than the solutions. To tackle the challenge of incorporating instance perturbation into hyper-heuristics, we also design a new hyper-heuristic framework HIP-HOP (recursive acronym of HIP-HOP is an instance perturbation-based hyper-heuristic optimization procedure), which employs a grammar guided high level strategy to manipulate the low level heuristics. With the expressive power of the grammar, the constraints, such as the feasibility of the output solution could be easily satisfied. Numerical results and statistical tests over both the Ising spin glass problem and the p -median problem instances show that HIP-HOP is able to achieve promising performances. Furthermore, runtime distribution analysis reveals that, although being relatively slow at the beginning, HIP-HOP is able to achieve competitive solutions once given sufficient time.
Multi-instance multi-label distance metric learning for genome-wide protein function prediction.

PubMed

Xu, Yonghui; Min, Huaqing; Song, Hengjie; Wu, Qingyao

2016-08-01

Multi-instance multi-label (MIML) learning has been proven to be effective for the genome-wide protein function prediction problems where each training example is associated with not only multiple instances but also multiple class labels. To find an appropriate MIML learning method for genome-wide protein function prediction, many studies in the literature attempted to optimize objective functions in which dissimilarity between instances is measured using the Euclidean distance. But in many real applications, Euclidean distance may be unable to capture the intrinsic similarity/dissimilarity in feature space and label space. Unlike other previous approaches, in this paper, we propose to learn a multi-instance multi-label distance metric learning framework (MIMLDML) for genome-wide protein function prediction. Specifically, we learn a Mahalanobis distance to preserve and utilize the intrinsic geometric information of both feature space and label space for MIML learning. In addition, we try to deal with the sparsely labeled data by giving weight to the labeled data. Extensive experiments on seven real-world organisms covering the biological three-domain system (i.e., archaea, bacteria, and eukaryote; Woese et al., 1990) show that the MIMLDML algorithm is superior to most state-of-the-art MIML learning algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.

Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review

PubMed Central

Mercieca-Bebber, Rebecca; Palmer, Michael J; Brundage, Michael; Stockler, Martin R; King, Madeleine T

2016-01-01

Objectives Patient-reported outcomes (PROs) provide important information about the impact of treatment from the patients' perspective. However, missing PRO data may compromise the interpretability and value of the findings. We aimed to report: (1) a non-technical summary of problems caused by missing PRO data; and (2) a systematic review by collating strategies to: (A) minimise rates of missing PRO data, and (B) facilitate transparent interpretation and reporting of missing PRO data in clinical research. Our systematic review does not address statistical handling of missing PRO data. Data sources MEDLINE and Cumulative Index to Nursing and Allied Health Literature (CINAHL) databases (inception to 31 March 2015), and citing articles and reference lists from relevant sources. Eligibility criteria English articles providing recommendations for reducing missing PRO data rates, or strategies to facilitate transparent interpretation and reporting of missing PRO data were included. Methods 2 reviewers independently screened articles against eligibility criteria. Discrepancies were resolved with the research team. Recommendations were extracted and coded according to framework synthesis. Results 117 sources (55% discussion papers, 26% original research) met the eligibility criteria. Design and methodological strategies for reducing rates of missing PRO data included: incorporating PRO-specific information into the protocol; carefully designing PRO assessment schedules and defining termination rules; minimising patient burden; appointing a PRO coordinator; PRO-specific training for staff; ensuring PRO studies are adequately resourced; and continuous quality assurance. Strategies for transparent interpretation and reporting of missing PRO data include utilising auxiliary data to inform analysis; transparently reporting baseline PRO scores, rates and reasons for missing data; and methods for handling missing PRO data. Conclusions The instance of missing PRO data and its potential to bias clinical research can be minimised by implementing thoughtful design, rigorous methodology and transparent reporting strategies. All members of the research team have a responsibility in implementing such strategies. PMID:27311907
MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation.

PubMed

Cheerla, Nikhil; Gevaert, Olivier

2017-01-13

The current state-of-the-art in cancer diagnosis and treatment is not ideal; diagnostic tests are accurate but invasive, and treatments are "one-size fits-all" instead of being personalized. Recently, miRNA's have garnered significant attention as cancer biomarkers, owing to their ease of access (circulating miRNA in the blood) and stability. There have been many studies showing the effectiveness of miRNA data in diagnosing specific cancer types, but few studies explore the role of miRNA in predicting treatment outcome. Here we go a step further, using tissue miRNA and clinical data across 21 cancers from the 'The Cancer Genome Atlas' (TCGA) database. We use machine learning techniques to create an accurate pan-cancer diagnosis system, and a prediction model for treatment outcomes. Finally, using these models, we create a web-based tool that diagnoses cancer and recommends the best treatment options. We achieved 97.2% accuracy for classification using a support vector machine classifier with radial basis. The accuracies improved to 99.9-100% when climbing up the embryonic tree and classifying cancers at different stages. We define the accuracy as the ratio of the total number of instances correctly classified to the total instances. The classifier also performed well, achieving greater than 80% sensitivity for many cancer types on independent validation datasets. Many miRNAs selected by our feature selection algorithm had strong previous associations to various cancers and tumor progression. Then, using miRNA, clinical and treatment data and encoding it in a machine-learning readable format, we built a prognosis predictor model to predict the outcome of treatment with 85% accuracy. We used this model to create a tool that recommends personalized treatment regimens. Both the diagnosis and prognosis model, incorporating semi-supervised learning techniques to improve their accuracies with repeated use, were uploaded online for easy access. Our research is a step towards the final goal of diagnosing cancer and predicting treatment recommendations using non-invasive blood tests.
Arctic Ocean sea ice drift origin derived from artificial radionuclides.

PubMed

Cámara-Mor, P; Masqué, P; Garcia-Orellana, J; Cochran, J K; Mas, J L; Chamizo, E; Hanfland, C

2010-07-15

Since the 1950s, nuclear weapon testing and releases from the nuclear industry have introduced anthropogenic radionuclides into the sea, and in many instances their ultimate fate are the bottom sediments. The Arctic Ocean is one of the most polluted in this respect, because, in addition to global fallout, it is impacted by regional fallout from nuclear weapon testing, and indirectly by releases from nuclear reprocessing facilities and nuclear accidents. Sea-ice formed in the shallow continental shelves incorporate sediments with variable concentrations of anthropogenic radionuclides that are transported through the Arctic Ocean and are finally released in the melting areas. In this work, we present the results of anthropogenic radionuclide analyses of sea-ice sediments (SIS) collected on five cruises from different Arctic regions and combine them with a database including prior measurements of these radionuclides in SIS. The distribution of (137)Cs and (239,240)Pu activities and the (240)Pu/(239)Pu atom ratio in SIS showed geographical differences, in agreement with the two main sea ice drift patterns derived from the mean field of sea-ice motion, the Transpolar Drift and Beaufort Gyre, with the Fram Strait as the main ablation area. A direct comparison of data measured in SIS samples against those reported for the potential source regions permits identification of the regions from which sea ice incorporates sediments. The (240)Pu/(239)Pu atom ratio in SIS may be used to discern the origin of sea ice from the Kara-Laptev Sea and the Alaskan shelf. However, if the (240)Pu/(239)Pu atom ratio is similar to global fallout, it does not provide a unique diagnostic indicator of the source area, and in such cases, the source of SIS can be constrained with a combination of the (137)Cs and (239,240)Pu activities. Therefore, these anthropogenic radionuclides can be used in many instances to determine the geographical source area of sea-ice. Copyright 2010 Elsevier B.V. All rights reserved.
Comparative analysis of genomics and proteomics in Bacillus thuringiensis 4.0718.

PubMed

Rang, Jie; He, Hao; Wang, Ting; Ding, Xuezhi; Zuo, Mingxing; Quan, Meifang; Sun, Yunjun; Yu, Ziquan; Hu, Shengbiao; Xia, Liqiu

2015-01-01

Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for promoting the application of proteogenomics in the life sciences.
Computational framework to support integration of biomolecular and clinical data within a translational approach.

PubMed

Miyoshi, Newton Shydeo Brandão; Pinheiro, Daniel Guariz; Silva, Wilson Araújo; Felipe, Joaquim Cezar

2013-06-06

The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. We have implemented an extension of Chado - the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different "omics" technologies with patient's clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans.
The Evolving Private Military Sector: A Survey

DTIC Science & Technology

2009-08-05

enough evidence (for instance, the diversification of commercial security firms into the PM sector) and enough theory (for instance, institutional ... theory about how new fields are created out of old ones, how legitimacy is co-opted) to suggest this perspective might warrant further investigation
Supporting Information Linking and Discovery Across Organizations Using the VIVO Semantic Web Software Suite

NASA Astrophysics Data System (ADS)

Mayernik, M. S.; Daniels, M. D.; Maull, K. E.; Khan, H.; Krafft, D. B.; Gross, M. B.; Rowan, L. R.

2016-12-01

Geosciences research is often conducted using distributed networks of researchers and resources. To better enable the discovery of the research output from the scientists and resources used within these organizations, UCAR, Cornell University, and UNAVCO are collaborating on the EarthCollab (http://earthcube.org/group/earthcollab) project which seeks to leverage semantic technologies to manage and link scientific data. As part of this effort, we have been exploring how to leverage information distributed across multiple research organizations. EarthCollab is using the VIVO semantic software suite to lookup and display Semantic Web information across our project partners.Our presentation will include a demonstration of linking between VIVO instances, discussing how to create linkages between entities in different VIVO instances where both entities describe the same person or resource. This discussion will explore how we designate the equivalence of these entities using "same as" assertions between identifiers representing these entities including URIs and ORCID IDs and how we have extended the base VIVO architecture to support the lookup of which entities in separate VIVO instances may be equivalent and to then display information from external linked entities. We will also discuss how these extensions can support other linked data lookups and sources of information.This VIVO cross-linking mechanism helps bring information from multiple VIVO instances together and helps users in navigating information spread-out between multiple VIVO instances. Challenges and open questions for this approach relate to how to display the information obtained from an external VIVO instance, both in order to preserve the brands of the internal and external systems and to handle discrepancies between ontologies, content, and/or VIVO versions.
Bayes classifiers for imbalanced traffic accidents datasets.

PubMed

Mujalli, Randa Oqab; López, Griselda; Garach, Laura

2016-03-01

Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.
49 CFR 240.309 - Railroad oversight responsibilities.

Code of Federal Regulations, 2010 CFR

2010-10-01

... reported train accidents attributed to poor safety performance by locomotive engineers; (3) The number and... and analysis concerning the administration of its program for responding to detected instances of poor... analysis shall involve: (1) The number and nature of the instances of detected poor safety conduct...
Optimal Linking Design for Response Model Parameters

ERIC Educational Resources Information Center

Barrett, Michelle D.; van der Linden, Wim J.

2017-01-01

Linking functions adjust for differences between identifiability restrictions used in different instances of the estimation of item response model parameters. These adjustments are necessary when results from those instances are to be compared. As linking functions are derived from estimated item response model parameters, parameter estimation…
Category Representation for Classification and Feature Inference

ERIC Educational Resources Information Center

Johansen, Mark K.; Kruschke, John K.

2005-01-01

This research's purpose was to contrast the representations resulting from learning of the same categories by either classifying instances or inferring instance features. Prior inference learning research, particularly T. Yamauchi and A. B. Markman (1998), has suggested that feature inference learning fosters prototype representation, whereas…
A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.

PubMed

Hart, Emma; Sim, Kevin

2016-01-01

We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.
Multi-View Multi-Instance Learning Based on Joint Sparse Representation and Multi-View Dictionary Learning.

PubMed

Li, Bing; Yuan, Chunfeng; Xiong, Weihua; Hu, Weiming; Peng, Houwen; Ding, Xinmiao; Maybank, Steve

2017-12-01

In multi-instance learning (MIL), the relations among instances in a bag convey important contextual information in many applications. Previous studies on MIL either ignore such relations or simply model them with a fixed graph structure so that the overall performance inevitably degrades in complex environments. To address this problem, this paper proposes a novel multi-view multi-instance learning algorithm (MIL) that combines multiple context structures in a bag into a unified framework. The novel aspects are: (i) we propose a sparse -graph model that can generate different graphs with different parameters to represent various context relations in a bag, (ii) we propose a multi-view joint sparse representation that integrates these graphs into a unified framework for bag classification, and (iii) we propose a multi-view dictionary learning algorithm to obtain a multi-view graph dictionary that considers cues from all views simultaneously to improve the discrimination of the MIL. Experiments and analyses in many practical applications prove the effectiveness of the M IL.
Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

PubMed

Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

2017-12-01

It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.
Drug-related webpages classification based on multi-modal local decision fusion

NASA Astrophysics Data System (ADS)

Hu, Ruiguang; Su, Xiaojing; Liu, Yanxin

2018-03-01

In this paper, multi-modal local decision fusion is used for drug-related webpages classification. First, meaningful text are extracted through HTML parsing, and effective images are chosen by the FOCARSS algorithm. Second, six SVM classifiers are trained for six kinds of drug-taking instruments, which are represented by PHOG. One SVM classifier is trained for the cannabis, which is represented by the mid-feature of BOW model. For each instance in a webpage, seven SVMs give seven labels for its image, and other seven labels are given by searching the names of drug-taking instruments and cannabis in its related text. Concatenating seven labels of image and seven labels of text, the representation of those instances in webpages are generated. Last, Multi-Instance Learning is used to classify those drugrelated webpages. Experimental results demonstrate that the classification accuracy of multi-instance learning with multi-modal local decision fusion is much higher than those of single-modal classification.
Multiple-Instance Regression with Structured Data

NASA Technical Reports Server (NTRS)

Wagstaff, Kiri L.; Lane, Terran; Roper, Alex

2008-01-01

We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents. Unlike previous MIR methods, MI-ClusterRegress can operate on bags that are structured in that they contain items drawn from a number of distinct (but unknown) distributions. MI-ClusterRegress simultaneously learns a model of the bag's internal structure, the relevance of each item, and a regression model that accurately predicts labels for new bags. We evaluated this approach on the challenging MIR problem of crop yield prediction from remote sensing data. MI-ClusterRegress provided predictions that were more accurate than those obtained with non-multiple-instance approaches or MIR methods that do not model the bag structure.
Induced subgraph searching for geometric model fitting

NASA Astrophysics Data System (ADS)

Xiao, Fan; Xiao, Guobao; Yan, Yan; Wang, Xing; Wang, Hanzi

2017-11-01

In this paper, we propose a novel model fitting method based on graphs to fit and segment multiple-structure data. In the graph constructed on data, each model instance is represented as an induced subgraph. Following the idea of pursuing the maximum consensus, the multiple geometric model fitting problem is formulated as searching for a set of induced subgraphs including the maximum union set of vertices. After the generation and refinement of the induced subgraphs that represent the model hypotheses, the searching process is conducted on the "qualified" subgraphs. Multiple model instances can be simultaneously estimated by solving a converted problem. Then, we introduce the energy evaluation function to determine the number of model instances in data. The proposed method is able to effectively estimate the number and the parameters of model instances in data severely corrupted by outliers and noises. Experimental results on synthetic data and real images validate the favorable performance of the proposed method compared with several state-of-the-art fitting methods.
Configuring Airspace Sectors with Approximate Dynamic Programming

NASA Technical Reports Server (NTRS)

Bloem, Michael; Gupta, Pramod

2010-01-01

In response to changing traffic and staffing conditions, supervisors dynamically configure airspace sectors by assigning them to control positions. A finite horizon airspace sector configuration problem models this supervisor decision. The problem is to select an airspace configuration at each time step while considering a workload cost, a reconfiguration cost, and a constraint on the number of control positions at each time step. Three algorithms for this problem are proposed and evaluated: a myopic heuristic, an exact dynamic programming algorithm, and a rollouts approximate dynamic programming algorithm. On problem instances from current operations with only dozens of possible configurations, an exact dynamic programming solution gives the optimal cost value. The rollouts algorithm achieves costs within 2% of optimal for these instances, on average. For larger problem instances that are representative of future operations and have thousands of possible configurations, excessive computation time prohibits the use of exact dynamic programming. On such problem instances, the rollouts algorithm reduces the cost achieved by the heuristic by more than 15% on average with an acceptable computation time.
Systems engineering interfaces: A model based approach

NASA Astrophysics Data System (ADS)

Fosse, E.; Delp, C. L.

The engineering of interfaces is a critical function of the discipline of Systems Engineering. Included in interface engineering are instances of interaction. Interfaces provide the specifications of the relevant properties of a system or component that can be connected to other systems or components while instances of interaction are identified in order to specify the actual integration to other systems or components. Current Systems Engineering practices rely on a variety of documents and diagrams to describe interface specifications and instances of interaction. The SysML[1] specification provides a precise model based representation for interfaces and interface instance integration. This paper will describe interface engineering as implemented by the Operations Revitalization Task using SysML, starting with a generic case and culminating with a focus on a Flight System to Ground Interaction. The reusability of the interface engineering approach presented as well as its extensibility to more complex interfaces and interactions will be shown. Model-derived tables will support the case studies shown and are examples of model-based documentation products.
Active Learning by Querying Informative and Representative Examples.

PubMed

Huang, Sheng-Jun; Jin, Rong; Zhou, Zhi-Hua

2014-10-01

Active learning reduces the labeling cost by iteratively selecting the most valuable data to query their labels. It has attracted a lot of interests given the abundance of unlabeled data and the high cost of labeling. Most active learning approaches select either informative or representative unlabeled instances to query their labels, which could significantly limit their performance. Although several active learning algorithms were proposed to combine the two query selection criteria, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this limitation by developing a principled approach, termed QUIRE, based on the min-max view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an unlabeled instance. Further, by incorporating the correlation among labels, we extend the QUIRE approach to multi-label learning by actively querying instance-label pairs. Extensive experimental results show that the proposed QUIRE approach outperforms several state-of-the-art active learning approaches in both single-label and multi-label learning.

United Space Alliance, LLC Windchill As-Built Highlights

NASA Technical Reports Server (NTRS)

Richmond, Dena M.

2011-01-01

Work Order execution data from Solumina As Worked BOM (As-Built) in Solumina (Slide 2) sent to Windchill Windchill receives As Worked BOM data and 'marries' the Solumina execution data to the Part Instance/Product Structure/As-Design (Slide 3 As-Design and Slide 4 &5 As-Built) Manage the As-Built to As-Design delta's via the interface and Part Instance Attributes (Slide 6) including Work Order data and any CMcomments Produce an As-Built to As-Design report that also includes all Part Instance Attribute data (no screenshot as this is still in development) Utilize Windchill and Solumina 'Out of the Box' functionality
Learned Helplessness: Theory and Evidence

ERIC Educational Resources Information Center

Maier, Steven F.; Seligman, Martin E. P.

1976-01-01

Authors believes that three phenomena are all instances of "learned helplessness," instances in which an organism has learned that outcomes are uncontrollable by his responses and is seriously debilitated by this knowledge. This article explores the evidence for the phenomena of learned helplessness, and discussed a variety of theoretical…
INFORMS Section on Location Analysis Dissertation Award Submission

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waddell, Lucas

This research effort can be summarized by two main thrusts, each of which has a chapter of the dissertation dedicated to it. First, I pose a novel polyhedral approach for identifying polynomially solvable in- stances of the QAP based on an application of the reformulation-linearization technique (RLT), a general procedure for constructing mixed 0-1 linear reformulations of 0-1 pro- grams. The feasible region to the continuous relaxation of the level-1 RLT form is a polytope having a highly specialized structure. Every binary solution to the QAP is associated with an extreme point of this polytope, and the objective function valuemore » is preserved at each such point. However, there exist extreme points that do not correspond to binary solutions. The key insight is a previously unnoticed and unexpected relationship between the polyhedral structure of the continuous relaxation of the level-1 RLT representation and various classes of readily solvable instances. Specifically, we show that a variety of apparently unrelated solvable cases of the QAP can all be categorized in the following sense: each such case has an objective function which ensures that an optimal solution to the continuous relaxation of the level-1 RLT form occurs at a binary extreme point. Interestingly, there exist instances that are solvable by the level-1 RLT form which do not satisfy the conditions of these cases, so that the level-1 form theoretically identifies a richer family of solvable instances. Second, I focus on instances of the QAP known in the literature as linearizable. An instance of the QAP is defined to be linearizable if and only if the problem can be equivalently written as a linear assignment problem that preserves the objective function value at all feasible solutions. I provide an entirely new polyheral-based perspective on the concept of linearizable by showing that an instance of the QAP is linearizable if and only if a relaxed version of the continuous relaxation of the level-1 RLT form is bounded. We also shows that the level-1 RLT form can identify a richer family of solvable instances than those deemed linearizable by demonstrating that the continuous relaxation of the level-1 RLT form can have an optimal binary solution for instances that are not linearizable. As a byproduct, I use this theoretical framework to explicity, in closed form, characterize the dimensions of the level-1 RLT form and various other problem relaxations.« less
A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets.

PubMed

Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco

2017-09-01

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Exact and conceptual repetition dissociate conceptual memory tests: problems for transfer appropriate processing theory.

PubMed

McDermott, K B; Roediger, H L

1996-03-01

Three experiments examined whether a conceptual implicit memory test (specifically, category instance generation) would exhibit repetition effects similar to those found in free recall. The transfer appropriate processing account of dissociations among memory tests led us to predict that the tests would show parallel effects; this prediction was based upon the theory's assumption that conceptual tests will behave similarly as a function of various independent variables. In Experiment 1, conceptual repetition (i.e., following a target word [e.g., puzzles] with an associate [e.g., jigsaw]) did not enhance priming on the instance generation test relative to the condition of simply presenting the target word once, although this manipulation did affect free recall. In Experiment 2, conceptual repetition was achieved by following a picture with its corresponding word (or vice versa). In this case, there was an effect of conceptual repetition on free recall but no reliable effect on category instance generation or category cued recall. In addition, we obtained a picture superiority effect in free recall but not in category instance generation. In the third experiment, when the same study sequence was used as in Experiment 1, but with instructions that encouraged relational processing, priming on the category instance generation task was enhanced by conceptual repetition. Results demonstrate that conceptual memory tests can be dissociated and present problems for Roediger's (1990) transfer appropriate processing account of dissociations between explicit and implicit tests.
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

NASA Astrophysics Data System (ADS)

Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

2014-12-01

The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
Validation of temporal and spatial consistency of facility- and speed-specific vehicle-specific power distributions for emission estimation: A case study in Beijing, China.

PubMed

Zhai, Zhiqiang; Song, Guohua; Lu, Hongyu; He, Weinan; Yu, Lei

2017-09-01

Vehicle-specific power (VSP) has been found to be highly correlated with vehicle emissions. It is used in many studies on emission modeling such as the MOVES (Motor Vehicle Emissions Simulator) model. The existing studies develop specific VSP distributions (or OpMode distribution in MOVES) for different road types and various average speeds to represent the vehicle operating modes on road. However, it is still not clear if the facility- and speed-specific VSP distributions are consistent temporally and spatially. For instance, is it necessary to update periodically the database of the VSP distributions in the emission model? Are the VSP distributions developed in the city central business district (CBD) area applicable to its suburb area? In this context, this study examined the temporal and spatial consistency of the facility- and speed-specific VSP distributions in Beijing. The VSP distributions in different years and in different areas are developed, based on real-world vehicle activity data. The root mean square error (RMSE) is employed to quantify the difference between the VSP distributions. The maximum differences of the VSP distributions between different years and between different areas are approximately 20% of that between different road types. The analysis of the carbon dioxide (CO 2 ) emission factor indicates that the temporal and spatial differences of the VSP distributions have no significant impact on vehicle emission estimation, with relative error of less than 3%. The temporal and spatial differences have no significant impact on the development of the facility- and speed-specific VSP distributions for the vehicle emission estimation. The database of the specific VSP distributions in the VSP-based emission models can maintain in terms of time. Thus, it is unnecessary to update the database regularly, and it is reliable to use the history vehicle activity data to forecast the emissions in the future. In one city, the areas with less data can still develop accurate VSP distributions based on better data from other areas.
Data Collection, Collaboration, Analysis, and Publication Using the Open Data Repository's (ODR) Data Publisher

NASA Astrophysics Data System (ADS)

Lafuente, B.; Stone, N.; Bristow, T.; Keller, R. M.; Blake, D. F.; Downs, R. T.; Pires, A.; Dateo, C. E.; Fonda, M.

2017-12-01

In development for nearly four years, the Open Data Repository's (ODR) Data Publisher software has become a useful tool for researchers' data needs. Data Publisher facilitates the creation of customized databases with flexible permission sets that allow researchers to share data collaboratively while improving data discovery and maintaining ownership rights. The open source software provides an end-to-end solution from collection to final repository publication. A web-based interface allows researchers to enter data, view data, and conduct analysis using any programming language supported by JupyterHub (http://www.jupyterhub.org). This toolset makes it possible for a researcher to store and manipulate their data in the cloud from any internet capable device. Data can be embargoed in the system until a date selected by the researcher. For instance, open publication can be set to a date that coincides with publication of data analysis in a third party journal. In conjunction with teams at NASA Ames and the University of Arizona, a number of pilot studies are being conducted to guide the software development so that it allows them to publish and share their data. These pilots include (1) the Astrobiology Habitable Environments Database (AHED), a central searchable repository designed to promote and facilitate the integration and sharing of all the data generated by the diverse disciplines in astrobiology; (2) a database containing the raw and derived data products from the CheMin instrument on the MSL rover Curiosity (http://odr.io/CheMin), featuring a versatile graphing system, instructions and analytical tools to process the data, and a capability to download data in different formats; and (3) the Mineral Evolution project, which by correlating the diversity of mineral species with their ages, localities, and other measurable properties aims to understand how the episodes of planetary accretion and differentiation, plate tectonics, and origin of life lead to a selective evolution of mineral species through changes in temperature, pressure, and composition. Ongoing development will complete integration of third party meta-data standards and publishing data to the semantic web. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, MSL.
Extraterrestrial cold chemistry. A need for a specific database.

NASA Astrophysics Data System (ADS)

Pernot, P.; Carrasco, N.; Dobrijevic, M.; Hébrard, E.; Plessis, S.; Wakelam, V.

2008-09-01

The major resource databases for building chemical models for photochemistry in cold environments are mainly based on those designed for Earth atmospheric chemistry or combustion, in which reaction rates are reported for temperatures typically above 300 K [1,2]. Kinetic data measured at low temperatures are very sparse; for instance, in stateoftheart photochemical models of Titan atmosphere, less than 10% of the rates have been measured in the relevant temperature range (100200 K) [35]. In consequence, photochemical models rely mostly on lowT extrapolations by Arrheniustype laws. There is more and more evidence that this is often inappropriate [6], and low T extrapolations are hindered by very high uncertainty [3] (Fig.1). The predictions of models based on those extrapolations are expected to be very inaccurate [4,7]. We argue that there is not much sense in increasing the complexity of the present models as long as this predictivity issue has not been resolved. Fig. 1 Uncertainty of low temperature extrapolation for the N(2D) +C2H4 reaction rate, from measurements in the range 225 292 K [10], assuming an Arrhenius law (blue line). The sample of rate laws is generated by Monte Carlo uncertainty propagation after a Bayesian Data reAnalysis (BDA) of experimental data. A dialogue between modellers and experimentalists is necessary to improve this situation. Considering the heavy costs of low temperature reaction kinetics experiments, the identification of key reactions has to be based on an optimal strategy to improve the predictivity of photochemical models. This can be achieved by global sensitivity analysis, as illustrated on Titan atmospheric chemistry [8]. The main difficulty of this scheme is that it requires a lot of inputs, mainly the evaluation of uncertainty for extrapolated reaction rates. Although a large part has already been achieved by Hébrard et al. [3], extension and validation requires a group of experts. A new generation of collaborative kinetic database is needed to implement efficiently this scheme. The KIDA project [9], initiated by V. Wakelam for astrochemistry, has been joined by planetologists with similar prospects. EuroPlaNet will contribute to this effort through the organization of comities of experts on specific processes in atmospheric photochemistry.
Spatio-structural granularity of biological material entities

PubMed Central

2010-01-01

Background With the continuously increasing demands on knowledge- and data-management that databases have to meet, ontologies and the theories of granularity they use become more and more important. Unfortunately, currently used theories and schemes of granularity unnecessarily limit the performance of ontologies due to two shortcomings: (i) they do not allow the integration of multiple granularity perspectives into one granularity framework; (ii) they are not applicable to cumulative-constitutively organized material entities, which cover most of the biomedical material entities. Results The above mentioned shortcomings are responsible for the major inconsistencies in currently used spatio-structural granularity schemes. By using the Basic Formal Ontology (BFO) as a top-level ontology and Keet's general theory of granularity, a granularity framework is presented that is applicable to cumulative-constitutively organized material entities. It provides a scheme for granulating complex material entities into their constitutive and regional parts by integrating various compositional and spatial granularity perspectives. Within a scale dependent resolution perspective, it even allows distinguishing different types of representations of the same material entity. Within other scale dependent perspectives, which are based on specific types of measurements (e.g. weight, volume, etc.), the possibility of organizing instances of material entities independent of their parthood relations and only according to increasing measures is provided as well. All granularity perspectives are connected to one another through overcrossing granularity levels, together forming an integrated whole that uses the compositional object perspective as an integrating backbone. This granularity framework allows to consistently assign structural granularity values to all different types of material entities. Conclusions The here presented framework provides a spatio-structural granularity framework for all domain reference ontologies that model cumulative-constitutively organized material entities. With its multi-perspectives approach it allows querying an ontology stored in a database at one's own desired different levels of detail: The contents of a database can be organized according to diverse granularity perspectives, which in their turn provide different views on its content (i.e. data, knowledge), each organized into different levels of detail. PMID:20509878
Are your Spectroscopic Data Being Used?

NASA Astrophysics Data System (ADS)

Gordon, Iouli E.; Rothman, Laurence S.; Wilzewski, Jonas

2014-06-01

Spectroscopy is an established and indispensable tool in science, industry, agriculture, medicine, surveillance, etc.. The potential user of spectral data, which is not available in HITRAN or other databases, searches the spectroscopy publications. After finding the desired publication, the user very often encounters the following problems: 1) They cannot find the data described in the paper. There can be many reasons for this: nothing is provided in the paper itself or supplementary material; the authors are not responding to any requests; the web links provided in the paper have long been broken; etc. 2) The data is presented in a reduced form, for instance through the fitted spectroscopic constants. While this is a long-standing practice among spectroscopists, there are numerous serious problems with this practice, such as users getting different energy and intensity values because of different representations of the solution to the Hamiltonian, or even just despairing of trying to generate usable line lists from the published constants. Properly providing the data benefits not only users but also the authors of the spectroscopic research. We will show that this increases citations to the spectroscopy papers and visibility of the research groups. We will also address the quite common issue when researchers obtain the data, but do not feel that they have time, interest or resources to write an article describing it. There are modern tools that would allow one to make these data available to potential users and still get credit for it. However, this is a worst case scenario recommendation, i.e., publishing the data in a peer-reviewed journal is still the preferred way. L. S. Rothman, I. E. Gordon, et al. "The HITRAN 2012 molecular spectroscopic database," JQSRT 113, 4-50 (2013).
A data mining framework for time series estimation.

PubMed

Hu, Xiao; Xu, Peng; Wu, Shaozhi; Asgari, Shadnaz; Bergsneider, Marvin

2010-04-01

Time series estimation techniques are usually employed in biomedical research to derive variables less accessible from a set of related and more accessible variables. These techniques are traditionally built from systems modeling approaches including simulation, blind decovolution, and state estimation. In this work, we define target time series (TTS) and its related time series (RTS) as the output and input of a time series estimation process, respectively. We then propose a novel data mining framework for time series estimation when TTS and RTS represent different sets of observed variables from the same dynamic system. This is made possible by mining a database of instances of TTS, its simultaneously recorded RTS, and the input/output dynamic models between them. The key mining strategy is to formulate a mapping function for each TTS-RTS pair in the database that translates a feature vector extracted from RTS to the dissimilarity between true TTS and its estimate from the dynamic model associated with the same TTS-RTS pair. At run time, a feature vector is extracted from an inquiry RTS and supplied to the mapping function associated with each TTS-RTS pair to calculate a dissimilarity measure. An optimal TTS-RTS pair is then selected by analyzing these dissimilarity measures. The associated input/output model of the selected TTS-RTS pair is then used to simulate the TTS given the inquiry RTS as an input. An exemplary implementation was built to address a biomedical problem of noninvasive intracranial pressure assessment. The performance of the proposed method was superior to that of a simple training-free approach of finding the optimal TTS-RTS pair by a conventional similarity-based search on RTS features. 2009 Elsevier Inc. All rights reserved.
High-accuracy and robust face recognition system based on optical parallel correlator using a temporal image sequence

NASA Astrophysics Data System (ADS)

Watanabe, Eriko; Ishikawa, Mami; Ohta, Maiko; Kodate, Kashiko

2005-09-01

Face recognition is used in a wide range of security systems, such as monitoring credit card use, searching for individuals with street cameras via Internet and maintaining immigration control. There are still many technical subjects under study. For instance, the number of images that can be stored is limited under the current system, and the rate of recognition must be improved to account for photo shots taken at different angles under various conditions. We implemented a fully automatic Fast Face Recognition Optical Correlator (FARCO) system by using a 1000 frame/s optical parallel correlator designed and assembled by us. Operational speed for the 1: N (i.e. matching a pair of images among N, where N refers to the number of images in the database) identification experiment (4000 face images) amounts to less than 1.5 seconds, including the pre/post processing. From trial 1: N identification experiments using FARCO, we acquired low error rates of 2.6% False Reject Rate and 1.3% False Accept Rate. By making the most of the high-speed data-processing capability of this system, much more robustness can be achieved for various recognition conditions when large-category data are registered for a single person. We propose a face recognition algorithm for the FARCO while employing a temporal image sequence of moving images. Applying this algorithm to a natural posture, a two times higher recognition rate scored compared with our conventional system. The system has high potential for future use in a variety of purposes such as search for criminal suspects by use of street and airport video cameras, registration of babies at hospitals or handling of an immeasurable number of images in a database.
Determinants of seat belt use behaviour: a protocol for a systematic review.

PubMed

Ghaffari, Mohtasham; Armoon, Bahram; Rakhshanderou, Sakineh; Mehrabi, Yadollah; Soori, Hamid; Simsekoghlu, Ozelem; Harooni, Javad

2018-05-03

The use of seat belts could prevent severe collision damage to people in vehicle accidents and keep passengers safe from sustaining serious injuries; for instance, it could prevent passengers from being thrown out of a vehicle after the collision. The current systematic review will identify and analyse the determinants of seat belt use behaviour. We will include qualitative, quantitative and mixed methods studies reporting the acquired data from passengers aged more than 12 years and drivers, from both commercial and personal vehicles. Online databases including MEDLINE/PubMed, Scopus, Web of Science, Embase, Cochrane Database of Systematic Reviews and PsycINFO will be investigated in the current study. Published and available articles will be evaluated according to their titles and abstracts. Published papers conforming to the inclusion criteria will be organised for a complete review. Next, the full text of the remaining articles will be studied independently for eligibility by two authors. The quality of the selected studies will be assessed with appropriate tools. Based on the information obtained from the data extraction, the type of determinants of seat belt use will be classified. Ethics approval is not required, because this is a protocol for a systematic review and no primary data will be collected. The authors will ensure to maintain the rights of the used and included articles in the present systematic review. The findings of this review will be published in a relevant peer-reviewed journal. CRD42017067511. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The UK medical education database (UKMED) what is it? Why and how might you use it?

PubMed

Dowell, Jon; Cleland, Jennifer; Fitzpatrick, Siobhan; McManus, Chris; Nicholson, Sandra; Oppé, Thomas; Petty-Saphon, Katie; King, Olga Sierocinska; Smith, Daniel; Thornton, Steve; White, Kirsty

2018-01-05

Educating doctors is expensive and poor performance by future graduates can literally cost lives. Whilst the practice of medicine is highly evidence based, medical education is much less so. Research on medical school selection, undergraduate progression, Fitness to Practise (FtP) and postgraduate careers has been hampered across the globe by the challenges of uniting the data required. This paper describes the creation, structure and access arrangements for the first UK-wide attempt to do so. A collaborative approach has created a research database commencing with all entrants to UK medical schools in 2007 and 2008 (UKMED Phase 1). Here the content is outlined, governance arrangements considered, system access explained, and the potential implications of this new resource discussed. The data currently include achievements prior to medical school entry, admissions tests, graduation point information and also all subsequent data collected by the General Medical Council, including FtP, career progression, annual National Training Survey (NTS) responses, career choice and postgraduate exam performance data. UKMED has grown since the pilot phase with additional datasets; all subsequent years of students/trainees and stronger governance processes. The inclusion of future cohorts and additional information such as admissions scores or bespoke surveys or assessments is now being piloted. Thus, for instance, new scrutiny can be applied to selection techniques and the effectiveness of educational interventions. Data are available free of charge for approved studies from suitable research groups worldwide. It is anticipated that UKMED will continue on a rolling basis. This has the potential to radically change the volume and types of research that can be envisaged and, therefore, to improve standards, facilitate workforce planning and support the regulation of medical education and training. This paper aspires to encourage proposals to utilise this exciting resource.
TOOLKIT, Version 2. 0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schroeder, E.; Bagot, B.; McNeill, R.L.

1990-05-09

The purpose of this User's Guide is to show by example many of the features of Toolkit II. Some examples will be copies of screens as they appear while running the Toolkit. Other examples will show what the user should enter in various situations; in these instances, what the computer asserts will be in boldface and what the user responds will be in regular type. The User's Guide is divided into four sections. The first section, FOCUS Databases'', will give a broad overview of the Focus administrative databases that are available on the VAX; easy-to-use reports are available for mostmore » of them in the Toolkit. The second section, Getting Started'', will cover the steps necessary to log onto the Computer Center VAX cluster and how to start Focus and the Toolkit. The third section, Using the Toolkit'', will discuss some of the features in the Toolkit -- the available reports and how to access them, as well as some utilities. The fourth section, Helpful Hints'', will cover some useful facts about the VAX and Focus as well as some of the more common problems that can occur. The Toolkit is not set in concrete but is continually being revised and improved. If you have any opinions as to changes that you would like to see made to the Toolkit or new features that you would like included, please let us know. Since we do try to respond to the needs of the user and make periodic improvement to the Toolkit, this User's Guide may not correspond exactly to what is available in the computer. In general, changes are made to provide new options or features; rarely is an existing feature deleted.« less
Information Retrieval Performance of Probabilistically Generated, Problem-Specific Computerized Provider Order Entry Pick-Lists: A Pilot Study

PubMed Central

Rothschild, Adam S.; Lehmann, Harold P.

2005-01-01

Objective: The aim of this study was to preliminarily determine the feasibility of probabilistically generating problem-specific computerized provider order entry (CPOE) pick-lists from a database of explicitly linked orders and problems from actual clinical cases. Design: In a pilot retrospective validation, physicians reviewed internal medicine cases consisting of the admission history and physical examination and orders placed using CPOE during the first 24 hours after admission. They created coded problem lists and linked orders from individual cases to the problem for which they were most indicated. Problem-specific order pick-lists were generated by including a given order in a pick-list if the probability of linkage of order and problem (PLOP) equaled or exceeded a specified threshold. PLOP for a given linked order-problem pair was computed as its prevalence among the other cases in the experiment with the given problem. The orders that the reviewer linked to a given problem instance served as the reference standard to evaluate its system-generated pick-list. Measurements: Recall, precision, and length of the pick-lists. Results: Average recall reached a maximum of .67 with a precision of .17 and pick-list length of 31.22 at a PLOP threshold of 0. Average precision reached a maximum of .73 with a recall of .09 and pick-list length of .42 at a PLOP threshold of .9. Recall varied inversely with precision in classic information retrieval behavior. Conclusion: We preliminarily conclude that it is feasible to generate problem-specific CPOE pick-lists probabilistically from a database of explicitly linked orders and problems. Further research is necessary to determine the usefulness of this approach in real-world settings. PMID:15684134
CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences.

PubMed

Hazes, Bart

2014-02-28

Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Exploring Spanish health social media for detecting drug effects.

PubMed

Segura-Bedmar, Isabel; Martínez, Paloma; Revert, Ricardo; Moreno-Schneider, Julián

2015-01-01

Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and increasing health care costs. Reporting systems are available in order for both healthcare professionals and patients to alert about possible ADRs. However, several studies have shown that these adverse events are underestimated. Our hypothesis is that health social networks could be a significant information source for the early detection of ADRs as well as of new drug indications. In this work we present a system for detecting drug effects (which include both adverse drug reactions as well as drug indications) from user posts extracted from a Spanish health forum. Texts were processed using MeaningCloud, a multilingual text analysis engine, to identify drugs and effects. In addition, we developed the first Spanish database storing drugs as well as their effects automatically built from drug package inserts gathered from online websites. We then applied a distant-supervision method using the database on a collection of 84,000 messages in order to extract the relations between drugs and their effects. To classify the relation instances, we used a kernel method based only on shallow linguistic information of the sentences. Regarding Relation Extraction of drugs and their effects, the distant supervision approach achieved a recall of 0.59 and a precision of 0.48. The task of extracting relations between drugs and their effects from social media is a complex challenge due to the characteristics of social media texts. These texts, typically posts or tweets, usually contain many grammatical errors and spelling mistakes. Moreover, patients use lay terminology to refer to diseases, symptoms and indications that is not usually included in lexical resources in languages other than English.
The Nordic Obstetric Surveillance Study: a study of complete uterine rupture, abnormally invasive placenta, peripartum hysterectomy, and severe blood loss at delivery.

PubMed

Colmorn, Lotte B; Petersen, Kathrine B; Jakobsson, Maija; Lindqvist, Pelle G; Klungsoyr, Kari; Källen, Karin; Bjarnadottir, Ragnheidur I; Tapper, Anna-Maija; Børdahl, Per E; Gottvall, Karin; Thurn, Lars; Gissler, Mika; Krebs, Lone; Langhoff-Roos, Jens

2015-07-01

To assess the rates and characteristics of women with complete uterine rupture, abnormally invasive placenta, peripartum hysterectomy, and severe blood loss at delivery in the Nordic countries. Prospective, Nordic collaboration. The Nordic Obstetric Surveillance Study (NOSS) collected cases of severe obstetric complications in the Nordic countries from April 2009 to August 2012. Cases were reported by clinicians at the Nordic maternity units and retrieved from medical birth registers, hospital discharge registers, and transfusion databases by using International Classification of Diseases, 10th revision codes on diagnoses and the Nordic Medico-Statistical Committee Classification of Surgical Procedure codes. Rates of the studied complications and possible risk factors among parturients in the Nordic countries. The studied complications were reported in 1019 instances among 605 362 deliveries during the study period. The reported rate of severe blood loss at delivery was 11.6/10 000 deliveries, complete uterine rupture was 5.6/10 000 deliveries, abnormally invasive placenta was 4.6/10 000 deliveries, and peripartum hysterectomy was 3.5/10 000 deliveries. Of the women, 25% had two or more complications. Women with complications were more often >35 years old, overweight, with a higher parity, and a history of cesarean delivery compared with the total population. The studied obstetric complications are rare. Uniform definitions and valid reporting are essential for international comparisons. The main risk factors include previous cesarean section. The detailed information collected in the NOSS database provides a basis for epidemiologic studies, audits, and educational activities. © 2015 Nordic Federation of Societies of Obstetrics and Gynecology.

Analysis of drug-drug interactions among patients receiving antiretroviral regimens using data from a large open-source prescription database.

PubMed

Patel, Nimish; Borg, Peter; Haubrich, Richard; McNicholl, Ian

2018-06-14

Results of a study of contraindicated concomitant medication use among recipients of preferred antiretroviral therapy (ART) regimens are reported. A retrospective study was conducted to evaluate concomitant medication use in a cohort of previously treatment-naive, human immunodeficiency virus (HIV)-infected U.S. patients prescribed preferred ART regimens during the period April 2014-March 2015. Data were obtained from a proprietary longitudinal prescription database; elements retrieved included age, sex, and prescription data. The outcome of interest was the frequency of drug-drug interactions (DDIs) associated with concomitant use of contraindicated medications. Data on 25,919 unique treatment-naive patients who used a preferred ART regimen were collected. Overall, there were 384 instances in which a contraindicated medication was dispensed for concurrent use with a recommended ART regimen. Rates of contraindicated concomitant medication use differed significantly by ART regimen; the highest rate (3.2%) was for darunavir plus ritonavir plus emtricitabine-tenofovir disoproxil fumarate (DRV plus RTV plus FTC/TDF), followed by elvitegravir-cobicistat-emtricitabine-tenofovir disoproxil fumarate (EVG/c/FTC/TDF)(2.8%). The highest frequencies of DDIs were associated with ART regimens that included a pharmacoenhancing agent: DRV plus RTV plus FTC/TDF (3.2%) and EVG/c/FTC/TDF (2.8%). In a large population of treatment-naive HIV-infected patients, ART regimens that contained a pharmacoenhancing agent were involved most frequently in contraindicated medication-related DDIs. All of the DDIs could have been avoided by using therapeutic alternatives within the same class not associated with a DDI. Copyright © 2018 by the American Society of Health-System Pharmacists, Inc. All rights reserved.
Tachydysrhythmia treatment and adverse events in patients with wolff-Parkinson-white syndrome.

PubMed

Siegelman, Jeffrey N; Marill, Keith A; Adler, Jonathan N

2014-09-01

Current guidelines recommend avoiding atrioventricular-nodal blocking agents (AVNB) when treating tachydysrhythmias in Wolff-Parkinson-White syndrome (WPW) patients. We investigated medications selected and resulting outcomes for patients with tachydysrhythmias and WPW. In this single-center retrospective cohort study, we searched a hospital-wide database for the following inclusion criteria: WPW, tachycardia, and intravenous antidysrhythmics. The composite outcome of adverse events was acceleration of tachycardia, new hypotension, new malignant dysrhythmia, and cardioversion. The difference in binomial proportions of patients meeting the composite outcome after AVNB or non-AVNB (NAVNB) treatment was calculated after dividing the groups by QRS duration. A random-effects mixed linear analysis was performed to analyze the vital sign response. The initial database search yielded 1158 patient visits, with 60 meeting inclusion criteria. Patients' median age was 52.5 years; 53% were male, 43% presented in wide complex tachycardia (WCT), with 75% in atrial fibrillation (AF) or flutter. AVNBs were administered in 42 (70%) patient visits. For those patients with WCT in AF, the difference in proportions of patients meeting the composite outcome after AVNBs vs. NAVNBs treatment was an increase of 3% (95% confidence interval [CI] -39%-49%), and for those with narrow complex AF it was a decrease of 13% (95% CI -37%-81%). No instances of malignant dysrhythmia occurred. Mixed linear analysis showed no statistically significant effects on heart rate, though suggested a trend toward increasing heart rate after AVNB in wide complex AF. In this sample of WPW-associated tachydysrhythmia patients, many were treated with AVNBs. The composite outcome was similarly met after use of either AVNB or NAVNB, and no malignant dysrhythmias were observed. Copyright © 2014 Elsevier Inc. All rights reserved.
How Young Children Learn from Examples: Descriptive and Inferential Problems

ERIC Educational Resources Information Center

Kalish, Charles W.; Kim, Sunae; Young, Andrew G.

2012-01-01

Three experiments with preschool- and young school-aged children (N = 75 and 53) explored the kinds of relations children detect in samples of instances (descriptive problem) and how they generalize those relations to new instances (inferential problem). Each experiment initially presented a perfect biconditional relation between two features…
Advanced Homomorphic Encryption its Applications and Derivatives (AHEAD)

DTIC Science & Technology

2013-09-01

lattice problems. Quaderni di Matematica , 13:1–32, 2004. Preliminary version in STOC 1996. [Ajt99] M. Ajtai. Generating hard instances of the short...search-to-decision reduction of [17]. References [1] M. Ajtai. Generating hard instances of lattice problems. Quaderni di Matematica , 13:1–32, 2004
Visual Representations of Academic Misconduct: Enhancing Information Literacy Skills

ERIC Educational Resources Information Center

Ivancic, Sonia R.; Hosek, Angela M.

2017-01-01

Courses: This unit activity is suited for courses with research and source citation components, such as the Basic Communication; Interpersonal, and Organizational Communication courses. Objectives: Students will (a) visually interpret and analyze instances of plagiarism; (b) revise their work to use proper citations and reduce instances of…
Some Implications of a Diversifying Workforce for Governance and Management

ERIC Educational Resources Information Center

Whitchurch, Celia; Gordon, George

2011-01-01

This paper suggests that as university missions have adapted to accommodate major developments associated with, for instance, mass higher education and internationalisation agendas, university workforces have diversified. They now, for instance, incorporate practitioners in areas such as health and social care, and professional staff who support…
15 CFR 700.54 - Instances where assistance will not be provided.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Instances where assistance will not be provided. 700.54 Section 700.54 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE NATIONAL SECURITY INDUSTRIAL...
15 CFR 700.54 - Instances where assistance will not be provided.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Instances where assistance will not be provided. 700.54 Section 700.54 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU OF INDUSTRY AND SECURITY, DEPARTMENT OF COMMERCE NATIONAL SECURITY INDUSTRIAL...
Labels Facilitate Infants' Comparison of Action Goals

ERIC Educational Resources Information Center

Gerson, Sarah A.; Woodward, Amanda L.

2014-01-01

Understanding the actions of others depends on the insight that these actions are structured by intentional relations. In a number of conceptual domains, comparison with familiar instances has been shown to support children's and adults' ability to discern the relational structure of novel instances. Recent evidence suggests that this process…
Number Partitioning via Quantum Adiabatic Computation

NASA Technical Reports Server (NTRS)

Smelyanskiy, Vadim N.; Toussaint, Udo

2002-01-01

We study both analytically and numerically the complexity of the adiabatic quantum evolution algorithm applied to random instances of combinatorial optimization problems. We use as an example the NP-complete set partition problem and obtain an asymptotic expression for the minimal gap separating the ground and exited states of a system during the execution of the algorithm. We show that for computationally hard problem instances the size of the minimal gap scales exponentially with the problem size. This result is in qualitative agreement with the direct numerical simulation of the algorithm for small instances of the set partition problem. We describe the statistical properties of the optimization problem that are responsible for the exponential behavior of the algorithm.
Felyx : A Free Open Software Solution for the Analysis of Large Earth Observation Datasets

NASA Astrophysics Data System (ADS)

Piolle, Jean-Francois; Shutler, Jamie; Poulter, David; Guidetti, Veronica; Donlon, Craig

2014-05-01

GHRSST project, by assembling large collections of earth observation data from various sources and agencies, has also raised the need for providing the user community with tools to inter-compare them, assess and monitor their quality. The ESA /Medspiration project, which implemented the first operating node of GHRSST system for Europe, also paved the way successfully towards such generic analytics tools by developing the High Resolution Diagnostic Dataset System (HR-DDS) and Satellite to In situ Multi-sensor Match-up Databases. Building on this heritage, ESA is now funding the development by IFREMER, PML and Pelamis of felyx, a web tool merging the two capabilities into a single software solution. It will consist in a free open software solution, written in python and javascript, whose aim is to provide Earth Observation data producers and users with an open-source, flexible and reusable tool to allow the quality and performance of data streams (satellite, in situ and model) to be easily monitored and studied. The primary concept of Felyx is to work as an extraction tool, subsetting source data over predefined target areas (which can be static or moving) : these data subsets, and associated metrics, can then be accessed by users or client applications either as raw files, automatic alerts and reports generated periodically, or through a flexible web interface enabling statistical analysis and visualization. Felyx presents itself as an open-source suite of tools, written in python and javascript, enabling : * subsetting large local or remote collections of Earth Observation data over predefined sites (geographical boxes) or moving targets (ship, buoy, hurricane), storing locally the extracted data (refered as miniProds). These miniProds constitute a much smaller representative subset of the original collection on which one can perform any kind of processing or assessment without having to cope with heavy volumes of data. * computing statistical metrics over these miniProds using for instance a set of usual statistical operators (mean, median, rms, ...), fully extensible and applicable to any variable of a dataset. These metrics are stored in a fast search engine, queryable by humans and automated applications. * reporting or alerting, based on user-defined inference rules, through various media (emails, twitter feeds,..) and devices (phones, tablets). * analysing miniProds and metrics through a web interface allowing to dig into this base of information and extracting useful knowledge through multidimensional interactive display functions (time series, scatterplots, histograms, maps). The services provided by felyx will be generic, deployable at users own premises and adaptable enough to integrate any kind of parameters. Users will be able to operate their own felyx instance at any location, on datasets and parameters of their own interest, and the various instances will be able to interact with each other, creating a web of felyx systems enabling aggregation and cross comparison of miniProds and metrics from multiple sources. Initially two instances will be operated simultaneously during a 6 months demonstration phase, at IFREMER - on sea surface temperature (for GHRSST community) and ocean waves datasets - and PML - on ocean colour. We will present results from the Felyx project, demonstrate how the GHRSST community can exploit Felyx and demonstrate how the wider community can make use of the GHRSST data within Felyx.
Sentence Comprehension in Children with Specific Language Impairment: Effects of Input Rate and Phonological Working Memory

ERIC Educational Resources Information Center

Montgomery, James W.

2004-01-01

Many children with specific language impairment (SLI) exhibit sentence comprehension difficulties. In some instances, these difficulties appear to be related to poor linguistic knowledge and, in other instances, to inferior general processing abilities. Two processing deficiencies evidenced by these children include reduced linguistic processing…
Understanding Positive and Negative Communication Instances between Special Educators and Parents of High School Students with EBD

ERIC Educational Resources Information Center

Mires, Carolyn B.

2015-01-01

Using a multiple case study methodology, interviews were conducted to examine current practices and perceptions of the communication practices of teachers working with high school students with emotional and behavioral disorders (EBD). These interviews involved questions about general communication instances which occurred each week, communication…
Refactoring a CS0 Course for Engineering Students to Use Active Learning

ERIC Educational Resources Information Center

Lokkila, Erno; Kaila, Erkki; Lindén, Rolf; Laakso, Mikko-Jussi; Sutinen, Erkki

2017-01-01

Purpose: The purpose of this paper was to determine whether applying e-learning material to a course leads to consistently improved student performance. Design/methodology/approach: This paper analyzes grade data from seven instances of the course. The first three instances were performed traditionally. After an intervention, in the form of…
First observed instance of polygyny in Flammulated Owls

Treesearch

Brian D. Linkhart; Erin M. Evers; Julie D. Megler; Eric C. Palm; Catherine M. Salipante; Scott W. Yanco

2008-01-01

We document the first observed instance of polygyny in Flammulated Owls (Otus flammeolus) and the first among insectivorous raptors. Chronologies of the male's two nests, which were 510 m apart, were separated by nearly 2 weeks. Each brood initially consisted of three owlets, similar to the mean brood size in monogamous pairs. The male delivered...
Beyond Physics: A Case for Far Transfer

ERIC Educational Resources Information Center

Forsyth, Benjamin Robert

2012-01-01

This is a case study of a physics undergraduate who claimed that he "uses physics to understand other subjects." This statement suggested that this student could describe issues concerning the transfer of learning and especially instances of far transfer. Detailed instances of far transfer have been difficult to replicate in lab settings.…
78 FR 56150 - Airworthiness Directives; Piper Aircraft, Inc. Airplanes

Federal Register 2010, 2011, 2012, 2013, 2014

2013-09-12

..., PA-46-350P, PA-46R-350T, and PA-46-500TP airplanes. There is an incorrect reference to a paragraph designation, four instances of an incorrect reference to the paragraph in the service bulletin that references... instances of an incorrect reference to the paragraph in the service bulletin that references an...
10 CFR 217.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 10 Energy 3 2013-01-01 2013-01-01 false Instances where assistance may not be provided. 217.44 Section 217.44 Energy DEPARTMENT OF ENERGY OIL ENERGY PRIORITIES AND ALLOCATIONS SYSTEM Special Priorities... where assistance may not be provided include situations when a person is attempting to: (a) Secure a...
10 CFR 217.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 10 Energy 3 2014-01-01 2014-01-01 false Instances where assistance may not be provided. 217.44 Section 217.44 Energy DEPARTMENT OF ENERGY OIL ENERGY PRIORITIES AND ALLOCATIONS SYSTEM Special Priorities... where assistance may not be provided include situations when a person is attempting to: (a) Secure a...
10 CFR 217.44 - Instances where assistance may not be provided.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 10 Energy 3 2012-01-01 2012-01-01 false Instances where assistance may not be provided. 217.44 Section 217.44 Energy DEPARTMENT OF ENERGY OIL ENERGY PRIORITIES AND ALLOCATIONS SYSTEM Special Priorities... where assistance may not be provided include situations when a person is attempting to: (a) Secure a...

28 CFR 51.46 - Reconsideration of objection at the instance of the Attorney General.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Reconsideration of objection at the instance of the Attorney General. 51.46 Section 51.46 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR THE ADMINISTRATION OF SECTION 5 OF THE VOTING RIGHTS ACT OF 1965, AS AMENDED...
28 CFR 51.46 - Reconsideration of objection at the instance of the Attorney General.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 28 Judicial Administration 2 2011-07-01 2011-07-01 false Reconsideration of objection at the instance of the Attorney General. 51.46 Section 51.46 Judicial Administration DEPARTMENT OF JUSTICE (CONTINUED) PROCEDURES FOR THE ADMINISTRATION OF SECTION 5 OF THE VOTING RIGHTS ACT OF 1965, AS AMENDED...
Developing Formal Object-oriented Requirements Specifications: A Model, Tool and Technique.

ERIC Educational Resources Information Center

Jackson, Robert B.; And Others

1995-01-01

Presents a formal object-oriented specification model (OSS) for computer software system development that is supported by a tool that automatically generates a prototype from an object-oriented analysis model (OSA) instance, lets the user examine the prototype, and permits the user to refine the OSA model instance to generate a requirements…
2008 Service Academy Gender Relations Survey

DTIC Science & Technology

2008-12-01

quid pro quo harassment . Three component measures of sexual harassment are derived from Q17. The...classic quid pro quo instances of specific treatment or favoritism conditioned on sexual cooperation. The measurement of these behaviors is derived from...relationship; • Sexual coercionclassic quid pro quo instances of specific treatment or favoritism conditioned on sexual cooperation. 73
78 FR 68911 - Public Company Accounting Oversight Board; Notice of Filing of Proposed Rules on Attestation...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-11-15

... determining whether the controls have been implemented. c. Obtain an understanding of instances of non... control include: The nature of the financial responsibility rule; The risk associated with non-compliance... identified instance of non-compliance or an identified Deficiency in Internal Control Over Compliance is an...
In "Other" Words: Some Thoughts on the Transferability of Collocations

ERIC Educational Resources Information Center

Odlin, Terence

2011-01-01

In discussions of cross-linguistic influence (also known as language transfer), the focus is usually on the influence of a particular structure in a particular instance of language contact, for instance, the negative transfer of serial verbs by Vietnamese learners of English: "She has managed to rise the kite fly over the tallest…
Developmental Changes in Visual Object Recognition between 18 and 24 Months of Age

ERIC Educational Resources Information Center

Pereira, Alfredo F.; Smith, Linda B.

2009-01-01

Two experiments examined developmental changes in children's visual recognition of common objects during the period of 18 to 24 months. Experiment 1 examined children's ability to recognize common category instances that presented three different kinds of information: (1) richly detailed and prototypical instances that presented both local and…
Hybrid value foraging: How the value of targets shapes human foraging behavior.

PubMed

Wolfe, Jeremy M; Cain, Matthew S; Alaoui-Soce, Abla

2018-04-01

In hybrid foraging, observers search visual displays for multiple instances of multiple target types. In previous hybrid foraging experiments, although there were multiple types of target, all instances of all targets had the same value. Under such conditions, behavior was well described by the marginal value theorem (MVT). Foragers left the current "patch" for the next patch when the instantaneous rate of collection dropped below their average rate of collection. An observer's specific target selections were shaped by previous target selections. Observers were biased toward picking another instance of the same target. In the present work, observers forage for instances of four target types whose value and prevalence can vary. If value is kept constant and prevalence manipulated, participants consistently show a preference for the most common targets. Patch-leaving behavior follows MVT. When value is manipulated, observers favor more valuable targets, though individual foraging strategies become more diverse, with some observers favoring the most valuable target types very strongly, sometimes moving to the next patch without collecting any of the less valuable targets.
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees.

PubMed

van Iersel, Leo; Kelk, Steven; Lekić, Nela; Scornavacca, Celine

2014-05-05

Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.
Computer Program Recognizes Patterns in Time-Series Data

NASA Technical Reports Server (NTRS)

Hand, Charles

2003-01-01

A computer program recognizes selected patterns in time-series data like digitized samples of seismic or electrophysiological signals. The program implements an artificial neural network (ANN) and a set of N clocks for the purpose of determining whether N or more instances of a certain waveform, W, occur within a given time interval, T. The ANN must be trained to recognize W in the incoming stream of data. The first time the ANN recognizes W, it sets clock 1 to count down from T to zero; the second time it recognizes W, it sets clock 2 to count down from T to zero, and so forth through the Nth instance. On the N + 1st instance, the cycle is repeated, starting with clock 1. If any clock has not reached zero when it is reset, then N instances of W have been detected within time T, and the program so indicates. The program can readily be encoded in a field-programmable gate array or an application-specific integrated circuit that could be used, for example, to detect electroencephalographic or electrocardiographic waveforms indicative of epileptic seizures or heart attacks, respectively.
Predicting MHC-II binding affinity using multiple instance regression

PubMed Central

EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant

2011-01-01

Reliably predicting the ability of antigen peptides to bind to major histocompatibility complex class II (MHC-II) molecules is an essential step in developing new vaccines. Uncovering the amino acid sequence correlates of the binding affinity of MHC-II binding peptides is important for understanding pathogenesis and immune response. The task of predicting MHC-II binding peptides is complicated by the significant variability in their length. Most existing computational methods for predicting MHC-II binding peptides focus on identifying a nine amino acids core region in each binding peptide. We formulate the problems of qualitatively and quantitatively predicting flexible length MHC-II peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression. We present results of experiments using several benchmark datasets that show that MHCMIR is competitive with the state-of-the-art methods for predicting MHC-II binding peptides. An online web server that implements the MHCMIR method for MHC-II binding affinity prediction is freely accessible at http://ailab.cs.iastate.edu/mhcmir. PMID:20855923
Systems and methods to control multiple peripherals with a single-peripheral application code

DOEpatents

Ransom, Ray M.

2013-06-11

Methods and apparatus are provided for enhancing the BIOS of a hardware peripheral device to manage multiple peripheral devices simultaneously without modifying the application software of the peripheral device. The apparatus comprises a logic control unit and a memory in communication with the logic control unit. The memory is partitioned into a plurality of ranges, each range comprising one or more blocks of memory, one range being associated with each instance of the peripheral application and one range being reserved for storage of a data pointer related to each peripheral application of the plurality. The logic control unit is configured to operate multiple instances of the control application by duplicating one instance of the peripheral application for each peripheral device of the plurality and partitioning a memory device into partitions comprising one or more blocks of memory, one partition being associated with each instance of the peripheral application. The method then reserves a range of memory addresses for storage of a data pointer related to each peripheral device of the plurality, and initializes each of the plurality of peripheral devices.
An effective hybrid immune algorithm for solving the distributed permutation flow-shop scheduling problem

NASA Astrophysics Data System (ADS)

Xu, Ye; Wang, Ling; Wang, Shengyao; Liu, Min

2014-09-01

In this article, an effective hybrid immune algorithm (HIA) is presented to solve the distributed permutation flow-shop scheduling problem (DPFSP). First, a decoding method is proposed to transfer a job permutation sequence to a feasible schedule considering both factory dispatching and job sequencing. Secondly, a local search with four search operators is presented based on the characteristics of the problem. Thirdly, a special crossover operator is designed for the DPFSP, and mutation and vaccination operators are also applied within the framework of the HIA to perform an immune search. The influence of parameter setting on the HIA is investigated based on the Taguchi method of design of experiment. Extensive numerical testing results based on 420 small-sized instances and 720 large-sized instances are provided. The effectiveness of the HIA is demonstrated by comparison with some existing heuristic algorithms and the variable neighbourhood descent methods. New best known solutions are obtained by the HIA for 17 out of 420 small-sized instances and 585 out of 720 large-sized instances.
Performance Analysis of Continuous Black-Box Optimization Algorithms via Footprints in Instance Space.

PubMed

Muñoz, Mario A; Smith-Miles, Kate A

2017-01-01

This article presents a method for the objective assessment of an algorithm's strengths and weaknesses. Instead of examining the performance of only one or more algorithms on a benchmark set, or generating custom problems that maximize the performance difference between two algorithms, our method quantifies both the nature of the test instances and the algorithm performance. Our aim is to gather information about possible phase transitions in performance, that is, the points in which a small change in problem structure produces algorithm failure. The method is based on the accurate estimation and characterization of the algorithm footprints, that is, the regions of instance space in which good or exceptional performance is expected from an algorithm. A footprint can be estimated for each algorithm and for the overall portfolio. Therefore, we select a set of features to generate a common instance space, which we validate by constructing a sufficiently accurate prediction model. We characterize the footprints by their area and density. Our method identifies complementary performance between algorithms, quantifies the common features of hard problems, and locates regions where a phase transition may lie.
The iMars WebGIS - Spatio-Temporal Data Queries and Single Image Map Web Services

NASA Astrophysics Data System (ADS)

Walter, Sebastian; Steikert, Ralf; Schreiner, Bjoern; Muller, Jan-Peter; van Gasselt, Stephan; Sidiropoulos, Panagiotis; Lanz-Kroechert, Julia

2017-04-01

Introduction: Web-based planetary image dissemination platforms usually show outline coverages of the data and offer querying for metadata as well as preview and download, e.g. the HRSC Mapserver (Walter & van Gasselt, 2014). Here we introduce a new approach for a system dedicated to change detection by simultanous visualisation of single-image time series in a multi-temporal context. While the usual form of presenting multi-orbit datasets is the merge of the data into a larger mosaic, we want to stay with the single image as an important snapshot of the planetary surface at a specific time. In the context of the EU FP-7 iMars project we process and ingest vast amounts of automatically co-registered (ACRO) images. The base of the co-registration are the high precision HRSC multi-orbit quadrangle image mosaics, which are based on bundle-block-adjusted multi-orbit HRSC DTMs. Additionally we make use of the existing bundle-adjusted HRSC single images available at the PDS archives. A prototype demonstrating the presented features is available at http://imars.planet.fu-berlin.de. Multi-temporal database: In order to locate multiple coverage of images and select images based on spatio-temporal queries, we converge available coverage catalogs for various NASA imaging missions into a relational database management system with geometry support. We harvest available metadata entries during our processing pipeline using the Integrated Software for Imagers and Spectrometers (ISIS) software. Currently, this database contains image outlines from the MGS/MOC, MRO/CTX and the MO/THEMIS instruments with imaging dates ranging from 1996 to the present. For the MEx/HRSC data, we already maintain a database which we automatically update with custom software based on the VICAR environment. Web Map Service with time support: The MapServer software is connected to the database and provides Web Map Services (WMS) with time support based on the START_TIME image attribute. It allows temporal WMS GetMap requests by setting additional TIME parameter values in the request. The values for the parameter represent an interval defined by its lower and upper bounds. As the WMS time standard only supports one time variable, only the start times of the images are considered. If no time values are submitted with the request, the full time range of all images is assumed as the default. Dynamic single image WMS: To compare images from different acquisition times at sites of multiple coverage, we have to load every image as a single WMS layer. Due to the vast amount of single images we need a way to set up the layers in a dynamic way - the map server does not know the images to be served beforehand. We use the MapScript interface to dynamically access MapServer's objects and configure the file name and path of the requested image in the map configuration. The layers are created on-the-fly each representing only one single image. On the frontend side, the vendor-specific WMS request parameter (PRODUCTID) has to be appended to the regular set of WMS parameters. The request is then passed on to the MapScript instance. Web Map Tile Cache: In order to speed up access of the WMS requests, a MapCache instance has been integrated in the pipeline. As it is not aware of the available PDS product IDs which will be queried, the PRODUCTID parameter is configured as an additional dimension of the cache. The WMS request is received by the Apache webserver configured with the MapCache module. If the tile is available in the tile cache, it is immediately commited to the client. If not available, the tile request is forwarded to Apache and the MapScript module. The Python script intercepts the WMS request and extracts the product ID from the parameter chain. It loads the layer object from the map file and appends the file name and path of the inquired image. After some possible further image processing inside the script (stretching, color matching), the request is submitted to the MapServer backend which in turn delivers the response back to the MapCache instance. Web frontend: We have implemented a web-GIS frontend based on various OpenLayers components. The basemap is a global color-hillshaded HRSC bundle-adjusted DTM mosaic with a resolution of 50 m per pixel. The new bundle-block-adjusted qudrangle mosaics of the MC-11 quadrangle, both image and DTM, are included with opacity slider options. The layer user interface has been adapted on the base of the ol3-layerswitcher and extended by foldable and switchable groups, layer sorting (by resolution, by time and alphabeticallly) and reordering (drag-and-drop). A collapsible time panel accomodates a time slider interface where the user can filter the visible data by a range of Mars or Earth dates and/or by solar longitudes. The visualisation of time-series of single images is controlled by a specific toolbar enabling the workflow of image selection (by point or bounding box), dynamic image loading and playback of single images in a video player-like environment. During a stress-test campaign we could demonstrate that the system is capable of serving up to 10 simultaneous users on its current lightweight development hardware. It is planned to relocate the software to more powerful hardware by the time of this conference. Conclusions/Outlook: The iMars webGIS is an expert tool for the detection and visualization of surface changes. We demonstrate a technique to dynamically retrieve and display single images based on the time-series structure of the data. Together with the multi-temporal database and its MapServer/MapCache backend it provides a stable and high performance environment for the dissemination of the various iMars products. Acknowledgements: This research has received funding from the EU's FP7 Programme under iMars 607379 and by the German Space Agency (DLR Bonn), grant 50 QM 1301 (HRSC on Mars Express).
CLOUDCLOUD : general-purpose instrument monitoring and data managing software

NASA Astrophysics Data System (ADS)

Dias, António; Amorim, António; Tomé, António

2016-04-01

An effective experiment is dependent on the ability to store and deliver data and information to all participant parties regardless of their degree of involvement in the specific parts that make the experiment a whole. Having fast, efficient and ubiquitous access to data will increase visibility and discussion, such that the outcome will have already been reviewed several times, strengthening the conclusions. The CLOUD project aims at providing users with a general purpose data acquisition, management and instrument monitoring platform that is fast, easy to use, lightweight and accessible to all participants of an experiment. This work is now implemented in the CLOUD experiment at CERN and will be fully integrated with the experiment as of 2016. Despite being used in an experiment of the scale of CLOUD, this software can also be used in any size of experiment or monitoring station, from single computers to large networks of computers to monitor any sort of instrument output without influencing the individual instrument's DAQ. Instrument data and meta data is stored and accessed via a specially designed database architecture and any type of instrument output is accepted using our continuously growing parsing application. Multiple databases can be used to separate different data taking periods or a single database can be used if for instance an experiment is continuous. A simple web-based application gives the user total control over the monitored instruments and their data, allowing data visualization and download, upload of processed data and the ability to edit existing instruments or add new instruments to the experiment. When in a network, new computers are immediately recognized and added to the system and are able to monitor instruments connected to them. Automatic computer integration is achieved by a locally running python-based parsing agent that communicates with a main server application guaranteeing that all instruments assigned to that computer are monitored with parsing intervals as fast as milliseconds. This software (server+agents+interface+database) comes in easy and ready-to-use packages that can be installed in any operating system, including Android and iOS systems. This software is ideal for use in modular experiments or monitoring stations with large variability in instruments and measuring methods or in large collaborations, where data requires homogenization in order to be effectively transmitted to all involved parties. This work presents the software and provides performance comparison with previously used monitoring systems in the CLOUD experiment at CERN.
Error reduction in EMG signal decomposition

PubMed Central

Kline, Joshua C.

2014-01-01

Decomposition of the electromyographic (EMG) signal into constituent action potentials and the identification of individual firing instances of each motor unit in the presence of ambient noise are inherently probabilistic processes, whether performed manually or with automated algorithms. Consequently, they are subject to errors. We set out to classify and reduce these errors by analyzing 1,061 motor-unit action-potential trains (MUAPTs), obtained by decomposing surface EMG (sEMG) signals recorded during human voluntary contractions. Decomposition errors were classified into two general categories: location errors representing variability in the temporal localization of each motor-unit firing instance and identification errors consisting of falsely detected or missed firing instances. To mitigate these errors, we developed an error-reduction algorithm that combines multiple decomposition estimates to determine a more probable estimate of motor-unit firing instances with fewer errors. The performance of the algorithm is governed by a trade-off between the yield of MUAPTs obtained above a given accuracy level and the time required to perform the decomposition. When applied to a set of sEMG signals synthesized from real MUAPTs, the identification error was reduced by an average of 1.78%, improving the accuracy to 97.0%, and the location error was reduced by an average of 1.66 ms. The error-reduction algorithm in this study is not limited to any specific decomposition strategy. Rather, we propose it be used for other decomposition methods, especially when analyzing precise motor-unit firing instances, as occurs when measuring synchronization. PMID:25210159
Practical optimization of Steiner trees via the cavity method

NASA Astrophysics Data System (ADS)

Braunstein, Alfredo; Muntoni, Anna

2016-07-01

The optimization version of the cavity method for single instances, called Max-Sum, has been applied in the past to the minimum Steiner tree problem on graphs and variants. Max-Sum has been shown experimentally to give asymptotically optimal results on certain types of weighted random graphs, and to give good solutions in short computation times for some types of real networks. However, the hypotheses behind the formulation and the cavity method itself limit substantially the class of instances on which the approach gives good results (or even converges). Moreover, in the standard model formulation, the diameter of the tree solution is limited by a predefined bound, that affects both computation time and convergence properties. In this work we describe two main enhancements to the Max-Sum equations to be able to cope with optimization of real-world instances. First, we develop an alternative ‘flat’ model formulation that allows the relevant configuration space to be reduced substantially, making the approach feasible on instances with large solution diameter, in particular when the number of terminal nodes is small. Second, we propose an integration between Max-Sum and three greedy heuristics. This integration allows Max-Sum to be transformed into a highly competitive self-contained algorithm, in which a feasible solution is given at each step of the iterative procedure. Part of this development participated in the 2014 DIMACS Challenge on Steiner problems, and we report the results here. The performance on the challenge of the proposed approach was highly satisfactory: it maintained a small gap to the best bound in most cases, and obtained the best results on several instances in two different categories. We also present several improvements with respect to the version of the algorithm that participated in the competition, including new best solutions for some of the instances of the challenge.
Toward an optimal solver for time-spectral fluid-dynamic and aeroelastic solutions on unstructured meshes

NASA Astrophysics Data System (ADS)

Mundis, Nathan L.; Mavriplis, Dimitri J.

2017-09-01

The time-spectral method applied to the Euler and coupled aeroelastic equations theoretically offers significant computational savings for purely periodic problems when compared to standard time-implicit methods. However, attaining superior efficiency with time-spectral methods over traditional time-implicit methods hinges on the ability rapidly to solve the large non-linear system resulting from time-spectral discretizations which become larger and stiffer as more time instances are employed or the period of the flow becomes especially short (i.e. the maximum resolvable wave-number increases). In order to increase the efficiency of these solvers, and to improve robustness, particularly for large numbers of time instances, the Generalized Minimal Residual Method (GMRES) is used to solve the implicit linear system over all coupled time instances. The use of GMRES as the linear solver makes time-spectral methods more robust, allows them to be applied to a far greater subset of time-accurate problems, including those with a broad range of harmonic content, and vastly improves the efficiency of time-spectral methods. In previous work, a wave-number independent preconditioner that mitigates the increased stiffness of the time-spectral method when applied to problems with large resolvable wave numbers has been developed. This preconditioner, however, directly inverts a large matrix whose size increases in proportion to the number of time instances. As a result, the computational time of this method scales as the cube of the number of time instances. In the present work, this preconditioner has been reworked to take advantage of an approximate-factorization approach that effectively decouples the spatial and temporal systems. Once decoupled, the time-spectral matrix can be inverted in frequency space, where it has entries only on the main diagonal and therefore can be inverted quite efficiently. This new GMRES/preconditioner combination is shown to be over an order of magnitude more efficient than the previous wave-number independent preconditioner for problems with large numbers of time instances and/or large reduced frequencies.
Detecting overlapping instances in microscopy images using extremal region trees.

PubMed

Arteta, Carlos; Lempitsky, Victor; Noble, J Alison; Zisserman, Andrew

2016-01-01

In many microscopy applications the images may contain both regions of low and high cell densities corresponding to different tissues or colonies at different stages of growth. This poses a challenge to most previously developed automated cell detection and counting methods, which are designed to handle either the low-density scenario (through cell detection) or the high-density scenario (through density estimation or texture analysis). The objective of this work is to detect all the instances of an object of interest in microscopy images. The instances may be partially overlapping and clustered. To this end we introduce a tree-structured discrete graphical model that is used to select and label a set of non-overlapping regions in the image by a global optimization of a classification score. Each region is labeled with the number of instances it contains - for example regions can be selected that contain two or three object instances, by defining separate classes for tuples of objects in the detection process. We show that this formulation can be learned within the structured output SVM framework and that the inference in such a model can be accomplished using dynamic programming on a tree structured region graph. Furthermore, the learning only requires weak annotations - a dot on each instance. The candidate regions for the selection are obtained as extremal region of a surface computed from the microscopy image, and we show that the performance of the model can be improved by considering a proxy problem for learning the surface that allows better selection of the extremal regions. Furthermore, we consider a number of variations for the loss function used in the structured output learning. The model is applied and evaluated over six quite disparate data sets of images covering: fluorescence microscopy, weak-fluorescence molecular images, phase contrast microscopy and histopathology images, and is shown to exceed the state of the art in performance. Copyright © 2015 Elsevier B.V. All rights reserved.

Future aircraft networks and schedules

NASA Astrophysics Data System (ADS)

Shu, Yan

2011-07-01

Because of the importance of air transportation scheduling, the emergence of small aircraft and the vision of future fuel-efficient aircraft, this thesis has focused on the study of aircraft scheduling and network design involving multiple types of aircraft and flight services. It develops models and solution algorithms for the schedule design problem and analyzes the computational results. First, based on the current development of small aircraft and on-demand flight services, this thesis expands a business model for integrating on-demand flight services with the traditional scheduled flight services. This thesis proposes a three-step approach to the design of aircraft schedules and networks from scratch under the model. In the first step, both a frequency assignment model for scheduled flights that incorporates a passenger path choice model and a frequency assignment model for on-demand flights that incorporates a passenger mode choice model are created. In the second step, a rough fleet assignment model that determines a set of flight legs, each of which is assigned an aircraft type and a rough departure time is constructed. In the third step, a timetable model that determines an exact departure time for each flight leg is developed. Based on the models proposed in the three steps, this thesis creates schedule design instances that involve almost all the major airports and markets in the United States. The instances of the frequency assignment model created in this thesis are large-scale non-convex mixed-integer programming problems, and this dissertation develops an overall network structure and proposes iterative algorithms for solving these instances. The instances of both the rough fleet assignment model and the timetable model created in this thesis are large-scale mixed-integer programming problems, and this dissertation develops subproblem schemes for solving these instances. Based on these solution algorithms, this dissertation also presents computational results of these large-scale instances. To validate the models and solution algorithms developed, this thesis also compares the daily flight schedules that it designs with the schedules of the existing airlines. Furthermore, it creates instances that represent different economic and fuel-prices conditions and derives schedules under these different conditions. In addition, it discusses the implication of using new aircraft in the future flight schedules. Finally, future research in three areas---model, computational method, and simulation for validation---is proposed.
Instance, Cue, and Dimension Learning in Concept Shift Task.

ERIC Educational Resources Information Center

Prentice, Joan L.; Panda, Kailas C.

Experiment I was designed to demonstrate that young children fail to abstract the positive cue as the relevant stimulus event in a restricted concept-learning task. Sixteen kindergarten and 16 fourth grade subjects were trained to criterion on a Kendler-type task, whereupon each subject was presented a pair of new instances which contrasted only…
Attributes of Instances of Student Mathematical Thinking That Are Worth Building on in Whole-Class Discussion

ERIC Educational Resources Information Center

Van Zoest, Laura R.; Stockero, Shari L.; Leatham, Keith R.; Peterson, Blake E.; Atanga, Napthalin A.; Ochieng, Mary A.

2017-01-01

This study investigated attributes of 278 instances of student mathematical thinking during whole-class interactions that were identified as having high potential, if made the object of discussion, to foster learners' understanding of important mathematical ideas. Attributes included the form of the thinking (e.g., question vs. declarative…
The Myth of Social Class and Criminality: An Empirical Assessment of the Empirical Evidence.

ERIC Educational Resources Information Center

Tittle, Charles R.; And Others

1978-01-01

Thirty-five studies examining the relationship between social class and crime/delinquency are reduced to comparable statistics using as units of analysis instances where the relationship was studied for specific categories of age, sex, race, place of residence, data type, or offense. Findings from 363 instances are summarized and patterns are…
Community Values and Unconventional Teacher Behavior: A National Canadian Study (1945-1985).

ERIC Educational Resources Information Center

Manley-Casimir, Michael; And Others

This study focuses on the tension between community norms/values and teacher behavior in Canada. A series of instances where teachers in public or denominational schools are accused of misconduct are presented, and these instances are traced from their beginnings to their ends (that is, to the resolution or acceptance of the social conflict). Part…
78 FR 15790 - Self-Regulatory Organizations; Financial Industry Regulatory Authority, Inc.; Order Granting...

Federal Register 2010, 2011, 2012, 2013, 2014

2013-03-12

... Security ``(ABS'') (except ABS traded To Be Announced (``TBA'')), in the limited instances when members... Factor in the limited instances when the member effects a transaction in an ABS (except a TBA transaction...-conforming Factor'').\\13\\ As a result of the proposed rule change, when an ABS transaction (except for a TBA...
Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach

Treesearch

F. Briggs; B. Lakshminarayanan; L. Neal; X.Z. Fern; R. Raich; S.F. Hadley; A.S. Hadley; M.G. Betts

2012-01-01

Although field-collected recordings typically contain multiple simultaneously vocalizing birds of different species, acoustic species classification in this setting has received little study so far. This work formulates the problem of classifying the set of species present in an audio recording using the multi-instance multi-label (MIML) framework for machine learning...
9 CFR 310.3 - Carcasses and parts in certain instances to be retained.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 9 Animals and Animal Products 2 2010-01-01 2010-01-01 false Carcasses and parts in certain instances to be retained. 310.3 Section 310.3 Animals and Animal Products FOOD SAFETY AND INSPECTION SERVICE... AND VOLUNTARY INSPECTION AND CERTIFICATION POST-MORTEM INSPECTION § 310.3 Carcasses and parts in...
2010 Workplace and Gender Relations Survey of Active Duty Members. Tabulations of Responses

DTIC Science & Technology

2011-04-01

Sexual Coercion can be defined as classic quid pro quo , instances of special treatment or favoritism conditional...defined as classic quid pro quo , instances of special treatment or favoritism conditional on sexual cooperation. Percentages Percent Responding...marked as happening to you, do you consider to have been sexual harassment ? ........................................... 342 31. Sexual Harassment
2006 Gender Relations of Active-Duty Members

DTIC Science & Technology

2008-03-01

sexual harassment (e.g., behaviors that might lead to a hostile work environment, or represent quid pro quo harassment ). Hostile work environments...attempts to establish a sexual relationship), and sexual coercion (classic quid pro quo instances of specific treatment or favoritism conditioned on... sexual relationship. Sexual coercion includes classic quid pro quo , instances of specific treatment or
Security Meets Real-World Computing. Building Digital Libraries

ERIC Educational Resources Information Center

Huwe, Terence K.

2005-01-01

The author of this column describes several instances in which secure data on computers were compromised. In each of these instances, a different route was involved in gaining access to the secure data--one by office-based theft, one by hacking, and one by burglary. Is is proposed that the most difficult factor to guarantee in the protection of…
Systems Thinking Evidence from Colleges of Business and Their Universities

ERIC Educational Resources Information Center

Seiler, John H.; Kowalsky, Michelle

2011-01-01

This study investigated instances of the term "systems thinking" among the websites of the Top 25 business schools as ranked by "U. S. News and World Report" in 2010. Since a greater number of instances of the term and its variants in a university's web documents may indicate an increased interest of the institution in the…
Text-Based Negotiated Interaction of NNS-NNS and NNS-NS Dyads on Facebook

ERIC Educational Resources Information Center

Liu, Sarah Hsueh-Jui

2017-01-01

This study sought to determine the difference in text-based negotiated interaction between non-native speakers of English (NNS-NNS) and between non-native and natives (NNS-NS) in terms of the frequency of negotiated instances, successfully resolved instances, and interactional strategy use when the dyads collaborated on Facebook. It involved 10…
Staff and Student Attitudes to Plagiarism at University College Northampton

ERIC Educational Resources Information Center

Pickard, Jill

2006-01-01

University College Northampton (UCN) provides undergraduate and postgraduate courses in a wide range of subjects. In the past, instances of plagiarism were considered rare and were dealt with by academic staff on a case-by-case basis. However, the increase in instances detected by staff has led to a need to address the issue more consistently. The…
Instance-based categorization: automatic versus intentional forms of retrieval.

PubMed

Neal, A; Hesketh, B; Andrews, S

1995-03-01

Two experiments are reported which attempt to disentangle the relative contribution of intentional and automatic forms of retrieval to instance-based categorization. A financial decision-making task was used in which subjects had to decide whether a bank would approve loans for a series of applicants. Experiment 1 found that categorization was sensitive to instance-specific knowledge, even when subjects had practiced using a simple rule. L. L. Jacoby's (1991) process-dissociation procedure was adapted for use in Experiment 2 to infer the relative contribution of intentional and automatic retrieval processes to categorization decisions. The results provided (1) strong evidence that intentional retrieval processes influence categorization, and (2) some preliminary evidence suggesting that automatic retrieval processes may also contribute to categorization decisions.
Influence of a non-hospital medical care facility on antimicrobial resistance in wastewater.

PubMed

Bäumlisberger, Mathias; Youssar, Loubna; Schilhabel, Markus B; Jonas, Daniel

2015-01-01

The global widespread use of antimicrobials and accompanying increase in resistant bacterial strains is of major public health concern. Wastewater systems and wastewater treatment plants are considered a niche for antibiotic resistance genes (ARGs), with diverse microbial communities facilitating ARG transfer via mobile genetic element (MGE). In contrast to hospital sewage, wastewater from other health care facilities is still poorly investigated. At the instance of a nursing home located in south-west Germany, in the present study, shotgun metagenomics was used to investigate the impact on wastewater of samples collected up- and down-stream in different seasons. Microbial composition, ARGs and MGEs were analyzed using different annotation approaches with various databases, including Antibiotic Resistance Ontologies (ARO), integrons and plasmids. Our analysis identified seasonal differences in microbial communities and abundance of ARG and MGE between samples from different seasons. However, no obvious differences were detected between up- and downstream samples. The results suggest that, in contrast to hospitals, sewage from the nursing home does not have a major impact on ARG or MGE in wastewater, presumably due to much less intense antimicrobial usage. Possible limitations of metagenomic studies using high-throughput sequencing for detection of genes that seemingly confer antibiotic resistance are discussed.
Attitudes toward unauthorized immigrants, authorized immigrants, and refugees.

PubMed

Murray, Kate E; Marx, David M

2013-07-01

Rates of human migration are steadily rising and have resulted in significant sociopolitical debates over how to best respond to increasing cultural diversity and changing migration patterns. Research on prejudicial attitudes toward immigrants has focused on the attitudes and beliefs that individuals in the receiving country hold about immigrants. The current study enhances this literature by examining how young adults view authorized and unauthorized immigrants and refugees. Using a between-groups design of 191 undergraduates, we found that participants consistently reported more prejudicial attitudes, greater perceived realistic threats, and greater intergroup anxiety when responding to questions about unauthorized compared with authorized immigrants. Additionally, there were differences in attitudes depending on participants' generational status, with older-generation participants reporting greater perceived realistic and symbolic threat, prejudice, and anxiety than newer-generation students. In some instances, these effects were moderated by participant race/ethnicity and whether they were evaluating authorized or unauthorized immigrants. Lastly, perceived realistic threat, symbolic threat, and intergroup anxiety were significant predictors of prejudicial attitudes. Overall, participants reported positive attitudes toward refugees and resettlement programs in the United States. These findings have implications for future research and interventions focused on immigration and prejudice toward migrant groups. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Composite Bloom Filters for Secure Record Linkage.

PubMed

Durham, Elizabeth Ashley; Kantarcioglu, Murat; Xue, Yuan; Toth, Csaba; Kuzu, Mehmet; Malin, Bradley

2014-12-01

The process of record linkage seeks to integrate instances that correspond to the same entity. Record linkage has traditionally been performed through the comparison of identifying field values ( e.g., Surname ), however, when databases are maintained by disparate organizations, the disclosure of such information can breach the privacy of the corresponding individuals. Various private record linkage (PRL) methods have been developed to obscure such identifiers, but they vary widely in their ability to balance competing goals of accuracy, efficiency and security. The tokenization and hashing of field values into Bloom filters (BF) enables greater linkage accuracy and efficiency than other PRL methods, but the encodings may be compromised through frequency-based cryptanalysis. Our objective is to adapt a BF encoding technique to mitigate such attacks with minimal sacrifices in accuracy and efficiency. To accomplish these goals, we introduce a statistically-informed method to generate BF encodings that integrate bits from multiple fields, the frequencies of which are provably associated with a minimum number of fields. Our method enables a user-specified tradeoff between security and accuracy. We compare our encoding method with other techniques using a public dataset of voter registration records and demonstrate that the increases in security come with only minor losses to accuracy.
Representing and querying now-relative relational medical data.

PubMed

Anselma, Luca; Piovesan, Luca; Stantic, Bela; Terenziani, Paolo

2018-03-01

Temporal information plays a crucial role in medicine. Patients' clinical records are intrinsically temporal. Thus, in Medical Informatics there is an increasing need to store, support and query temporal data (particularly in relational databases), in order, for instance, to supplement decision-support systems. In this paper, we show that current approaches to relational data have remarkable limitations in the treatment of "now-relative" data (i.e., data holding true at the current time). This can severely compromise their applicability in general, and specifically in the medical context, where "now-relative" data are essential to assess the current status of the patients. We propose a theoretically grounded and application-independent relational approach to cope with now-relative data (which can be paired, e.g., with different decision support systems) overcoming such limitations. We propose a new temporal relational representation, which is the first relational model coping with the temporal indeterminacy intrinsic in now-relative data. We also propose new temporal algebraic operators to query them, supporting the distinction between possible and necessary time, and Allen's temporal relations between data. We exemplify the impact of our approach, and study the theoretical and computational properties of the new representation and algebra. Copyright © 2018 Elsevier B.V. All rights reserved.
Expanding the boundaries of evaluative learning research: How intersecting regularities shape our likes and dislikes.

PubMed

Hughes, Sean; De Houwer, Jan; Perugini, Marco

2016-06-01

Over the last 30 years, researchers have identified several types of procedures through which novel preferences may be formed and existing ones altered. For instance, regularities in the presence of a single stimulus (as in the case of mere exposure) or 2 or more stimuli (as in the case of evaluative conditioning) have been shown to influence liking. We propose that intersections between regularities represent a previously unrecognized class of procedures for changing liking. Across 4 related studies, we found strong support for the hypothesis that when environmental regularities intersect with one another (i.e., share elements or have elements that share relations with other elements), the evaluative properties of the elements of those regularities can change. These changes in liking were observed across a range of stimuli and procedures and were evident when self-report measures, implicit measures, and behavioral choice measures of liking were employed. Functional and mental explanations of this phenomenon are offered followed by a discussion of how this new type of evaluative learning effect can accelerate theoretical, methodological, and empirical development in attitude research. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.