2013-01-01
commercial NoSQL database system. The results show that In-dexedHBase provides a data loading speed that is 6 times faster than Riak, and is...compare it with Riak, a widely adopted commercial NoSQL database system. The results show that In- dexedHBase provides a data loading speed that is 6...events. This chapter describes our research towards building an efficient and scalable storage platform for Truthy. Many existing NoSQL databases
ERIC Educational Resources Information Center
Fitzgibbons, Megan; Meert, Deborah
2010-01-01
The use of bibliographic management software and its internal search interfaces is now pervasive among researchers. This study compares the results between searches conducted in academic databases' search interfaces versus the EndNote search interface. The results show mixed search reliability, depending on the database and type of search…
Evaluating the Impact of Database Heterogeneity on Observational Study Results
Madigan, David; Ryan, Patrick B.; Schuemie, Martijn; Stang, Paul E.; Overhage, J. Marc; Hartzema, Abraham G.; Suchard, Marc A.; DuMouchel, William; Berlin, Jesse A.
2013-01-01
Clinical studies that use observational databases to evaluate the effects of medical products have become commonplace. Such studies begin by selecting a particular database, a decision that published papers invariably report but do not discuss. Studies of the same issue in different databases, however, can and do generate different results, sometimes with strikingly different clinical implications. In this paper, we systematically study heterogeneity among databases, holding other study methods constant, by exploring relative risk estimates for 53 drug-outcome pairs and 2 widely used study designs (cohort studies and self-controlled case series) across 10 observational databases. When holding the study design constant, our analysis shows that estimated relative risks range from a statistically significant decreased risk to a statistically significant increased risk in 11 of 53 (21%) of drug-outcome pairs that use a cohort design and 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design. This exceeds the proportion of pairs that were consistent across databases in both direction and statistical significance, which was 9 of 53 (17%) for cohort studies and 5 of 53 (9%) for self-controlled case series. Our findings show that clinical studies that use observational databases can be sensitive to the choice of database. More attention is needed to consider how the choice of data source may be affecting results. PMID:23648805
[Validation of interaction databases in psychopharmacotherapy].
Hahn, M; Roll, S C
2018-03-01
Drug-drug interaction databases are an important tool to increase drug safety in polypharmacy. There are several drug interaction databases available but it is unclear which one shows the best results and therefore increases safety for the user of the databases and the patients. So far, there has been no validation of German drug interaction databases. Validation of German drug interaction databases regarding the number of hits, mechanisms of drug interaction, references, clinical advice, and severity of the interaction. A total of 36 drug interactions which were published in the last 3-5 years were checked in 5 different databases. Besides the number of hits, it was also documented if the mechanism was correct, clinical advice was given, primary literature was cited, and the severity level of the drug-drug interaction was given. All databases showed weaknesses regarding the hit rate of the tested drug interactions, with a maximum of 67.7% hits. The highest score in this validation was achieved by MediQ with 104 out of 180 points. PsiacOnline achieved 83 points, arznei-telegramm® 58, ifap index® 54 and the ABDA-database 49 points. Based on this validation MediQ seems to be the most suitable databank for the field of psychopharmacotherapy. The best results in this comparison were achieved by MediQ but this database also needs improvement with respect to the hit rate so that the users can rely on the results and therefore increase drug therapy safety.
Duchrow, Timo; Shtatland, Timur; Guettler, Daniel; Pivovarov, Misha; Kramer, Stefan; Weissleder, Ralph
2009-01-01
Background The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them. Results Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly. Conclusion Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases. The system can be accessed at . PMID:19799796
Partitioning medical image databases for content-based queries on a Grid.
Montagnat, J; Breton, V; E Magnin, I
2005-01-01
In this paper we study the impact of executing a medical image database query application on the grid. For lowering the total computation time, the image database is partitioned into subsets to be processed on different grid nodes. A theoretical model of the application complexity and estimates of the grid execution overhead are used to efficiently partition the database. We show results demonstrating that smart partitioning of the database can lead to significant improvements in terms of total computation time. Grids are promising for content-based image retrieval in medical databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Park, Yubin; Shankar, Mallikarjun; Park, Byung H.
Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcaremore » RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed 3NF Equivalent Graph (3EG) transformation.We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.« less
NASA Aeronautics and Space Database for bibliometric analysis
NASA Technical Reports Server (NTRS)
Powers, R.; Rudman, R.
2004-01-01
The authors use the NASA Aeronautics and Space Database to perform bibliometric analysis of citations. This paper explains their research methodology and gives some sample results showing collaboration trends between NASA Centers and other institutions.
Vail, Paris J; Morris, Brian; van Kan, Aric; Burdett, Brianna C; Moyes, Kelsey; Theisen, Aaron; Kerr, Iain D; Wenstrup, Richard J; Eggington, Julie M
2015-10-01
Genetic variants of uncertain clinical significance (VUSs) are a common outcome of clinical genetic testing. Locus-specific variant databases (LSDBs) have been established for numerous disease-associated genes as a research tool for the interpretation of genetic sequence variants to facilitate variant interpretation via aggregated data. If LSDBs are to be used for clinical practice, consistent and transparent criteria regarding the deposition and interpretation of variants are vital, as variant classifications are often used to make important and irreversible clinical decisions. In this study, we performed a retrospective analysis of 2017 consecutive BRCA1 and BRCA2 genetic variants identified from 24,650 consecutive patient samples referred to our laboratory to establish an unbiased dataset representative of the types of variants seen in the US patient population, submitted by clinicians and researchers for BRCA1 and BRCA2 testing. We compared the clinical classifications of these variants among five publicly accessible BRCA1 and BRCA2 variant databases: BIC, ClinVar, HGMD (paid version), LOVD, and the UMD databases. Our results show substantial disparity of variant classifications among publicly accessible databases. Furthermore, it appears that discrepant classifications are not the result of a single outlier but widespread disagreement among databases. This study also shows that databases sometimes favor a clinical classification when current best practice guidelines (ACMG/AMP/CAP) would suggest an uncertain classification. Although LSDBs have been well established for research applications, our results suggest several challenges preclude their wider use in clinical practice.
Arrhythmia Evaluation in Wearable ECG Devices
Sadrawi, Muammar; Lin, Chien-Hung; Hsieh, Yita; Kuo, Chia-Chun; Chien, Jen Chien; Haraikawa, Koichi; Abbod, Maysam F.; Shieh, Jiann-Shing
2017-01-01
This study evaluates four databases from PhysioNet: The American Heart Association database (AHADB), Creighton University Ventricular Tachyarrhythmia database (CUDB), MIT-BIH Arrhythmia database (MITDB), and MIT-BIH Noise Stress Test database (NSTDB). The ANSI/AAMI EC57:2012 is used for the evaluation of the algorithms for the supraventricular ectopic beat (SVEB), ventricular ectopic beat (VEB), atrial fibrillation (AF), and ventricular fibrillation (VF) via the evaluation of the sensitivity, positive predictivity and false positive rate. Sample entropy, fast Fourier transform (FFT), and multilayer perceptron neural network with backpropagation training algorithm are selected for the integrated detection algorithms. For this study, the result for SVEB has some improvements compared to a previous study that also utilized ANSI/AAMI EC57. In further, VEB sensitivity and positive predictivity gross evaluations have greater than 80%, except for the positive predictivity of the NSTDB database. For AF gross evaluation of MITDB database, the results show very good classification, excluding the episode sensitivity. In advanced, for VF gross evaluation, the episode sensitivity and positive predictivity for the AHADB, MITDB, and CUDB, have greater than 80%, except for MITDB episode positive predictivity, which is 75%. The achieved results show that the proposed integrated SVEB, VEB, AF, and VF detection algorithm has an accurate classification according to ANSI/AAMI EC57:2012. In conclusion, the proposed integrated detection algorithm can achieve good accuracy in comparison with other previous studies. Furthermore, more advanced algorithms and hardware devices should be performed in future for arrhythmia detection and evaluation. PMID:29068369
Identifying relevant data for a biological database: handcrafted rules versus machine learning.
Sehgal, Aditya Kumar; Das, Sanmay; Noto, Keith; Saier, Milton H; Elkan, Charles
2011-01-01
With well over 1,000 specialized biological databases in use today, the task of automatically identifying novel, relevant data for such databases is increasingly important. In this paper, we describe practical machine learning approaches for identifying MEDLINE documents and Swiss-Prot/TrEMBL protein records, for incorporation into a specialized biological database of transport proteins named TCDB. We show that both learning approaches outperform rules created by hand by a human expert. As one of the first case studies involving two different approaches to updating a deployed database, both the methods compared and the results will be of interest to curators of many specialized databases.
On patterns and re-use in bioinformatics databases
Bell, Michael J.; Lord, Phillip
2017-01-01
Abstract Motivation: As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. Results: We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Availability and implementation: Analytical software is available on request. Contact: phillip.lord@newcastle.ac.uk PMID:28525546
Distribution Characteristics of Air-Bone Gaps – Evidence of Bias in Manual Audiometry
Margolis, Robert H.; Wilson, Richard H.; Popelka, Gerald R.; Eikelboom, Robert H.; Swanepoel, De Wet; Saly, George L.
2015-01-01
Objective Five databases were mined to examine distributions of air-bone gaps obtained by automated and manual audiometry. Differences in distribution characteristics were examined for evidence of influences unrelated to the audibility of test signals. Design The databases provided air- and bone-conduction thresholds that permitted examination of air-bone gap distributions that were free of ceiling and floor effects. Cases with conductive hearing loss were eliminated based on air-bone gaps, tympanometry, and otoscopy, when available. The analysis is based on 2,378,921 threshold determinations from 721,831 subjects from five databases. Results Automated audiometry produced air-bone gaps that were normally distributed suggesting that air- and bone-conduction thresholds are normally distributed. Manual audiometry produced air-bone gaps that were not normally distributed and show evidence of biasing effects of assumptions of expected results. In one database, the form of the distributions showed evidence of inclusion of conductive hearing losses. Conclusions Thresholds obtained by manual audiometry show tester bias effects from assumptions of the patient’s hearing loss characteristics. Tester bias artificially reduces the variance of bone-conduction thresholds and the resulting air-bone gaps. Because the automated method is free of bias from assumptions of expected results, these distributions are hypothesized to reflect the true variability of air- and bone-conduction thresholds and the resulting air-bone gaps. PMID:26627469
Average probability that a "cold hit" in a DNA database search results in an erroneous attribution.
Song, Yun S; Patil, Anand; Murphy, Erin E; Slatkin, Montgomery
2009-01-01
We consider a hypothetical series of cases in which the DNA profile of a crime-scene sample is found to match a known profile in a DNA database (i.e., a "cold hit"), resulting in the identification of a suspect based only on genetic evidence. We show that the average probability that there is another person in the population whose profile matches the crime-scene sample but who is not in the database is approximately 2(N - d)p(A), where N is the number of individuals in the population, d is the number of profiles in the database, and p(A) is the average match probability (AMP) for the population. The AMP is estimated by computing the average of the probabilities that two individuals in the population have the same profile. We show further that if a priori each individual in the population is equally likely to have left the crime-scene sample, then the average probability that the database search attributes the crime-scene sample to a wrong person is (N - d)p(A).
Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs
2015-01-01
Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
Upgrade Summer Severe Weather Tool
NASA Technical Reports Server (NTRS)
Watson, Leela
2011-01-01
The goal of this task was to upgrade to the existing severe weather database by adding observations from the 2010 warm season, update the verification dataset with results from the 2010 warm season, use statistical logistic regression analysis on the database and develop a new forecast tool. The AMU analyzed 7 stability parameters that showed the possibility of providing guidance in forecasting severe weather, calculated verification statistics for the Total Threat Score (TTS), and calculated warm season verification statistics for the 2010 season. The AMU also performed statistical logistic regression analysis on the 22-year severe weather database. The results indicated that the logistic regression equation did not show an increase in skill over the previously developed TTS. The equation showed less accuracy than TTS at predicting severe weather, little ability to distinguish between severe and non-severe weather days, and worse standard categorical accuracy measures and skill scores over TTS.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Lozano-Rubí, Raimundo; Serrano-Balazote, Pablo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2017-08-18
The objective of this research is to compare the relational and non-relational (NoSQL) database systems approaches in order to store, recover, query and persist standardized medical information in the form of ISO/EN 13606 normalized Electronic Health Record XML extracts, both in isolation and concurrently. NoSQL database systems have recently attracted much attention, but few studies in the literature address their direct comparison with relational databases when applied to build the persistence layer of a standardized medical information system. One relational and two NoSQL databases (one document-based and one native XML database) of three different sizes have been created in order to evaluate and compare the response times (algorithmic complexity) of six different complexity growing queries, which have been performed on them. Similar appropriate results available in the literature have also been considered. Relational and non-relational NoSQL database systems show almost linear algorithmic complexity query execution. However, they show very different linear slopes, the former being much steeper than the two latter. Document-based NoSQL databases perform better in concurrency than in isolation, and also better than relational databases in concurrency. Non-relational NoSQL databases seem to be more appropriate than standard relational SQL databases when database size is extremely high (secondary use, research applications). Document-based NoSQL databases perform in general better than native XML NoSQL databases. EHR extracts visualization and edition are also document-based tasks more appropriate to NoSQL database systems. However, the appropriate database solution much depends on each particular situation and specific problem.
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-01-01
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form. PMID:29608174
Sánchez-de-Madariaga, Ricardo; Muñoz, Adolfo; Castro, Antonio L; Moreno, Oscar; Pascual, Mario
2018-03-19
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist. The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist. NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases. One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
Yoo, Do Hyeon; Shin, Wook-Geun; Lee, Jaekook; Yeom, Yeon Soo; Kim, Chan Hyeong; Chang, Byung-Uck; Min, Chul Hee
2017-11-01
After the Fukushima accident in Japan, the Korean Government implemented the "Act on Protective Action Guidelines Against Radiation in the Natural Environment" to regulate unnecessary radiation exposure to the public. However, despite the law which came into effect in July 2012, an appropriate method to evaluate the equivalent and effective doses from naturally occurring radioactive material (NORM) in consumer products is not available. The aim of the present study is to develop and validate an effective dose coefficient database enabling the simple and correct evaluation of the effective dose due to the usage of NORM-added consumer products. To construct the database, we used a skin source method with a computational human phantom and Monte Carlo (MC) simulation. For the validation, the effective dose was compared between the database using interpolation method and the original MC method. Our result showed a similar equivalent dose across the 26 organs and a corresponding average dose between the database and the MC calculations of < 5% difference. The differences in the effective doses were even less, and the result generally show that equivalent and effective doses can be quickly calculated with the database with sufficient accuracy. Copyright © 2017 Elsevier Ltd. All rights reserved.
Performance analysis of different database in new internet mapping system
NASA Astrophysics Data System (ADS)
Yao, Xing; Su, Wei; Gao, Shuai
2017-03-01
In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
NASA Astrophysics Data System (ADS)
Ritter, Nils C.; Sowa, Roman; Schauer, Jan C.; Gruber, Daniel; Goehler, Thomas; Rettig, Ralf; Povoden-Karadeniz, Erwin; Koerner, Carolin; Singer, Robert F.
2018-06-01
We prepared 41 different superalloy compositions by an arc melting, casting, and heat treatment process. Alloy solid solution strengthening elements were added in graded amounts, and we measured the solidus, liquidus, and γ'-solvus temperatures of the samples by DSC. The γ'-phase fraction increased as the W, Mo, and Re contents were increased, and W showed the most pronounced effect. Ru decreased the γ'-phase fraction. Melting temperatures (i.e., solidus and liquidus) were increased by addition of Re, W, and Ru (the effect increased in that order). Addition of Mo decreased the melting temperature. W was effective as a strengthening element because it acted as a solid solution strengthener and increased the fraction of fine γ'-precipitates, thus improving precipitation strengthening. Experimentally determined values were compared with calculated values based on the CALPHAD software tools Thermo-Calc (databases: TTNI8 and TCNI6) and MatCalc (database ME-NI). The ME-NI database, which was specially adapted to the present investigation, showed good agreement. TTNI8 also showed good results. The TCNI6 database is suitable for computational design of complex nickel-based superalloys. However, a large deviation remained between the experiment results and calculations based on this database. It also erroneously predicted γ'-phase separations and failed to describe the Ru-effect on transition temperatures.
Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A; McGowan, Thomas; Wroblewski, Matthew S; Seymour, Sean L; Griffin, Timothy J
2013-04-01
Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Applying cognitive load theory to the redesign of a conventional database systems course
NASA Astrophysics Data System (ADS)
Mason, Raina; Seton, Carolyn; Cooper, Graham
2016-01-01
Cognitive load theory (CLT) was used to redesign a Database Systems course for Information Technology students. The redesign was intended to address poor student performance and low satisfaction, and to provide a more relevant foundation in database design and use for subsequent studies and industry. The original course followed the conventional structure for a database course, covering database design first, then database development. Analysis showed the conventional course content was appropriate but the instructional materials used were too complex, especially for novice students. The redesign of instructional materials applied CLT to remove split attention and redundancy effects, to provide suitable worked examples and sub-goals, and included an extensive re-sequencing of content. The approach was primarily directed towards mid- to lower performing students and results showed a significant improvement for this cohort with the exam failure rate reducing by 34% after the redesign on identical final exams. Student satisfaction also increased and feedback from subsequent study was very positive. The application of CLT to the design of instructional materials is discussed for delivery of technical courses.
Ryan, Patrick B.; Schuemie, Martijn
2013-01-01
Background: Clinical studies that use observational databases, such as administrative claims and electronic health records, to evaluate the effects of medical products have become commonplace. These studies begin by selecting a particular study design, such as a case control, cohort, or self-controlled design, and different authors can and do choose different designs for the same clinical question. Furthermore, published papers invariably report the study design but do not discuss the rationale for the specific choice. Studies of the same clinical question with different designs, however, can generate different results, sometimes with strikingly different implications. Even within a specific study design, authors make many different analytic choices and these too can profoundly impact results. In this paper, we systematically study heterogeneity due to the type of study design and due to analytic choices within study design. Methods and findings: We conducted our analysis in 10 observational healthcare databases but mostly present our results in the context of the GE Centricity EMR database, an electronic health record database containing data for 11.2 million lives. We considered the impact of three different study design choices on estimates of associations between bisphosphonates and four particular health outcomes for which there is no evidence of an association. We show that applying alternative study designs can yield discrepant results, in terms of direction and significance of association. We also highlight that while traditional univariate sensitivity analysis may not show substantial variation, systematic assessment of all analytical choices within a study design can yield inconsistent results ranging from statistically significant decreased risk to statistically significant increased risk. Our findings show that clinical studies using observational databases can be sensitive both to study design choices and to specific analytic choices within study design. Conclusion: More attention is needed to consider how design choices may be impacting results and, when possible, investigators should examine a wide array of possible choices to confirm that significant findings are consistently identified. PMID:25083251
Khan, Aihab; Husain, Syed Afaq
2013-01-01
We put forward a fragile zero watermarking scheme to detect and characterize malicious modifications made to a database relation. Most of the existing watermarking schemes for relational databases introduce intentional errors or permanent distortions as marks into the database original content. These distortions inevitably degrade the data quality and data usability as the integrity of a relational database is violated. Moreover, these fragile schemes can detect malicious data modifications but do not characterize the tempering attack, that is, the nature of tempering. The proposed fragile scheme is based on zero watermarking approach to detect malicious modifications made to a database relation. In zero watermarking, the watermark is generated (constructed) from the contents of the original data rather than introduction of permanent distortions as marks into the data. As a result, the proposed scheme is distortion-free; thus, it also resolves the inherent conflict between security and imperceptibility. The proposed scheme also characterizes the malicious data modifications to quantify the nature of tempering attacks. Experimental results show that even minor malicious modifications made to a database relation can be detected and characterized successfully.
Projections for fast protein structure retrieval
Bhattacharya, Sourangshu; Bhattacharyya, Chiranjib; Chandra, Nagasuma R
2006-01-01
Background In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. Results Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali. PMID:17254310
Kokol, Peter; Vošner, Helena Blažun
2018-01-01
The overall aim of the present study was to compare the coverage of existing research funding information for articles indexed in Scopus, Web of Science, and PubMed databases. The numbers of articles with funding information published in 2015 were identified in the three selected databases and compared using bibliometric analysis of a sample of twenty-eight prestigious medical journals. Frequency analysis of the number of articles with funding information showed statistically significant differences between Scopus, Web of Science, and PubMed databases. The largest proportion of articles with funding information was found in Web of Science (29.0%), followed by PubMed (14.6%) and Scopus (7.7%). The results show that coverage of funding information differs significantly among Scopus, Web of Science, and PubMed databases in a sample of the same medical journals. Moreover, we found that, currently, funding data in PubMed is more difficult to obtain and analyze compared with that in the other two databases.
Fusion of Dependent and Independent Biometric Information Sources
2005-03-01
palmprint , DNA, ECG, signature, etc. The comparison of various biometric techniques is given in [13] and is presented in Table 1. Since, each...theory. Experimental studies on the M2VTS database [32] showed that a reduction in error rates is up to about 40%. Four combination strategies are...taken from the CEDAR benchmark database . The word recognition results were the highest (91%) among published results for handwritten words (before 2001
QED's School Market Trends: Teacher Buying Behavior & Attitudes, 2001-2002. Research Report.
ERIC Educational Resources Information Center
Quality Education Data, Inc., Denver, CO.
This study examined teachers' classroom material buying behaviors and trends. Data came from Quality Education Data's National Education Database, which includes U.S. K-12 public, private, and Catholic schools and districts. Researchers surveyed K-8 teachers randomly selected from QED's National Education Database. Results show that teachers spend…
A multi-center ring trial for the identification of anaerobic bacteria using MALDI-TOF MS.
Veloo, A C M; Jean-Pierre, H; Justesen, U S; Morris, T; Urban, E; Wybo, I; Shah, H N; Friedrich, A W; Morris, T; Shah, H N; Jean-Pierre, H; Justesen, U S; Nagy, E; Urban, E; Kostrzewa, M; Veloo, A; Friedrich, A W
2017-12-01
Inter-laboratory reproducibility of Matrix Assisted Laser Desorption Time-of-Flight Mass Spectrometry (MALDI-TOF MS) of anaerobic bacteria has not been shown before. Therefore, ten anonymized anaerobic strains were sent to seven participating laboratories, an initiative of the European Network for the Rapid Identification of Anaerobes (ENRIA). On arrival the strains were cultured and identified using MALDI-TOF MS. The spectra derived were compared with two different Biotyper MALDI-TOF MS databases, the db5627 and the db6903. The results obtained using the db5627 shows a reasonable variation between the different laboratories. However, when a more optimized database is used, the variation is less pronounced. In this study we show that an optimized database not only results in a higher number of strains which can be identified using MALDI-TOF MS, but also corrects for differences in performance between laboratories. Copyright © 2017 Elsevier Ltd. All rights reserved.
Comparing features sets for content-based image retrieval in a medical-case database
NASA Astrophysics Data System (ADS)
Muller, Henning; Rosset, Antoine; Vallee, Jean-Paul; Geissbuhler, Antoine
2004-04-01
Content-based image retrieval systems (CBIRSs) have frequently been proposed for the use in medical image databases and PACS. Still, only few systems were developed and used in a real clinical environment. It rather seems that medical professionals define their needs and computer scientists develop systems based on data sets they receive with little or no interaction between the two groups. A first study on the diagnostic use of medical image retrieval also shows an improvement in diagnostics when using CBIRSs which underlines the potential importance of this technique. This article explains the use of an open source image retrieval system (GIFT - GNU Image Finding Tool) for the retrieval of medical images in the medical case database system CasImage that is used in daily, clinical routine in the university hospitals of Geneva. Although the base system of GIFT shows an unsatisfactory performance, already little changes in the feature space show to significantly improve the retrieval results. The performance of variations in feature space with respect to color (gray level) quantizations and changes in texture analysis (Gabor filters) is compared. Whereas stock photography relies mainly on colors for retrieval, medical images need a large number of gray levels for successful retrieval, especially when executing feedback queries. The results also show that a too fine granularity in the gray levels lowers the retrieval quality, especially with single-image queries. For the evaluation of the retrieval peformance, a subset of the entire case database of more than 40,000 images is taken with a total of 3752 images. Ground truth was generated by a user who defined the expected query result of a perfect system by selecting images relevant to a given query image. The results show that a smaller number of gray levels (32 - 64) leads to a better retrieval performance, especially when using relevance feedback. The use of more scales and directions for the Gabor filters in the texture analysis also leads to improved results but response time is going up equally due to the larger feature space. CBIRSs can be of great use in managing large medical image databases. They allow to find images that might otherwise be lost for research and publications. They also give students students the possibility to navigate within large image repositories. In the future, CBIR might also become more important in case-based reasoning and evidence-based medicine to support the diagnostics because first studies show good results.
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data
2014-01-01
Background Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. Results We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Conclusions Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility. PMID:25089180
Collaborative WiFi Fingerprinting Using Sensor-Based Navigation on Smartphones.
Zhang, Peng; Zhao, Qile; Li, You; Niu, Xiaoji; Zhuang, Yuan; Liu, Jingnan
2015-07-20
This paper presents a method that trains the WiFi fingerprint database using sensor-based navigation solutions. Since micro-electromechanical systems (MEMS) sensors provide only a short-term accuracy but suffer from the accuracy degradation with time, we restrict the time length of available indoor navigation trajectories, and conduct post-processing to improve the sensor-based navigation solution. Different middle-term navigation trajectories that move in and out of an indoor area are combined to make up the database. Furthermore, we evaluate the effect of WiFi database shifts on WiFi fingerprinting using the database generated by the proposed method. Results show that the fingerprinting errors will not increase linearly according to database (DB) errors in smartphone-based WiFi fingerprinting applications.
Collaborative WiFi Fingerprinting Using Sensor-Based Navigation on Smartphones
Zhang, Peng; Zhao, Qile; Li, You; Niu, Xiaoji; Zhuang, Yuan; Liu, Jingnan
2015-01-01
This paper presents a method that trains the WiFi fingerprint database using sensor-based navigation solutions. Since micro-electromechanical systems (MEMS) sensors provide only a short-term accuracy but suffer from the accuracy degradation with time, we restrict the time length of available indoor navigation trajectories, and conduct post-processing to improve the sensor-based navigation solution. Different middle-term navigation trajectories that move in and out of an indoor area are combined to make up the database. Furthermore, we evaluate the effect of WiFi database shifts on WiFi fingerprinting using the database generated by the proposed method. Results show that the fingerprinting errors will not increase linearly according to database (DB) errors in smartphone-based WiFi fingerprinting applications. PMID:26205269
Generation of an Aerothermal Data Base for the X33 Spacecraft
NASA Technical Reports Server (NTRS)
Roberts, Cathy; Huynh, Loc
1998-01-01
The X-33 experimental program is a cooperative program between industry and NASA, managed by Lockheed-Martin Skunk Works to develop an experimental vehicle to demonstrate new technologies for a single-stage-to-orbit, fully reusable launch vehicle (RLV). One of the new technologies to be demonstrated is an advanced Thermal Protection System (TPS) being designed by BF Goodrich (formerly Rohr, Inc.) with support from NASA. The calculation of an aerothermal database is crucial to identifying the critical design environment data for the TPS. The NASA Ames X-33 team has generated such a database using Computational Fluid Dynamics (CFD) analyses, engineering analysis methods and various programs to compare and interpolate the results from the CFD and the engineering analyses. This database, along with a program used to query the database, is used extensively by several X-33 team members to help them in designing the X-33. This paper will describe the methods used to generate this database, the program used to query the database, and will show some of the aerothermal analysis results for the X-33 aircraft.
Analysis of Landslide Hazard Impact Using the Landslide Database for Germany
NASA Astrophysics Data System (ADS)
Klose, M.; Damm, B.
2014-12-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still shows a comprehensive research history in Germany, but only one focused on development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present contribution reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this contribution, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of different case studies in the German Central Uplands. The case study results exemplify database application in analysis of vulnerability to landslides, impact statistics, and hazard or cost modeling.
Lee, Ken Ka-Yin; Tang, Wai-Choi; Choi, Kup-Sze
2013-04-01
Clinical data are dynamic in nature, often arranged hierarchically and stored as free text and numbers. Effective management of clinical data and the transformation of the data into structured format for data analysis are therefore challenging issues in electronic health records development. Despite the popularity of relational databases, the scalability of the NoSQL database model and the document-centric data structure of XML databases appear to be promising features for effective clinical data management. In this paper, three database approaches--NoSQL, XML-enabled and native XML--are investigated to evaluate their suitability for structured clinical data. The database query performance is reported, together with our experience in the databases development. The results show that NoSQL database is the best choice for query speed, whereas XML databases are advantageous in terms of scalability, flexibility and extensibility, which are essential to cope with the characteristics of clinical data. While NoSQL and XML technologies are relatively new compared to the conventional relational database, both of them demonstrate potential to become a key database technology for clinical data management as the technology further advances. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Performance assessment of EMR systems based on post-relational database.
Yu, Hai-Yan; Li, Jing-Song; Zhang, Xiao-Guang; Tian, Yu; Suzuki, Muneou; Araki, Kenji
2012-08-01
Post-relational databases provide high performance and are currently widely used in American hospitals. As few hospital information systems (HIS) in either China or Japan are based on post-relational databases, here we introduce a new-generation electronic medical records (EMR) system called Hygeia, which was developed with the post-relational database Caché and the latest platform Ensemble. Utilizing the benefits of a post-relational database, Hygeia is equipped with an "integration" feature that allows all the system users to access data-with a fast response time-anywhere and at anytime. Performance tests of databases in EMR systems were implemented in both China and Japan. First, a comparison test was conducted between a post-relational database, Caché, and a relational database, Oracle, embedded in the EMR systems of a medium-sized first-class hospital in China. Second, a user terminal test was done on the EMR system Izanami, which is based on the identical database Caché and operates efficiently at the Miyazaki University Hospital in Japan. The results proved that the post-relational database Caché works faster than the relational database Oracle and showed perfect performance in the real-time EMR system.
NASA Astrophysics Data System (ADS)
Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.
2013-09-01
Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying Lateral Boundary Conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2000-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complimented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone vertical profiles. The results show performance is largely within uncertainty estimates for the Tropospheric Emission Spectrometer (TES) with some exceptions. The major difference shows a high bias in the upper troposphere along the southern boundary in January. This publication documents the global simulation database, the tool for conversion to LBC, and the fidelity of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.
Long-range (fractal) correlations in the LEDA database.
NASA Astrophysics Data System (ADS)
di Nella, H.; Montuori, M.; Paturel, G.; Pietronero, L.; Sylos Labini, F.
1996-04-01
All the recent redshift surveys show highly irregular patterns of galaxies on scales of hundreds of megaparsecs such as chains, walls and cells. One of the most powerful catalog of galaxies is represented by the LEDA database that contains more than 36,000 galaxies with redshift. We study the correlation properties of such a sample finding that galaxy distribution shows well defined fractal nature up to R_S_~150h^-1^Mpc with fractal dimension D~2. We test the consistency of these results versus the incompleteness in the sample.
Niu, Heng; Yang, Jingyu; Yang, Kunxian; Huang, Yingze
2017-11-01
DNA promoter methylation can suppresses gene expression and shows an important role in the biological functions of Ras association domain family 1A (RASSF1A). Many studies have performed to elucidate the role of RASSF1A promoter methylation in thyroid carcinoma, while the results were conflicting and heterogeneous. Here, we analyzed the data of databases to determine the relationship between RASSF1A promoter methylation and thyroid carcinoma. We used the data from 14 cancer-normal studies and Gene Expression Omnibus (GEO) database to analyze RASSF1A promoter methylation in thyroid carcinoma susceptibility. The data from the Cancer Genome Atlas project (TCGA) database was used to analyze the relationship between RASSF1A promoter methylation and thyroid carcinoma susceptibility, clinical characteristics, prognosis. Odds ratios were estimated for thyroid carcinoma susceptibility and hazard ratios were estimated for thyroid carcinoma prognosis. The heterogeneity between studies of meta-analysis was explored using H, I values, and meta-regression. We adopted quality criteria to classify the studies of meta-analysis. Subgroup analyses were done for thyroid carcinoma susceptibility according to ethnicity, methods, and primers. Result of meta-analysis indicated that RASSF1A promoter methylation is associated with higher susceptibility to thyroid carcinoma with small heterogeneity. Similarly, the result from GEO database also showed that a significant association between RASSF1A gene promoter methylation and thyroid carcinoma susceptibility. For the results of TCGA database, we found that RASSF1A promoter methylation is associated with susceptibility and poor disease-free survival (DFS) of thyroid carcinoma. In addition, we also found a close association between RASSF1A promoter methylation and patient tumor stage and age, but not in patients of different genders. The methylation status of RASSF1A promoter is strongly associated with thyroid carcinoma susceptibility and DFS. The RASSF1A promoter methylation test can be applied in the clinical diagnosis of thyroid carcinoma.
Generalized entropies and the similarity of texts
NASA Astrophysics Data System (ADS)
Altmann, Eduardo G.; Dias, Laércio; Gerlach, Martin
2017-01-01
We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf’s law) implies that the generalized entropies, computed at the word level, are dominated by words in a specific range of frequencies. Here we show that this is the case not only for the generalized entropies but also for the generalized (Jensen-Shannon) divergences, used to compute the similarity between different texts. This finding allows us to identify the contribution of specific words (and word frequencies) for the different generalized entropies and also to estimate the size of the databases needed to obtain a reliable estimation of the divergences. We test our results in large databases of books (from the google n-gram database) and scientific papers (indexed by Web of Science).
Gradishar, William; Johnson, KariAnne; Brown, Krystal; Mundt, Erin; Manley, Susan
2017-07-01
There is a growing move to consult public databases following receipt of a genetic test result from a clinical laboratory; however, the well-documented limitations of these databases call into question how often clinicians will encounter discordant variant classifications that may introduce uncertainty into patient management. Here, we evaluate discordance in BRCA1 and BRCA2 variant classifications between a single commercial testing laboratory and a public database commonly consulted in clinical practice. BRCA1 and BRCA2 variant classifications were obtained from ClinVar and compared with the classifications from a reference laboratory. Full concordance and discordance were determined for variants whose ClinVar entries were of the same pathogenicity (pathogenic, benign, or uncertain). Variants with conflicting ClinVar classifications were considered partially concordant if ≥1 of the listed classifications agreed with the reference laboratory classification. Four thousand two hundred and fifty unique BRCA1 and BRCA2 variants were available for analysis. Overall, 73.2% of classifications were fully concordant and 12.3% were partially concordant. The remaining 14.5% of variants had discordant classifications, most of which had a definitive classification (pathogenic or benign) from the reference laboratory compared with an uncertain classification in ClinVar (14.0%). Here, we show that discrepant classifications between a public database and single reference laboratory potentially account for 26.7% of variants in BRCA1 and BRCA2 . The time and expertise required of clinicians to research these discordant classifications call into question the practicality of checking all test results against a database and suggest that discordant classifications should be interpreted with these limitations in mind. With the increasing use of clinical genetic testing for hereditary cancer risk, accurate variant classification is vital to ensuring appropriate medical management. There is a growing move to consult public databases following receipt of a genetic test result from a clinical laboratory; however, we show that up to 26.7% of variants in BRCA1 and BRCA2 have discordant classifications between ClinVar and a reference laboratory. The findings presented in this paper serve as a note of caution regarding the utility of database consultation. © AlphaMed Press 2017.
Molecular scaffold analysis of natural products databases in the public domain.
Yongye, Austin B; Waddell, Jacob; Medina-Franco, José L
2012-11-01
Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery. © 2012 John Wiley & Sons A/S.
Comparison of LEWICE and GlennICE in the SLD Regime
NASA Technical Reports Server (NTRS)
Wright, William B.; Potapczuk, Mark G.; Levinson, Laurie H.
2008-01-01
A research project is underway at the NASA Glenn Research Center (GRC) to produce computer software that can accurately predict ice growth under any meteorological conditions for any aircraft surface. This report will present results from two different computer programs. The first program, LEWICE version 3.2.2, has been reported on previously. The second program is GlennICE version 0.1. An extensive comparison of the results in a quantifiable manner against the database of ice shapes that have been generated in the GRC Icing Research Tunnel (IRT) has also been performed, including additional data taken to extend the database in the Super-cooled Large Drop (SLD) regime. This paper will show the differences in ice shape between LEWICE 3.2.2, GlennICE, and experimental data. This report will also provide a description of both programs. Comparisons are then made to recent additions to the SLD database and selected previous cases. Quantitative comparisons are shown for horn height, horn angle, icing limit, area, and leading edge thickness. The results show that the predicted results for both programs are within the accuracy limits of the experimental data for the majority of cases.
Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-01-01
Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
Database for propagation models
NASA Astrophysics Data System (ADS)
Kantak, Anil V.
1991-07-01
A propagation researcher or a systems engineer who intends to use the results of a propagation experiment is generally faced with various database tasks such as the selection of the computer software, the hardware, and the writing of the programs to pass the data through the models of interest. This task is repeated every time a new experiment is conducted or the same experiment is carried out at a different location generating different data. Thus the users of this data have to spend a considerable portion of their time learning how to implement the computer hardware and the software towards the desired end. This situation may be facilitated considerably if an easily accessible propagation database is created that has all the accepted (standardized) propagation phenomena models approved by the propagation research community. Also, the handling of data will become easier for the user. Such a database construction can only stimulate the growth of the propagation research it if is available to all the researchers, so that the results of the experiment conducted by one researcher can be examined independently by another, without different hardware and software being used. The database may be made flexible so that the researchers need not be confined only to the contents of the database. Another way in which the database may help the researchers is by the fact that they will not have to document the software and hardware tools used in their research since the propagation research community will know the database already. The following sections show a possible database construction, as well as properties of the database for the propagation research.
The ChEMBL database as linked open data
2013-01-01
Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis. Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying. Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support. PMID:23657106
Handwritten word preprocessing for database adaptation
NASA Astrophysics Data System (ADS)
Oprean, Cristina; Likforman-Sulem, Laurence; Mokbel, Chafic
2013-01-01
Handwriting recognition systems are typically trained using publicly available databases, where data have been collected in controlled conditions (image resolution, paper background, noise level,...). Since this is not often the case in real-world scenarios, classification performance can be affected when novel data is presented to the word recognition system. To overcome this problem, we present in this paper a new approach called database adaptation. It consists of processing one set (training or test) in order to adapt it to the other set (test or training, respectively). Specifically, two kinds of preprocessing, namely stroke thickness normalization and pixel intensity normalization are considered. The advantage of such approach is that we can re-use the existing recognition system trained on controlled data. We conduct several experiments with the Rimes 2011 word database and with a real-world database. We adapt either the test set or the training set. Results show that training set adaptation achieves better results than test set adaptation, at the cost of a second training stage on the adapted data. Accuracy of data set adaptation is increased by 2% to 3% in absolute value over no adaptation.
A literature search tool for intelligent extraction of disease-associated genes.
Jung, Jae-Yoon; DeLuca, Todd F; Nelson, Tristan H; Wall, Dennis P
2014-01-01
To extract disorder-associated genes from the scientific literature in PubMed with greater sensitivity for literature-based support than existing methods. We developed a PubMed query to retrieve disorder-related, original research articles. Then we applied a rule-based text-mining algorithm with keyword matching to extract target disorders, genes with significant results, and the type of study described by the article. We compared our resulting candidate disorder genes and supporting references with existing databases. We demonstrated that our candidate gene set covers nearly all genes in manually curated databases, and that the references supporting the disorder-gene link are more extensive and accurate than other general purpose gene-to-disorder association databases. We implemented a novel publication search tool to find target articles, specifically focused on links between disorders and genotypes. Through comparison against gold-standard manually updated gene-disorder databases and comparison with automated databases of similar functionality we show that our tool can search through the entirety of PubMed to extract the main gene findings for human diseases rapidly and accurately.
Age estimation using cortical surface pattern combining thickness with curvatures
Wang, Jieqiong; Li, Wenjing; Miao, Wen; Dai, Dai; Hua, Jing; He, Huiguang
2014-01-01
Brain development and healthy aging have been proved to follow a specific pattern, which, in turn, can be applied to help doctors diagnose mental diseases. In this paper, we design a cortical surface pattern (CSP) combining the cortical thickness with curvatures, which constructs an accurate human age estimation model with relevance vector regression. We test our model with two public databases. One is the IXI database (360 healthy subjects aging from 20 to 82 years old were selected), and the other is the INDI database (303 subjects aging from 7 to 22 years old were selected). The results show that our model can achieve as small as 4.57 years deviation in the IXI database and 1.38 years deviation in the INDI database. Furthermore, we employ this surface pattern to age groups classification, and get a remarkably high accuracy (97.77%) and a significantly high sensitivity/specificity (97.30%/98.10%). These results suggest that our designed CSP combining thickness with curvatures is stable and sensitive to brain development, and it is much more powerful than voxel-based morphometry used in previous methods for age estimation. PMID:24395657
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yu, Haoyu S.; Zhang, Wenjing; Verma, Pragya
2015-01-01
The goal of this work is to develop a gradient approximation to the exchange–correlation functional of Kohn–Sham density functional theory for treating molecular problems with a special emphasis on the prediction of quantities important for homogeneous catalysis and other molecular energetics. Our training and validation of exchange–correlation functionals is organized in terms of databases and subdatabases. The key properties required for homogeneous catalysis are main group bond energies (database MGBE137), transition metal bond energies (database TMBE32), reaction barrier heights (database BH76), and molecular structures (database MS10). We also consider 26 other databases, most of which are subdatabases of a newlymore » extended broad database called Database 2015, which is presented in the present article and in its ESI. Based on the mathematical form of a nonseparable gradient approximation (NGA), as first employed in the N12 functional, we design a new functional by using Database 2015 and by adding smoothness constraints to the optimization of the functional. The resulting functional is called the gradient approximation for molecules, or GAM. The GAM functional gives better results for MGBE137, TMBE32, and BH76 than any available generalized gradient approximation (GGA) or than N12. The GAM functional also gives reasonable results for MS10 with an MUE of 0.018 Å. The GAM functional provides good results both within the training sets and outside the training sets. The convergence tests and the smooth curves of exchange–correlation enhancement factor as a function of the reduced density gradient show that the GAM functional is a smooth functional that should not lead to extra expense or instability in optimizations. NGAs, like GGAs, have the advantage over meta-GGAs and hybrid GGAs of respectively smaller grid-size requirements for integrations and lower costs for extended systems. These computational advantages combined with the relatively high accuracy for all the key properties needed for molecular catalysis make the GAM functional very promising for future applications.« less
A Database as a Service for the Healthcare System to Store Physiological Signal Data.
Chang, Hsien-Tsung; Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.
A Database as a Service for the Healthcare System to Store Physiological Signal Data
Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415
Aguilera-Mendoza, Longendri; Marrero-Ponce, Yovani; Tellez-Ibarra, Roberto; Llorente-Quesada, Monica T; Salgado, Jesús; Barigye, Stephen J; Liu, Jun
2015-08-01
The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are included in CAMP_Patent. However, the majority of databases have their own set of unique sequences, as well as some overlap with other databases. The complete set of non-duplicate sequences comprises 16 990 cases, which is almost half of the total number of reported peptides. On the other hand, the diversity analysis identifies the most and least diverse databases and proves that all databases exhibit some level of redundancy. Finally, we present a new parallel-free software, named Dover Analyzer, developed to compute the overlap and diversity between any number of databases and compile a set of non-redundant sequences. These results are useful for selecting or building a suitable representative set of AMPs, according to specific needs. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
van Heuven, Walter J. B.; Pitchford, Nicola J.; Ledgeway, Timothy
2017-01-01
Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/. PMID:28231303
Kyparissiadis, Antonios; van Heuven, Walter J B; Pitchford, Nicola J; Ledgeway, Timothy
2017-01-01
Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/.
Updated folate data in the Dutch Food Composition Database and implications for intake estimates
Westenbrink, Susanne; Jansen-van der Vliet, Martine; van Rossum, Caroline
2012-01-01
Background and objective Nutrient values are influenced by the analytical method used. Food folate measured by high performance liquid chromatography (HPLC) or by microbiological assay (MA) yield different results, with in general higher results from MA than from HPLC. This leads to the question of how to deal with different analytical methods in compiling standardised and internationally comparable food composition databases? A recent inventory on folate in European food composition databases indicated that currently MA is more widely used than HPCL. Since older Dutch values are produced by HPLC and newer values by MA, analytical methods and procedures for compiling folate data in the Dutch Food Composition Database (NEVO) were reconsidered and folate values were updated. This article describes the impact of this revision of folate values in the NEVO database as well as the expected impact on the folate intake assessment in the Dutch National Food Consumption Survey (DNFCS). Design The folate values were revised by replacing HPLC with MA values from recent Dutch analyses. Previously MA folate values taken from foreign food composition tables had been recalculated to the HPLC level, assuming a 27% lower value from HPLC analyses. These recalculated values were replaced by the original MA values. Dutch HPLC and MA values were compared to each other. Folate intake was assessed for a subgroup within the DNFCS to estimate the impact of the update. Results In the updated NEVO database nearly all folate values were produced by MA or derived from MA values which resulted in an average increase of 24%. The median habitual folate intake in young children was increased by 11–15% using the updated folate values. Conclusion The current approach for folate in NEVO resulted in more transparency in data production and documentation and higher comparability among European databases. Results of food consumption surveys are expected to show higher folate intakes when using the updated values. PMID:22481900
Computerized database management system for breast cancer patients.
Sim, Kok Swee; Chong, Sze Siang; Tso, Chih Ping; Nia, Mohsen Esmaeili; Chong, Aun Kee; Abbas, Siti Fathimah
2014-01-01
Data analysis based on breast cancer risk factors such as age, race, breastfeeding, hormone replacement therapy, family history, and obesity was conducted on breast cancer patients using a new enhanced computerized database management system. My Structural Query Language (MySQL) is selected as the application for database management system to store the patient data collected from hospitals in Malaysia. An automatic calculation tool is embedded in this system to assist the data analysis. The results are plotted automatically and a user-friendly graphical user interface is developed that can control the MySQL database. Case studies show breast cancer incidence rate is highest among Malay women, followed by Chinese and Indian. The peak age for breast cancer incidence is from 50 to 59 years old. Results suggest that the chance of developing breast cancer is increased in older women, and reduced with breastfeeding practice. The weight status might affect the breast cancer risk differently. Additional studies are needed to confirm these findings.
Effective spatial database support for acquiring spatial information from remote sensing images
NASA Astrophysics Data System (ADS)
Jin, Peiquan; Wan, Shouhong; Yue, Lihua
2009-12-01
In this paper, a new approach to maintain spatial information acquiring from remote-sensing images is presented, which is based on Object-Relational DBMS. According to this approach, the detected and recognized results of targets are stored and able to be further accessed in an ORDBMS-based spatial database system, and users can access the spatial information using the standard SQL interface. This approach is different from the traditional ArcSDE-based method, because the spatial information management module is totally integrated into the DBMS and becomes one of the core modules in the DBMS. We focus on three issues, namely the general framework for the ORDBMS-based spatial database system, the definitions of the add-in spatial data types and operators, and the process to develop a spatial Datablade on Informix. The results show that the ORDBMS-based spatial database support for image-based target detecting and recognition is easy and practical to be implemented.
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
Conformational flexibility of two RNA trimers explored by computational tools and database search.
Fadrná, Eva; Koca, Jaroslav
2003-04-01
Two RNA sequences, AAA and AUG, were studied by the conformational search program CICADA and by molecular dynamics (MD) in the framework of the AMBER force field, and also via thorough PDB database search. CICADA was used to provide detailed information about conformers and conformational interconversions on the energy surfaces of the above molecules. Several conformational families were found for both sequences. Analysis of the results shows differences, especially between the energy of the single families, and also in flexibility and concerted conformational movement. Therefore, several MD trajectories (altogether 16 ns) were run to obtain more details about both the stability of conformers belonging to different conformational families and about the dynamics of the two systems. Results show that the trajectories strongly depend on the starting structure. When the MD start from the global minimum found by CICADA, they provide a stable run, while MD starting from another conformational family generates a trajectory where several different conformational families are visited. The results obtained by theoretical methods are compared with the thorough database search data. It is concluded that all except for the highest energy conformational families found in theoretical result also appear in experimental data. Registry numbers: adenylyl-(3' --> 5')-adenylyl-(3' --> 5')-adenosine [917-44-2] adenylyl-(3' --> 5')-uridylyl-(3' --> 5')-guanosine [3494-35-7].
Loss-tolerant measurement-device-independent quantum private queries.
Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu
2017-01-04
Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user's and database's privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user's imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors.
Competitive code-based fast palmprint identification using a set of cover trees
NASA Astrophysics Data System (ADS)
Yue, Feng; Zuo, Wangmeng; Zhang, David; Wang, Kuanquan
2009-06-01
A palmprint identification system recognizes a query palmprint image by searching for its nearest neighbor from among all the templates in a database. When applied on a large-scale identification system, it is often necessary to speed up the nearest-neighbor searching process. We use competitive code, which has very fast feature extraction and matching speed, for palmprint identification. To speed up the identification process, we extend the cover tree method and propose to use a set of cover trees to facilitate the fast and accurate nearest-neighbor searching. We can use the cover tree method because, as we show, the angular distance used in competitive code can be decomposed into a set of metrics. Using the Hong Kong PolyU palmprint database (version 2) and a large-scale palmprint database, our experimental results show that the proposed method searches for nearest neighbors faster than brute force searching.
NASA Astrophysics Data System (ADS)
Julius, Musa, Admiral; Pribadi, Sugeng; Muzli, Muzli
2018-03-01
Sulawesi, one of the biggest island in Indonesia, located on the convergence of two macro plate that is Eurasia and Pacific. NOAA and Novosibirsk Tsunami Laboratory show more than 20 tsunami data recorded in Sulawesi since 1820. Based on this data, determination of correlation between tsunami and earthquake parameter need to be done to proved all event in the past. Complete data of magnitudes, fault sizes and tsunami heights on this study sourced from NOAA and Novosibirsk Tsunami database, completed with Pacific Tsunami Warning Center (PTWC) catalog. This study aims to find correlation between moment magnitude, fault size and tsunami height by simple regression. The step of this research are data collecting, processing, and regression analysis. Result shows moment magnitude, fault size and tsunami heights strongly correlated. This analysis is enough to proved the accuracy of historical tsunami database in Sulawesi on NOAA, Novosibirsk Tsunami Laboratory and PTWC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupcich, Franco; Badal, Andreu; Kyprianou, Iacovos
Purpose: The purpose of this study was to develop a database for estimating organ dose in a voxelized patient model for coronary angiography and brain perfusion CT acquisitions with any spectra and angular tube current modulation setting. The database enables organ dose estimation for existing and novel acquisition techniques without requiring Monte Carlo simulations. Methods: The study simulated transport of monoenergetic photons between 5 and 150 keV for 1000 projections over 360 Degree-Sign through anthropomorphic voxelized female chest and head (0 Degree-Sign and 30 Degree-Sign tilt) phantoms and standard head and body CTDI dosimetry cylinders. The simulations resulted in tablesmore » of normalized dose deposition for several radiosensitive organs quantifying the organ dose per emitted photon for each incident photon energy and projection angle for coronary angiography and brain perfusion acquisitions. The values in a table can be multiplied by an incident spectrum and number of photons at each projection angle and then summed across all energies and angles to estimate total organ dose. Scanner-specific organ dose may be approximated by normalizing the database-estimated organ dose by the database-estimated CTDI{sub vol} and multiplying by a physical CTDI{sub vol} measurement. Two examples are provided demonstrating how to use the tables to estimate relative organ dose. In the first, the change in breast and lung dose during coronary angiography CT scans is calculated for reduced kVp, angular tube current modulation, and partial angle scanning protocols relative to a reference protocol. In the second example, the change in dose to the eye lens is calculated for a brain perfusion CT acquisition in which the gantry is tilted 30 Degree-Sign relative to a nontilted scan. Results: Our database provides tables of normalized dose deposition for several radiosensitive organs irradiated during coronary angiography and brain perfusion CT scans. Validation results indicate total organ doses calculated using our database are within 1% of those calculated using Monte Carlo simulations with the same geometry and scan parameters for all organs except red bone marrow (within 6%), and within 23% of published estimates for different voxelized phantoms. Results from the example of using the database to estimate organ dose for coronary angiography CT acquisitions show 2.1%, 1.1%, and -32% change in breast dose and 2.1%, -0.74%, and 4.7% change in lung dose for reduced kVp, tube current modulated, and partial angle protocols, respectively, relative to the reference protocol. Results show -19.2% difference in dose to eye lens for a tilted scan relative to a nontilted scan. The reported relative changes in organ doses are presented without quantification of image quality and are for the sole purpose of demonstrating the use of the proposed database. Conclusions: The proposed database and calculation method enable the estimation of organ dose for coronary angiography and brain perfusion CT scans utilizing any spectral shape and angular tube current modulation scheme by taking advantage of the precalculated Monte Carlo simulation results. The database can be used in conjunction with image quality studies to develop optimized acquisition techniques and may be particularly beneficial for optimizing dual kVp acquisitions for which numerous kV, mA, and filtration combinations may be investigated.« less
Evaluation of low wind modeling approaches for two tall-stack databases.
Paine, Robert; Samani, Olga; Kaplan, Mary; Knipping, Eladio; Kumar, Naresh
2015-11-01
The performance of the AERMOD air dispersion model under low wind speed conditions, especially for applications with only one level of meteorological data and no direct turbulence measurements or vertical temperature gradient observations, is the focus of this study. The analysis documented in this paper addresses evaluations for low wind conditions involving tall stack releases for which multiple years of concurrent emissions, meteorological data, and monitoring data are available. AERMOD was tested on two field-study databases involving several SO2 monitors and hourly emissions data that had sub-hourly meteorological data (e.g., 10-min averages) available using several technical options: default mode, with various low wind speed beta options, and using the available sub-hourly meteorological data. These field study databases included (1) Mercer County, a North Dakota database featuring five SO2 monitors within 10 km of the Dakota Gasification Company's plant and the Antelope Valley Station power plant in an area of both flat and elevated terrain, and (2) a flat-terrain setting database with four SO2 monitors within 6 km of the Gibson Generating Station in southwest Indiana. Both sites featured regionally representative 10-m meteorological databases, with no significant terrain obstacles between the meteorological site and the emission sources. The low wind beta options show improvement in model performance helping to reduce some of the over-prediction biases currently present in AERMOD when run with regulatory default options. The overall findings with the low wind speed testing on these tall stack field-study databases indicate that AERMOD low wind speed options have a minor effect for flat terrain locations, but can have a significant effect for elevated terrain locations. The performance of AERMOD using low wind speed options leads to improved consistency of meteorological conditions associated with the highest observed and predicted concentration events. The available sub-hourly modeling results using the Sub-Hourly AERMOD Run Procedure (SHARP) are relatively unbiased and show that this alternative approach should be seriously considered to address situations dominated by low-wind meander conditions. AERMOD was evaluated with two tall stack databases (in North Dakota and Indiana) in areas of both flat and elevated terrain. AERMOD cases included the regulatory default mode, low wind speed beta options, and use of the Sub-Hourly AERMOD Run Procedure (SHARP). The low wind beta options show improvement in model performance (especially in higher terrain areas), helping to reduce some of the over-prediction biases currently present in regulatory default AERMOD. The SHARP results are relatively unbiased and show that this approach should be seriously considered to address situations dominated by low-wind meander conditions.
Folks, Russell D; Savir-Baruch, Bital; Garcia, Ernest V; Verdes, Liudmila; Taylor, Andrew T
2012-12-01
Our objective was to design and implement a clinical history database capable of linking to our database of quantitative results from (99m)Tc-mercaptoacetyltriglycine (MAG3) renal scans and export a data summary for physicians or our software decision support system. For database development, we used a commercial program. Additional software was developed in Interactive Data Language. MAG3 studies were processed using an in-house enhancement of a commercial program. The relational database has 3 parts: a list of all renal scans (the RENAL database), a set of patients with quantitative processing results (the Q2 database), and a subset of patients from Q2 containing clinical data manually transcribed from the hospital information system (the CLINICAL database). To test interobserver variability, a second physician transcriber reviewed 50 randomly selected patients in the hospital information system and tabulated 2 clinical data items: hydronephrosis and presence of a current stent. The CLINICAL database was developed in stages and contains 342 fields comprising demographic information, clinical history, and findings from up to 11 radiologic procedures. A scripted algorithm is used to reliably match records present in both Q2 and CLINICAL. An Interactive Data Language program then combines data from the 2 databases into an XML (extensible markup language) file for use by the decision support system. A text file is constructed and saved for review by physicians. RENAL contains 2,222 records, Q2 contains 456 records, and CLINICAL contains 152 records. The interobserver variability testing found a 95% match between the 2 observers for presence or absence of ureteral stent (κ = 0.52), a 75% match for hydronephrosis based on narrative summaries of hospitalizations and clinical visits (κ = 0.41), and a 92% match for hydronephrosis based on the imaging report (κ = 0.84). We have developed a relational database system to integrate the quantitative results of MAG3 image processing with clinical records obtained from the hospital information system. We also have developed a methodology for formatting clinical history for review by physicians and export to a decision support system. We identified several pitfalls, including the fact that important textual information extracted from the hospital information system by knowledgeable transcribers can show substantial interobserver variation, particularly when record retrieval is based on the narrative clinical records.
Analyzing a multimodal biometric system using real and virtual users
NASA Astrophysics Data System (ADS)
Scheidat, Tobias; Vielhauer, Claus
2007-02-01
Three main topics of recent research on multimodal biometric systems are addressed in this article: The lack of sufficiently large multimodal test data sets, the influence of cultural aspects and data protection issues of multimodal biometric data. In this contribution, different possibilities are presented to extend multimodal databases by generating so-called virtual users, which are created by combining single biometric modality data of different users. Comparative tests on databases containing real and virtual users based on a multimodal system using handwriting and speech are presented, to study to which degree the use of virtual multimodal databases allows conclusions with respect to recognition accuracy in comparison to real multimodal data. All tests have been carried out on databases created from donations from three different nationality groups. This allows to review the experimental results both in general and in context of cultural origin. The results show that in most cases the usage of virtual persons leads to lower accuracy than the usage of real users in terms of the measurement applied: the Equal Error Rate. Finally, this article will address the general question how the concept of virtual users may influence the data protection requirements for multimodal evaluation databases in the future.
Aerodynamic Analysis of Simulated Heat Shield Recession for the Orion Command Module
NASA Technical Reports Server (NTRS)
Bibb, Karen L.; Alter, Stephen J.; Mcdaniel, Ryan D.
2008-01-01
The aerodynamic effects of the recession of the ablative thermal protection system for the Orion Command Module of the Crew Exploration Vehicle are important for the vehicle guidance. At the present time, the aerodynamic effects of recession being handled within the Orion aerodynamic database indirectly with an additional safety factor placed on the uncertainty bounds. This study is an initial attempt to quantify the effects for a particular set of recessed geometry shapes, in order to provide more rigorous analysis for managing recession effects within the aerodynamic database. The aerodynamic forces and moments for the baseline and recessed geometries were computed at several trajectory points using multiple CFD codes, both viscous and inviscid. The resulting aerodynamics for the baseline and recessed geometries were compared. The forces (lift, drag) show negligible differences between baseline and recessed geometries. Generally, the moments show a difference between baseline and recessed geometries that correlates with the maximum amount of recession of the geometry. The difference between the pitching moments for the baseline and recessed geometries increases as Mach number decreases (and the recession is greater), and reach a value of -0.0026 for the lowest Mach number. The change in trim angle of attack increases from approx. 0.5deg at M = 28.7 to approx. 1.3deg at M = 6, and is consistent with a previous analysis with a lower fidelity engineering tool. This correlation of the present results with the engineering tool results supports the continued use of the engineering tool for future work. The present analysis suggests there does not need to be an uncertainty due to recession in the Orion aerodynamic database for the force quantities. The magnitude of the change in pitching moment due to recession is large enough to warrant inclusion in the aerodynamic database. An increment in the uncertainty for pitching moment could be calculated from these results and included in the development of the aerodynamic database uncertainty for pitching moment.
A retrieval algorithm of hydrometer profile for submillimeter-wave radiometer
NASA Astrophysics Data System (ADS)
Liu, Yuli; Buehler, Stefan; Liu, Heguang
2017-04-01
Vertical profiles of particle microphysics perform vital functions for the estimation of climatic feedback. This paper proposes a new algorithm to retrieve the profile of the parameters of the hydrometeor(i.e., ice, snow, rain, liquid cloud, graupel) based on passive submillimeter-wave measurements. These parameters include water content and particle size. The first part of the algorithm builds the database and retrieves the integrated quantities. Database is built up by Atmospheric Radiative Transfer Simulator(ARTS), which uses atmosphere data to simulate the corresponding brightness temperature. Neural network, trained by the precalculated database, is developed to retrieve the water path for each type of particles. The second part of the algorithm analyses the statistical relationship between water path and vertical parameters profiles. Based on the strong dependence existing between vertical layers in the profiles, Principal Component Analysis(PCA) technique is applied. The third part of the algorithm uses the forward model explicitly to retrieve the hydrometeor profiles. Cost function is calculated in each iteration, and Differential Evolution(DE) algorithm is used to adjust the parameter values during the evolutionary process. The performance of this algorithm is planning to be verified for both simulation database and measurement data, by retrieving profiles in comparison with the initial one. Results show that this algorithm has the ability to retrieve the hydrometeor profiles efficiently. The combination of ARTS and optimization algorithm can get much better results than the commonly used database approach. Meanwhile, the concept that ARTS can be used explicitly in the retrieval process shows great potential in providing solution to other retrieval problems.
Ng, Kevin Kit Siong; Lee, Soon Leong; Tnah, Lee Hong; Nurul-Farhanah, Zakaria; Ng, Chin Hong; Lee, Chai Ting; Tani, Naoki; Diway, Bibian; Lai, Pei Sing; Khoo, Eyen
2016-07-01
Illegal logging and smuggling of Gonystylus bancanus (Thymelaeaceae) poses a serious threat to this fragile valuable peat swamp timber species. Using G. bancanus as a case study, DNA markers were used to develop identification databases at the species, population and individual level. The species level database for Gonystylus comprised of an rDNA (ITS2) and two cpDNA (trnH-psbA and trnL) markers based on a 20 Gonystylus species database. When concatenated, taxonomic species recognition was achieved with a resolution of 90% (18 out of the 20 species). In addition, based on 17 natural populations of G. bancanus throughout West (Peninsular Malaysia) and East (Sabah and Sarawak) Malaysia, population and individual identification databases were developed using cpDNA and STR markers respectively. A haplotype distribution map for Malaysia was generated using six cpDNA markers, resulting in 12 unique multilocus haplotypes, from 24 informative intraspecific variable sites. These unique haplotypes suggest a clear genetic structuring of West and East regions. A simulation procedure based on the composition of the samples was used to test whether a suspected sample conformed to a given regional origin. Overall, the observed type I and II errors of the databases showed good concordance with the predicted 5% threshold which indicates that the databases were useful in revealing provenance and establishing conformity of samples from West and East Malaysia. Sixteen STRs were used to develop the DNA profiling databases for individual identification. Bayesian clustering analyses divided the 17 populations into two main genetic clusters, corresponding to the regions of West and East Malaysia. Population substructuring (K=2) was observed within each region. After removal of bias resulting from sampling effects and population subdivision, conservativeness tests showed that the West and East Malaysia databases were conservative. This suggests that both databases can be used independently for random match probability estimation within respective regions. The reliability of the databases was further determined by independent self-assignment tests based on the likelihood of each individual's multilocus genotype occurring in each identified population, genetic cluster and region with an average percentage of correctly assigned individuals of 54.80%, 99.60% and 100% respectively. Thus, after appropriate validation, the genetic identification databases developed for G. bancanus in this study could support forensic applications and help safeguard this valuable species into the future. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Searching fee and non-fee toxicology information resources: an overview of selected databases.
Wright, L L
2001-01-12
Toxicology profiles organize information by broad subjects, the first of which affirms identity of the agent studied. Studies here show two non-fee databases (ChemFinder and ChemIDplus) verify the identity of compounds with high efficiency (63% and 73% respectively) with the fee-based Chemical Abstracts Registry file serving well to fill data gaps (100%). Continued searching proceeds using knowledge of structure, scope and content to select databases. Valuable sources for information are factual databases that collect data and facts in special subject areas organized in formats available for analysis or use. Some sources representative of factual files are RTECS, CCRIS, HSDB, GENE-TOX and IRIS. Numerous factual databases offer a wealth of reliable information; however, exhaustive searches probe information published in journal articles and/or technical reports with records residing in bibliographic databases such as BIOSIS, EMBASE, MEDLINE, TOXLINE and Web of Science. Listed with descriptions are numerous factual and bibliographic databases supplied by 11 producers. Given the multitude of options and resources, it is often necessary to seek service desk assistance. Questions were posed by telephone and e-mail to service desks at DIALOG, ISI, MEDLARS, Micromedex and STN International. Results of the survey are reported.
An Adaptive and Time-Efficient ECG R-Peak Detection Algorithm.
Qin, Qin; Li, Jianqing; Yue, Yinggao; Liu, Chengyu
2017-01-01
R-peak detection is crucial in electrocardiogram (ECG) signal analysis. This study proposed an adaptive and time-efficient R-peak detection algorithm for ECG processing. First, wavelet multiresolution analysis was applied to enhance the ECG signal representation. Then, ECG was mirrored to convert large negative R-peaks to positive ones. After that, local maximums were calculated by the first-order forward differential approach and were truncated by the amplitude and time interval thresholds to locate the R-peaks. The algorithm performances, including detection accuracy and time consumption, were tested on the MIT-BIH arrhythmia database and the QT database. Experimental results showed that the proposed algorithm achieved mean sensitivity of 99.39%, positive predictivity of 99.49%, and accuracy of 98.89% on the MIT-BIH arrhythmia database and 99.83%, 99.90%, and 99.73%, respectively, on the QT database. By processing one ECG record, the mean time consumptions were 0.872 s and 0.763 s for the MIT-BIH arrhythmia database and QT database, respectively, yielding 30.6% and 32.9% of time reduction compared to the traditional Pan-Tompkins method.
An Adaptive and Time-Efficient ECG R-Peak Detection Algorithm
Qin, Qin
2017-01-01
R-peak detection is crucial in electrocardiogram (ECG) signal analysis. This study proposed an adaptive and time-efficient R-peak detection algorithm for ECG processing. First, wavelet multiresolution analysis was applied to enhance the ECG signal representation. Then, ECG was mirrored to convert large negative R-peaks to positive ones. After that, local maximums were calculated by the first-order forward differential approach and were truncated by the amplitude and time interval thresholds to locate the R-peaks. The algorithm performances, including detection accuracy and time consumption, were tested on the MIT-BIH arrhythmia database and the QT database. Experimental results showed that the proposed algorithm achieved mean sensitivity of 99.39%, positive predictivity of 99.49%, and accuracy of 98.89% on the MIT-BIH arrhythmia database and 99.83%, 99.90%, and 99.73%, respectively, on the QT database. By processing one ECG record, the mean time consumptions were 0.872 s and 0.763 s for the MIT-BIH arrhythmia database and QT database, respectively, yielding 30.6% and 32.9% of time reduction compared to the traditional Pan-Tompkins method. PMID:29104745
The ClinicalTrials.gov results database--update and key issues.
Zarin, Deborah A; Tse, Tony; Williams, Rebecca J; Califf, Robert M; Ide, Nicholas C
2011-03-03
The ClinicalTrials.gov trial registry was expanded in 2008 to include a database for reporting summary results. We summarize the structure and contents of the results database, provide an update of relevant policies, and show how the data can be used to gain insight into the state of clinical research. We analyzed ClinicalTrials.gov data that were publicly available between September 2009 and September 2010. As of September 27, 2010, ClinicalTrials.gov received approximately 330 new and 2000 revised registrations each week, along with 30 new and 80 revised results submissions. We characterized the 79,413 registry and 2178 results of trial records available as of September 2010. From a sample cohort of results records, 78 of 150 (52%) had associated publications within 2 years after posting. Of results records available publicly, 20% reported more than two primary outcome measures and 5% reported more than five. Of a sample of 100 registry record outcome measures, 61% lacked specificity in describing the metric used in the planned analysis. In a sample of 700 results records, the mean number of different analysis populations per study group was 2.5 (median, 1; range, 1 to 25). Of these trials, 24% reported results for 90% or less of their participants. ClinicalTrials.gov provides access to study results not otherwise available to the public. Although the database allows examination of various aspects of ongoing and completed clinical trials, its ultimate usefulness depends on the research community to submit accurate, informative data.
Enabling search over encrypted multimedia databases
NASA Astrophysics Data System (ADS)
Lu, Wenjun; Swaminathan, Ashwin; Varna, Avinash L.; Wu, Min
2009-02-01
Performing information retrieval tasks while preserving data confidentiality is a desirable capability when a database is stored on a server maintained by a third-party service provider. This paper addresses the problem of enabling content-based retrieval over encrypted multimedia databases. Search indexes, along with multimedia documents, are first encrypted by the content owner and then stored onto the server. Through jointly applying cryptographic techniques, such as order preserving encryption and randomized hash functions, with image processing and information retrieval techniques, secure indexing schemes are designed to provide both privacy protection and rank-ordered search capability. Retrieval results on an encrypted color image database and security analysis of the secure indexing schemes under different attack models show that data confidentiality can be preserved while retaining very good retrieval performance. This work has promising applications in secure multimedia management.
Adaptive Neuro-Fuzzy Modeling of UH-60A Pilot Vibration
NASA Technical Reports Server (NTRS)
Kottapalli, Sesi; Malki, Heidar A.; Langari, Reza
2003-01-01
Adaptive neuro-fuzzy relationships have been developed to model the UH-60A Black Hawk pilot floor vertical vibration. A 200 point database that approximates the entire UH-60A helicopter flight envelope is used for training and testing purposes. The NASA/Army Airloads Program flight test database was the source of the 200 point database. The present study is conducted in two parts. The first part involves level flight conditions and the second part involves the entire (200 point) database including maneuver conditions. The results show that a neuro-fuzzy model can successfully predict the pilot vibration. Also, it is found that the training phase of this neuro-fuzzy model takes only two or three iterations to converge for most cases. Thus, the proposed approach produces a potentially viable model for real-time implementation.
Palm-Vein Classification Based on Principal Orientation Features
Zhou, Yujia; Liu, Yaqin; Feng, Qianjin; Yang, Feng; Huang, Jing; Nie, Yixiao
2014-01-01
Personal recognition using palm–vein patterns has emerged as a promising alternative for human recognition because of its uniqueness, stability, live body identification, flexibility, and difficulty to cheat. With the expanding application of palm–vein pattern recognition, the corresponding growth of the database has resulted in a long response time. To shorten the response time of identification, this paper proposes a simple and useful classification for palm–vein identification based on principal direction features. In the registration process, the Gaussian-Radon transform is adopted to extract the orientation matrix and then compute the principal direction of a palm–vein image based on the orientation matrix. The database can be classified into six bins based on the value of the principal direction. In the identification process, the principal direction of the test sample is first extracted to ascertain the corresponding bin. One-by-one matching with the training samples is then performed in the bin. To improve recognition efficiency while maintaining better recognition accuracy, two neighborhood bins of the corresponding bin are continuously searched to identify the input palm–vein image. Evaluation experiments are conducted on three different databases, namely, PolyU, CASIA, and the database of this study. Experimental results show that the searching range of one test sample in PolyU, CASIA and our database by the proposed method for palm–vein identification can be reduced to 14.29%, 14.50%, and 14.28%, with retrieval accuracy of 96.67%, 96.00%, and 97.71%, respectively. With 10,000 training samples in the database, the execution time of the identification process by the traditional method is 18.56 s, while that by the proposed approach is 3.16 s. The experimental results confirm that the proposed approach is more efficient than the traditional method, especially for a large database. PMID:25383715
77 FR 28244 - Amendment of Class D and E Airspace; Baltimore, MD
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-14
... Baltimore VORTAC are being adjusted to coincide with the FAA's aeronautical database, which show the correct... Baltimore VORTAC, Baltimore, MD, to be in concert with the FAAs aeronautical database, which shows the... with the FAA's Aeronautical Products database. The FAA has determined that this regulation only...
77 FR 16668 - Amendment of Class D and E Airspace; Brooksville, FL
Federal Register 2010, 2011, 2012, 2013, 2014
2012-03-22
... the airport are being adjusted to coincide with the FAA's aeronautical database, which shows the... Hernando County Airport, Brooksville, FL, to be in concert with the FAAs aeronautical database, which shows... the FAA's Aeronautical Products database. The FAA has determined that this regulation only involves an...
CRETACEOUS CLIMATE SENSITIVITY STUDY USING DINOSAUR & PLANT PALEOBIOGEOGRAPHY
NASA Astrophysics Data System (ADS)
Goswami, A.; Main, D. J.; Noto, C. R.; Moore, T. L.; Scotese, C.
2009-12-01
The Early Cretaceous was characterized by cool poles and moderate global temperatures (~16° C). During the mid and late Cretaceous, long-term global warming (~20° - 22° C) was driven by increasing levels of CO2, rising sea level (lowering albedo) and the continuing breakup of Pangea. Paleoclimatic reconstructions for four time intervals during the Cretaceous: Middle Campanian (80 Ma), Cenomanian/Turonian (90 Ma), Early Albian (110 Ma) and Barremian-Hauterivian (130Ma) are presented here. These paleoclimate simulations were prepared using the Fast Ocean and Atmosphere Model (FOAM). The simulated results show the pattern of the pole-to-Equator temperature gradients, rainfall, surface run-off, the location of major rivers and deltas. In order to investigate the effect of potential dispersal routes on paleobiogeographic patterns, a time-slice series of maps from Early - Late Cretaceous were produced showing plots of dinosaur and plant fossil distributions. These Maps were created utilizing: 1) plant fossil localities from the GEON and Paleobiology (PBDB) databases; and 2) dinosaur fossil localities from an updated version of the Dinosauria (Weishampel, 2004) database. These results are compared to two different types of datasets, 1) Paleotemperature database for the Cretaceous and 2) locality data obtained from GEON, PBDB and Dinosauria database. Global latitudinal mean temperatures from both the model and the paelotemperature database were plotted on a series of latitudinal graphs along with the distributions of fossil plants and dinosaurs. It was found that most dinosaur localities through the Cretaceous tend to cluster within specific climate belts, or envelopes. Also, these Cretaceous maps show variance in biogeographic zonation of both plants and dinosaurs that is commensurate with reconstructed climate patterns and geography. These data are particularly useful for understanding the response of late Mesozoic ecosystems to geographic and climatic conditions that differed markedly from the present. Studies of past biotas and their changes may elucidate the role of climatic and geographic factors in driving changes in species distributions, ecosystem organization, and evolutionary dynamics over time.
Chen, Qingyu; Zobel, Justin; Verspoor, Karin
2017-01-01
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC-a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases - in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases.Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w. © The Author(s) 2017. Published by Oxford University Press.
Chen, Qingyu; Zobel, Justin; Verspoor, Karin
2017-01-01
GenBank, the EMBL European Nucleotide Archive and the DNA DataBank of Japan, known collectively as the International Nucleotide Sequence Database Collaboration or INSDC, are the three most significant nucleotide sequence databases. Their records are derived from laboratory work undertaken by different individuals, by different teams, with a range of technologies and assumptions and over a period of decades. As a consequence, they contain a great many duplicates, redundancies and inconsistencies, but neither the prevalence nor the characteristics of various types of duplicates have been rigorously assessed. Existing duplicate detection methods in bioinformatics only address specific duplicate types, with inconsistent assumptions; and the impact of duplicates in bioinformatics databases has not been carefully assessed, making it difficult to judge the value of such methods. Our goal is to assess the scale, kinds and impact of duplicates in bioinformatics databases, through a retrospective analysis of merged groups in INSDC databases. Our outcomes are threefold: (1) We analyse a benchmark dataset consisting of duplicates manually identified in INSDC—a dataset of 67 888 merged groups with 111 823 duplicate pairs across 21 organisms from INSDC databases – in terms of the prevalence, types and impacts of duplicates. (2) We categorize duplicates at both sequence and annotation level, with supporting quantitative statistics, showing that different organisms have different prevalence of distinct kinds of duplicate. (3) We show that the presence of duplicates has practical impact via a simple case study on duplicates, in terms of GC content and melting temperature. We demonstrate that duplicates not only introduce redundancy, but can lead to inconsistent results for certain tasks. Our findings lead to a better understanding of the problem of duplication in biological databases. Database URL: the merged records are available at https://cloudstor.aarnet.edu.au/plus/index.php/s/Xef2fvsebBEAv9w PMID:28077566
Applying manifold learning techniques to the CAESAR database
NASA Astrophysics Data System (ADS)
Mendoza-Schrock, Olga; Patrick, James; Arnold, Gregory; Ferrara, Matthew
2010-04-01
Understanding and organizing data is the first step toward exploiting sensor phenomenology for dismount tracking. What image features are good for distinguishing people and what measurements, or combination of measurements, can be used to classify the dataset by demographics including gender, age, and race? A particular technique, Diffusion Maps, has demonstrated the potential to extract features that intuitively make sense [1]. We want to develop an understanding of this tool by validating existing results on the Civilian American and European Surface Anthropometry Resource (CAESAR) database. This database, provided by the Air Force Research Laboratory (AFRL) Human Effectiveness Directorate and SAE International, is a rich dataset which includes 40 traditional, anthropometric measurements of 4400 human subjects. If we could specifically measure the defining features for classification, from this database, then the future question will then be to determine a subset of these features that can be measured from imagery. This paper briefly describes the Diffusion Map technique, shows potential for dimension reduction of the CAESAR database, and describes interesting problems to be further explored.
Lhermitte, L; Mejstrikova, E; van der Sluijs-Gelling, A J; Grigore, G E; Sedek, L; Bras, A E; Gaipa, G; Sobral da Costa, E; Novakova, M; Sonneveld, E; Buracchi, C; de Sá Bacelar, T; te Marvelde, J G; Trinquand, A; Asnafi, V; Szczepanski, T; Matarraz, S; Lopez, A; Vidriales, B; Bulsa, J; Hrusak, O; Kalina, T; Lecrevisse, Q; Martin Ayuso, M; Brüggemann, M; Verde, J; Fernandez, P; Burgos, L; Paiva, B; Pedreira, C E; van Dongen, J J M; Orfao, A; van der Velden, V H J
2018-01-01
Precise classification of acute leukemia (AL) is crucial for adequate treatment. EuroFlow has previously designed an AL orientation tube (ALOT) to guide towards the relevant classification panel (T-cell acute lymphoblastic leukemia (T-ALL), B-cell precursor (BCP)-ALL and/or acute myeloid leukemia (AML)) and final diagnosis. Now we built a reference database with 656 typical AL samples (145 T-ALL, 377 BCP-ALL, 134 AML), processed and analyzed via standardized protocols. Using principal component analysis (PCA)-based plots and automated classification algorithms for direct comparison of single-cells from individual patients against the database, another 783 cases were subsequently evaluated. Depending on the database-guided results, patients were categorized as: (i) typical T, B or Myeloid without or; (ii) with a transitional component to another lineage; (iii) atypical; or (iv) mixed-lineage. Using this automated algorithm, in 781/783 cases (99.7%) the right panel was selected, and data comparable to the final WHO-diagnosis was already provided in >93% of cases (85% T-ALL, 97% BCP-ALL, 95% AML and 87% mixed-phenotype AL patients), even without data on the full-characterization panels. Our results show that database-guided analysis facilitates standardized interpretation of ALOT results and allows accurate selection of the relevant classification panels, hence providing a solid basis for designing future WHO AL classifications. PMID:29089646
Results from a new 193nm die-to-database reticle inspection platform
NASA Astrophysics Data System (ADS)
Broadbent, William H.; Alles, David S.; Giusti, Michael T.; Kvamme, Damon F.; Shi, Rui-fang; Sousa, Weston L.; Walsh, Robert; Xiong, Yalin
2010-05-01
A new 193nm wavelength high resolution reticle defect inspection platform has been developed for both die-to-database and die-to-die inspection modes. In its initial configuration, this innovative platform has been designed to meet the reticle qualification requirements of the IC industry for the 22nm logic and 3xhp memory generations (and shrinks) with planned extensions to the next generation. The 22nm/3xhp IC generation includes advanced 193nm optical lithography using conventional RET, advanced computational lithography, and double patterning. Further, EUV pilot line lithography is beginning. This advanced 193nm inspection platform has world-class performance and the capability to meet these diverse needs in optical and EUV lithography. The architecture of the new 193nm inspection platform is described. Die-to-database inspection results are shown on a variety of reticles from industry sources; these reticles include standard programmed defect test reticles, as well as advanced optical and EUV product and product-like reticles. Results show high sensitivity and low false and nuisance detections on complex optical reticle designs and small feature size EUV reticles. A direct comparison with the existing industry standard 257nm wavelength inspection system shows measurable sensitivity improvement for small feature sizes
Results from a new die-to-database reticle inspection platform
NASA Astrophysics Data System (ADS)
Broadbent, William; Xiong, Yalin; Giusti, Michael; Walsh, Robert; Dayal, Aditya
2007-03-01
A new die-to-database high-resolution reticle defect inspection system has been developed for the 45nm logic node and extendable to the 32nm node (also the comparable memory nodes). These nodes will use predominantly 193nm immersion lithography although EUV may also be used. According to recent surveys, the predominant reticle types for the 45nm node are 6% simple tri-tone and COG. Other advanced reticle types may also be used for these nodes including: dark field alternating, Mask Enhancer, complex tri-tone, high transmission, CPL, EUV, etc. Finally, aggressive model based OPC will typically be used which will include many small structures such as jogs, serifs, and SRAF (sub-resolution assist features) with accompanying very small gaps between adjacent structures. The current generation of inspection systems is inadequate to meet these requirements. The architecture and performance of a new die-to-database inspection system is described. This new system is designed to inspect the aforementioned reticle types in die-to-database and die-to-die modes. Recent results from internal testing of the prototype systems are shown. The results include standard programmed defect test reticles and advanced 45nm and 32nm node reticles from industry sources. The results show high sensitivity and low false detections being achieved.
Graphics-based intelligent search and abstracting using Data Modeling
NASA Astrophysics Data System (ADS)
Jaenisch, Holger M.; Handley, James W.; Case, Carl T.; Songy, Claude G.
2002-11-01
This paper presents an autonomous text and context-mining algorithm that converts text documents into point clouds for visual search cues. This algorithm is applied to the task of data-mining a scriptural database comprised of the Old and New Testaments from the Bible and the Book of Mormon, Doctrine and Covenants, and the Pearl of Great Price. Results are generated which graphically show the scripture that represents the average concept of the database and the mining of the documents down to the verse level.
NASA Astrophysics Data System (ADS)
Maurer, Joshua; Rupper, Summer
2015-10-01
Declassified historical imagery from the Hexagon spy satellite database has near-global coverage, yet remains a largely untapped resource for geomorphic change studies. Unavailable satellite ephemeris data make DEM (digital elevation model) extraction difficult in terms of time and accuracy. A new fully-automated pipeline for DEM extraction and image orthorectification is presented which yields accurate results and greatly increases efficiency over traditional photogrammetric methods, making the Hexagon image database much more appealing and accessible. A 1980 Hexagon DEM is extracted and geomorphic change computed for the Thistle Creek Landslide region in the Wasatch Range of North America to demonstrate an application of the new method. Surface elevation changes resulting from the landslide show an average elevation decrease of 14.4 ± 4.3 m in the source area, an increase of 17.6 ± 4.7 m in the deposition area, and a decrease of 30.2 ± 5.1 m resulting from a new roadcut. Two additional applications of the method include volume estimates of material excavated during the Mount St. Helens volcanic eruption and the volume of net ice loss over a 34-year period for glaciers in the Bhutanese Himalayas. These results show the value of Hexagon imagery in detecting and quantifying historical geomorphic change, especially in regions where other data sources are limited.
Long Duration Exposure Facility (LDEF) optical systems SIG summary and database
NASA Astrophysics Data System (ADS)
Bohnhoff-Hlavacek, Gail
1992-09-01
The main objectives of the Long Duration Exposure Facility (LDEF) Optical Systems Special Investigative Group (SIG) Discipline are to develop a database of experimental findings on LDEF optical systems and elements hardware, and provide an optical system overview. Unlike the electrical and mechanical disciplines, the optics effort relies primarily on the testing of hardware at the various principal investigator's laboratories, since minimal testing of optical hardware was done at Boeing. This is because all space-exposed optics hardware are part of other individual experiments. At this time, all optical systems and elements testing by experiment investigator teams is not complete, and in some cases has hardly begun. Most experiment results to date, document observations and measurements that 'show what happened'. Still to come from many principal investigators is a critical analysis to explain 'why it happened' and future design implications. The original optical system related concerns and the lessons learned at a preliminary stage in the Optical Systems Investigations are summarized. The design of the Optical Experiments Database and how to acquire and use the database to review the LDEF results are described.
Long Duration Exposure Facility (LDEF) optical systems SIG summary and database
NASA Technical Reports Server (NTRS)
Bohnhoff-Hlavacek, Gail
1992-01-01
The main objectives of the Long Duration Exposure Facility (LDEF) Optical Systems Special Investigative Group (SIG) Discipline are to develop a database of experimental findings on LDEF optical systems and elements hardware, and provide an optical system overview. Unlike the electrical and mechanical disciplines, the optics effort relies primarily on the testing of hardware at the various principal investigator's laboratories, since minimal testing of optical hardware was done at Boeing. This is because all space-exposed optics hardware are part of other individual experiments. At this time, all optical systems and elements testing by experiment investigator teams is not complete, and in some cases has hardly begun. Most experiment results to date, document observations and measurements that 'show what happened'. Still to come from many principal investigators is a critical analysis to explain 'why it happened' and future design implications. The original optical system related concerns and the lessons learned at a preliminary stage in the Optical Systems Investigations are summarized. The design of the Optical Experiments Database and how to acquire and use the database to review the LDEF results are described.
Developing an A Priori Database for Passive Microwave Snow Water Retrievals Over Ocean
NASA Astrophysics Data System (ADS)
Yin, Mengtao; Liu, Guosheng
2017-12-01
A physically optimized a priori database is developed for Global Precipitation Measurement Microwave Imager (GMI) snow water retrievals over ocean. The initial snow water content profiles are derived from CloudSat Cloud Profiling Radar (CPR) measurements. A radiative transfer model in which the single-scattering properties of nonspherical snowflakes are based on the discrete dipole approximate results is employed to simulate brightness temperatures and their gradients. Snow water content profiles are then optimized through a one-dimensional variational (1D-Var) method. The standard deviations of the difference between observed and simulated brightness temperatures are in a similar magnitude to the observation errors defined for observation error covariance matrix after the 1D-Var optimization, indicating that this variational method is successful. This optimized database is applied in a Bayesian retrieval snow water algorithm. The retrieval results indicated that the 1D-Var approach has a positive impact on the GMI retrieved snow water content profiles by improving the physical consistency between snow water content profiles and observed brightness temperatures. Global distribution of snow water contents retrieved from the a priori database is compared with CloudSat CPR estimates. Results showed that the two estimates have a similar pattern of global distribution, and the difference of their global means is small. In addition, we investigate the impact of using physical parameters to subset the database on snow water retrievals. It is shown that using total precipitable water to subset the database with 1D-Var optimization is beneficial for snow water retrievals.
Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy
2013-08-01
Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
Cheng, Ching-Wu; Leu, Sou-Sen; Cheng, Ying-Mei; Wu, Tsung-Chih; Lin, Chen-Chung
2012-09-01
Construction accident research involves the systematic sorting, classification, and encoding of comprehensive databases of injuries and fatalities. The present study explores the causes and distribution of occupational accidents in the Taiwan construction industry by analyzing such a database using the data mining method known as classification and regression tree (CART). Utilizing a database of 1542 accident cases during the period 2000-2009, the study seeks to establish potential cause-and-effect relationships regarding serious occupational accidents in the industry. The results of this study show that the occurrence rules for falls and collapses in both public and private project construction industries serve as key factors to predict the occurrence of occupational injuries. The results of the study provide a framework for improving the safety practices and training programs that are essential to protecting construction workers from occasional or unexpected accidents. Copyright © 2011 Elsevier Ltd. All rights reserved.
Data-based fault-tolerant control for affine nonlinear systems with actuator faults.
Xie, Chun-Hua; Yang, Guang-Hong
2016-09-01
This paper investigates the fault-tolerant control (FTC) problem for unknown nonlinear systems with actuator faults including stuck, outage, bias and loss of effectiveness. The upper bounds of stuck faults, bias faults and loss of effectiveness faults are unknown. A new data-based FTC scheme is proposed. It consists of the online estimations of the bounds and a state-dependent function. The estimations are adjusted online to compensate automatically the actuator faults. The state-dependent function solved by using real system data helps to stabilize the system. Furthermore, all signals in the resulting closed-loop system are uniformly bounded and the states converge asymptotically to zero. Compared with the existing results, the proposed approach is data-based. Finally, two simulation examples are provided to show the effectiveness of the proposed approach. Copyright © 2016 ISA. Published by Elsevier Ltd. All rights reserved.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-05-01
To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Bouadjenek, Mohamed Reda; Verspoor, Karin; Zobel, Justin
2017-07-01
We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as "confident" or "suspicious". Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature. Copyright © 2017 Elsevier Inc. All rights reserved.
Udo, Renate; Tcherny-Lessenot, Stéphanie; Brauer, Ruth; Dolin, Paul; Irvine, David; Wang, Yunxun; Auclert, Laurent; Juhaeri, Juhaeri; Kurz, Xavier; Abenhaim, Lucien; Grimaldi, Lamiae; De Bruin, Marie L
2016-03-01
To examine the robustness of findings of case-control studies on the association between acute liver injury (ALI) and antibiotic use in the following different situations: (i) Replication of a protocol in different databases, with different data types, as well as replication in the same database, but performed by a different research team. (ii) Varying algorithms to identify cases, with and without manual case validation. (iii) Different exposure windows for time at risk. Five case-control studies in four different databases were performed with a common study protocol as starting point to harmonize study outcome definitions, exposure definitions and statistical analyses. All five studies showed an increased risk of ALI associated with antibiotic use ranging from OR 2.6 (95% CI 1.3-5.4) to 7.7 (95% CI 2.0-29.3). Comparable trends could be observed in the five studies: (i) without manual validation the use of the narrowest definition for ALI showed higher risk estimates, (ii) narrow and broad algorithm definitions followed by manual validation of cases resulted in similar risk estimates, and (iii) the use of a larger window (30 days vs 14 days) to define time at risk led to a decrease in risk estimates. Reproduction of a study using a predefined protocol in different database settings is feasible, although assumptions had to be made and amendments in the protocol were inevitable. Despite differences, the strength of association was comparable between the studies. In addition, the impact of varying outcome definitions and time windows showed similar trends within the data sources. Copyright © 2015 John Wiley & Sons, Ltd.
Informatics in radiology: use of CouchDB for document-based storage of DICOM objects.
Rascovsky, Simón J; Delgado, Jorge A; Sanz, Alexander; Calvo, Víctor D; Castrillón, Gabriel
2012-01-01
Picture archiving and communication systems traditionally have depended on schema-based Structured Query Language (SQL) databases for imaging data management. To optimize database size and performance, many such systems store a reduced set of Digital Imaging and Communications in Medicine (DICOM) metadata, discarding informational content that might be needed in the future. As an alternative to traditional database systems, document-based key-value stores recently have gained popularity. These systems store documents containing key-value pairs that facilitate data searches without predefined schemas. Document-based key-value stores are especially suited to archive DICOM objects because DICOM metadata are highly heterogeneous collections of tag-value pairs conveying specific information about imaging modalities, acquisition protocols, and vendor-supported postprocessing options. The authors used an open-source document-based database management system (Apache CouchDB) to create and test two such databases; CouchDB was selected for its overall ease of use, capability for managing attachments, and reliance on HTTP and Representational State Transfer standards for accessing and retrieving data. A large database was created first in which the DICOM metadata from 5880 anonymized magnetic resonance imaging studies (1,949,753 images) were loaded by using a Ruby script. To provide the usual DICOM query functionality, several predefined "views" (standard queries) were created by using JavaScript. For performance comparison, the same queries were executed in both the CouchDB database and a SQL-based DICOM archive. The capabilities of CouchDB for attachment management and database replication were separately assessed in tests of a similar, smaller database. Results showed that CouchDB allowed efficient storage and interrogation of all DICOM objects; with the use of information retrieval algorithms such as map-reduce, all the DICOM metadata stored in the large database were searchable with only a minimal increase in retrieval time over that with the traditional database management system. Results also indicated possible uses for document-based databases in data mining applications such as dose monitoring, quality assurance, and protocol optimization. RSNA, 2012
Corwin, John; Silberschatz, Avi; Miller, Perry L; Marenco, Luis
2007-01-01
Data sparsity and schema evolution issues affecting clinical informatics and bioinformatics communities have led to the adoption of vertical or object-attribute-value-based database schemas to overcome limitations posed when using conventional relational database technology. This paper explores these issues and discusses why biomedical data are difficult to model using conventional relational techniques. The authors propose a solution to these obstacles based on a relational database engine using a sparse, column-store architecture. The authors provide benchmarks comparing the performance of queries and schema-modification operations using three different strategies: (1) the standard conventional relational design; (2) past approaches used by biomedical informatics researchers; and (3) their sparse, column-store architecture. The performance results show that their architecture is a promising technique for storing and processing many types of data that are not handled well by the other two semantic data models.
A Blind Reversible Robust Watermarking Scheme for Relational Databases
Chang, Chin-Chen; Nguyen, Thai-Son; Lin, Chia-Chen
2013-01-01
Protecting the ownership and controlling the copies of digital data have become very important issues in Internet-based applications. Reversible watermark technology allows the distortion-free recovery of relational databases after the embedded watermark data are detected or verified. In this paper, we propose a new, blind, reversible, robust watermarking scheme that can be used to provide proof of ownership for the owner of a relational database. In the proposed scheme, a reversible data-embedding algorithm, which is referred to as “histogram shifting of adjacent pixel difference” (APD), is used to obtain reversibility. The proposed scheme can detect successfully 100% of the embedded watermark data, even if as much as 80% of the watermarked relational database is altered. Our extensive analysis and experimental results show that the proposed scheme is robust against a variety of data attacks, for example, alteration attacks, deletion attacks, mix-match attacks, and sorting attacks. PMID:24223033
A blind reversible robust watermarking scheme for relational databases.
Chang, Chin-Chen; Nguyen, Thai-Son; Lin, Chia-Chen
2013-01-01
Protecting the ownership and controlling the copies of digital data have become very important issues in Internet-based applications. Reversible watermark technology allows the distortion-free recovery of relational databases after the embedded watermark data are detected or verified. In this paper, we propose a new, blind, reversible, robust watermarking scheme that can be used to provide proof of ownership for the owner of a relational database. In the proposed scheme, a reversible data-embedding algorithm, which is referred to as "histogram shifting of adjacent pixel difference" (APD), is used to obtain reversibility. The proposed scheme can detect successfully 100% of the embedded watermark data, even if as much as 80% of the watermarked relational database is altered. Our extensive analysis and experimental results show that the proposed scheme is robust against a variety of data attacks, for example, alteration attacks, deletion attacks, mix-match attacks, and sorting attacks.
Producing approximate answers to database queries
NASA Technical Reports Server (NTRS)
Vrbsky, Susan V.; Liu, Jane W. S.
1993-01-01
We have designed and implemented a query processor, called APPROXIMATE, that makes approximate answers available if part of the database is unavailable or if there is not enough time to produce an exact answer. The accuracy of the approximate answers produced improves monotonically with the amount of data retrieved to produce the result. The exact answer is produced if all of the needed data are available and query processing is allowed to continue until completion. The monotone query processing algorithm of APPROXIMATE works within the standard relational algebra framework and can be implemented on a relational database system with little change to the relational architecture. We describe here the approximation semantics of APPROXIMATE that serves as the basis for meaningful approximations of both set-valued and single-valued queries. We show how APPROXIMATE is implemented to make effective use of semantic information, provided by an object-oriented view of the database, and describe the additional overhead required by APPROXIMATE.
The ClinicalTrials.gov Results Database — Update and Key Issues
Zarin, Deborah A.; Tse, Tony; Williams, Rebecca J.; Califf, Robert M.; Ide, Nicholas C.
2011-01-01
BACKGROUND The ClinicalTrials.gov trial registry was expanded in 2008 to include a database for reporting summary results. We summarize the structure and contents of the results database, provide an update of relevant policies, and show how the data can be used to gain insight into the state of clinical research. METHODS We analyzed ClinicalTrials.gov data that were publicly available between September 2009 and September 2010. RESULTS As of September 27, 2010, ClinicalTrials.gov received approximately 330 new and 2000 revised registrations each week, along with 30 new and 80 revised results submissions. We characterized the 79,413 registry and 2178 results of trial records available as of September 2010. From a sample cohort of results records, 78 of 150 (52%) had associated publications within 2 years after posting. Of results records available publicly, 20% reported more than two primary outcome measures and 5% reported more than five. Of a sample of 100 registry record outcome measures, 61% lacked specificity in describing the metric used in the planned analysis. In a sample of 700 results records, the mean number of different analysis populations per study group was 2.5 (median, 1; range, 1 to 25). Of these trials, 24% reported results for 90% or less of their participants. CONCLUSIONS ClinicalTrials.gov provides access to study results not otherwise available to the public. Although the database allows examination of various aspects of ongoing and completed clinical trials, its ultimate usefulness depends on the research community to submit accurate, informative data. PMID:21366476
[Trauma and accident documentation in Germany compared with elsewhere in Europe].
Probst, C; Richter, M; Haasper, C; Lefering, R; Otte, D; Oestern, H J; Krettek, C; Hüfner, T
2008-07-01
The role of trauma documentation has grown continuously since the 1970s. Prevention and management of injuries were adapted according to the results of many analyses. Since 1993 there have been two different trauma databases in Germany: the German trauma registry (TR) and the database of the Accident Research Unit (UFO). Modern computer applications improved the data processing. Our study analysed the pros and cons of each system and compared them with those of our European neighbours. We compared the TR and the UFO databases with respect to aims and goals, advantages and disadvantages, and current status. Results were reported as means +/- standard errors of the mean. The level of significance was set at P<0.05. There were differences between the two databases concerning number and types of items, aims and goals, and demographics. The TR documents care for severely injured patients and the clinical course of different types of accidents. The UFO describes traffic accidents, accident conditions, and interrelations. The German and British systems are similar, and the French system shows interesting differences. The German trauma documentation systems focus on different points. Therefore both can be used for substantiated analyses of different hypotheses. Certain intersections of both databases may help to answer very special questions in the future.
Recent Advances in the GLIMS Glacier Database
NASA Astrophysics Data System (ADS)
Raup, Bruce; Cogley, Graham; Zemp, Michael; Glaus, Ladina
2017-04-01
Glaciers are shrinking almost without exception. Glacier losses have impacts on local water availability and hazards, and contribute to sea level rise. To understand these impacts and the processes behind them, it is crucial to monitor glaciers through time by mapping their areal extent, changes in volume, elevation distribution, snow lines, ice flow velocities, and changes to associated water bodies. The glacier database of the Global Land Ice Measurements from Space (GLIMS) initiative is the only multi-temporal glacier database capable of tracking all these glacier measurements and providing them to the scientific community and broader public. Here we present recent results in 1) expansion of the geographic and temporal coverage of the GLIMS Glacier Database by drawing on the Randolph Glacier Inventory (RGI) and other new data sets; 2) improved tools for visualizing and downloading GLIMS data in a choice of formats and data models; and 3) a new data model for handling multiple glacier records through time while avoiding double-counting of glacier number or area. The result of this work is a more complete glacier data repository that shows not only the current state of glaciers on Earth, but how they have changed in recent decades. The database is useful for tracking changes in water resources, hazards, and mass budgets of the world's glaciers.
Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints.
Awale, Mahendra; Jin, Xian; Reymond, Jean-Louis
2015-01-01
Tools to explore large compound databases in search for analogs of query molecules provide a strategically important support in drug discovery to help identify available analogs of any given reference or hit compound by ligand based virtual screening (LBVS). We recently showed that large databases can be formatted for very fast searching with various 2D-fingerprints using the city-block distance as similarity measure, in particular a 2D-atom pair fingerprint (APfp) and the related category extended atom pair fingerprint (Xfp) which efficiently encode molecular shape and pharmacophores, but do not perceive stereochemistry. Here we investigated related 3D-atom pair fingerprints to enable rapid stereoselective searches in the ZINC database (23.2 million 3D structures). Molecular fingerprints counting atom pairs at increasing through-space distance intervals were designed using either all atoms (16-bit 3DAPfp) or different atom categories (80-bit 3DXfp). These 3D-fingerprints retrieved molecular shape and pharmacophore analogs (defined by OpenEye ROCS scoring functions) of 110,000 compounds from the Cambridge Structural Database with equal or better accuracy than the 2D-fingerprints APfp and Xfp, and showed comparable performance in recovering actives from decoys in the DUD database. LBVS by 3DXfp or 3DAPfp similarity was stereoselective and gave very different analogs when starting from different diastereomers of the same chiral drug. Results were also different from LBVS with the parent 2D-fingerprints Xfp or APfp. 3D- and 2D-fingerprints also gave very different results in LBVS of folded molecules where through-space distances between atom pairs are much shorter than topological distances. 3DAPfp and 3DXfp are suitable for stereoselective searches for shape and pharmacophore analogs of query molecules in large databases. Web-browsers for searching ZINC by 3DAPfp and 3DXfp similarity are accessible at www.gdb.unibe.ch and should provide useful assistance to drug discovery projects. Graphical abstractAtom pair fingerprints based on through-space distances (3DAPfp) provide better shape encoding than atom pair fingerprints based on topological distances (APfp) as measured by the recovery of ROCS shape analogs by fp similarity.
Gene and protein nomenclature in public databases
Fundel, Katrin; Zimmer, Ralf
2006-01-01
Background Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. Results We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. Conclusion In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application. PMID:16899134
Casuso-Holgado, María Jesús; Martín-Valero, Rocío; Carazo, Ana F; Medrano-Sánchez, Esther M; Cortés-Vega, M Dolores; Montero-Bancalero, Francisco José
2018-04-01
To evaluate the evidence for the use of virtual reality to treat balance and gait impairments in multiple sclerosis rehabilitation. Systematic review and meta-analysis of randomized controlled trials and quasi-randomized clinical trials. An electronic search was conducted using the following databases: MEDLINE (PubMed), Physiotherapy Evidence Database (PEDro), Cochrane Database of Systematic Reviews (CDSR) and (CINHAL). A quality assessment was performed using the PEDro scale. The data were pooled and a meta-analysis was completed. This systematic review was conducted in accordance with the (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) PRISMA guideline statement. It was registered in the PROSPERO database (CRD42016049360). A total of 11 studies were included. The data were pooled, allowing meta-analysis of seven outcomes of interest. A total of 466 participants clinically diagnosed with multiple sclerosis were analysed. Results showed that virtual reality balance training is more effective than no intervention for postural control improvement (standard mean difference (SMD) = -0.64; 95% confidence interval (CI) = -1.05, -0.24; P = 0.002). However, significant overall effect was not showed when compared with conventional training (SMD = -0.04; 95% CI = -0.70, 0.62; P = 0.90). Inconclusive results were also observed for gait rehabilitation. Virtual reality training could be considered at least as effective as conventional training and more effective than no intervention to treat balance and gait impairments in multiple sclerosis rehabilitation.
GHAZI MIRSAEID, Seyed Javad; MOTAMEDI, Nadia; RAMEZAN GHORBANI, Nahid
2015-01-01
Background: In this study, the impact of self-citation (Journal and Author) on impact factor of Iranian English Medical journals in two international citation databases, Web of Science (WoS) and Islamic world science citation center (ISC), were compared by citation analysis. Methods: Twelve journals in WoS and 26 journals in ISC databases indexed between the years (2006–2009) were selected and compared. For comparison of self-citation rate in two databases, we used Wilcoxon and Mann-whitney tests. We used Pearson test for correlation of self-citation and IF in WoS, and the Spearman’s correlation coefficient for the ISC database. Covariance analysis was used for comparison of two correlation tests. P. value was 0.05 in all of tests. Results: There was no significant difference between self-citation rates in two databases (P>0.05). Findings also showed no significant difference between the correlation of Journal self-citation and impact factor in two databases (P=0.526) however, there was significant difference between the author’s self-citation and impact factor in these databases (P<0.001). Conclusion: The impact of Author’s self-citation in the Impact Factor of WoS was higher than the ISC. PMID:26587498
Kupferberg, Natalie; Hartel, Lynda Jones
2004-01-01
Objectives: The purpose of this study is to assess the usefulness of five full-text drug databases as evaluated by medical librarians, pharmacy faculty, and pharmacy students at an academic health center. Study findings and recommendations are offered as guidance to librarians responsible for purchasing decisions. Methods: Four pharmacy students, four pharmacy faculty members, and four medical librarians answered ten drug information questions using the databases AHFS Drug Information (STAT!Ref); DRUGDEX (Micromedex); eFacts (Drug Facts and Comparisons); Lexi-Drugs Online (Lexi-Comp); and the PDR Electronic Library (Micromedex). Participants noted whether each database contained answers to the questions and evaluated each database on ease of navigation, screen readability, overall satisfaction, and product recommendation. Results: While each study group found that DRUGDEX provided the most direct answers to the ten questions, faculty members gave Lexi-Drugs the highest overall rating. Students favored eFacts. The faculty and students found the PDR least useful. Librarians ranked DRUGDEX the highest and AHFS the lowest. The comments of pharmacy faculty and students show that these groups preferred concise, easy-to-use sources; librarians focused on the comprehensiveness, layout, and supporting references of the databases. Conclusion: This study demonstrates the importance of consulting with primary clientele before purchasing databases. Although there are many online drug databases to consider, present findings offer strong support for eFacts, Lexi-Drugs, and DRUGDEX. PMID:14762464
Adverse Events Associated with Prolonged Antibiotic Use
Meropol, Sharon B.; Chan, K. Arnold; Chen, Zhen; Finkelstein, Jonathan A.; Hennessy, Sean; Lautenbach, Ebbing; Platt, Richard; Schech, Stephanie D.; Shatin, Deborah; Metlay, Joshua P.
2014-01-01
Purpose The Infectious Diseases Society of America and US CDC recommend 60 days of ciprofloxacin, doxycycline or amoxicillin for anthrax prophylaxis. It is not possible to determine severe adverse drug event (ADE) risks from the few people thus far exposed to anthrax prophylaxis. This study’s objective was to estimate risks of severe ADEs associated with long-term ciprofloxacin, doxycycline and amoxicillin exposure using 3 large databases: one electronic medical record (General Practice Research Database) and two claims databases (UnitedHealthcare, HMO Research Network). Methods We include office visit, hospital admission and prescription data for 1/1/1999–6/30/2001. Exposure variable was oral antibiotic person-days (pds). Primary outcome was hospitalization during exposure with ADE diagnoses: anaphylaxis, phototoxicity, hepatotoxicity, nephrotoxicity, seizures, ventricular arrhythmia or infectious colitis. Results We randomly sampled 999,773, 1,047,496 and 1,819,004 patients from Databases A, B and C respectively. 33,183 amoxicillin, 15,250 ciprofloxacin and 50,171 doxycycline prescriptions continued ≥30 days. ADE hospitalizations during long-term exposure were not observed in Database A. ADEs during long-term amoxicillin were seen only in Database C with 5 ADEs or 1.2(0.4–2.7) ADEs/100,000 pds exposure. Long-term ciprofloxacin showed 3 and 4 ADEs with 5.7(1.2–16.6) and 3.5(1.0–9.0) ADEs/100,000 pds in Databases B and C, respectively. Only Database B had ADEs during long-term doxycycline with 3 ADEs or 0.9(0.2–2.6) ADEs/100,000 pds. For most events, the incidence rate ratio, comparing >28 vs.1–28 pds exposure was <1, showing limited evidence for cumulative dose-related ADEs from long-term exposure. Conclusions Long-term amoxicillin, ciprofloxacin and doxycycline appears safe, supporting use of these medications if needed for large-scale post-exposure anthrax prophylaxis. PMID:18215001
A Picture Database for Verbs and Nouns with Different Action Content in Turkish.
Bayram, Ece; Aydin, Özgür; Ergenc, Hacer Iclal; Akbostanci, Muhittin Cenk
2017-08-01
In this study we present a picture database of 160 nouns and 160 verbs. All verbs and nouns are divided into two groups as action and non-action words. Age of acquisition, familiarity, imageability, name agreement and complexity norms are reported alongside frequency, word length and morpheme count for each word. Data were collected from 600 native Turkish adults in total. The results show that although several measures have weak correlations with each other, only age of acquisition had moderate downhill relationships with familiarity and frequency with familiarity and frequency having a rather strong positive correlation with each other. The norms and the picture database are available as supplemental materials for use in psycholinguistic studies in Turkish.
Strickler, Jeffery C; Lopiano, Kenneth K
2016-11-01
This study profiles an innovative approach to capture patient satisfaction data from emergency department (ED) patients by implementing an electronic survey method. This study compares responders to nonresponders. Our hypothesis is that the cohort of survey respondents will be similar to nonresponders in terms of the key characteristics of age, gender, race, ethnicity, ED disposition, and payor status. This study is a cross-sectional design using secondary data from the database and provides an opportunity for univariate analysis of the key characteristics for each group. The data elements will be abstracted from the database and compared with the same key characteristics from a similar sample from the database on nonresponders to the ED satisfaction survey. Age showed a statistically significant difference between responders and nonresponders. Comparison by disposition status showed no substantial difference between responders and nonresponders. Gender distribution showed a greater number of female than male responders. Race distribution showed a greater number and response by white and Asian patients as compared with African Americans. A review of ethnicity showed fewer Hispanics responded. An evaluation by payor classification showed greater number and response rate by those with a commercial or Workers Comp payor source. The response rate by Medicare recipients was stronger than expected; however, the response rate by Medicaid recipients and self-pay could be a concern for underrepresentation by lower socioeconomic groups. Finally, the evaluation of the method of notification showed that notification by both e-mail and text substantially improved response rates. The evaluation of key characteristics showed no difference related to disposition, but differences related to age, gender, race, ethnicity, and payor classification. These results point to a potential concern for underrepresentation by lower socioeconomic groups. The results showed that notification by both e-mail and text substantially improved response rates.
Kreaden, Usha S.; Gabbert, Jessica; Thomas, Raju
2014-01-01
Abstract Introduction: The primary aims of this study were to assess the learning curve effect of robot-assisted radical prostatectomy (RARP) in a large administrative database consisting of multiple U.S. hospitals and surgeons, and to compare the results of RARP with open radical prostatectomy (ORP) from the same settings. Materials and Methods: The patient population of study was from the Premier Perspective Database (Premier, Inc., Charlotte, NC) and consisted of 71,312 radical prostatectomies performed at more than 300 U.S. hospitals by up to 3739 surgeons by open or robotic techniques from 2004 to 2010. The key endpoints were surgery time, inpatient length of stay, and overall complications. We compared open versus robotic, results by year of procedures, results by case volume of specific surgeons, and results of open surgery in hospitals with and without a robotic system. Results: The mean surgery time was longer for RARP (4.4 hours, standard deviation [SD] 1.7) compared with ORP (3.4 hours, SD 1.5) in the same hospitals (p<0.0001). Inpatient stay was shorter for RARP (2.2 days, SD 1.9) compared with ORP (3.2 days, SD 2.7) in the same hospitals (p<0.0001). The overall complications were less for RARP (10.6%) compared with ORP (15.8%) in the same hospitals, as were transfusion rates. ORP results in hospitals without a robot were not better than ORP with a robot, and pretreatment co-morbidity profiles were similar in all cohorts. Trending of results by year of procedure showed no differences in the three cohorts, but trending of RARP results by surgeon experience showed improvements in surgery time, hospital stay, conversion rates, and complication rates. Conclusions: During the initial 7 years of RARP development, outcomes showed decreased hospital stay, complications, and transfusion rates. Learning curve trends for RARP were evident for these endpoints when grouped by surgeon experience, but not by year of surgery. PMID:24350787
Comparison of Aircraft Icing Growth Assessment Software
NASA Technical Reports Server (NTRS)
Wright, William; Potapczuk, Mark G.; Levinson, Laurie H.
2011-01-01
A research project is underway to produce computer software that can accurately predict ice growth under any meteorological conditions for any aircraft surface. An extensive comparison of the results in a quantifiable manner against the database of ice shapes that have been generated in the NASA Glenn Icing Research Tunnel (IRT) has been performed, including additional data taken to extend the database in the Super-cooled Large Drop (SLD) regime. The project shows the differences in ice shape between LEWICE 3.2.2, GlennICE, and experimental data. The project addresses the validation of the software against a recent set of ice-shape data in the SLD regime. This validation effort mirrors a similar effort undertaken for previous validations of LEWICE. Those reports quantified the ice accretion prediction capabilities of the LEWICE software. Several ice geometry features were proposed for comparing ice shapes in a quantitative manner. The resulting analysis showed that LEWICE compared well to the available experimental data.
Exploring Discretization Error in Simulation-Based Aerodynamic Databases
NASA Technical Reports Server (NTRS)
Aftosmis, Michael J.; Nemec, Marian
2010-01-01
This work examines the level of discretization error in simulation-based aerodynamic databases and introduces strategies for error control. Simulations are performed using a parallel, multi-level Euler solver on embedded-boundary Cartesian meshes. Discretization errors in user-selected outputs are estimated using the method of adjoint-weighted residuals and we use adaptive mesh refinement to reduce these errors to specified tolerances. Using this framework, we examine the behavior of discretization error throughout a token database computed for a NACA 0012 airfoil consisting of 120 cases. We compare the cost and accuracy of two approaches for aerodynamic database generation. In the first approach, mesh adaptation is used to compute all cases in the database to a prescribed level of accuracy. The second approach conducts all simulations using the same computational mesh without adaptation. We quantitatively assess the error landscape and computational costs in both databases. This investigation highlights sensitivities of the database under a variety of conditions. The presence of transonic shocks or the stiffness in the governing equations near the incompressible limit are shown to dramatically increase discretization error requiring additional mesh resolution to control. Results show that such pathologies lead to error levels that vary by over factor of 40 when using a fixed mesh throughout the database. Alternatively, controlling this sensitivity through mesh adaptation leads to mesh sizes which span two orders of magnitude. We propose strategies to minimize simulation cost in sensitive regions and discuss the role of error-estimation in database quality.
Use of Graph Database for the Integration of Heterogeneous Biological Data.
Yoon, Byoung-Ha; Kim, Seon-Kyu; Kim, Seon-Young
2017-03-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.
Use of Graph Database for the Integration of Heterogeneous Biological Data
Yoon, Byoung-Ha; Kim, Seon-Kyu
2017-01-01
Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data. PMID:28416946
Interactive Exploration for Continuously Expanding Neuron Databases.
Li, Zhongyu; Metaxas, Dimitris N; Lu, Aidong; Zhang, Shaoting
2017-02-15
This paper proposes a novel framework to help biologists explore and analyze neurons based on retrieval of data from neuron morphological databases. In recent years, the continuously expanding neuron databases provide a rich source of information to associate neuronal morphologies with their functional properties. We design a coarse-to-fine framework for efficient and effective data retrieval from large-scale neuron databases. In the coarse-level, for efficiency in large-scale, we employ a binary coding method to compress morphological features into binary codes of tens of bits. Short binary codes allow for real-time similarity searching in Hamming space. Because the neuron databases are continuously expanding, it is inefficient to re-train the binary coding model from scratch when adding new neurons. To solve this problem, we extend binary coding with online updating schemes, which only considers the newly added neurons and update the model on-the-fly, without accessing the whole neuron databases. In the fine-grained level, we introduce domain experts/users in the framework, which can give relevance feedback for the binary coding based retrieval results. This interactive strategy can improve the retrieval performance through re-ranking the above coarse results, where we design a new similarity measure and take the feedback into account. Our framework is validated on more than 17,000 neuron cells, showing promising retrieval accuracy and efficiency. Moreover, we demonstrate its use case in assisting biologists to identify and explore unknown neurons. Copyright © 2017 Elsevier Inc. All rights reserved.
Acupuncture for neurogenesis in experimental ischemic stroke: a systematic review and meta-analysis.
Lu, Lin; Zhang, Xiao-guang; Zhong, Linda L D; Chen, Zi-xian; Li, Yan; Zheng, Guo-qing; Bian, Zhao-xiang
2016-01-20
Acupuncture has been used for patients with stroke and post-stroke rehabilitation for thousands of years. Previous studies reported that acupuncture enhanced stroke recovery through neurogenesis. Hence, we conducted a systematic review and meta-analysis for preclinical studies to assess the current evidence for acupuncture effect on neurogenesis in treating ischaemic stroke. Studies were obtained from six databases, including PubMed, EMBASE, Cochrane Library, Chinese National Knowledge Infrastructure, VIP information database, and Chinese Biomedical Literature Database, Ultimately, 34 studies containing 1617 animals were identified. Neurogenesis markers of Brdu, Nestin, PSA-NCAM, NeuN and GFAP were selected as major outcomes. The pooled results of 15 studies marked with Brdu showed significant effects of acupuncture for improving proliferation when compared with control groups (P < 0.01); 13 studies marked with Nestin showed significant effects of acupuncture for increasing proliferation when compared with control groups (P < 0.01); 4 studies marked with PSA-NCAM showed significant effects of acupuncture for enhancing migration when compared with control groups (P < 0.01); 4 studies marked with NeuN showed significant effects of acupuncture for stimulating differentiation when compared with control groups (P < 0.01). The findings suggest that acupuncture is a prospective therapy targeting neurogenesis for ischemic stroke.
Performance of Stratified and Subgrouped Disproportionality Analyses in Spontaneous Databases.
Seabroke, Suzie; Candore, Gianmario; Juhlin, Kristina; Quarcoo, Naashika; Wisniewski, Antoni; Arani, Ramin; Painter, Jeffery; Tregunno, Philip; Norén, G Niklas; Slattery, Jim
2016-04-01
Disproportionality analyses are used in many organisations to identify adverse drug reactions (ADRs) from spontaneous report data. Reporting patterns vary over time, with patient demographics, and between different geographical regions, and therefore subgroup analyses or adjustment by stratification may be beneficial. The objective of this study was to evaluate the performance of subgroup and stratified disproportionality analyses for a number of key covariates within spontaneous report databases of differing sizes and characteristics. Using a reference set of established ADRs, signal detection performance (sensitivity and precision) was compared for stratified, subgroup and crude (unadjusted) analyses within five spontaneous report databases (two company, one national and two international databases). Analyses were repeated for a range of covariates: age, sex, country/region of origin, calendar time period, event seriousness, vaccine/non-vaccine, reporter qualification and report source. Subgroup analyses consistently performed better than stratified analyses in all databases. Subgroup analyses also showed benefits in both sensitivity and precision over crude analyses for the larger international databases, whilst for the smaller databases a gain in precision tended to result in some loss of sensitivity. Additionally, stratified analyses did not increase sensitivity or precision beyond that associated with analytical artefacts of the analysis. The most promising subgroup covariates were age and region/country of origin, although this varied between databases. Subgroup analyses perform better than stratified analyses and should be considered over the latter in routine first-pass signal detection. Subgroup analyses are also clearly beneficial over crude analyses for larger databases, but further validation is required for smaller databases.
Jia, Jianping; Sun, Xiaohui; Dai, Song; Sang, Yuehong
2016-01-01
Benign paroxysmal positional vertigo (BPPV) is a common vestibular disorder that causes vertigo. Study of BPPV has dramatically rapid progress in recent years. We analyze the BPPV growth We searched the international data quantity year by year in database of PubMed, ScienceDirect and WILEY before 2014 respectively, then we searched the domestic data quantity year by year in database of CNKI, VIP and Wanfang Data before 2015 by selecting "Benign paroxysmal positional vertigo" as the keywords. Then we carried out regression analysis with the gathered results in above databases to determine data growth regularity and main factors that affect future development of BPPV. Also, we analyzes published BPPV papers in domestic and international journals. PubMed database contains 808 literatures, ScienceDirect contains 177 database and WILEY contains 46 literatures, All together we collected 1 038 international articles. CNKI contains 440 literatures, VIP contains 580 literatures and WanFang data contains 449 literatures. All together we collected 1 469 domestic literatures. It shows the rising trend of the literature accumulation amount of BPPV. The scattered point diagram of BPPV shows an exponential growing trend, which was growing slowly in the early time but rapidly in recent years. It shows that the development of BPPV has three stages from international arical: exploration period (before 1985), breakthrough period (1986-1998). The deepening stage (after 1998), Chinese literature also has three stages from domestic BPPV precess. Blank period (before the year of 1982), the enlightenment period (1982-2004), the deepening stage (after the year of 2004). In the pregress of BPPV, many outsantding sccholars played an important role in domestic scitifction of researching, which has produced a certain influence in the worldwide.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malyapa, Robert; Lowe, Matthew; Christie Medical Physics and Engineering, The Christie NHS Foundation Trust, Manchester
Purpose: To evaluate the robustness of head and neck plans for treatment with intensity modulated proton therapy to range and setup errors, and to establish robustness parameters for the planning of future head and neck treatments. Methods and Materials: Ten patients previously treated were evaluated in terms of robustness to range and setup errors. Error bar dose distributions were generated for each plan, from which several metrics were extracted and used to define a robustness database of acceptable parameters over all analyzed plans. The patients were treated in sequentially delivered series, and plans were evaluated for both the first seriesmore » and for the combined error over the whole treatment. To demonstrate the application of such a database in the head and neck, for 1 patient, an alternative treatment plan was generated using a simultaneous integrated boost (SIB) approach and plans of differing numbers of fields. Results: The robustness database for the treatment of head and neck patients is presented. In an example case, comparison of single and multiple field plans against the database show clear improvements in robustness by using multiple fields. A comparison of sequentially delivered series and an SIB approach for this patient show both to be of comparable robustness, although the SIB approach shows a slightly greater sensitivity to uncertainties. Conclusions: A robustness database was created for the treatment of head and neck patients with intensity modulated proton therapy based on previous clinical experience. This will allow the identification of future plans that may benefit from alternative planning approaches to improve robustness.« less
Historical hydrology and database on flood events (Apulia, southern Italy)
NASA Astrophysics Data System (ADS)
Lonigro, Teresa; Basso, Alessia; Gentile, Francesco; Polemio, Maurizio
2014-05-01
Historical data about floods represent an important tool for the comprehension of the hydrological processes, the estimation of hazard scenarios as a basis for Civil Protection purposes, as a basis of the rational land use management, especially in karstic areas, where time series of river flows are not available and the river drainage is rare. The research shows the importance of the improvement of existing flood database with an historical approach, finalized to collect past or historical floods event, in order to better assess the occurrence trend of floods, in the case for the Apulian region (south Italy). The main source of records of flood events for Apulia was the AVI (the acronym means Italian damaged areas) database, an existing Italian database that collects data concerning damaging floods from 1918 to 1996. The database was expanded consulting newspapers, publications, and technical reports from 1996 to 2006. In order to expand the temporal range further data were collected searching in the archives of regional libraries. About 700 useful news from 17 different local newspapers were found from 1876 to 1951. From a critical analysis of the 700 news collected since 1876 to 1952 only 437 were useful for the implementation of the Apulia database. The screening of these news showed the occurrence of about 122 flood events in the entire region. The district of Bari, the regional main town, represents the area in which the great number of events occurred; the historical analysis confirms this area as flood-prone. There is an overlapping period (from 1918 to 1952) between old AVI database and new historical dataset obtained by newspapers. With regard to this period, the historical research has highlighted new flood events not reported in the existing AVI database and it also allowed to add more details to the events already recorded. This study shows that the database is a dynamic instrument, which allows a continuous implementation of data, even in real time. More details on previous results of this research activity were recently published (Polemio, 2010; Basso et al., 2012; Lonigro et al., 2013) References Basso A., Lonigro T. and Polemio M. (2012) "The improvement of historical database on damaging hydrogeological events in the case of Apulia (Southern Italy)". Rendiconti online della Società Geologica Italiana, 21: 379-380; Lonigro T., Basso A. and Polemio M. (2013) "Historical database on damaging hydrogeological events in Apulia region (Southern Italy)". Rendiconti online della Società Geologica Italiana, 24: 196-198; Polemio M. (2010) "Historical floods and a recent extreme rainfall event in the Murgia karstic environment (Southern Italy)". Zeitschrift für Geomorphologie, 54(2): 195-219.
Critical assessment of human metabolic pathway databases: a stepping stone for future integration
2011-01-01
Background Multiple pathway databases are available that describe the human metabolic network and have proven their usefulness in many applications, ranging from the analysis and interpretation of high-throughput data to their use as a reference repository. However, so far the various human metabolic networks described by these databases have not been systematically compared and contrasted, nor has the extent to which they differ been quantified. For a researcher using these databases for particular analyses of human metabolism, it is crucial to know the extent of the differences in content and their underlying causes. Moreover, the outcomes of such a comparison are important for ongoing integration efforts. Results We compared the genes, EC numbers and reactions of five frequently used human metabolic pathway databases. The overlap is surprisingly low, especially on reaction level, where the databases agree on 3% of the 6968 reactions they have combined. Even for the well-established tricarboxylic acid cycle the databases agree on only 5 out of the 30 reactions in total. We identified the main causes for the lack of overlap. Importantly, the databases are partly complementary. Other explanations include the number of steps a conversion is described in and the number of possible alternative substrates listed. Missing metabolite identifiers and ambiguous names for metabolites also affect the comparison. Conclusions Our results show that each of the five networks compared provides us with a valuable piece of the puzzle of the complete reconstruction of the human metabolic network. To enable integration of the networks, next to a need for standardizing the metabolite names and identifiers, the conceptual differences between the databases should be resolved. Considerable manual intervention is required to reach the ultimate goal of a unified and biologically accurate model for studying the systems biology of human metabolism. Our comparison provides a stepping stone for such an endeavor. PMID:21999653
The effect of care pathways for hip fractures: a systematic review.
Leigheb, Fabrizio; Vanhaecht, Kris; Sermeus, Walter; Lodewijckx, Cathy; Deneckere, Svin; Boonen, Steven; Boto, Paulo Alexandre Faria; Mendes, Rita Veloso; Panella, Massimiliano
2012-07-01
We performed a systematic review for primary studies on care pathways (CPs) for hip fracture (HF). The online databases MEDLINE-PubMed, Ovid-EMBASE, CINAHL-EBSCO host, and The Cochrane Library (Cochrane Central Register of Clinical Trials, Health Technology Assessment Database, NHS Economic Evaluation Database) were searched. Two researchers reviewed the literature independently. Primary studies that met predefined inclusion criteria were assessed for their methodological quality. A total of 15 publications were included: 15 primary studies corresponding with 12 main investigations. Primary studies were evaluated for clinical outcomes, process outcomes, and economic outcomes. The studies assessed a wide range of outcome measures. While a number of divergent clinical outcomes were reported, most studies showed positive results of process management and health-services utilization. In terms of mortality, the results provided evidence for a positive impact of CPs on in-hospital mortality. Most studies also showed a significantly reduced risk of complications, including medical complications, wound infections, and pressure sores. Moreover, time-span process measures showed that an improvement in the organization of care was achieved through the use of CPs. Conflicting results were observed with regard to functional recovery and mobility between patients treated with CPs compared to usual care. Although our review suggests that CPs can have positive effects in patients with HF, the available evidence is insufficient for formal recommendations. There is a need for more research on CPs with selected process and outcome indicators, for in-hospital and postdischarge management of HF, with an emphasis on well-designed randomized trials.
2010-01-01
Background A plant-based diet protects against chronic oxidative stress-related diseases. Dietary plants contain variable chemical families and amounts of antioxidants. It has been hypothesized that plant antioxidants may contribute to the beneficial health effects of dietary plants. Our objective was to develop a comprehensive food database consisting of the total antioxidant content of typical foods as well as other dietary items such as traditional medicine plants, herbs and spices and dietary supplements. This database is intended for use in a wide range of nutritional research, from in vitro and cell and animal studies, to clinical trials and nutritional epidemiological studies. Methods We procured samples from countries worldwide and assayed the samples for their total antioxidant content using a modified version of the FRAP assay. Results and sample information (such as country of origin, product and/or brand name) were registered for each individual food sample and constitute the Antioxidant Food Table. Results The results demonstrate that there are several thousand-fold differences in antioxidant content of foods. Spices, herbs and supplements include the most antioxidant rich products in our study, some exceptionally high. Berries, fruits, nuts, chocolate, vegetables and products thereof constitute common foods and beverages with high antioxidant values. Conclusions This database is to our best knowledge the most comprehensive Antioxidant Food Database published and it shows that plant-based foods introduce significantly more antioxidants into human diet than non-plant foods. Because of the large variations observed between otherwise comparable food samples the study emphasizes the importance of using a comprehensive database combined with a detailed system for food registration in clinical and epidemiological studies. The present antioxidant database is therefore an essential research tool to further elucidate the potential health effects of phytochemical antioxidants in diet. PMID:20096093
WMC Database Evaluation. Case Study Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palounek, Andrea P. T
The WMC Database is ultimately envisioned to hold a collection of experimental data, design information, and information from computational models. This project was a first attempt at using the Database to access experimental data and extract information from it. This evaluation shows that the Database concept is sound and robust, and that the Database, once fully populated, should remain eminently usable for future researchers.
Wang, Shuai; Wei, Yan-Zhao; Yang, Jian-Hong; Zhou, Yu-Ming; Cheng, Yu-Hang; Yang, Chao; Zheng, Yi
2017-08-01
The aims are to evaluate the efficacy and safety of aripiprazole for tic disorders (TDs) in children and adolescents. We searched PubMed, Embase, PsychINFO, Cochrane database as well as Chinese databases of CNKI, VIP, CBM and Wanfang from the database inception to October 2016, and 17 full-text studies (N=1305) were included in our article. The meta-analysis of 10 studies (N=817) showed that there was no significant difference in the reduction of total YGTSS score between aripiprazole and other drugs, and meta-analysis of 7 studies (n=324) which used tic symptom control ≧30% as outcome measure showed that there was no significant difference between aripiprazole and other treatments. The most common AEs of aripiprazole were the drowsiness, nausea/vomiting and increased appetite, and meta analysis which used the TESS scale as the outcome measurement showed that there was a significant difference between aripiprazole and haloperidol. In conclusion, these data provide moderate quality evidence that aripiprazole could be an effective and safe treatment option for TDs, and results from further trials are urgently needed to extend this evidence base. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
Effects of pilates on patients with chronic non-specific low back pain: a systematic review
Lin, Hui-Ting; Hung, Wei-Ching; Hung, Jia-Ling; Wu, Pei-Shan; Liaw, Li-Jin; Chang, Jia-Hao
2016-01-01
[Purpose] To evaluate the effects of Pilates on patients with chronic low back pain through a systematic review of high-quality articles on randomized controlled trials. [Subjects and Methods] Keywords and synonyms for “Pilates” and “Chronic low back pain” were used in database searches. The databases included PubMed, Physiotherapy Evidence Database (PEDro), Medline, and the Cochrane Library. Articles involving randomized controlled trials with higher than 5 points on the PEDro scale were reviewed for suitability and inclusion. The methodological quality of the included randomized controlled trials was evaluated using the PEDro scale. Relevant information was extracted by 3 reviewers. [Results] Eight randomized controlled trial articles were included. Patients with chronic low back pain showed statistically significant improvement in pain relief and functional ability compared to patients who only performed usual or routine health care. However, other forms of exercise were similar to Pilates in the improvement of pain relief and functional capacity. [Conclusion] In patients with chronic low back pain, Pilates showed significant improvement in pain relief and functional enhancement. Other exercises showed effects similar to those of Pilates, if waist or torso movement was included and the exercises were performed for 20 cumulative hours. PMID:27821970
NASA Astrophysics Data System (ADS)
Skotniczny, Zbigniew
1989-12-01
The Query by Forms (QbF) system is a user-oriented interactive tool for querying large relational database with minimal queries difinition cost. The system was worked out under the assumption that user's time and effort for defining needed queries is the most severe bottleneck. The system may be applied in any Rdb/VMS databases system and is recommended for specific information systems of any project where end-user queries cannot be foreseen. The tool is dedicated to specialist of an application domain who have to analyze data maintained in database from any needed point of view, who do not need to know commercial databases languages. The paper presents the system developed as a compromise between its functionality and usability. User-system communication via a menu-driven "tree-like" structure of screen-forms which produces a query difinition and execution is discussed in detail. Output of query results (printed reports and graphics) is also discussed. Finally the paper shows one application of QbF to a HERA-project.
Authentication Based on Pole-zero Models of Signature Velocity
Rashidi, Saeid; Fallah, Ali; Towhidkhah, Farzad
2013-01-01
With the increase of communication and financial transaction through internet, on-line signature verification is an accepted biometric technology for access control and plays a significant role in authenticity and authorization in modernized society. Therefore, fast and precise algorithms for the signature verification are very attractive. The goal of this paper is modeling of velocity signal that pattern and properties is stable for persons. With using pole-zero models based on discrete cosine transform, precise method is proposed for modeling and then features is founded from strokes. With using linear, parzen window and support vector machine classifiers, the signature verification technique was tested with a large number of authentic and forgery signatures and has demonstrated the good potential of this technique. The signatures are collected from three different database include a proprietary database, the SVC2004 and the Sabanci University signature database benchmark databases. Experimental results based on Persian, SVC2004 and SUSIG databases show that our method achieves an equal error rate of 5.91%, 5.62% and 3.91% in the skilled forgeries, respectively. PMID:24696797
Minervino, Aline Costa; Duarte, Elisabeth Carmen
2016-03-01
This article outlines the results of a descriptive study that analyses loss and damage caused by hydrometeorological disasters in Brazil between 2010 and 2014 using the EM DAT (global) and S2iD (national) databases. The analysis shows major differences in the total number of disaster events included in the databases (EM-DAT = 36; S2iD = 4,070) and estimated costs of loss and damage (EM-DAT - R$ 9.2 billion; S2iD - R$331.4 billion). The analysis also shows that the five states most affected by these events are Santa Catarina, Rio Grande do Sul, Minas Gerais, São Paulo and Paraná in Brazil's South and Southeast regions and that these results are consistent with the findings of other studies. The costs of disasters were highest for housing, public infrastructure works, collectively used public facilities, other public service facilities, and state health and education facilities. The costs associated with public health facilities were also high. Despite their limitations, both databases demonstrated their usefulness for determining seasonal and long-term trends and patterns, and risk areas, and thus assist decision makers in identifying areas that are most affected by and vulnerable to natural disasters.
Cricket: A Mapped, Persistent Object Store
NASA Technical Reports Server (NTRS)
Shekita, Eugene; Zwilling, Michael
1996-01-01
This paper describes Cricket, a new database storage system that is intended to be used as a platform for design environments and persistent programming languages. Cricket uses the memory management primitives of the Mach operating system to provide the abstraction of a shared, transactional single-level store that can be directly accessed by user applications. In this paper, we present the design and motivation for Cricket. We also present some initial performance results which show that, for its intended applications, Cricket can provide better performance than a general-purpose database storage system.
A humming retrieval system based on music fingerprint
NASA Astrophysics Data System (ADS)
Han, Xingkai; Cao, Baiyu
2011-10-01
In this paper, we proposed an improved music information retrieval method utilizing the music fingerprint. The goal of this method is to represent the music with compressed musical information. Based on the selected MIDI files, which are generated automatically as our music target database, we evaluate the accuracy, effectiveness, and efficiency of this method. In this research we not only extract the feature sequence, which can represent the file effectively, from the query and melody database, but also make it possible for retrieving the results in an innovative way. We investigate on the influence of noise to the performance of our system. As experimental result shows, the retrieval accuracy arriving at up to91% without noise is pretty well
An evaluation of information retrieval accuracy with simulated OCR output
DOE Office of Scientific and Technical Information (OSTI.GOV)
Croft, W.B.; Harding, S.M.; Taghva, K.
Optical Character Recognition (OCR) is a critical part of many text-based applications. Although some commercial systems use the output from OCR devices to index documents without editing, there is very little quantitative data on the impact of OCR errors on the accuracy of a text retrieval system. Because of the difficulty of constructing test collections to obtain this data, we have carried out evaluation using simulated OCR output on a variety of databases. The results show that high quality OCR devices have little effect on the accuracy of retrieval, but low quality devices used with databases of short documents canmore » result in significant degradation.« less
Construction of databases: advances and significance in clinical research.
Long, Erping; Huang, Bingjie; Wang, Liming; Lin, Xiaoyu; Lin, Haotian
2015-12-01
Widely used in clinical research, the database is a new type of data management automation technology and the most efficient tool for data management. In this article, we first explain some basic concepts, such as the definition, classification, and establishment of databases. Afterward, the workflow for establishing databases, inputting data, verifying data, and managing databases is presented. Meanwhile, by discussing the application of databases in clinical research, we illuminate the important role of databases in clinical research practice. Lastly, we introduce the reanalysis of randomized controlled trials (RCTs) and cloud computing techniques, showing the most recent advancements of databases in clinical research.
Expediting topology data gathering for the TOPDB database.
Dobson, László; Langó, Tamás; Reményi, István; Tusnády, Gábor E
2015-01-01
The Topology Data Bank of Transmembrane Proteins (TOPDB, http://topdb.enzim.ttk.mta.hu) contains experimentally determined topology data of transmembrane proteins. Recently, we have updated TOPDB from several sources and utilized a newly developed topology prediction algorithm to determine the most reliable topology using the results of experiments as constraints. In addition to collecting the experimentally determined topology data published in the last couple of years, we gathered topographies defined by the TMDET algorithm using 3D structures from the PDBTM. Results of global topology analysis of various organisms as well as topology data generated by high throughput techniques, like the sequential positions of N- or O-glycosylations were incorporated into the TOPDB database. Moreover, a new algorithm was developed to integrate scattered topology data from various publicly available databases and a new method was introduced to measure the reliability of predicted topologies. We show that reliability values highly correlate with the per protein topology accuracy of the utilized prediction method. Altogether, more than 52,000 new topology data and more than 2600 new transmembrane proteins have been collected since the last public release of the TOPDB database. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
SING: Subgraph search In Non-homogeneous Graphs
2010-01-01
Background Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs. PMID:20170516
DOE Office of Scientific and Technical Information (OSTI.GOV)
Muñoz-Jaramillo, Andrés; Windmueller, John C.; Amouzou, Ernest C.
2015-02-10
In this work, we take advantage of 11 different sunspot group, sunspot, and active region databases to characterize the area and flux distributions of photospheric magnetic structures. We find that, when taken separately, different databases are better fitted by different distributions (as has been reported previously in the literature). However, we find that all our databases can be reconciled by the simple application of a proportionality constant, and that, in reality, different databases are sampling different parts of a composite distribution. This composite distribution is made up by linear combination of Weibull and log-normal distributions—where a pure Weibull (log-normal) characterizesmore » the distribution of structures with fluxes below (above) 10{sup 21}Mx (10{sup 22}Mx). Additionally, we demonstrate that the Weibull distribution shows the expected linear behavior of a power-law distribution (when extended to smaller fluxes), making our results compatible with the results of Parnell et al. We propose that this is evidence of two separate mechanisms giving rise to visible structures on the photosphere: one directly connected to the global component of the dynamo (and the generation of bipolar active regions), and the other with the small-scale component of the dynamo (and the fragmentation of magnetic structures due to their interaction with turbulent convection)« less
Computer-Assisted Classification Patterns in Autoimmune Diagnostics: The AIDA Project
Benammar Elgaaied, Amel; Cascio, Donato; Bruno, Salvatore; Ciaccio, Maria Cristina; Cipolla, Marco; Fauci, Alessandro; Morgante, Rossella; Taormina, Vincenzo; Gorgi, Yousr; Marrakchi Triki, Raja; Ben Ahmed, Melika; Louzir, Hechmi; Yalaoui, Sadok; Imene, Sfar; Issaoui, Yassine; Abidi, Ahmed; Ammar, Myriam; Bedhiafi, Walid; Ben Fraj, Oussama; Bouhaha, Rym; Hamdi, Khouloud; Soumaya, Koudhi; Neili, Bilel; Asma, Gati; Lucchese, Mariano; Catanzaro, Maria; Barbara, Vincenza; Brusca, Ignazio; Fregapane, Maria; Amato, Gaetano; Friscia, Giuseppe; Neila, Trai; Turkia, Souayeh; Youssra, Haouami; Rekik, Raja; Bouokez, Hayet; Vasile Simone, Maria; Fauci, Francesco; Raso, Giuseppe
2016-01-01
Antinuclear antibodies (ANAs) are significant biomarkers in the diagnosis of autoimmune diseases in humans, done by mean of Indirect ImmunoFluorescence (IIF) method, and performed by analyzing patterns and fluorescence intensity. This paper introduces the AIDA Project (autoimmunity: diagnosis assisted by computer) developed in the framework of an Italy-Tunisia cross-border cooperation and its preliminary results. A database of interpreted IIF images is being collected through the exchange of images and double reporting and a Gold Standard database, containing around 1000 double reported images, has been settled. The Gold Standard database is used for optimization of a CAD (Computer Aided Detection) solution and for the assessment of its added value, in order to be applied along with an Immunologist as a second Reader in detection of autoantibodies. This CAD system is able to identify on IIF images the fluorescence intensity and the fluorescence pattern. Preliminary results show that CAD, used as second Reader, appeared to perform better than Junior Immunologists and hence may significantly improve their efficacy; compared with two Junior Immunologists, the CAD system showed higher Intensity Accuracy (85,5% versus 66,0% and 66,0%), higher Patterns Accuracy (79,3% versus 48,0% and 66,2%), and higher Mean Class Accuracy (79,4% versus 56,7% and 64.2%). PMID:27042658
[Relational database for urinary stone ambulatory consultation. Assessment of initial outcomes].
Sáenz Medina, J; Páez Borda, A; Crespo Martinez, L; Gómez Dos Santos, V; Barrado, C; Durán Poveda, M
2010-05-01
To create a relational database for monitoring lithiasic patients. We describe the architectural details and the initial results of the statistical analysis. Microsoft Access 2002 was used as template. Four different tables were constructed to gather demographic data (table 1), clinical and laboratory findings (table 2), stone features (table 3) and therapeutic approach (table 4). For a reliability analysis of the database the number of correctly stored data was gathered. To evaluate the performance of the database, a prospective analysis was conducted, from May 2004 to August 2009, on 171 stone free patients after treatment (EWSL, surgery or medical) from a total of 511 patients stored in the database. Lithiasic status (stone free or stone relapse) was used as primary end point, while demographic factors (age, gender), lithiasic history, upper urinary tract alterations and characteristics of the stone (side, location, composition and size) were considered as predictive factors. An univariate analysis was conducted initially by chi square test and supplemented by Kaplan Meier estimates for time to stone recurrence. A multiple Cox proportional hazards regression model was generated to jointly assess the prognostic value of the demographic factors and the predictive value of stones characteristics. For the reliability analysis 22,084 data were available corresponding to 702 consultations on 511 patients. Analysis of data showed a recurrence rate of 85.4% (146/171, median time to recurrence 608 days, range 70-1758). In the univariate and multivariate analysis, none of the factors under consideration had a significant effect on recurrence rate (p=ns). The relational database is useful for monitoring patients with urolithiasis. It allows easy control and update, as well as data storage for later use. The analysis conducted for its evaluation showed no influence of demographic factors and stone features on stone recurrence.
Uddin, Md Jamal; Groenwold, Rolf H H; de Boer, Anthonius; Gardarsdottir, Helga; Martin, Elisa; Candore, Gianmario; Belitser, Svetlana V; Hoes, Arno W; Roes, Kit C B; Klungel, Olaf H
2016-03-01
Instrumental variable (IV) analysis can control for unmeasured confounding, yet it has not been widely used in pharmacoepidemiology. We aimed to assess the performance of IV analysis using different IVs in multiple databases in a study of antidepressant use and hip fracture. Information on adults with at least one prescription of a selective serotonin reuptake inhibitor (SSRI) or tricyclic antidepressant (TCA) during 2001-2009 was extracted from the THIN (UK), BIFAP (Spain), and Mondriaan (Netherlands) databases. IVs were created using the proportion of SSRI prescriptions per practice or using the one, five, or ten previous prescriptions by a physician. Data were analysed using conventional Cox regression and two-stage IV models. In the conventional analysis, SSRI (vs. TCA) was associated with an increased risk of hip fracture, which was consistently found across databases: the adjusted hazard ratio (HR) was approximately 1.35 for time-fixed and 1.50 to 2.49 for time-varying SSRI use, while the IV analysis based on the IVs that appeared to satisfy the IV assumptions showed conflicting results, e.g. the adjusted HRs ranged from 0.55 to 2.75 for time-fixed exposure. IVs for time-varying exposure violated at least one IV assumption and were therefore invalid. This multiple database study shows that the performance of IV analysis varied across the databases for time-fixed and time-varying exposures and strongly depends on the definition of IVs. It remains challenging to obtain valid IVs in pharmacoepidemiological studies, particularly for time-varying exposure, and IV analysis should therefore be interpreted cautiously. Copyright © 2016 John Wiley & Sons, Ltd.
Database interfaces on NASA's heterogeneous distributed database system
NASA Technical Reports Server (NTRS)
Huang, Shou-Hsuan Stephen
1987-01-01
The purpose of Distributed Access View Integrated Database (DAVID) interface module (Module 9: Resident Primitive Processing Package) is to provide data transfer between local DAVID systems and resident Data Base Management Systems (DBMSs). The result of current research is summarized. A detailed description of the interface module is provided. Several Pascal templates were constructed. The Resident Processor program was also developed. Even though it is designed for the Pascal templates, it can be modified for templates in other languages, such as C, without much difficulty. The Resident Processor itself can be written in any programming language. Since Module 5 routines are not ready yet, there is no way to test the interface module. However, simulation shows that the data base access programs produced by the Resident Processor do work according to the specifications.
Searching mixed DNA profiles directly against profile databases.
Bright, Jo-Anne; Taylor, Duncan; Curran, James; Buckleton, John
2014-03-01
DNA databases have revolutionised forensic science. They are a powerful investigative tool as they have the potential to identify persons of interest in criminal investigations. Routinely, a DNA profile generated from a crime sample could only be searched for in a database of individuals if the stain was from single contributor (single source) or if a contributor could unambiguously be determined from a mixed DNA profile. This meant that a significant number of samples were unsuitable for database searching. The advent of continuous methods for the interpretation of DNA profiles offers an advanced way to draw inferential power from the considerable investment made in DNA databases. Using these methods, each profile on the database may be considered a possible contributor to a mixture and a likelihood ratio (LR) can be formed. Those profiles which produce a sufficiently large LR can serve as an investigative lead. In this paper empirical studies are described to determine what constitutes a large LR. We investigate the effect on a database search of complex mixed DNA profiles with contributors in equal proportions with dropout as a consideration, and also the effect of an incorrect assignment of the number of contributors to a profile. In addition, we give, as a demonstration of the method, the results using two crime samples that were previously unsuitable for database comparison. We show that effective management of the selection of samples for searching and the interpretation of the output can be highly informative. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Teaching Advanced SQL Skills: Text Bulk Loading
ERIC Educational Resources Information Center
Olsen, David; Hauser, Karina
2007-01-01
Studies show that advanced database skills are important for students to be prepared for today's highly competitive job market. A common task for database administrators is to insert a large amount of data into a database. This paper illustrates how an up-to-date, advanced database topic, namely bulk insert, can be incorporated into a database…
The Space Systems Environmental Test Facility Database (SSETFD), Website Development Status
NASA Technical Reports Server (NTRS)
Snyder, James M.
2008-01-01
The Aerospace Corporation has been developing a database of U.S. environmental test laboratory capabilities utilized by the space systems hardware development community. To date, 19 sites have been visited by The Aerospace Corporation and verbal agreements reached to include their capability descriptions in the database. A website is being developed to make this database accessible by all interested government, civil, university and industry personnel. The website will be accessible by all interested in learning more about the extensive collective capability that the US based space industry has to offer. The Environments, Test & Assessment Department within The Aerospace Corporation will be responsible for overall coordination and maintenance of the database. Several US government agencies are interested in utilizing this database to assist in the source selection process for future spacecraft programs. This paper introduces the website by providing an overview of its development, location and search capabilities. It will show how the aerospace community can apply this new tool as a way to increase the utilization of existing lab facilities, and as a starting point for capital expenditure/upgrade trade studies. The long term result is expected to be increased utilization of existing laboratory capability and reduced overall development cost of space systems hardware. Finally, the paper will present the process for adding new participants, and how the database will be maintained.
Maintaining Multimedia Data in a Geospatial Database
2012-09-01
at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database produced result sets from zero to 100,000, it was...excelled given multiple conditions. A different look at PostgreSQL and MySQL as spatial databases was offered. Given their results, as each database... MySQL ................................................................................................14 B. BENCHMARKING DATA RETRIEVED FROM TABLE
On patterns and re-use in bioinformatics databases.
Bell, Michael J; Lord, Phillip
2017-09-01
As the quantity of data being depositing into biological databases continues to increase, it becomes ever more vital to develop methods that enable us to understand this data and ensure that the knowledge is correct. It is widely-held that data percolates between different databases, which causes particular concerns for data correctness; if this percolation occurs, incorrect data in one database may eventually affect many others while, conversely, corrections in one database may fail to percolate to others. In this paper, we test this widely-held belief by directly looking for sentence reuse both within and between databases. Further, we investigate patterns of how sentences are reused over time. Finally, we consider the limitations of this form of analysis and the implications that this may have for bioinformatics database design. We show that reuse of annotation is common within many different databases, and that also there is a detectable level of reuse between databases. In addition, we show that there are patterns of reuse that have previously been shown to be associated with percolation errors. Analytical software is available on request. phillip.lord@newcastle.ac.uk. © The Author(s) 2017. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Cui, Chen; Asari, Vijayan K.
2014-03-01
Biometric features such as fingerprints, iris patterns, and face features help to identify people and restrict access to secure areas by performing advanced pattern analysis and matching. Face recognition is one of the most promising biometric methodologies for human identification in a non-cooperative security environment. However, the recognition results obtained by face recognition systems are a affected by several variations that may happen to the patterns in an unrestricted environment. As a result, several algorithms have been developed for extracting different facial features for face recognition. Due to the various possible challenges of data captured at different lighting conditions, viewing angles, facial expressions, and partial occlusions in natural environmental conditions, automatic facial recognition still remains as a difficult issue that needs to be resolved. In this paper, we propose a novel approach to tackling some of these issues by analyzing the local textural descriptions for facial feature representation. The textural information is extracted by an enhanced local binary pattern (ELBP) description of all the local regions of the face. The relationship of each pixel with respect to its neighborhood is extracted and employed to calculate the new representation. ELBP reconstructs a much better textural feature extraction vector from an original gray level image in different lighting conditions. The dimensionality of the texture image is reduced by principal component analysis performed on each local face region. Each low dimensional vector representing a local region is now weighted based on the significance of the sub-region. The weight of each sub-region is determined by employing the local variance estimate of the respective region, which represents the significance of the region. The final facial textural feature vector is obtained by concatenating the reduced dimensional weight sets of all the modules (sub-regions) of the face image. Experiments conducted on various popular face databases show promising performance of the proposed algorithm in varying lighting, expression, and partial occlusion conditions. Four databases were used for testing the performance of the proposed system: Yale Face database, Extended Yale Face database B, Japanese Female Facial Expression database, and CMU AMP Facial Expression database. The experimental results in all four databases show the effectiveness of the proposed system. Also, the computation cost is lower because of the simplified calculation steps. Research work is progressing to investigate the effectiveness of the proposed face recognition method on pose-varying conditions as well. It is envisaged that a multilane approach of trained frameworks at different pose bins and an appropriate voting strategy would lead to a good recognition rate in such situation.
IUEAGN: A database of ultraviolet spectra of active galactic nuclei
NASA Technical Reports Server (NTRS)
Pike, G.; Edelson, R.; Shull, J. M.; Saken, J.
1993-01-01
In 13 years of operation, IUE has gathered approximately 5000 spectra of almost 600 Active Galactic Nuclei (AGN). In order to undertake AGN studies which require large amounts of data, we are consistently reducing this entire archive and creating a homogeneous, easy-to-use database. First, the spectra are extracted using the Optimal extraction algorithm. Continuum fluxes are then measured across predefined bands, and line fluxes are measured with a multi-component fit. These results, along with source information such as redshifts and positions, are placed in the IUEAGN relational database. Analysis algorithms, statistical tests, and plotting packages run within the structure, and this flexible database can accommodate future data when they are released. This archival approach has already been used to survey line and continuum variability in six bright Seyfert 1s and rapid continuum variability in 14 blazars. Among the results that could only be obtained using a large archival study is evidence that blazars show a positive correlation between degree of variability and apparent luminosity, while Seyfert 1s show an anti-correlation. This suggests that beaming dominates the ultraviolet properties for blazars, while thermal emission from an accretion disk dominates for Seyfert 1s. Our future plans include a survey of line ratios in Seyfert 1s, to be fitted with photoionization models to test the models and determine the range of temperatures, densities and ionization parameters. We will also include data from IRAS, Einstein, EXOSAT, and ground-based telescopes to measure multi-wavelength correlations and broadband spectral energy distributions.
Fangerau, H
2004-01-01
Objectives: In this study the author aimed to provide information for researchers to help them with the selection of suitable databases for finding medical ethics literature. The quantity of medical ethical literature that is indexed in different existing electronic bibliographies was ascertained. Method: Using the international journal index Ulrich's Periodicals Directory, journals on medical ethics were identified. The electronic bibliographies indexing these journals were analysed. In an additional analysis documentalists indexing bioethical literature were asked to name European journals on medical ethics. The bibliographies indexing these journals were examined. Results: Of 290 journals on medical ethics 173 were indexed in at least one bibliography. Current Contents showed the highest coverage with 66 (22.8%) journals indexed followed by MEDLINE (22.1%). By a combined search in the top ten bibliographies with the highest coverage, a maximum coverage of 45.2% of all journals could be reached. All the bibliographies showed a tendency to index more North American than European literature. This result was verified by the supplementary analysis of a sample of continental European journals. Here EMBASE covered the highest number of journals (20.6%) followed by the Russian Academy of Sciences Bibliographies (19.2%). Conclusion: A medical ethics literature search has to be carried out in several databases in order to reach an adequate collection of literature. The databases one wishes to combine should be carefully chosen. There seems to be a regional bias in the most popular databases, favouring North American periodicals compared with European literature on medical ethics. PMID:15173367
Application of kernel functions for accurate similarity search in large chemical databases.
Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H
2010-04-29
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
Bohl, Daniel D; Russo, Glenn S; Basques, Bryce A; Golinvaux, Nicholas S; Fu, Michael C; Long, William D; Grauer, Jonathan N
2014-12-03
There has been an increasing use of national databases to conduct orthopaedic research. Questions regarding the validity and consistency of these studies have not been fully addressed. The purpose of this study was to test for similarity in reported measures between two national databases commonly used for orthopaedic research. A retrospective cohort study of patients undergoing lumbar spinal fusion procedures during 2009 to 2011 was performed in two national databases: the Nationwide Inpatient Sample and the National Surgical Quality Improvement Program. Demographic characteristics, comorbidities, and inpatient adverse events were directly compared between databases. The total numbers of patients included were 144,098 from the Nationwide Inpatient Sample and 8434 from the National Surgical Quality Improvement Program. There were only small differences in demographic characteristics between the two databases. There were large differences between databases in the rates at which specific comorbidities were documented. Non-morbid obesity was documented at rates of 9.33% in the Nationwide Inpatient Sample and 36.93% in the National Surgical Quality Improvement Program (relative risk, 0.25; p < 0.05). Peripheral vascular disease was documented at rates of 2.35% in the Nationwide Inpatient Sample and 0.60% in the National Surgical Quality Improvement Program (relative risk, 3.89; p < 0.05). Similarly, there were large differences between databases in the rates at which specific inpatient adverse events were documented. Sepsis was documented at rates of 0.38% in the Nationwide Inpatient Sample and 0.81% in the National Surgical Quality Improvement Program (relative risk, 0.47; p < 0.05). Acute kidney injury was documented at rates of 1.79% in the Nationwide Inpatient Sample and 0.21% in the National Surgical Quality Improvement Program (relative risk, 8.54; p < 0.05). As database studies become more prevalent in orthopaedic surgery, authors, reviewers, and readers should view these studies with caution. This study shows that two commonly used databases can identify demographically similar patients undergoing a common orthopaedic procedure; however, the databases document markedly different rates of comorbidities and inpatient adverse events. The differences are likely the result of the very different mechanisms through which the databases collect their comorbidity and adverse event data. Findings highlight concerns regarding the validity of orthopaedic database research. Copyright © 2014 by The Journal of Bone and Joint Surgery, Incorporated.
Initial Flight Test Evaluation of the F-15 ACTIVE Axisymmetric Vectoring Nozzle Performance
NASA Technical Reports Server (NTRS)
Orme, John S.; Hathaway, Ross; Ferguson, Michael D.
1998-01-01
A full envelope database of a thrust-vectoring axisymmetric nozzle performance for the Pratt & Whitney Pitch/Yaw Balance Beam Nozzle (P/YBBN) is being developed using the F-15 Advanced Control Technology for Integrated Vehicles (ACTIVE) aircraft. At this time, flight research has been completed for steady-state pitch vector angles up to 20' at an altitude of 30,000 ft from low power settings to maximum afterburner power. The nozzle performance database includes vector forces, internal nozzle pressures, and temperatures all of which can be used for regression analysis modeling. The database was used to substantiate a set of nozzle performance data from wind tunnel testing and computational fluid dynamic analyses. Findings from initial flight research at Mach 0.9 and 1.2 are presented in this paper. The results show that vector efficiency is strongly influenced by power setting. A significant discrepancy in nozzle performance has been discovered between predicted and measured results during vectoring.
EMR Database Upgrade from MUMPS to CACHE: Lessons Learned.
Alotaibi, Abduallah; Emshary, Mshary; Househ, Mowafa
2014-01-01
Over the past few years, Saudi hospitals have been implementing and upgrading Electronic Medical Record Systems (EMRs) to ensure secure data transfer and exchange between EMRs.This paper focuses on the process and lessons learned in upgrading the MUMPS database to a the newer Caché database to ensure the integrity of electronic data transfer within a local Saudi hospital. This paper examines the steps taken by the departments concerned, their action plans and how the change process was managed. Results show that user satisfaction was achieved after the upgrade was completed. The system was stable and offered better healthcare quality to patients as a result of the data exchange. Hardware infrastructure upgrades improved scalability and software upgrades to Caché improved stability. The overall performance was enhanced and new functions were added (CPOE) during the upgrades. The essons learned were: 1) Involve higher management; 2) Research multiple solutions available in the market; 3) Plan for a variety of implementation scenarios.
Crowdsourcing-Assisted Radio Environment Database for V2V Communication.
Katagiri, Keita; Sato, Koya; Fujii, Takeo
2018-04-12
In order to realize reliable Vehicle-to-Vehicle (V2V) communication systems for autonomous driving, the recognition of radio propagation becomes an important technology. However, in the current wireless distributed network systems, it is difficult to accurately estimate the radio propagation characteristics because of the locality of the radio propagation caused by surrounding buildings and geographical features. In this paper, we propose a measurement-based radio environment database for improving the accuracy of the radio environment estimation in the V2V communication systems. The database first gathers measurement datasets of the received signal strength indicator (RSSI) related to the transmission/reception locations from V2V systems. By using the datasets, the average received power maps linked with transmitter and receiver locations are generated. We have performed measurement campaigns of V2V communications in the real environment to observe RSSI for the database construction. Our results show that the proposed method has higher accuracy of the radio propagation estimation than the conventional path loss model-based estimation.
Providing R-Tree Support for Mongodb
NASA Astrophysics Data System (ADS)
Xiang, Longgang; Shao, Xiaotian; Wang, Dehao
2016-06-01
Supporting large amounts of spatial data is a significant characteristic of modern databases. However, unlike some mature relational databases, such as Oracle and PostgreSQL, most of current burgeoning NoSQL databases are not well designed for storing geospatial data, which is becoming increasingly important in various fields. In this paper, we propose a novel method to provide R-tree index, as well as corresponding spatial range query and nearest neighbour query functions, for MongoDB, one of the most prevalent NoSQL databases. First, after in-depth analysis of MongoDB's features, we devise an efficient tabular document structure which flattens R-tree index into MongoDB collections. Further, relevant mechanisms of R-tree operations are issued, and then we discuss in detail how to integrate R-tree into MongoDB. Finally, we present the experimental results which show that our proposed method out-performs the built-in spatial index of MongoDB. Our research will greatly facilitate big data management issues with MongoDB in a variety of geospatial information applications.
Crowdsourcing-Assisted Radio Environment Database for V2V Communication †
Katagiri, Keita; Fujii, Takeo
2018-01-01
In order to realize reliable Vehicle-to-Vehicle (V2V) communication systems for autonomous driving, the recognition of radio propagation becomes an important technology. However, in the current wireless distributed network systems, it is difficult to accurately estimate the radio propagation characteristics because of the locality of the radio propagation caused by surrounding buildings and geographical features. In this paper, we propose a measurement-based radio environment database for improving the accuracy of the radio environment estimation in the V2V communication systems. The database first gathers measurement datasets of the received signal strength indicator (RSSI) related to the transmission/reception locations from V2V systems. By using the datasets, the average received power maps linked with transmitter and receiver locations are generated. We have performed measurement campaigns of V2V communications in the real environment to observe RSSI for the database construction. Our results show that the proposed method has higher accuracy of the radio propagation estimation than the conventional path loss model-based estimation. PMID:29649174
NASA Astrophysics Data System (ADS)
Cinzia Marra, Anna; Casella, Daniele; Martins Costa do Amaral, Lia; Sanò, Paolo; Dietrich, Stefano; Panegrossi, Giulia
2017-04-01
Two new precipitation retrieval algorithms for the Advanced Microwave Scanning Radiometer 2 (AMSR2) and for the GPM Microwave Imager (GMI) are presented. The algorithms are based on the Cloud Dynamics and Radiation Database (CDRD) Bayesian approach and represent an evolution of the previous version applied to Special Sensor Microwave Imager/Sounder (SSMIS) observations, and used operationally within the EUMETSAT Satellite Application Facility on support to Operational Hydrology and Water Management (H-SAF). These new products present as main innovation the use of an extended database entirely empirical, derived from coincident radar and radiometer observations from the NASA/JAXA Global Precipitation Measurement Core Observatory (GPM-CO) (Dual-frequency Precipitation Radar-DPR and GMI). The other new aspects are: 1) a new rain-no-rain screening approach; 2) the use of Empirical Orthogonal Functions (EOF) and Canonical Correlation Analysis (CCA) both in the screening approach, and in the Bayesian algorithm; 2) the use of new meteorological and environmental ancillary variables to categorize the database and mitigate the problem of non-uniqueness of the retrieval solution; 3) the development and implementations of specific modules for computational time minimization. The CDRD algorithms for AMSR2 and GMI are able to handle an extremely large observational database available from GPM-CO and provide the rainfall estimate with minimum latency, making them suitable for near-real time hydrological and operational applications. As far as CDRD for AMSR2, a verification study over Italy using ground-based radar data and over the MSG full disk area using coincident GPM-CO/AMSR2 observations has been carried out. Results show remarkable AMSR2 capabilities for rainfall rate (RR) retrieval over ocean (for RR > 0.25 mm/h), good capabilities over vegetated land (for RR > 1 mm/h), while for coastal areas the results are less certain. Comparisons with NASA GPM products, and with ground-based radar data, show that CDRD for AMSR2 is able to depict very well the areas of high precipitation over all surface types. Similarly, preliminary results of the application of CDRD for GMI are also shown and discussed, highlighting the advantage of the availability of high frequency channels (> 90 GHz) for precipitation retrieval over land and coastal areas.
Kong, Xiangxing; Li, Jun; Cai, Yibo; Tian, Yu; Chi, Shengqiang; Tong, Danyang; Hu, Yeting; Yang, Qi; Li, Jingsong; Poston, Graeme; Yuan, Ying; Ding, Kefeng
2018-01-08
To revise the American Joint Committee on Cancer TNM staging system for colorectal cancer (CRC) based on a nomogram analysis of Surveillance, Epidemiology, and End Results (SEER) database, and to prove the rationality of enhancing T stage's weighting in our previously proposed T-plus staging system. Total 115,377 non-metastatic CRC patients from SEER were randomly grouped as training and testing set by ratio 1:1. The Nomo-staging system was established via three nomograms based on 1-year, 2-year and 3-year disease specific survival (DSS) Logistic regression analysis of the training set. The predictive value of Nomo-staging system for the testing set was evaluated by concordance index (c-index), likelihood ratio (L.R.) and Akaike information criteria (AIC) for 1-year, 2-year, 3-year overall survival (OS) and DSS. Kaplan-Meier survival curve was used to valuate discrimination and gradient monotonicity. And an external validation was performed on database from the Second Affiliated Hospital of Zhejiang University (SAHZU). Patients with T1-2 N1 and T1N2a were classified into stage II while T4 N0 patients were classified into stage III in Nomo-staging system. Kaplan-Meier survival curves of OS and DSS in testing set showed Nomo-staging system performed better in discrimination and gradient monotonicity, and the external validation in SAHZU database also showed distinctly better discrimination. The Nomo-staging system showed higher value in L.R. and c-index, and lower value in AIC when predicting OS and DSS in testing set. The Nomo-staging system showed better performance in prognosis prediction and the weight of lymph nodes status in prognosis prediction should be cautiously reconsidered.
Corruption of genomic databases with anomalous sequence.
Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L
1992-06-11
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II.
Lu, Zhiyong; Hirschman, Lynette
2012-01-01
Manual curation of data from the biomedical literature is a rate-limiting factor for many expert curated databases. Despite the continuing advances in biomedical text mining and the pressing needs of biocurators for better tools, few existing text-mining tools have been successfully integrated into production literature curation systems such as those used by the expert curated databases. To close this gap and better understand all aspects of literature curation, we invited submissions of written descriptions of curation workflows from expert curated databases for the BioCreative 2012 Workshop Track II. We received seven qualified contributions, primarily from model organism databases. Based on these descriptions, we identified commonalities and differences across the workflows, the common ontologies and controlled vocabularies used and the current and desired uses of text mining for biocuration. Compared to a survey done in 2009, our 2012 results show that many more databases are now using text mining in parts of their curation workflows. In addition, the workshop participants identified text-mining aids for finding gene names and symbols (gene indexing), prioritization of documents for curation (document triage) and ontology concept assignment as those most desired by the biocurators. DATABASE URL: http://www.biocreative.org/tasks/bc-workshop-2012/workflow/.
NASA Astrophysics Data System (ADS)
Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung
2009-04-01
For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
Eronen, Lauri; Toivonen, Hannu
2012-06-06
Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Database on Demand: insight how to build your own DBaaS
NASA Astrophysics Data System (ADS)
Gaspar Aparicio, Ruben; Coterillo Coz, Ignacio
2015-12-01
At CERN, a number of key database applications are running on user-managed MySQL, PostgreSQL and Oracle database services. The Database on Demand (DBoD) project was born out of an idea to provide CERN user community with an environment to develop and run database services as a complement to the central Oracle based database service. The Database on Demand empowers the user to perform certain actions that had been traditionally done by database administrators, providing an enterprise platform for database applications. It also allows the CERN user community to run different database engines, e.g. presently three major RDBMS (relational database management system) vendors are offered. In this article we show the actual status of the service after almost three years of operations, some insight of our new redesign software engineering and near future evolution.
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete.
Pour, Sadaf Moallemi; Alam, M Shahria; Milani, Abbas S
2016-08-30
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models.
Estimating Missing Features to Improve Multimedia Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bagherjeiran, A; Love, N S; Kamath, C
Retrieval in a multimedia database usually involves combining information from different modalities of data, such as text and images. However, all modalities of the data may not be available to form the query. The retrieval results from such a partial query are often less than satisfactory. In this paper, we present an approach to complete a partial query by estimating the missing features in the query. Our experiments with a database of images and their associated captions show that, with an initial text-only query, our completion method has similar performance to a full query with both image and text features.more » In addition, when we use relevance feedback, our approach outperforms the results obtained using a full query.« less
2005-01-01
Précis The rapid implementation and continuing expansion of forensic DNA databases around the world has been supported by claims about their effectiveness in criminal investigations and challenged by assertions of the resulting intrusiveness into individual privacy. These two competing perspectives provide the basis for ongoing considerations about the categories of persons who should be subject to nonconsensual DNA sampling and profile retention as well as the uses to which such profiles should be put. This paper uses the example of the current arrangements for forensic DNA databasing in England & Wales to discuss the ways in which the legislative and operational basis for police DNA databasing is reliant upon continuous deliberations over these and other matters by a range of key stakeholders. We also assess the effects of the recent innovative use of DNA databasing for ‘familial searching’ in this jurisdiction in order to show how agreed understandings about the appropriate uses of DNA can become unsettled and reformulated even where their investigative effectiveness is uncontested. We conclude by making some observations about the future of what is recognised to be the largest forensic DNA database in the world. PMID:16240734
Neural Network Modeling of UH-60A Pilot Vibration
NASA Technical Reports Server (NTRS)
Kottapalli, Sesi
2003-01-01
Full-scale flight-test pilot floor vibration is modeled using neural networks and full-scale wind tunnel test data for low speed level flight conditions. Neural network connections between the wind tunnel test data and the tlxee flight test pilot vibration components (vertical, lateral, and longitudinal) are studied. Two full-scale UH-60A Black Hawk databases are used. The first database is the NASMArmy UH-60A Airloads Program flight test database. The second database is the UH-60A rotor-only wind tunnel database that was acquired in the NASA Ames SO- by 120- Foot Wind Tunnel with the Large Rotor Test Apparatus (LRTA). Using neural networks, the flight-test pilot vibration is modeled using the wind tunnel rotating system hub accelerations, and separately, using the hub loads. The results show that the wind tunnel rotating system hub accelerations and the operating parameters can represent the flight test pilot vibration. The six components of the wind tunnel N/rev balance-system hub loads and the operating parameters can also represent the flight test pilot vibration. The present neural network connections can significandy increase the value of wind tunnel testing.
RECOVIR Software for Identifying Viruses
NASA Technical Reports Server (NTRS)
Chakravarty, Sugoto; Fox, George E.; Zhu, Dianhui
2013-01-01
Most single-stranded RNA (ssRNA) viruses mutate rapidly to generate a large number of strains with highly divergent capsid sequences. Determining the capsid residues or nucleotides that uniquely characterize these strains is critical in understanding the strain diversity of these viruses. RECOVIR (an acronym for "recognize viruses") software predicts the strains of some ssRNA viruses from their limited sequence data. Novel phylogenetic-tree-based databases of protein or nucleic acid residues that uniquely characterize these virus strains are created. Strains of input virus sequences (partial or complete) are predicted through residue-wise comparisons with the databases. RECOVIR uses unique characterizing residues to identify automatically strains of partial or complete capsid sequences of picorna and caliciviruses, two of the most highly diverse ssRNA virus families. Partition-wise comparisons of the database residues with the corresponding residues of more than 300 complete and partial sequences of these viruses resulted in correct strain identification for all of these sequences. This study shows the feasibility of creating databases of hitherto unknown residues uniquely characterizing the capsid sequences of two of the most highly divergent ssRNA virus families. These databases enable automated strain identification from partial or complete capsid sequences of these human and animal pathogens.
Image-based query-by-example for big databases of galaxy images
NASA Astrophysics Data System (ADS)
Shamir, Lior; Kuminski, Evan
2017-01-01
Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.
Development of Vision Based Multiview Gait Recognition System with MMUGait Database
Ng, Hu; Tan, Wooi-Haw; Tong, Hau-Lee
2014-01-01
This paper describes the acquisition setup and development of a new gait database, MMUGait. This database consists of 82 subjects walking under normal condition and 19 subjects walking with 11 covariate factors, which were captured under two views. This paper also proposes a multiview model-based gait recognition system with joint detection approach that performs well under different walking trajectories and covariate factors, which include self-occluded or external occluded silhouettes. In the proposed system, the process begins by enhancing the human silhouette to remove the artifacts. Next, the width and height of the body are obtained. Subsequently, the joint angular trajectories are determined once the body joints are automatically detected. Lastly, crotch height and step-size of the walking subject are determined. The extracted features are smoothened by Gaussian filter to eliminate the effect of outliers. The extracted features are normalized with linear scaling, which is followed by feature selection prior to the classification process. The classification experiments carried out on MMUGait database were benchmarked against the SOTON Small DB from University of Southampton. Results showed correct classification rate above 90% for all the databases. The proposed approach is found to outperform other approaches on SOTON Small DB in most cases. PMID:25143972
The carbon cycle and hurricanes in the United States between 1900 and 2011.
Dahal, Devendra; Liu, Shuguang; Oeding, Jennifer
2014-06-06
Hurricanes cause severe impacts on forest ecosystems in the United States. These events can substantially alter the carbon biogeochemical cycle at local to regional scales. We selected all tropical storms and more severe events that made U.S. landfall between 1900 and 2011 and used hurricane best track database, a meteorological model (HURRECON), National Land Cover Database (NLCD), U. S. Department of Agirculture Forest Service biomass dataset, and pre- and post-MODIS data to quantify individual event and annual biomass mortality. Our estimates show an average of 18.2 TgC/yr of live biomass mortality for 1900-2011 in the US with strong spatial and inter-annual variability. Results show Hurricane Camille in 1969 caused the highest aboveground biomass mortality with 59.5 TgC. Similarly 1954 had the highest annual mortality with 68.4 TgC attributed to landfalling hurricanes. The results presented are deemed useful to further investigate historical events, and the methods outlined are potentially beneficial to quantify biomass loss in future events.
A Full Snow Season in Yellowstone: A Database of Restored Aqua Band 6
NASA Technical Reports Server (NTRS)
Gladkova, Irina; Grossberg, Michael; Bonev, George; Romanov, Peter; Riggs, George; Hall, Dorothy
2013-01-01
The algorithms for estimating snow extent for the Moderate Resolution Imaging Spectroradiometer (MODIS) optimally use the 1.6- m channel which is unavailable for MODIS on Aqua due to detector damage. As a test bed to demonstrate that Aqua band 6 can be restored, we chose the area surrounding Yellowstone and Grand Teton national parks. In such rugged and difficult-to-access terrain, satellite images are particularly important for providing an estimation of snow-cover extent. For the full 2010-2011 snow season covering the Yellowstone region, we have used quantitative image restoration to create a database of restored Aqua band 6. The database includes restored radiances, normalized vegetation index, normalized snow index, thermal data, and band-6-based snow-map products. The restored Aqua-band-6 data have also been regridded and combined with Terra data to produce a snow-cover map that utilizes both Terra and Aqua snow maps. Using this database, we show that the restored Aqua-band-6-based snow-cover extent has a comparable performance with respect to ground stations to the one based on Terra. The result of a restored band 6 from Aqua is that we have an additional band-6 image of the Yellowstone region each day. This image can be used to mitigate cloud occlusion, using the same algorithms used for band 6 on Terra. We show an application of this database of restored band-6 images to illustrate the value of creating a cloud gap filling using the National Aeronautics and Space Administration s operational cloud masks and data from both Aqua and Terra.
Palmprint Recognition Across Different Devices.
Jia, Wei; Hu, Rong-Xiang; Gui, Jie; Zhao, Yang; Ren, Xiao-Ming
2012-01-01
In this paper, the problem of Palmprint Recognition Across Different Devices (PRADD) is investigated, which has not been well studied so far. Since there is no publicly available PRADD image database, we created a non-contact PRADD image database containing 12,000 grayscale captured from 100 subjects using three devices, i.e., one digital camera and two smart-phones. Due to the non-contact image acquisition used, rotation and scale changes between different images captured from a same palm are inevitable. We propose a robust method to calculate the palm width, which can be effectively used for scale normalization of palmprints. On this PRADD image database, we evaluate the recognition performance of three different methods, i.e., subspace learning method, correlation method, and orientation coding based method, respectively. Experiments results show that orientation coding based methods achieved promising recognition performance for PRADD.
Palmprint Recognition across Different Devices
Jia, Wei; Hu, Rong-Xiang; Gui, Jie; Zhao, Yang; Ren, Xiao-Ming
2012-01-01
In this paper, the problem of Palmprint Recognition Across Different Devices (PRADD) is investigated, which has not been well studied so far. Since there is no publicly available PRADD image database, we created a non-contact PRADD image database containing 12,000 grayscale captured from 100 subjects using three devices, i.e., one digital camera and two smart-phones. Due to the non-contact image acquisition used, rotation and scale changes between different images captured from a same palm are inevitable. We propose a robust method to calculate the palm width, which can be effectively used for scale normalization of palmprints. On this PRADD image database, we evaluate the recognition performance of three different methods, i.e., subspace learning method, correlation method, and orientation coding based method, respectively. Experiments results show that orientation coding based methods achieved promising recognition performance for PRADD. PMID:22969380
BioBenchmark Toyama 2012: an evaluation of the performance of triple stores on biological data.
Wu, Hongyan; Fujiwara, Toyofumi; Yamamoto, Yasunori; Bolleman, Jerven; Yamaguchi, Atsuko
2014-01-01
Biological databases vary enormously in size and data complexity, from small databases that contain a few million Resource Description Framework (RDF) triples to large databases that contain billions of triples. In this paper, we evaluate whether RDF native stores can be used to meet the needs of a biological database provider. Prior evaluations have used synthetic data with a limited database size. For example, the largest BSBM benchmark uses 1 billion synthetic e-commerce knowledge RDF triples on a single node. However, real world biological data differs from the simple synthetic data much. It is difficult to determine whether the synthetic e-commerce data is efficient enough to represent biological databases. Therefore, for this evaluation, we used five real data sets from biological databases. We evaluated five triple stores, 4store, Bigdata, Mulgara, Virtuoso, and OWLIM-SE, with five biological data sets, Cell Cycle Ontology, Allie, PDBj, UniProt, and DDBJ, ranging in size from approximately 10 million to 8 billion triples. For each database, we loaded all the data into our single node and prepared the database for use in a classical data warehouse scenario. Then, we ran a series of SPARQL queries against each endpoint and recorded the execution time and the accuracy of the query response. Our paper shows that with appropriate configuration Virtuoso and OWLIM-SE can satisfy the basic requirements to load and query biological data less than 8 billion or so on a single node, for the simultaneous access of 64 clients. OWLIM-SE performs best for databases with approximately 11 million triples; For data sets that contain 94 million and 590 million triples, OWLIM-SE and Virtuoso perform best. They do not show overwhelming advantage over each other; For data over 4 billion Virtuoso works best. 4store performs well on small data sets with limited features when the number of triples is less than 100 million, and our test shows its scalability is poor; Bigdata demonstrates average performance and is a good open source triple store for middle-sized (500 million or so) data set; Mulgara shows a little of fragility.
Harju, Inka; Lange, Christoph; Kostrzewa, Markus; Maier, Thomas; Rantakokko-Jalava, Kaisu; Haanperä, Marjo
2017-03-01
Reliable distinction of Streptococcus pneumoniae and viridans group streptococci is important because of the different pathogenic properties of these organisms. Differentiation between S. pneumoniae and closely related Sreptococcus mitis species group streptococci has always been challenging, even when using such modern methods as 16S rRNA gene sequencing or matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry. In this study, a novel algorithm combined with an enhanced database was evaluated for differentiation between S. pneumoniae and S. mitis species group streptococci. One hundred one clinical S. mitis species group streptococcal strains and 188 clinical S. pneumoniae strains were identified by both the standard MALDI Biotyper database alone and that combined with a novel algorithm. The database update from 4,613 strains to 5,627 strains drastically improved the differentiation of S. pneumoniae and S. mitis species group streptococci: when the new database version containing 5,627 strains was used, only one of the 101 S. mitis species group isolates was misidentified as S. pneumoniae , whereas 66 of them were misidentified as S. pneumoniae when the earlier 4,613-strain MALDI Biotyper database version was used. The updated MALDI Biotyper database combined with the novel algorithm showed even better performance, producing no misidentifications of the S. mitis species group strains as S. pneumoniae All S. pneumoniae strains were correctly identified as S. pneumoniae with both the standard MALDI Biotyper database and the standard MALDI Biotyper database combined with the novel algorithm. This new algorithm thus enables reliable differentiation between pneumococci and other S. mitis species group streptococci with the MALDI Biotyper. Copyright © 2017 American Society for Microbiology.
Detection of alternative splice variants at the proteome level in Aspergillus flavus.
Chang, Kung-Yen; Georgianna, D Ryan; Heber, Steffen; Payne, Gary A; Muddiman, David C
2010-03-05
Identification of proteins from proteolytic peptides or intact proteins plays an essential role in proteomics. Researchers use search engines to match the acquired peptide sequences to the target proteins. However, search engines depend on protein databases to provide candidates for consideration. Alternative splicing (AS), the mechanism where the exon of pre-mRNAs can be spliced and rearranged to generate distinct mRNA and therefore protein variants, enable higher eukaryotic organisms, with only a limited number of genes, to have the requisite complexity and diversity at the proteome level. Multiple alternative isoforms from one gene often share common segments of sequences. However, many protein databases only include a limited number of isoforms to keep minimal redundancy. As a result, the database search might not identify a target protein even with high quality tandem MS data and accurate intact precursor ion mass. We computationally predicted an exhaustive list of putative isoforms of Aspergillus flavus proteins from 20 371 expressed sequence tags to investigate whether an alternative splicing protein database can assign a greater proportion of mass spectrometry data. The newly constructed AS database provided 9807 new alternatively spliced variants in addition to 12 832 previously annotated proteins. The searches of the existing tandem MS spectra data set using the AS database identified 29 new proteins encoded by 26 genes. Nine fungal genes appeared to have multiple protein isoforms. In addition to the discovery of splice variants, AS database also showed potential to improve genome annotation. In summary, the introduction of an alternative splicing database helps identify more proteins and unveils more information about a proteome.
Identification of Bearing Failure Using Signal Vibrations
NASA Astrophysics Data System (ADS)
Yani, Irsyadi; Resti, Yulia; Burlian, Firmansyah
2018-04-01
Vibration analysis can be used to identify damage to mechanical systems such as journal bearings. Identification of failure can be done by observing the resulting vibration spectrum by measuring the vibration signal occurring in a mechanical system Bearing is one of the engine elements commonly used in mechanical systems. The main purpose of this research is to monitor the bearing condition and to identify bearing failure on a mechanical system by observing the resulting vibration. Data collection techniques based on recordings of sound caused by the vibration of the mechanical system were used in this study, then created a database system based bearing failure due to vibration signal recording sounds on a mechanical system The next step is to group the bearing damage by type based on the databases obtained. The results show the percentage of success in identifying bearing damage is 98 %.
Integration and management of massive remote-sensing data based on GeoSOT subdivision model
NASA Astrophysics Data System (ADS)
Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li
2016-07-01
Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.
Vullo, Carlos M; Romero, Magdalena; Catelli, Laura; Šakić, Mustafa; Saragoni, Victor G; Jimenez Pleguezuelos, María Jose; Romanini, Carola; Anjos Porto, Maria João; Puente Prieto, Jorge; Bofarull Castro, Alicia; Hernandez, Alexis; Farfán, María José; Prieto, Victoria; Alvarez, David; Penacino, Gustavo; Zabalza, Santiago; Hernández Bolaños, Alejandro; Miguel Manterola, Irati; Prieto, Lourdes; Parsons, Thomas
2016-03-01
The GHEP-ISFG Working Group has recognized the importance of assisting DNA laboratories to gain expertise in handling DVI or missing persons identification (MPI) projects which involve the need for large-scale genetic profile comparisons. Eleven laboratories participated in a DNA matching exercise to identify victims from a hypothetical conflict with 193 missing persons. The post mortem database was comprised of 87 skeletal remain profiles from a secondary mass grave displaying a minimal number of 58 individuals with evidence of commingling. The reference database was represented by 286 family reference profiles with diverse pedigrees. The goal of the exercise was to correctly discover re-associations and family matches. The results of direct matching for commingled remains re-associations were correct and fully concordant among all laboratories. However, the kinship analysis for missing persons identifications showed variable results among the participants. There was a group of laboratories with correct, concordant results but nearly half of the others showed discrepant results exhibiting likelihood ratio differences of several degrees of magnitude in some cases. Three main errors were detected: (a) some laboratories did not use the complete reference family genetic data to report the match with the remains, (b) the identity and/or non-identity hypotheses were sometimes wrongly expressed in the likelihood ratio calculations, and (c) many laboratories did not properly evaluate the prior odds for the event. The results suggest that large-scale profile comparisons for DVI or MPI is a challenge for forensic genetics laboratories and the statistical treatment of DNA matching and the Bayesian framework should be better standardized among laboratories. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Névéol, Aurélie; Wilbur, W John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/
Measuring use patterns of online journals and databases
De Groote, Sandra L.; Dorsch, Josephine L.
2003-01-01
Purpose: This research sought to determine use of online biomedical journals and databases and to assess current user characteristics associated with the use of online resources in an academic health sciences center. Setting: The Library of the Health Sciences–Peoria is a regional site of the University of Illinois at Chicago (UIC) Library with 350 print journals, more than 4,000 online journals, and multiple online databases. Methodology: A survey was designed to assess online journal use, print journal use, database use, computer literacy levels, and other library user characteristics. A survey was sent through campus mail to all (471) UIC Peoria faculty, residents, and students. Results: Forty-one percent (188) of the surveys were returned. Ninety-eight percent of the students, faculty, and residents reported having convenient access to a computer connected to the Internet. While 53% of the users indicated they searched MEDLINE at least once a week, other databases showed much lower usage. Overall, 71% of respondents indicated a preference for online over print journals when possible. Conclusions: Users prefer online resources to print, and many choose to access these online resources remotely. Convenience and full-text availability appear to play roles in selecting online resources. The findings of this study suggest that databases without links to full text and online journal collections without links from bibliographic databases will have lower use. These findings have implications for collection development, promotion of library resources, and end-user training. PMID:12883574
Névéol, Aurélie; Wilbur, W. John; Lu, Zhiyong
2012-01-01
High-throughput experiments and bioinformatics techniques are creating an exploding volume of data that are becoming overwhelming to keep track of for biologists and researchers who need to access, analyze and process existing data. Much of the available data are being deposited in specialized databases, such as the Gene Expression Omnibus (GEO) for microarrays or the Protein Data Bank (PDB) for protein structures and coordinates. Data sets are also being described by their authors in publications archived in literature databases such as MEDLINE and PubMed Central. Currently, the curation of links between biological databases and the literature mainly relies on manual labour, which makes it a time-consuming and daunting task. Herein, we analysed the current state of link curation between GEO, PDB and MEDLINE. We found that the link curation is heterogeneous depending on the sources and databases involved, and that overlap between sources is low, <50% for PDB and GEO. Furthermore, we showed that text-mining tools can automatically provide valuable evidence to help curators broaden the scope of articles and database entries that they review. As a result, we made recommendations to improve the coverage of curated links, as well as the consistency of information available from different databases while maintaining high-quality curation. Database URLs: http://www.ncbi.nlm.nih.gov/PubMed, http://www.ncbi.nlm.nih.gov/geo/, http://www.rcsb.org/pdb/ PMID:22685160
G-Bean: an ontology-graph based web tool for biomedical literature retrieval
2014-01-01
Background Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. Methods G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Results Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. Conclusions G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user. PMID:25474588
NASA Astrophysics Data System (ADS)
Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.
2006-12-01
The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and partner agencies to efficiently identify the significant results for new experiment directions and principle investigators to formulate experiment directions for new proposals.
Database Search Engines: Paradigms, Challenges and Solutions.
Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc
2016-01-01
The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.
Optimal Embedding for Shape Indexing in Medical Image Databases
Qian, Xiaoning; Tagare, Hemant D.; Fulbright, Robert K.; Long, Rodney; Antani, Sameer
2010-01-01
This paper addresses the problem of indexing shapes in medical image databases. Shapes of organs are often indicative of disease, making shape similarity queries important in medical image databases. Mathematically, shapes with landmarks belong to shape spaces which are curved manifolds with a well defined metric. The challenge in shape indexing is to index data in such curved spaces. One natural indexing scheme is to use metric trees, but metric trees are prone to inefficiency. This paper proposes a more efficient alternative. We show that it is possible to optimally embed finite sets of shapes in shape space into a Euclidean space. After embedding, classical coordinate-based trees can be used for efficient shape retrieval. The embedding proposed in the paper is optimal in the sense that it least distorts the partial Procrustes shape distance. The proposed indexing technique is used to retrieve images by vertebral shape from the NHANES II database of cervical and lumbar spine x-ray images maintained at the National Library of Medicine. Vertebral shape strongly correlates with the presence of osteophytes, and shape similarity retrieval is proposed as a tool for retrieval by osteophyte presence and severity. Experimental results included in the paper evaluate (1) the usefulness of shape-similarity as a proxy for osteophytes, (2) the computational and disk access efficiency of the new indexing scheme, (3) the relative performance of indexing with embedding to the performance of indexing without embedding, and (4) the computational cost of indexing using the proposed embedding versus the cost of an alternate embedding. The experimental results clearly show the relevance of shape indexing and the advantage of using the proposed embedding. PMID:20163981
Optimal embedding for shape indexing in medical image databases.
Qian, Xiaoning; Tagare, Hemant D; Fulbright, Robert K; Long, Rodney; Antani, Sameer
2010-06-01
This paper addresses the problem of indexing shapes in medical image databases. Shapes of organs are often indicative of disease, making shape similarity queries important in medical image databases. Mathematically, shapes with landmarks belong to shape spaces which are curved manifolds with a well defined metric. The challenge in shape indexing is to index data in such curved spaces. One natural indexing scheme is to use metric trees, but metric trees are prone to inefficiency. This paper proposes a more efficient alternative. We show that it is possible to optimally embed finite sets of shapes in shape space into a Euclidean space. After embedding, classical coordinate-based trees can be used for efficient shape retrieval. The embedding proposed in the paper is optimal in the sense that it least distorts the partial Procrustes shape distance. The proposed indexing technique is used to retrieve images by vertebral shape from the NHANES II database of cervical and lumbar spine X-ray images maintained at the National Library of Medicine. Vertebral shape strongly correlates with the presence of osteophytes, and shape similarity retrieval is proposed as a tool for retrieval by osteophyte presence and severity. Experimental results included in the paper evaluate (1) the usefulness of shape similarity as a proxy for osteophytes, (2) the computational and disk access efficiency of the new indexing scheme, (3) the relative performance of indexing with embedding to the performance of indexing without embedding, and (4) the computational cost of indexing using the proposed embedding versus the cost of an alternate embedding. The experimental results clearly show the relevance of shape indexing and the advantage of using the proposed embedding. Copyright (c) 2010 Elsevier B.V. All rights reserved.
XTCE (XML Telemetric and Command Exchange) Standard Making It Work at NASA. Can It Work For You?
NASA Technical Reports Server (NTRS)
Munoz-Fernandez, Michela; Smith, Danford S.; Rice, James K.; Jones, Ronald A.
2017-01-01
The XML Telemetric and Command Exchange (XTCE) standard is intended as a way to describe telemetry and command databases to be exchanged across centers and space agencies. XTCE usage has the potential to lead to consolidation of the Mission Operations Center (MOC) Monitor and Control displays for mission cross-support, reducing equipment and configuration costs, as well as a decrease in the turnaround time for telemetry and command modifications during all the mission phases. The adoption of XTCE will reduce software maintenance costs by reducing the variation between our existing mission dictionaries. The main objective of this poster is to show how powerful XTCE is in terms of interoperability across centers and missions. We will provide results for a use case where two centers can use their local tools to process and display the same mission telemetry in their MOC independently of one another. In our use case we have first quantified the ability for XTCE to capture the telemetry definitions of the mission by use of our suite of support tools (Conversion, Validation, and Compliance measurement). The next step was to show processing and monitoring of the same telemetry in two mission centers. Once the database was converted to XTCE using our tool, the XTCE file became our primary database and was shared among the various tool chains through their XTCE importers and ultimately configured to ingest the telemetry stream and display or capture the telemetered information in similar ways.Summary results include the ability to take a real mission database and real mission telemetry and display them on various tools from two centers, as well as using commercially free COTS.
Nomura, Kaori; Takahashi, Kunihiko; Hinomura, Yasushi; Kawaguchi, Genta; Matsushita, Yasuyuki; Marui, Hiroko; Anzai, Tatsuhiko; Hashiguchi, Masayuki; Mochizuki, Mayumi
2015-01-01
Background The use of a statistical approach to analyze cumulative adverse event (AE) reports has been encouraged by regulatory authorities. However, data variations affect statistical analyses (eg, signal detection). Further, differences in regulations, social issues, and health care systems can cause variations in AE data. The present study examined similarities and differences between two publicly available databases, ie, the Japanese Adverse Drug Event Report (JADER) database and the US Food and Drug Administration Adverse Event Reporting System (FAERS), and how they affect signal detection. Methods Two AE data sources from 2010 were examined, ie, JADER cases (JP) and Japanese cases extracted from the FAERS (FAERS-JP). Three methods for signals of disproportionate reporting, ie, the reporting odds ratio, Bayesian confidence propagation neural network, and Gamma Poisson Shrinker (GPS), were used on drug-event combinations for three substances frequently recorded in both systems. Results The two databases showed similar elements of AE reports, but no option was provided for a shareable case identifier. The average number of AEs per case was 1.6±1.3 (maximum 37) in the JP and 3.3±3.5 (maximum 62) in the FAERS-JP. Between 5% and 57% of all AEs were signaled by three quantitative methods for etanercept, infliximab, and paroxetine. Signals identified by GPS for the JP and FAERS-JP, as referenced by Japanese labeling, showed higher positive sensitivity than was expected. Conclusion The FAERS-JP was different from the JADER. Signals derived from both datasets identified different results, but shared certain signals. Discrepancies in type of AEs, drugs reported, and average number of AEs per case were potential contributing factors. This study will help those concerned with pharmacovigilance better understand the use and pitfalls of using spontaneous AE data. PMID:26109846
Development of a System Model for Non-Invasive Quantification of Bilirubin in Jaundice Patients
NASA Astrophysics Data System (ADS)
Alla, Suresh K.
Neonatal jaundice is a medical condition which occurs in newborns as a result of an imbalance between the production and elimination of bilirubin. Excess bilirubin in the blood stream diffuses into the surrounding tissue leading to a yellowing of the skin. An optical system integrated with a signal processing system is used as a platform to noninvasively quantify bilirubin concentration through the measurement of diffuse skin reflectance. Initial studies have lead to the generation of a clinical analytical model for neonatal jaundice which generates spectral reflectance data for jaundiced skin with varying levels of bilirubin concentration in the tissue. The spectral database built using the clinical analytical model is then used as a test database to validate the signal processing system in real time. This evaluation forms the basis for understanding the translation of this research to human trials. The clinical analytical model and signal processing system have been successful validated on three spectral databases. First spectral database is constructed using a porcine model as a surrogate for neonatal skin tissue. Samples of pig skin were soaked in bilirubin solutions of varying concentrations to simulate jaundice skin conditions. The resulting skins samples were analyzed with our skin reflectance systems producing bilirubin concentration values that show a high correlation (R2 = 0.94) to concentration of the bilirubin solution that each porcine tissue sample is soaked in. The second spectral database is the spectral measurements collected on human volunteers to quantify the different chromophores and other physical properties of the tissue such a Hematocrit, Hemoglobin etc. The third spectral database is the spectral data collected at different time periods from the moment a bruise is induced.
The EXOSAT database and archive
NASA Technical Reports Server (NTRS)
Reynolds, A. P.; Parmar, A. N.
1992-01-01
The EXOSAT database provides on-line access to the results and data products (spectra, images, and lightcurves) from the EXOSAT mission as well as access to data and logs from a number of other missions (such as EINSTEIN, COS-B, ROSAT, and IRAS). In addition, a number of familiar optical, infrared, and x ray catalogs, including the Hubble Space Telescope (HST) guide star catalog are available. The complete database is located at the EXOSAT observatory at ESTEC in the Netherlands and is accessible remotely via a captive account. The database management system was specifically developed to efficiently access the database and to allow the user to perform statistical studies on large samples of astronomical objects as well as to retrieve scientific and bibliographic information on single sources. The system was designed to be mission independent and includes timing, image processing, and spectral analysis packages as well as software to allow the easy transfer of analysis results and products to the user's own institute. The archive at ESTEC comprises a subset of the EXOSAT observations, stored on magnetic tape. Observations of particular interest were copied in compressed format to an optical jukebox, allowing users to retrieve and analyze selected raw data entirely from their terminals. Such analysis may be necessary if the user's needs are not accommodated by the products contained in the database (in terms of time resolution, spectral range, and the finesse of the background subtraction, for instance). Long-term archiving of the full final observation data is taking place at ESRIN in Italy as part of the ESIS program, again using optical media, and ESRIN have now assumed responsibility for distributing the data to the community. Tests showed that raw observational data (typically several tens of megabytes for a single target) can be transferred via the existing networks in reasonable time.
The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.
Takashima, S; Yamaoka, K
1999-08-30
Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.
Optimized volume models of earthquake-triggered landslides
Xu, Chong; Xu, Xiwei; Shen, Lingling; Yao, Qi; Tan, Xibin; Kang, Wenjun; Ma, Siyuan; Wu, Xiyan; Cai, Juntao; Gao, Mingxing; Li, Kang
2016-01-01
In this study, we proposed three optimized models for calculating the total volume of landslides triggered by the 2008 Wenchuan, China Mw 7.9 earthquake. First, we calculated the volume of each deposit of 1,415 landslides triggered by the quake based on pre- and post-quake DEMs in 20 m resolution. The samples were used to fit the conventional landslide “volume-area” power law relationship and the 3 optimized models we proposed, respectively. Two data fitting methods, i.e. log-transformed-based linear and original data-based nonlinear least square, were employed to the 4 models. Results show that original data-based nonlinear least square combining with an optimized model considering length, width, height, lithology, slope, peak ground acceleration, and slope aspect shows the best performance. This model was subsequently applied to the database of landslides triggered by the quake except for two largest ones with known volumes. It indicates that the total volume of the 196,007 landslides is about 1.2 × 1010 m3 in deposit materials and 1 × 1010 m3 in source areas, respectively. The result from the relationship of quake magnitude and entire landslide volume related to individual earthquake is much less than that from this study, which reminds us the necessity to update the power-law relationship. PMID:27404212
Optimized volume models of earthquake-triggered landslides.
Xu, Chong; Xu, Xiwei; Shen, Lingling; Yao, Qi; Tan, Xibin; Kang, Wenjun; Ma, Siyuan; Wu, Xiyan; Cai, Juntao; Gao, Mingxing; Li, Kang
2016-07-12
In this study, we proposed three optimized models for calculating the total volume of landslides triggered by the 2008 Wenchuan, China Mw 7.9 earthquake. First, we calculated the volume of each deposit of 1,415 landslides triggered by the quake based on pre- and post-quake DEMs in 20 m resolution. The samples were used to fit the conventional landslide "volume-area" power law relationship and the 3 optimized models we proposed, respectively. Two data fitting methods, i.e. log-transformed-based linear and original data-based nonlinear least square, were employed to the 4 models. Results show that original data-based nonlinear least square combining with an optimized model considering length, width, height, lithology, slope, peak ground acceleration, and slope aspect shows the best performance. This model was subsequently applied to the database of landslides triggered by the quake except for two largest ones with known volumes. It indicates that the total volume of the 196,007 landslides is about 1.2 × 10(10) m(3) in deposit materials and 1 × 10(10) m(3) in source areas, respectively. The result from the relationship of quake magnitude and entire landslide volume related to individual earthquake is much less than that from this study, which reminds us the necessity to update the power-law relationship.
An expression database for roots of the model legume Medicago truncatula under salt stress
2009-01-01
Background Medicago truncatula is a model legume whose genome is currently being sequenced by an international consortium. Abiotic stresses such as salt stress limit plant growth and crop productivity, including those of legumes. We anticipate that studies on M. truncatula will shed light on other economically important legumes across the world. Here, we report the development of a database called MtED that contains gene expression profiles of the roots of M. truncatula based on time-course salt stress experiments using the Affymetrix Medicago GeneChip. Our hope is that MtED will provide information to assist in improving abiotic stress resistance in legumes. Description The results of our microarray experiment with roots of M. truncatula under 180 mM sodium chloride were deposited in the MtED database. Additionally, sequence and annotation information regarding microarray probe sets were included. MtED provides functional category analysis based on Gene and GeneBins Ontology, and other Web-based tools for querying and retrieving query results, browsing pathways and transcription factor families, showing metabolic maps, and comparing and visualizing expression profiles. Utilities like mapping probe sets to genome of M. truncatula and In-Silico PCR were implemented by BLAT software suite, which were also available through MtED database. Conclusion MtED was built in the PHP script language and as a MySQL relational database system on a Linux server. It has an integrated Web interface, which facilitates ready examination and interpretation of the results of microarray experiments. It is intended to help in selecting gene markers to improve abiotic stress resistance in legumes. MtED is available at http://bioinformatics.cau.edu.cn/MtED/. PMID:19906315
An expression database for roots of the model legume Medicago truncatula under salt stress.
Li, Daofeng; Su, Zhen; Dong, Jiangli; Wang, Tao
2009-11-11
Medicago truncatula is a model legume whose genome is currently being sequenced by an international consortium. Abiotic stresses such as salt stress limit plant growth and crop productivity, including those of legumes. We anticipate that studies on M. truncatula will shed light on other economically important legumes across the world. Here, we report the development of a database called MtED that contains gene expression profiles of the roots of M. truncatula based on time-course salt stress experiments using the Affymetrix Medicago GeneChip. Our hope is that MtED will provide information to assist in improving abiotic stress resistance in legumes. The results of our microarray experiment with roots of M. truncatula under 180 mM sodium chloride were deposited in the MtED database. Additionally, sequence and annotation information regarding microarray probe sets were included. MtED provides functional category analysis based on Gene and GeneBins Ontology, and other Web-based tools for querying and retrieving query results, browsing pathways and transcription factor families, showing metabolic maps, and comparing and visualizing expression profiles. Utilities like mapping probe sets to genome of M. truncatula and In-Silico PCR were implemented by BLAT software suite, which were also available through MtED database. MtED was built in the PHP script language and as a MySQL relational database system on a Linux server. It has an integrated Web interface, which facilitates ready examination and interpretation of the results of microarray experiments. It is intended to help in selecting gene markers to improve abiotic stress resistance in legumes. MtED is available at http://bioinformatics.cau.edu.cn/MtED/.
A Brief Review of RNA–Protein Interaction Database Resources
Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong
2017-01-01
RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278
NLTE4 Plasma Population Kinetics Database
National Institute of Standards and Technology Data Gateway
SRD 159 NLTE4 Plasma Population Kinetics Database (Web database for purchase) This database contains benchmark results for simulation of plasma population kinetics and emission spectra. The data were contributed by the participants of the 4th Non-LTE Code Comparison Workshop who have unrestricted access to the database. The only limitation for other users is in hidden labeling of the output results. Guest users can proceed to the database entry page without entering userid and password.
Mugisa, Dana J; Katimbo, Abia; Sempiira, John E; Kisaalita, William S
2016-05-01
Sub-Saharan African women on small-acreage farms carry a disproportionately higher labor burden, which is one of the main reasons they are unable to produce for both home and the market and realize higher incomes. Labor-saving interventions such as hand-tools are needed to save time and/or increase productivity in, for example, land preparation for crop and animal agriculture, post-harvest processing, and meeting daily energy and water needs. Development of such tools requires comprehensive and content-specific anthropometric data or body dimensions and existing databases based on Western women may be less relevant. We conducted measurements on 89 women to provide preliminary results toward answering two questions. First, how well existing databases are applicable in the design of hand-tools for sub-Saharan African women. Second, how universal body dimension predictive models are among ethnic groups. Our results show that, body dimensions between Bantu and Nilotic ethnolinguistic groups are different and both are different from American women. These results strongly support the need for establishing anthropometric databases for sub-Saharan African women, toward hand-tool design. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
The Emotional Movie Database (EMDB): a self-report and psychophysiological study.
Carvalho, Sandra; Leite, Jorge; Galdo-Álvarez, Santiago; Gonçalves, Oscar F
2012-12-01
Film clips are an important tool for evoking emotional responses in the laboratory. When compared with other emotionally potent visual stimuli (e.g., pictures), film clips seem to be more effective in eliciting emotions for longer periods of time at both the subjective and physiological levels. The main objective of the present study was to develop a new database of affective film clips without auditory content, based on a dimensional approach to emotional stimuli (valence, arousal and dominance). The study had three different phases: (1) the pre-selection and editing of 52 film clips (2) the self-report rating of these film clips by a sample of 113 participants and (3) psychophysiological assessment [skin conductance level (SCL) and the heart rate (HR)] on 32 volunteers. Film clips from different categories were selected to elicit emotional states from different quadrants of affective space. The results also showed that sustained exposure to the affective film clips resulted in a pattern of a SCL increase and HR deceleration in high arousal conditions (i.e., horror and erotic conditions). The resulting emotional movie database can reliably be used in research requiring the presentation of non-auditory film clips with different ratings of valence, arousal and dominance.
An effective model for store and retrieve big health data in cloud computing.
Goli-Malekabadi, Zohreh; Sargolzaei-Javan, Morteza; Akbari, Mohammad Kazem
2016-08-01
The volume of healthcare data including different and variable text types, sounds, and images is increasing day to day. Therefore, the storage and processing of these data is a necessary and challenging issue. Generally, relational databases are used for storing health data which are not able to handle the massive and diverse nature of them. This study aimed at presenting the model based on NoSQL databases for the storage of healthcare data. Despite different types of NoSQL databases, document-based DBs were selected by a survey on the nature of health data. The presented model was implemented in the Cloud environment for accessing to the distribution properties. Then, the data were distributed on the database by applying the Shard property. The efficiency of the model was evaluated in comparison with the previous data model, Relational Database, considering query time, data preparation, flexibility, and extensibility parameters. The results showed that the presented model approximately performed the same as SQL Server for "read" query while it acted more efficiently than SQL Server for "write" query. Also, the performance of the presented model was better than SQL Server in the case of flexibility, data preparation and extensibility. Based on these observations, the proposed model was more effective than Relational Databases for handling health data. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Keeping Track of Our Treasures: Managing Historical Data with Relational Database Software.
ERIC Educational Resources Information Center
Gutmann, Myron P.; And Others
1989-01-01
Describes the way a relational database management system manages a large historical data collection project. Shows that such databases are practical to construct. States that the programing tasks involved are not for beginners, but the rewards of having data organized are worthwhile. (GG)
Efficient Single-Pass Index Construction for Text Databases.
ERIC Educational Resources Information Center
Heinz, Steffen; Zobel, Justin
2003-01-01
Discusses index construction for text collections, reviews principal approaches to inverted indexes, analyzes their theoretical cost, and presents experimental results of the use of a single-pass inversion method on Web document collections. Shows that the single-pass approach is faster and does not require the complete vocabulary of the indexed…
Variogram-based feature extraction for neural network recognition of logos
NASA Astrophysics Data System (ADS)
Pham, Tuan D.
2003-03-01
This paper presents a new approach for extracting spatial features of images based on the theory of regionalized variables. These features can be effectively used for automatic recognition of logo images using neural networks. Experimental results on a public-domain logo database show the effectiveness of the proposed approach.
A Machine Reading System for Assembling Synthetic Paleontological Databases
Peters, Shanan E.; Zhang, Ce; Livny, Miron; Ré, Christopher
2014-01-01
Many aspects of macroevolutionary theory and our understanding of biotic responses to global environmental change derive from literature-based compilations of paleontological data. Existing manually assembled databases are, however, incomplete and difficult to assess and enhance with new data types. Here, we develop and validate the quality of a machine reading system, PaleoDeepDive, that automatically locates and extracts data from heterogeneous text, tables, and figures in publications. PaleoDeepDive performs comparably to humans in several complex data extraction and inference tasks and generates congruent synthetic results that describe the geological history of taxonomic diversity and genus-level rates of origination and extinction. Unlike traditional databases, PaleoDeepDive produces a probabilistic database that systematically improves as information is added. We show that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions. Our machine reading approach to scientific data integration and synthesis brings within reach many questions that are currently underdetermined and does so in ways that may stimulate entirely new modes of inquiry. PMID:25436610
Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model.
Fang, Yuming; Zhang, Chi; Li, Jing; Lei, Jianjun; Perreira Da Silva, Matthieu; Le Callet, Patrick
2017-10-01
In this paper, we investigate the visual attention modeling for stereoscopic video from the following two aspects. First, we build one large-scale eye tracking database as the benchmark of visual attention modeling for stereoscopic video. The database includes 47 video sequences and their corresponding eye fixation data. Second, we propose a novel computational model of visual attention for stereoscopic video based on Gestalt theory. In the proposed model, we extract the low-level features, including luminance, color, texture, and depth, from discrete cosine transform coefficients, which are used to calculate feature contrast for the spatial saliency computation. The temporal saliency is calculated by the motion contrast from the planar and depth motion features in the stereoscopic video sequences. The final saliency is estimated by fusing the spatial and temporal saliency with uncertainty weighting, which is estimated by the laws of proximity, continuity, and common fate in Gestalt theory. Experimental results show that the proposed method outperforms the state-of-the-art stereoscopic video saliency detection models on our built large-scale eye tracking database and one other database (DML-ITRACK-3D).
Long-term cycles in the history of life: periodic biodiversity in the paleobiology database.
Melott, Adrian L
2008-01-01
Time series analysis of fossil biodiversity of marine invertebrates in the Paleobiology Database (PBDB) shows a significant periodicity at approximately 63 My, in agreement with previous analyses based on the Sepkoski database. I discuss how this result did not appear in a previous analysis of the PBDB. The existence of the 63 My periodicity, despite very different treatment of systematic error in both PBDB and Sepkoski databases strongly argues for consideration of its reality in the fossil record. Cross-spectral analysis of the two datasets finds that a 62 My periodicity coincides in phase by 1.6 My, equivalent to better than the errors in either measurement. Consequently, the two data sets not only contain the same strong periodicity, but its peaks and valleys closely correspond in time. Two other spectral peaks appear in the PBDB analysis, but appear to be artifacts associated with detrending and with the increased interval length. Sampling-standardization procedures implemented by the PBDB collaboration suggest that the signal is not an artifact of sampling bias. Further work should focus on finding the cause of the 62 My periodicity.
Extraction of land cover change information from ENVISAT-ASAR data in Chengdu Plain
NASA Astrophysics Data System (ADS)
Xu, Wenbo; Fan, Jinlong; Huang, Jianxi; Tian, Yichen; Zhang, Yong
2006-10-01
Land cover data are essential to most global change research objectives, including the assessment of current environmental conditions and the simulation of future environmental scenarios that ultimately lead to public policy development. Chinese Academy of Sciences generated a nationwide land cover database in order to carry out the quantification and spatial characterization of land use/cover changes (LUCC) in 1990s. In order to improve the reliability of the database, we will update the database anytime. But it is difficult to obtain remote sensing data to extract land cover change information in large-scale. It is hard to acquire optical remote sensing data in Chengdu plain, so the objective of this research was to evaluate multitemporal ENVISAT advanced synthetic aperture radar (ASAR) data for extracting land cover change information. Based on the fieldwork and the nationwide 1:100000 land cover database, the paper assesses several land cover changes in Chengdu plain, for example: crop to buildings, forest to buildings, and forest to bare land. The results show that ENVISAT ASAR data have great potential for the applications of extracting land cover change information.
Results from prototype die-to-database reticle inspection system
NASA Astrophysics Data System (ADS)
Mu, Bo; Dayal, Aditya; Broadbent, Bill; Lim, Phillip; Goonesekera, Arosha; Chen, Chunlin; Yeung, Kevin; Pinto, Becky
2009-03-01
A prototype die-to-database high-resolution reticle defect inspection system has been developed for 32nm and below logic reticles, and 4X Half Pitch (HP) production and 3X HP development memory reticles. These nodes will use predominantly 193nm immersion lithography (with some layers double patterned), although EUV may also be used. Many different reticle types may be used for these generations including: binary (COG, EAPSM), simple tritone, complex tritone, high transmission, dark field alternating (APSM), mask enhancer, CPL, and EUV. Finally, aggressive model based OPC is typically used, which includes many small structures such as jogs, serifs, and SRAF (sub-resolution assist features), accompanied by very small gaps between adjacent structures. The architecture and performance of the prototype inspection system is described. This system is designed to inspect the aforementioned reticle types in die-todatabase mode. Die-to-database inspection results are shown on standard programmed defect test reticles, as well as advanced 32nm logic, and 4X HP and 3X HP memory reticles from industry sources. Direct comparisons with currentgeneration inspection systems show measurable sensitivity improvement and a reduction in false detections.
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this study is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) fuel datasets. The revision is based on the data quality indicators described by the ILCD Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD fuel datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the fuel-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD fuel datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall DQR of databases.
Wang, Chuan; Li, Yafeng; Gao, Shoucui; Cheng, Daxin; Zhao, Sihai; Liu, Enqi
2015-01-01
To evaluate the beneficial and adverse effects of breviscapine injection in combination with Western medicine on the treatment of patients with angina pectoris. The Cochrane Central Register of Controlled Trials, Medline, Science Citation Index, EMBASE, the China National Knowledge Infrastructure, the Wanfang Database, the Chongqing VIP Information Database and the China Biomedical Database were searched to identify randomized clinical trials (RCTs) that evaluated the effects of Western medicine compared to breviscapine injection plus Western medicine on angina pectoris patients. The included studies were analyzed using RevMan 5.1.0 software. The literature search yielded 460 studies, wherein 16 studies matched the selection criteria. The results showed that combined therapy using Breviscapine plus Western medicine was superior to Western medicine alone for improving angina pectoris symptoms (OR=3.77, 95% Cl: 2.76~5.15) and also resulted in increased electrocardiogram (ECG) improvement (OR=2.77, 95% Cl: 2.16~3.53). The current evidence suggests that Breviscapine plus Western medicine achieved a superior therapeutic effect compared to Western medicine alone.
Wang, Chuan; Li, Yafeng; Gao, Shoucui; Cheng, Daxin; Zhao, Sihai; Liu, Enqi
2015-01-01
To evaluate the beneficial and adverse effects of breviscapine injection in combination with Western medicine on the treatment of patients with angina pectoris. The Cochrane Central Register of Controlled Trials, Medline, Science Citation Index, EMBASE, the China National Knowledge Infrastructure, the Wanfang Database, the Chongqing VIP Information Database and the China Biomedical Database were searched to identify randomized clinical trials (RCTs) that evaluated the effects of Western medicine compared to breviscapine injection plus Western medicine on angina pectoris patients. The included studies were analyzed using RevMan 5.1.0 software. The literature search yielded 460 studies, wherein 16 studies matched the selection criteria. The results showed that combined therapy using Breviscapine plus Western medicine was superior to Western medicine alone for improving angina pectoris symptoms (OR =3.77, 95% Cl: 2.76~5.15) and also resulted in increased electrocardiogram (ECG) improvement (OR=2.77, 95% Cl: 2.16~3.53). The current evidence suggests that Breviscapine plus Western medicine achieved a superior therapeutic effect compared to Western medicine alone. PMID:26052709
Provenance of whitefish in the Gulf of Bothnia determined by elemental analysis of otolith cores
NASA Astrophysics Data System (ADS)
Lill, J.-O.; Finnäs, V.; Slotte, J. M. K.; Jokikokko, E.; Heimbrand, Y.; Hägerstrand, H.
2018-02-01
The strontium concentration in the core of otoliths was used to determine the provenance of whitefish found in the Gulf of Bothnia, Baltic Sea. To that end, a database of strontium concentration in fish otoliths representing different habitats (sea, river and fresh water) had to be built. Otoliths from juvenile whitefish were therefore collected from freshwater ponds at 5 hatcheries, from adult whitefish from 6 spawning sites at sea along the Finnish west coast, and from adult whitefish ascending to spawn in the Torne River, in total 67 otoliths. PIXE was applied to determine the elemental concentrations in these otoliths. While otoliths from the juveniles raised in the freshwater ponds showed low but varying strontium concentrations (194-1664 μg/g,), otoliths from sea-spawning fish showed high uniform strontium levels (3720-4333 μg/g). The otolith core analysis of whitefish from Torne River showed large variations in the strontium concentrations (1525-3650 μg/g). These otolith data form a database to be used for provenance studies of wild adult whitefish caught at sea. The applicability of the database was evaluated by analyzing the core of polished otoliths from 11 whitefish from a test site at sea in the Larsmo archipelago. Our results show that by analyzing strontium in the otolith core, we can differentiate between hatchery-origin and wild-origin whitefish, but not always between river and sea spawning whitefish.
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete
Pour, Sadaf Moallemi; Alam, M. Shahria; Milani, Abbas S.
2016-01-01
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models. PMID:28773859
[Changes in nursing administration in supporting transplantation in Brazil].
Cintra, Vivian; Sanna, Maria Cristina
2005-01-01
This historical and bibliographic study aimed to understand how Nursing was organized to support care in transplantation. The HISA, LILACS, BDENF, PERIENF and DEDALUS databases were consulted, and thirteen references were found, ten of which were scientific articles, two were master's dissertations and one was a doctoral thesis. The span of time chosen for study ranges from the date of the first kidney transplant in Brazil (1965), to the date of publication of the last scientific article found in the databases mentioned above (2003). After reading these articles, the ones that were similar in topic were grouped together, thus creating the thematic axis for the presentation of the results. The results showed that the Nursing profession has played an important and active role in transplants ever since the first procedure in 1965.
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles
2011-01-01
Background Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Results Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. Conclusions In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches. PMID:21342534
Wang, Weiqi; Wang, Yanbo Justin; Bañares-Alcántara, René; Coenen, Frans; Cui, Zhanfeng
2009-12-01
In this paper, data mining is used to analyze the data on the differentiation of mammalian Mesenchymal Stem Cells (MSCs), aiming at discovering known and hidden rules governing MSC differentiation, following the establishment of a web-based public database containing experimental data on the MSC proliferation and differentiation. To this effect, a web-based public interactive database comprising the key parameters which influence the fate and destiny of mammalian MSCs has been constructed and analyzed using Classification Association Rule Mining (CARM) as a data-mining technique. The results show that the proposed approach is technically feasible and performs well with respect to the accuracy of (classification) prediction. Key rules mined from the constructed MSC database are consistent with experimental observations, indicating the validity of the method developed and the first step in the application of data mining to the study of MSCs.
A comparative study of satellite estimation for solar insolation in Albania with ground measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mitrushi, Driada, E-mail: driadamitrushi@yahoo.com; Berberi, Pëllumb, E-mail: pellumb.berberi@gmail.com; Muda, Valbona, E-mail: vmuda@hotmail.com
The main objective of this study is to compare data provided by Database of NASA with available ground data for regions covered by national meteorological net NASA estimates that their measurements of average daily solar radiation have a root-mean-square deviation RMSD error of 35 W/m{sup 2} (roughly 20% inaccuracy). Unfortunately valid data from meteorological stations for regions of interest are quite rare in Albania. In these cases, use of Solar Radiation Database of NASA would be a satisfactory solution for different case studies. Using a statistical method allows to determine most probable margins between to sources of data. Comparison of meanmore » insulation data provided by NASA with ground data of mean insulation provided by meteorological stations show that ground data for mean insolation results, in all cases, to be underestimated compared with data provided by Database of NASA. Converting factor is 1.149.« less
Combined use of computational chemistry and chemoinformatics methods for chemical discovery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sugimoto, Manabu, E-mail: sugimoto@kumamoto-u.ac.jp; Institute for Molecular Science, 38 Nishigo-Naka, Myodaiji, Okazaki 444-8585; CREST, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012
2015-12-31
Data analysis on numerical data by the computational chemistry calculations is carried out to obtain knowledge information of molecules. A molecular database is developed to systematically store chemical, electronic-structure, and knowledge-based information. The database is used to find molecules related to a keyword of “cancer”. Then the electronic-structure calculations are performed to quantitatively evaluate quantum chemical similarity of the molecules. Among the 377 compounds registered in the database, 24 molecules are found to be “cancer”-related. This set of molecules includes both carcinogens and anticancer drugs. The quantum chemical similarity analysis, which is carried out by using numerical results of themore » density-functional theory calculations, shows that, when some energy spectra are referred to, carcinogens are reasonably distinguished from the anticancer drugs. Therefore these spectral properties are considered of as important measures for classification.« less
Expert Knowledge-Based Automatic Sleep Stage Determination by Multi-Valued Decision Making Method
NASA Astrophysics Data System (ADS)
Wang, Bei; Sugi, Takenao; Kawana, Fusae; Wang, Xingyu; Nakamura, Masatoshi
In this study, an expert knowledge-based automatic sleep stage determination system working on a multi-valued decision making method is developed. Visual inspection by a qualified clinician is adopted to obtain the expert knowledge database. The expert knowledge database consists of probability density functions of parameters for various sleep stages. Sleep stages are determined automatically according to the conditional probability. Totally, four subjects were participated. The automatic sleep stage determination results showed close agreements with the visual inspection on sleep stages of awake, REM (rapid eye movement), light sleep and deep sleep. The constructed expert knowledge database reflects the distributions of characteristic parameters which can be adaptive to variable sleep data in hospitals. The developed automatic determination technique based on expert knowledge of visual inspection can be an assistant tool enabling further inspection of sleep disorder cases for clinical practice.
NASA Astrophysics Data System (ADS)
Velazquez, Enrique Israel
Improvements in medical and genomic technologies have dramatically increased the production of electronic data over the last decade. As a result, data management is rapidly becoming a major determinant, and urgent challenge, for the development of Precision Medicine. Although successful data management is achievable using Relational Database Management Systems (RDBMS), exponential data growth is a significant contributor to failure scenarios. Growing amounts of data can also be observed in other sectors, such as economics and business, which, together with the previous facts, suggests that alternate database approaches (NoSQL) may soon be required for efficient storage and management of big databases. However, this hypothesis has been difficult to test in the Precision Medicine field since alternate database architectures are complex to assess and means to integrate heterogeneous electronic health records (EHR) with dynamic genomic data are not easily available. In this dissertation, we present a novel set of experiments for identifying NoSQL database approaches that enable effective data storage and management in Precision Medicine using patients' clinical and genomic information from the cancer genome atlas (TCGA). The first experiment draws on performance and scalability from biologically meaningful queries with differing complexity and database sizes. The second experiment measures performance and scalability in database updates without schema changes. The third experiment assesses performance and scalability in database updates with schema modifications due dynamic data. We have identified two NoSQL approach, based on Cassandra and Redis, which seems to be the ideal database management systems for our precision medicine queries in terms of performance and scalability. We present NoSQL approaches and show how they can be used to manage clinical and genomic big data. Our research is relevant to the public health since we are focusing on one of the main challenges to the development of Precision Medicine and, consequently, investigating a potential solution to the progressively increasing demands on health care.
The FRUITY database on AGB stars: past, present and future
NASA Astrophysics Data System (ADS)
Cristallo, S.; Piersanti, L.; Straniero, O.
2016-01-01
We present and show the features of the FRUITY database, an interactive web- based interface devoted to the nucleosynthesis in AGB stars. We describe the current available set of AGB models (largely expanded with respect to the original one) with masses in the range 1.3≤M/M⊙≤3.0 and metallicities -2.15 ≤[Fe/H]≤+0.15. We illustrate the details of our s-process surface distributions and we compare our results to observations. Moreover, we introduce a new set of models where the effects of rotation are taken into account. Finally, we shortly describe next planned upgrades.
A new phase-correlation-based iris matching for degraded images.
Krichen, Emine; Garcia-Salicetti, Sonia; Dorizzi, Bernadette
2009-08-01
In this paper, we present a new phase-correlation-based iris matching approach in order to deal with degradations in iris images due to unconstrained acquisition procedures. Our matching system is a fusion of global and local Gabor phase-correlation schemes. The main originality of our local approach is that we do not only consider the correlation peak amplitudes but also their locations in different regions of the images. Results on several degraded databases, namely, the CASIA-BIOSECURE and Iris Challenge Evaluation 2005 databases, show the improvement of our method compared to two available reference systems, Masek and Open Source for Iris (OSRIS), in verification mode.
USGS launches online database: Lichens in National Parks
Bennett, Jim
2005-01-01
If you are interested in lichens and National Parks, now you can query a lichen database that combines these two elements. Using pull-down menus you can: search by park, specifying either species list or the references used for that area; search by species (a report will show the parks in which species are found); and search by reference codes, which are available from the first query. The reference code search allows you to obtain the complete citation for each lichen species listed in a National Park.The result pages from these queries can be printed directly from the web browser, or can be copied and pasted into a word processor.
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Prototype of web-based database of surface wave investigation results for site classification
NASA Astrophysics Data System (ADS)
Hayashi, K.; Cakir, R.; Martin, A. J.; Craig, M. S.; Lorenzo, J. M.
2016-12-01
As active and passive surface wave methods are getting popular for evaluating site response of earthquake ground motion, demand on the development of database for investigation results is also increasing. Seismic ground motion not only depends on 1D velocity structure but also on 2D and 3D structures so that spatial information of S-wave velocity must be considered in ground motion prediction. The database can support to construct 2D and 3D underground models. Inversion of surface wave processing is essentially non-unique so that other information must be combined into the processing. The database of existed geophysical, geological and geotechnical investigation results can provide indispensable information to improve the accuracy and reliability of investigations. Most investigations, however, are carried out by individual organizations and investigation results are rarely stored in the unified and organized database. To study and discuss appropriate database and digital standard format for the surface wave investigations, we developed a prototype of web-based database to store observed data and processing results of surface wave investigations that we have performed at more than 400 sites in U.S. and Japan. The database was constructed on a web server using MySQL and PHP so that users can access to the database through the internet from anywhere with any device. All data is registered in the database with location and users can search geophysical data through Google Map. The database stores dispersion curves, horizontal to vertical spectral ratio and S-wave velocity profiles at each site that was saved in XML files as digital data so that user can review and reuse them. The database also stores a published 3D deep basin and crustal structure and user can refer it during the processing of surface wave data.
Alving, Berit Elisabeth; Christensen, Janne Buck; Thrysøe, Lars
2018-03-01
The purpose of this literature review is to provide an overview of the information retrieval behaviour of clinical nurses, in terms of the use of databases and other information resources and their frequency of use. Systematic searches carried out in five databases and handsearching were used to identify the studies from 2010 to 2016, with a populations, exposures and outcomes (PEO) search strategy, focusing on the question: In which databases or other information resources do hospital nurses search for evidence based information, and how often? Of 5272 titles retrieved based on the search strategy, only nine studies fulfilled the criteria for inclusion. The studies are from the United States, Canada, Taiwan and Nigeria. The results show that hospital nurses' primary choice of source for evidence based information is Google and peers, while bibliographic databases such as PubMed are secondary choices. Data on frequency are only included in four of the studies, and data are heterogenous. The reasons for choosing Google and peers are primarily lack of time; lack of information; lack of retrieval skills; or lack of training in database searching. Only a few studies are published on clinical nurses' retrieval behaviours, and more studies are needed from Europe and Australia. © 2018 Health Libraries Group.
NASA Astrophysics Data System (ADS)
Poinsot, Audrey; Yang, Fan; Brost, Vincent
2011-02-01
Including multiple sources of information in personal identity recognition and verification gives the opportunity to greatly improve performance. We propose a contactless biometric system that combines two modalities: palmprint and face. Hardware implementations are proposed on the Texas Instrument Digital Signal Processor and Xilinx Field-Programmable Gate Array (FPGA) platforms. The algorithmic chain consists of a preprocessing (which includes palm extraction from hand images), Gabor feature extraction, comparison by Hamming distance, and score fusion. Fusion possibilities are discussed and tested first using a bimodal database of 130 subjects that we designed (uB database), and then two common public biometric databases (AR for face and PolyU for palmprint). High performance has been obtained for recognition and verification purpose: a recognition rate of 97.49% with AR-PolyU database and an equal error rate of 1.10% on the uB database using only two training samples per subject have been obtained. Hardware results demonstrate that preprocessing can easily be performed during the acquisition phase, and multimodal biometric recognition can be treated almost instantly (0.4 ms on FPGA). We show the feasibility of a robust and efficient multimodal hardware biometric system that offers several advantages, such as user-friendliness and flexibility.
The construction of an EST database for Bombyx mori and its application
Mita, Kazuei; Morimyo, Mitsuoki; Okano, Kazuhiro; Koike, Yoshiko; Nohata, Junko; Kawasaki, Hideki; Kadono-Okuda, Keiko; Yamamoto, Kimiko; Suzuki, Masataka G.; Shimada, Toru; Goldsmith, Marian R.; Maeda, Susumu
2003-01-01
To build a foundation for the complete genome analysis of Bombyx mori, we have constructed an EST database. Because gene expression patterns deeply depend on tissues as well as developmental stages, we analyzed many cDNA libraries prepared from various tissues and different developmental stages to cover the entire set of Bombyx genes. So far, the Bombyx EST database contains 35,000 ESTs from 36 cDNA libraries, which are grouped into ≈11,000 nonredundant ESTs with the average length of 1.25 kb. The comparison with FlyBase suggests that the present EST database, SilkBase, covers >55% of all genes of Bombyx. The fraction of library-specific ESTs in each cDNA library indicates that we have not yet reached saturation, showing the validity of our strategy for constructing an EST database to cover all genes. To tackle the coming saturation problem, we have checked two methods, subtraction and normalization, to increase coverage and decrease the number of housekeeping genes, resulting in a 5–11% increase of library-specific ESTs. The identification of a number of genes and comprehensive cloning of gene families have already emerged from the SilkBase search. Direct links of SilkBase with FlyBase and WormBase provide ready identification of candidate Lepidoptera-specific genes. PMID:14614147
Scalable Indoor Localization via Mobile Crowdsourcing and Gaussian Process
Chang, Qiang; Li, Qun; Shi, Zesen; Chen, Wei; Wang, Weiping
2016-01-01
Indoor localization using Received Signal Strength Indication (RSSI) fingerprinting has been extensively studied for decades. The positioning accuracy is highly dependent on the density of the signal database. In areas without calibration data, however, this algorithm breaks down. Building and updating a dense signal database is labor intensive, expensive, and even impossible in some areas. Researchers are continually searching for better algorithms to create and update dense databases more efficiently. In this paper, we propose a scalable indoor positioning algorithm that works both in surveyed and unsurveyed areas. We first propose Minimum Inverse Distance (MID) algorithm to build a virtual database with uniformly distributed virtual Reference Points (RP). The area covered by the virtual RPs can be larger than the surveyed area. A Local Gaussian Process (LGP) is then applied to estimate the virtual RPs’ RSSI values based on the crowdsourced training data. Finally, we improve the Bayesian algorithm to estimate the user’s location using the virtual database. All the parameters are optimized by simulations, and the new algorithm is tested on real-case scenarios. The results show that the new algorithm improves the accuracy by 25.5% in the surveyed area, with an average positioning error below 2.2 m for 80% of the cases. Moreover, the proposed algorithm can localize the users in the neighboring unsurveyed area. PMID:26999139
NASA Astrophysics Data System (ADS)
Shi, Congming; Wang, Feng; Deng, Hui; Liu, Yingbo; Liu, Cuiyin; Wei, Shoulin
2017-08-01
As a dedicated synthetic aperture radio interferometer in China, the MingantU SpEctral Radioheliograph (MUSER), initially known as the Chinese Spectral RadioHeliograph (CSRH), has entered the stage of routine observation. More than 23 million data records per day need to be effectively managed to provide high-performance data query and retrieval for scientific data reduction. In light of these massive amounts of data generated by the MUSER, in this paper, a novel data management technique called the negative database (ND) is proposed and used to implement a data management system for the MUSER. Based on the key-value database, the ND technique makes complete utilization of the complement set of observational data to derive the requisite information. Experimental results showed that the proposed ND can significantly reduce storage volume in comparison with a relational database management system (RDBMS). Even when considering the time needed to derive records that were absent, its overall performance, including querying and deriving the data of the ND, is comparable with that of a relational database management system (RDBMS). The ND technique effectively solves the problem of massive data storage for the MUSER and is a valuable reference for the massive data management required in next-generation telescopes.
An intercomparison of tropical cyclone best-track products for the southwest Pacific
NASA Astrophysics Data System (ADS)
Magee, Andrew D.; Verdon-Kidd, Danielle C.; Kiem, Anthony S.
2016-06-01
Recent efforts to understand tropical cyclone (TC) activity in the southwest Pacific (SWP) have led to the development of numerous TC databases. The methods used to compile each database vary and are based on data from different meteorological centres, standalone TC databases and archived synoptic charts. Therefore the aims of this study are to (i) provide a spatio-temporal comparison of three TC best-track (BT) databases and explore any differences between them (and any associated implications) and (ii) investigate whether there are any spatial, temporal or statistical differences between pre-satellite (1945-1969), post-satellite (1970-2011) and post-geostationary satellite (1982-2011) era TC data given the changing observational technologies with time. To achieve this, we compare three best-track TC databases for the SWP region (0-35° S, 135° E-120° W) from 1945 to 2011: the Joint Typhoon Warning Center (JTWC), the International Best Track Archive for Climate Stewardship (IBTrACS) and the Southwest Pacific Enhanced Archive of Tropical Cyclones (SPEArTC). The results of this study suggest that SPEArTC is the most complete repository of TCs for the SWP region. In particular, we show that the SPEArTC database includes a number of additional TCs, not included in either the JTWC or IBTrACS database. These SPEArTC events do occur under environmental conditions conducive to tropical cyclogenesis (TC genesis), including anomalously negative 700 hPa vorticity (VORT), anomalously negative vertical shear of zonal winds (VSZW), anomalously negative 700 hPa geopotential height (GPH), cyclonic (absolute) 700 hPa winds and low values of absolute vertical wind shear (EVWS). Further, while changes in observational technologies from 1945 have undoubtedly improved our ability to detect and monitor TCs, we show that the number of TCs detected prior to the satellite era (1945-1969) are not statistically different to those in the post-satellite era (post-1970). Although data from pre-satellite and pre-geostationary satellite periods are currently inadequate for investigating TC intensity, this study suggests that SPEArTC data (from 1945) may be used to investigate long-term variability of TC counts and TC genesis locations.
Preliminary surficial geologic map of the Newberry Springs 30' x 60' quadrangle, California
Phelps, G.A.; Bedford, D.R.; Lidke, D.J.; Miller, D.M.; Schmidt, K.M.
2012-01-01
The Newberry Springs 30' x 60' quadrangle is located in the central Mojave Desert of southern California. It is split approximately into northern and southern halves by I-40, with the city of Barstow at its western edge and the town of Ludlow near its eastern edge. The map area spans lat 34°30 to 35° N. to long -116 °to -117° W. and covers over 1,000 km2. We integrate the results of surficial geologic mapping conducted during 2002-2005 with compilations of previous surficial mapping and bedrock geologic mapping. Quaternary units are subdivided in detail on the map to distinguish variations in age, process of formation, pedogenesis, lithology, and spatial interdependency, whereas pre-Quaternary bedrock units are grouped into generalized assemblages that emphasize their attributes as hillslope-forming materials and sources of parent material for the Quaternary units. The spatial information in this publication is presented in two forms: a spatial database and a geologic map. The geologic map is a view (the display of an extracted subset of the database at a given time) of the spatial database; it highlights key aspects of the database and necessarily does not show all of the data contained therein. The database contains detailed information about Quaternary geologic unit composition, authorship, and notes regarding geologic units, faults, contacts, and local vegetation. The amount of information contained in the database is too large to show on a single map, so a restricted subset of the information was chosen to summarize the overall nature of the geology. Refer to the database for additional information. Accompanying the spatial data are the map documentation and spatial metadata. The map documentation (this document) describes the geologic setting and history of the Newberry Springs map sheet, summarizes the age and physical character of each map unit, and describes principal faults and folds. The Federal Geographic Data Committee (FGDC) compliant metadata provides detailed information about the digital files and file structure of the spatial data.
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles.
Tsai, Richard Tzong-Han; Lai, Po-Ting
2011-02-23
Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches.
Raharimalala, F N; Andrianinarivomanana, T M; Rakotondrasoa, A; Collard, J M; Boyer, S
2017-09-01
Arthropod-borne diseases are important causes of morbidity and mortality. The identification of vector species relies mainly on morphological features and/or molecular biology tools. The first method requires specific technical skills and may result in misidentifications, and the second method is time-consuming and expensive. The aim of the present study is to assess the usefulness and accuracy of matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) as a supplementary tool with which to identify mosquito vector species and to invest in the creation of an international database. A total of 89 specimens belonging to 10 mosquito species were selected for the extraction of proteins from legs and for the establishment of a reference database. A blind test with 123 mosquitoes was performed to validate the MS method. Results showed that: (a) the spectra obtained in the study with a given species differed from the spectra of the same species collected in another country, which highlights the need for an international database; (b) MALDI-TOF MS is an accurate method for the rapid identification of mosquito species that are referenced in a database; (c) MALDI-TOF MS allows the separation of groups or complex species, and (d) laboratory specimens undergo a loss of proteins compared with those isolated in the field. In conclusion, MALDI-TOF MS is a useful supplementary tool for mosquito identification and can help inform vector control. © 2017 The Royal Entomological Society.
NASA Astrophysics Data System (ADS)
Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.
2014-02-01
Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying lateral boundary conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2001-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complemented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone and carbon monoxide vertical profiles. The results show performance is largely within uncertainty estimates for ozone from the Ozone Monitoring Instrument and carbon monoxide from the Measurements Of Pollution In The Troposphere (MOPITT), but there were some notable biases compared with Tropospheric Emission Spectrometer (TES) ozone. Compared with TES, our ozone predictions are high-biased in the upper troposphere, particularly in the south during January. This publication documents the global simulation database, the tool for conversion to LBC, and the evaluation of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.
Metal oxide based multisensor array and portable database for field analysis of antioxidants
Sharpe, Erica; Bradley, Ryan; Frasco, Thalia; Jayathilaka, Dilhani; Marsh, Amanda; Andreescu, Silvana
2014-01-01
We report a novel chemical sensing array based on metal oxide nanoparticles as a portable and inexpensive paper-based colorimetric method for polyphenol detection and field characterization of antioxidant containing samples. Multiple metal oxide nanoparticles with various polyphenol binding properties were used as active sensing materials to develop the sensor array and establish a database of polyphenol standards that include epigallocatechin gallate, gallic acid, resveratrol, and Trolox among others. Unique charge-transfer complexes are formed between each polyphenol and each metal oxide on the surface of individual sensors in the array, creating distinct optically detectable signals which have been quantified and logged into a reference database for polyphenol identification. The field-portable Pantone/X-Rite© CapSure® color reader was used to create this database and to facilitate rapid colorimetric analysis. The use of multiple metal-oxide sensors allows for cross-validation of results and increases accuracy of analysis. The database has enabled successful identification and quantification of antioxidant constituents within real botanical extractions including green tea. Formation of charge-transfer complexes is also correlated with antioxidant activity exhibiting electron transfer capabilities of each polyphenol. The antioxidant activity of each sample was calculated and validated against the oxygen radical absorbance capacity (ORAC) assay showing good comparability. The results indicate that this method can be successfully used for a more comprehensive analysis of antioxidant containing samples as compared to conventional methods. This technology can greatly simplify investigations into plant phenolics and make possible the on-site determination of antioxidant composition and activity in remote locations. PMID:24610993
Benschop, Corina C G; van der Beek, Cornelis P; Meiland, Hugo C; van Gorp, Ankie G M; Westen, Antoinette A; Sijen, Titia
2011-08-01
To analyze DNA samples with very low DNA concentrations, various methods have been developed that sensitize short tandem repeat (STR) typing. Sensitized DNA typing is accompanied by stochastic amplification effects, such as allele drop-outs and drop-ins. Therefore low template (LT) DNA profiles are interpreted with care. One can either try to infer the genotype by a consensus method that uses alleles confirmed in replicate analyses, or one can use a statistical model to evaluate the strength of the evidence in a direct comparison with a known DNA profile. In this study we focused on the first strategy and we show that the procedure by which the consensus profile is assembled will affect genotyping reliability. In order to gain insight in the roles of replicate number and requested level of reproducibility, we generated six independent amplifications of samples of known donors. The LT methods included both increased cycling and enhanced capillary electrophoresis (CE) injection [1]. Consensus profiles were assembled from two to six of the replications using four methods: composite (include all alleles), n-1 (include alleles detected in all but one replicate), n/2 (include alleles detected in at least half of the replicates) and 2× (include alleles detected twice). We compared the consensus DNA profiles with the DNA profile of the known donor, studied the stochastic amplification effects and examined the effect of the consensus procedure on DNA database search results. From all these analyses we conclude that the accuracy of LT DNA typing and the efficiency of database searching improve when the number of replicates is increased and the consensus method is n/2. The most functional number of replicates within this n/2 method is four (although a replicate number of three suffices for samples showing >25% of the alleles in standard STR typing). This approach was also the optimal strategy for the analysis of 2-person mixtures, although modified search strategies may be needed to retrieve the minor component in database searches. From the database searches follows the recommendation to specifically mark LT DNA profiles when entering them into the DNA database. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Walsh, Gregory J.
2016-08-16
This report consists of sheets 1 and 2 as well as an online geographic information systems database that includes contacts of bedrock geologic units, faults, outcrops, structural geologic information, and photographs. Sheet 2 of this report shows three cross sections, a tectonic map, and two brittle features maps that show measured outcrop-scale strike and dip results with summary stereonets and rose diagrams.
USDA-ARS?s Scientific Manuscript database
Epidemiologic studies show inverse associations between flavonoid intake and chronic disease risk. However, a lack of comprehensive databases of the flavonoid content of foods has hindered efforts to fully characterize population intake. Using a newly released database of flavonoid values, we soug...
8 CFR 338.11 - Execution and issuance of certificate of naturalization by clerk of court.
Code of Federal Regulations, 2010 CFR
2010-01-01
... the petitioner. If the court maintains naturalization records on an electronic database then only the... and maintained in the court's electronic database. (b) The certificate shall show under “former..., or if using automation equipment, ensure it is part of the electronic database record. The clerk of...
8 CFR 338.11 - Execution and issuance of certificate of naturalization by clerk of court.
Code of Federal Regulations, 2011 CFR
2011-01-01
... the petitioner. If the court maintains naturalization records on an electronic database then only the... and maintained in the court's electronic database. (b) The certificate shall show under “former..., or if using automation equipment, ensure it is part of the electronic database record. The clerk of...
Global Soil Respiration: Interaction with Environmental Variables and Response to Climate Change
NASA Astrophysics Data System (ADS)
Jian, J.; Steele, M.
2016-12-01
Background, methods, objectivesTerrestrial ecosystems take up around 1.7 Pg C per year; however, the role of terrestrial ecosystems as a carbon sink may change to carbon source by 2050, as a result of positive feedback of soil respiration response to global warming. Nevertheless, limited evidence shows that soil carbon is decreasing and the role of terrestrial ecosystems is changing under warming. One possibility is the positive feedback may slow due to the acclimation of soil respiration as a result of decreasing temperature sensitivity (Q10) with warming. To verify and quantify the uncertainty in soil carbon cycling and feedbacks to climate change, we assembled soil respiration observations from 1961 to 2014 from 724 publications into a monthly global soil respiration database (MSRDB), which included 13482 soil respiration measurements together with 38 other ancillary measurements from 538 sites. Using this database we examined macroscale variation in the relationship between soil respiration and air temperature, precipitation, leaf area index and soil properties. We also quantified global soil respiration, the sources of uncertainty, and its feedback to warming based on climate region-oriented models with variant Q10function. Results and ConclusionsOur results showed substantial heterogeneity in the relationship between soil respiration and environmental factors across different climate regions. For example, soil respiration was strongly related to vegetation (via leaf area index) in colder regions, but not in tropical region. Only in tropical and arid regions did soil properties explain any variation in soil respiration. Global annual mean soil respiration from 1961 to 2014 was estimated to be 72.41 Pg C yr-1 based on monthly global soil respiration database, 25 Pg lower than estimated based on yearly soil respiration database. By using the variable Q10 models, we estimated that global soil respiration increased at a rate of 0.03 Pg C yr-1 from 1961 to 2014, smaller than previous studies ( 0.1 Pg C yr-1). The substantial variations in these relationships suggest that regional scales is important for understanding and prediction of global carbon cycling and how it response to climate change.
Sousa Nanji, Liliana; Torres Cardoso, André; Costa, João; Vaz-Carneiro, António
2015-01-01
Impairment of the upper limbs is quite frequent after stroke, making rehabilitation an essential step towards clinical recovery and patient empowerment. This review aimed to synthetize existing evidence regarding interventions for upper limb function improvement after Stroke and to assess which would bring some benefit. The Cochrane Database of Systematic Reviews, the Database of Reviews of Effects and PROSPERO databases were searched until June 2013 and 40 reviews have been included, covering 503 studies, 18 078 participants and 18 interventions, as well as different doses and settings of interventions. The main results were: 1- Information currently available is insufficient to assess effectiveness of each intervention and to enable comparison of interventions; 2- Transcranial direct current stimulation brings no benefit for outcomes of activities of daily living; 3- Moderate-quality evidence showed a beneficial effect of constraint-induced movement therapy, mental practice, mirror therapy, interventions for sensory impairment, virtual reality and repetitive task practice; 4- Unilateral arm training may be more effective than bilateral arm training; 5- Moderate-quality evidence showed a beneficial effect of robotics on measures of impairment and ADLs; 6- There is no evidence of benefit or harm for technics such as repetitive transcranial magnetic stimulation, music therapy, pharmacological interventions, electrical stimulation and other therapies. Currently available evidence is insufficient and of low quality, not supporting clear clinical decisions. High-quality studies are still needed.
NASA Astrophysics Data System (ADS)
Jia, Jia; Cheng, Shuiyuan; Yao, Sen; Xu, Tiebing; Zhang, Tingting; Ma, Yuetao; Wang, Hongliang; Duan, Wenjiao
2018-06-01
As one of the highest energy consumption and pollution industries, the iron and steel industry is regarded as a most important source of particulate matter emission. In this study, chemical components of size-segregated particulate matters (PM) emitted from different manufacturing units in iron and steel industry were sampled by a comprehensive sampling system. Results showed that the average particle mass concentration was highest in sintering process, followed by puddling, steelmaking and then rolling processes. PM samples were divided into eight size fractions for testing the chemical components, SO42- and NH4+ distributed more into fine particles while most of the Ca2+ was concentrated in coarse particles, the size distribution of mineral elements depended on the raw materials applied. Moreover, local database with PM chemical source profiles of iron and steel industry were built and applied in CMAQ modeling for simulating SO42- and NO3- concentration, results showed that the accuracy of model simulation improved with local chemical source profiles compared to the SPECIATE database. The results gained from this study are expected to be helpful to understand the components of PM in iron and steel industry and contribute to the source apportionment researches.
Statistical organelle dissection of Arabidopsis guard cells using image database LIPS.
Higaki, Takumi; Kutsuna, Natsumaro; Hosokawa, Yoichiroh; Akita, Kae; Ebine, Kazuo; Ueda, Takashi; Kondo, Noriaki; Hasezawa, Seiichiro
2012-01-01
To comprehensively grasp cell biological events in plant stomatal movement, we have captured microscopic images of guard cells with various organelles markers. The 28,530 serial optical sections of 930 pairs of Arabidopsis guard cells have been released as a new image database, named Live Images of Plant Stomata (LIPS). We visualized the average organellar distributions in guard cells using probabilistic mapping and image clustering techniques. The results indicated that actin microfilaments and endoplasmic reticulum (ER) are mainly localized to the dorsal side and connection regions of guard cells. Subtractive images of open and closed stomata showed distribution changes in intracellular structures, including the ER, during stomatal movement. Time-lapse imaging showed that similar ER distribution changes occurred during stomatal opening induced by light irradiation or femtosecond laser shots on neighboring epidermal cells, indicating that our image analysis approach has identified a novel ER relocation in stomatal opening.
RF-Based Location Using Interpolation Functions to Reduce Fingerprint Mapping
Ezpeleta, Santiago; Claver, José M.; Pérez-Solano, Juan J.; Martí, José V.
2015-01-01
Indoor RF-based localization using fingerprint mapping requires an initial training step, which represents a time consuming process. This location methodology needs a database conformed with RSSI (Radio Signal Strength Indicator) measures from the communication transceivers taken at specific locations within the localization area. But, the real world localization environment is dynamic and it is necessary to rebuild the fingerprint database when some environmental changes are made. This paper explores the use of different interpolation functions to complete the fingerprint mapping needed to achieve the sought accuracy, thereby reducing the effort in the training step. Also, different distributions of test maps and reference points have been evaluated, showing the validity of this proposal and necessary trade-offs. Results reported show that the same or similar localization accuracy can be achieved even when only 50% of the initial fingerprint reference points are taken. PMID:26516862
Loss-tolerant measurement-device-independent quantum private queries
NASA Astrophysics Data System (ADS)
Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu
2017-01-01
Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user’s and database’s privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user’s imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors.
Selfie-Takers Prefer Left Cheeks: Converging Evidence from the (Extended) selfiecity Database
Manovich, Lev; Ferrari, Vera; Bruno, Nicola
2017-01-01
According to previous reports, selfie takers in widely different cultural contexts prefer poses showing the left cheek more than the right cheek. This posing bias may be interpreted as evidence for a right-hemispheric specialization for the expression of facial emotions. However, earlier studies analyzed selfie poses as categorized by human raters, which raises methodological issues in relation to the distinction between frontal and three-quarter poses. Here, we provide converging evidence by analyzing the (extended) selfiecity database which includes automatic assessments of head rotation and of emotional expression. We confirm a culture- and sex-independent left-cheek bias and report stronger expression of negative emotions in selfies showing the left cheek. These results are generally consistent with a psychobiological account of a left cheek bias in self-portraits but reveal possible unexpected facts concerning the relation between side bias and lateralization of emotional expression. PMID:28928683
Benchmarking Using Basic DBMS Operations
NASA Astrophysics Data System (ADS)
Crolotte, Alain; Ghazal, Ahmad
The TPC-H benchmark proved to be successful in the decision support area. Many commercial database vendors and their related hardware vendors used these benchmarks to show the superiority and competitive edge of their products. However, over time, the TPC-H became less representative of industry trends as vendors keep tuning their database to this benchmark-specific workload. In this paper, we present XMarq, a simple benchmark framework that can be used to compare various software/hardware combinations. Our benchmark model is currently composed of 25 queries that measure the performance of basic operations such as scans, aggregations, joins and index access. This benchmark model is based on the TPC-H data model due to its maturity and well-understood data generation capability. We also propose metrics to evaluate single-system performance and compare two systems. Finally we illustrate the effectiveness of this model by showing experimental results comparing two systems under different conditions.
Kamali, Parisa; Zettervall, Sara L; Wu, Winona; Ibrahim, Ahmed M S; Medin, Caroline; Rakhorst, Hinne A; Schermerhorn, Marc L; Lee, Bernard T; Lin, Samuel J
2017-04-01
Research derived from large-volume databases plays an increasing role in the development of clinical guidelines and health policy. In breast cancer research, the Surveillance, Epidemiology and End Results, National Surgical Quality Improvement Program, and Nationwide Inpatient Sample databases are widely used. This study aims to compare the trends in immediate breast reconstruction and identify the drawbacks and benefits of each database. Patients with invasive breast cancer and ductal carcinoma in situ were identified from each database (2005-2012). Trends of immediate breast reconstruction over time were evaluated. Patient demographics and comorbidities were compared. Subgroup analysis of immediate breast reconstruction use per race was conducted. Within the three databases, 1.2 million patients were studied. Immediate breast reconstruction in invasive breast cancer patients increased significantly over time in all databases. A similar significant upward trend was seen in ductal carcinoma in situ patients. Significant differences in immediate breast reconstruction rates were seen among races; and the disparity differed among the three databases. Rates of comorbidities were similar among the three databases. There has been a significant increase in immediate breast reconstruction; however, the extent of the reporting of overall immediate breast reconstruction rates and of racial disparities differs significantly among databases. The Nationwide Inpatient Sample and the National Surgical Quality Improvement Program report similar findings, with the Surveillance, Epidemiology and End Results database reporting results significantly lower in several categories. These findings suggest that use of the Surveillance, Epidemiology and End Results database may not be universally generalizable to the entire U.S.
A spatio-temporal landslide inventory for the NW of Spain: BAPA database
NASA Astrophysics Data System (ADS)
Valenzuela, Pablo; Domínguez-Cuesta, María José; Mora García, Manuel Antonio; Jiménez-Sánchez, Montserrat
2017-09-01
A landslide database has been created for the Principality of Asturias, NW Spain: the BAPA (Base de datos de Argayos del Principado de Asturias - Principality of Asturias Landslide Database). Data collection is mainly performed through searching local newspaper archives. Moreover, a BAPA App and a BAPA website (http://geol.uniovi.es/BAPA) have been developed to obtain additional information from citizens and institutions. Presently, the dataset covers the period 1980-2015, recording 2063 individual landslides. The use of free cartographic servers, such as Google Maps, Google Street View and Iberpix (Government of Spain), combined with the spatial descriptions and pictures contained in the press news, makes it possible to assess different levels of spatial accuracy. In the database, 59% of the records show an exact spatial location, and 51% of the records provided accurate dates, showing the usefulness of press archives as temporal records. Thus, 32% of the landslides show the highest spatial and temporal accuracy levels. The database also gathers information about the type and characteristics of the landslides, the triggering factors and the damage and costs caused. Field work was conducted to validate the methodology used in assessing the spatial location, temporal occurrence and characteristics of the landslides.
Building Inventory Database on the Urban Scale Using GIS for Earthquake Risk Assessment
NASA Astrophysics Data System (ADS)
Kaplan, O.; Avdan, U.; Guney, Y.; Helvaci, C.
2016-12-01
The majority of the existing buildings are not safe against earthquakes in most of the developing countries. Before a devastating earthquake, existing buildings need to be assessed and the vulnerable ones must be determined. Determining the seismic performance of existing buildings which is usually made with collecting the attributes of existing buildings, making the analysis and the necessary queries, and producing the result maps is very hard and complicated procedure that can be simplified with Geographic Information System (GIS). The aim of this study is to produce a building inventory database using GIS for assessing the earthquake risk of existing buildings. In this paper, a building inventory database for 310 buildings, located in Eskisehir, Turkey, was produced in order to assess the earthquake risk of the buildings. The results from this study show that 26% of the buildings have high earthquake risk, 33% of the buildings have medium earthquake risk and the 41% of the buildings have low earthquake risk. The produced building inventory database can be very useful especially for governments in dealing with the problem of determining seismically vulnerable buildings in the large existing building stocks. With the help of this kind of methods, determination of the buildings, which may collapse and cause life and property loss during a possible future earthquake, will be very quick, cheap and reliable.
Yang, Li-Hua; Du, Shi-Zheng; Sun, Jin-Fang; Mei, Si-Juan; Wang, Xiao-Qing; Zhang, Yuan-Yuan
2014-01-01
Abstract Objectives: To assess the clinical evidence of auriculotherapy for constipation treatment and to identify the efficacy of groups using Semen vaccariae or magnetic pellets as taped objects in managing constipation. Methods: Databases were searched, including five English-language databases (the Cochrane Library, PubMed, Embase, CINAHL, and AMED) and four Chinese medical databases. Only randomized controlled trials were included in the review process. Critical appraisal was conducted using the Cochrane risk of bias tool. Results: Seventeen randomized, controlled trials (RCTs) met the inclusion criteria, of which 2 had low risk of bias. The primary outcome measures were the improvement rate and total effective rate. A meta-analysis of 15 RCTs showed a moderate, significant effect of auriculotherapy in managing constipation compared with controls (relative risk [RR], 2.06; 95% confidence interval [CI], 1.52– 2.79; p<0.00001). The 15 RCTs also showed a moderate, significant effect of auriculotherapy in relieving constipation (RR, 1.28; 95% CI, 1.13–1.44; p<0.0001). For other symptoms associated with constipation, such as abdominal distension or anorexia, results of the meta-analyses showed no statistical significance. Subgroup analysis revealed that use of S. vaccariae and use of magnetic pellets were both statistically favored over the control in relieving constipation. Conclusions: Current evidence illustrated that auriculotherapy, a relatively safe strategy, is probably beneficial in managing constipation. However, most of the eligible RCTs had a high risk of bias, and all were conducted in China. No definitive conclusion can be made because of cultural and geographic differences. Further rigorous RCTs from around the world are warranted to confirm the effect and safety of auriculotherapy for constipation. PMID:25020089
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.
Wang, James Z; Zhang, Yuanyuan; Dong, Liang; Li, Lin; Srimani, Pradip K; Yu, Philip S
2014-01-01
Currently, most people use NCBI's PubMed to search the MEDLINE database, an important bibliographical information source for life science and biomedical information. However, PubMed has some drawbacks that make it difficult to find relevant publications pertaining to users' individual intentions, especially for non-expert users. To ameliorate the disadvantages of PubMed, we developed G-Bean, a graph based biomedical search engine, to search biomedical articles in MEDLINE database more efficiently. G-Bean addresses PubMed's limitations with three innovations: (1) Parallel document index creation: a multithreaded index creation strategy is employed to generate the document index for G-Bean in parallel; (2) Ontology-graph based query expansion: an ontology graph is constructed by merging four major UMLS (Version 2013AA) vocabularies, MeSH, SNOMEDCT, CSP and AOD, to cover all concepts in National Library of Medicine (NLM) database; a Personalized PageRank algorithm is used to compute concept relevance in this ontology graph and the Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme is used to re-rank the concepts. The top 500 ranked concepts are selected for expanding the initial query to retrieve more accurate and relevant information; (3) Retrieval and re-ranking of documents based on user's search intention: after the user selects any article from the existing search results, G-Bean analyzes user's selections to determine his/her true search intention and then uses more relevant and more specific terms to retrieve additional related articles. The new articles are presented to the user in the order of their relevance to the already selected articles. Performance evaluation with 106 OHSUMED benchmark queries shows that G-Bean returns more relevant results than PubMed does when using these queries to search the MEDLINE database. PubMed could not even return any search result for some OHSUMED queries because it failed to form the appropriate Boolean query statement automatically from the natural language query strings. G-Bean is available at http://bioinformatics.clemson.edu/G-Bean/index.php. G-Bean addresses PubMed's limitations with ontology-graph based query expansion, automatic document indexing, and user search intention discovery. It shows significant advantages in finding relevant articles from the MEDLINE database to meet the information need of the user.
FindIt@Flinders: User Experiences of the Primo Discovery Search Solution
ERIC Educational Resources Information Center
Jarrett, Kylie
2012-01-01
In September 2011, Flinders University Library launched FindIt@Flinders, the Primo discovery layer search to provide simultaneous results from the Library's collections and subscription databases. This research project was an exploratory case study which aimed to show whether students were finding relevant information for their course learning and…
Parental Numeric Language Input to Mandarin Chinese and English Speaking Preschool Children
ERIC Educational Resources Information Center
Chang, Alicia; Sandhofer, Catherine M.; Adelchanow, Lauren; Rottman, Benjamin
2011-01-01
The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that…
Can Money Buy Happiness? A Statistical Analysis of Predictors for User Satisfaction
ERIC Educational Resources Information Center
Hunter, Ben; Perret, Robert
2011-01-01
2007 data from LibQUAL+[TM] and the ACRL Library Trends and Statistics database were analyzed to determine if there is a statistically significant correlation between library expenditures and usage statistics and library patron satisfaction across 73 universities. The results show that users of larger, better funded libraries have higher…
A Survey of Computer Use by Undergraduate Psychology Departments in Virginia.
ERIC Educational Resources Information Center
Stoloff, Michael L.; Couch, James V.
1987-01-01
Reports a survey of computer use in psychology departments in Virginia's four year colleges. Results showed that faculty, students, and clerical staff used word processing, statistical analysis, and database management most frequently. The three most numerous computers brands were the Apple II family, IBM PCs, and the Apple Macintosh. (Author/JDH)
The Carbon Cycle and Hurricanes in the United States between 1900 and 2011
Dahal, Devendra; Liu, Shuguang; Oeding, Jennifer
2014-01-01
Hurricanes cause severe impacts on forest ecosystems in the United States. These events can substantially alter the carbon biogeochemical cycle at local to regional scales. We selected all tropical storms and more severe events that made U.S. landfall between 1900 and 2011 and used hurricane best track database, a meteorological model (HURRECON), National Land Cover Database (NLCD), U. S. Department of Agirculture Forest Service biomass dataset, and pre- and post-MODIS data to quantify individual event and annual biomass mortality. Our estimates show an average of 18.2 TgC/yr of live biomass mortality for 1900–2011 in the US with strong spatial and inter-annual variability. Results show Hurricane Camille in 1969 caused the highest aboveground biomass mortality with 59.5 TgC. Similarly 1954 had the highest annual mortality with 68.4 TgC attributed to landfalling hurricanes. The results presented are deemed useful to further investigate historical events, and the methods outlined are potentially beneficial to quantify biomass loss in future events. PMID:24903486
The carbon cycle and hurricanes in the United States between 1900 and 2011
Dahal, Devendra; Liu, Shu-Guang; Oeding, Jennifer
2014-01-01
Hurricanes cause severe impacts on forest ecosystems in the United States. These events can substantially alter the carbon biogeochemical cycle at local to regional scales. We selected all tropical storms and more severe events that made U.S. landfall between 1900 and 2011 and used hurricane best track database, a meteorological model (HURRECON), National Land Cover Database (NLCD), U. S. Department of Agirculture Forest Service biomass dataset, and pre- and post-MODIS data to quantify individual event and annual biomass mortality. Our estimates show an average of 18.2 TgC/yr of live biomass mortality for 1900–2011 in the US with strong spatial and inter-annual variability. Results show Hurricane Camille in 1969 caused the highest aboveground biomass mortality with 59.5 TgC. Similarly 1954 had the highest annual mortality with 68.4 TgC attributed to landfalling hurricanes. The results presented are deemed useful to further investigate historical events, and the methods outlined are potentially beneficial to quantify biomass loss in future events.
Ontological interpretation of biomedical database content.
Santana da Silva, Filipe; Jansen, Ludger; Freitas, Fred; Schulz, Stefan
2017-06-26
Biological databases store data about laboratory experiments, together with semantic annotations, in order to support data aggregation and retrieval. The exact meaning of such annotations in the context of a database record is often ambiguous. We address this problem by grounding implicit and explicit database content in a formal-ontological framework. By using a typical extract from the databases UniProt and Ensembl, annotated with content from GO, PR, ChEBI and NCBI Taxonomy, we created four ontological models (in OWL), which generate explicit, distinct interpretations under the BioTopLite2 (BTL2) upper-level ontology. The first three models interpret database entries as individuals (IND), defined classes (SUBC), and classes with dispositions (DISP), respectively; the fourth model (HYBR) is a combination of SUBC and DISP. For the evaluation of these four models, we consider (i) database content retrieval, using ontologies as query vocabulary; (ii) information completeness; and, (iii) DL complexity and decidability. The models were tested under these criteria against four competency questions (CQs). IND does not raise any ontological claim, besides asserting the existence of sample individuals and relations among them. Modelling patterns have to be created for each type of annotation referent. SUBC is interpreted regarding maximally fine-grained defined subclasses under the classes referred to by the data. DISP attempts to extract truly ontological statements from the database records, claiming the existence of dispositions. HYBR is a hybrid of SUBC and DISP and is more parsimonious regarding expressiveness and query answering complexity. For each of the four models, the four CQs were submitted as DL queries. This shows the ability to retrieve individuals with IND, and classes in SUBC and HYBR. DISP does not retrieve anything because the axioms with disposition are embedded in General Class Inclusion (GCI) statements. Ambiguity of biological database content is addressed by a method that identifies implicit knowledge behind semantic annotations in biological databases and grounds it in an expressive upper-level ontology. The result is a seamless representation of database structure, content and annotations as OWL models.
Su, Xiaoquan; Xu, Jian; Ning, Kang
2012-10-01
It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. ningkang@qibebt.ac.cn Supplementary data are available at Bioinformatics online.
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds.
Zhang, Yanmin; Jiao, Yu; Xiong, Xiao; Liu, Haichun; Ran, Ting; Xu, Jinxing; Lu, Shuai; Xu, Anyang; Pan, Jing; Qiao, Xin; Shi, Zhihao; Lu, Tao; Chen, Yadong
2015-11-01
The discovery of novel scaffolds against a specific target has long been one of the most significant but challengeable goals in discovering lead compounds. A scaffold that binds in important regions of the active pocket is more favorable as a starting point because scaffolds generally possess greater optimization possibilities. However, due to the lack of sufficient chemical space diversity of the databases and the ineffectiveness of the screening methods, it still remains a great challenge to discover novel active scaffolds. Since the strengths and weaknesses of both fragment-based drug design and traditional virtual screening (VS), we proposed a fragment VS concept based on Bayesian categorization for the discovery of novel scaffolds. This work investigated the proposal through an application on VEGFR-2 target. Firstly, scaffold and structural diversity of chemical space for 10 compound databases were explicitly evaluated. Simultaneously, a robust Bayesian classification model was constructed for screening not only compound databases but also their corresponding fragment databases. Although analysis of the scaffold diversity demonstrated a very unevenly distribution of scaffolds over molecules, results showed that our Bayesian model behaved better in screening fragments than molecules. Through a literature retrospective research, several generated fragments with relatively high Bayesian scores indeed exhibit VEGFR-2 biological activity, which strongly proved the effectiveness of fragment VS based on Bayesian categorization models. This investigation of Bayesian-based fragment VS can further emphasize the necessity for enrichment of compound databases employed in lead discovery by amplifying the diversity of databases with novel structures.
Progress towards a Spacecraft-Associated Microbial Meta-database (SAMM)
NASA Astrophysics Data System (ADS)
Mogul, Rakesh; Keagy, Laura; Nava, Argelia; Zerehi, Farah
The microbial inventories within the assembly facilities for spacecraft represent the primary pool of forward contaminants that may compromise life-detection missions. Accordingly, we are constructing a meta-database of these microorganisms for the purpose of building a bioinformatic resource for planetary protection and astrobiology-related endeavors. Using student-led efforts, the meta-database is being constructed from literature reports and is inclusive of both isolated microorganisms and those solely detected through DNA-based techniques. The Spacecraft-Associated Microbial Meta-database (SAMM) currently includes over 800 entries that are organized using 32 meta-tags involving taxonomy, location of isolation (facility and component), category of characterization (culture and/or genetic), types of characterizations (e.g., culture, 16s rDNA, phylochip, FAME, and DNA hybridization), growth conditions, Gram stain, and general physiological traits (e.g., sporulation, extremotolerance, and respiration properties). Interrogations on the database show that the cleanrooms at Kennedy Space Center (KSC) are ~ 2-fold greater in diversity in bacterial genera when compared to the Jet Propulsion Laboratory (JPL), and that bacteria related to water, plant, and human environments are more often associated with the KSC-specific genera. These results are parallel to those reported in the literature, and hence serve as benchmarks demonstrating the bioinformatic potential of this meta-database. The ultimate plans for SAMM include public availability, expansion through crowdsourcing efforts, and potential use as a companion resource to the culture collections assembled by DSMZ and JPL.
Alternative Databases for Anthropology Searching.
ERIC Educational Resources Information Center
Brody, Fern; Lambert, Maureen
1984-01-01
Examines online search results of sample questions in several databases covering linguistics, cultural anthropology, and physical anthropology in order to determine if and where any overlap in results might occur, and which files have greatest number of relevant hits. Search results by database are given for each subject area. (EJS)
Hartung, Daniel M; Zarin, Deborah A; Guise, Jeanne-Marie; McDonagh, Marian; Paynter, Robin; Helfand, Mark
2014-04-01
ClinicalTrials.gov requires reporting of result summaries for many drug and device trials. To evaluate the consistency of reporting of trials that are registered in the ClinicalTrials.gov results database and published in the literature. ClinicalTrials.gov results database and matched publications identified through ClinicalTrials.gov and a manual search of 2 electronic databases. 10% random sample of phase 3 or 4 trials with results in the ClinicalTrials.gov results database, completed before 1 January 2009, with 2 or more groups. One reviewer extracted data about trial design and results from the results database and matching publications. A subsample was independently verified. Of 110 trials with results, most were industry-sponsored, parallel-design drug studies. The most common inconsistency was the number of secondary outcome measures reported (80%). Sixteen trials (15%) reported the primary outcome description inconsistently, and 22 (20%) reported the primary outcome value inconsistently. Thirty-eight trials inconsistently reported the number of individuals with a serious adverse event (SAE); of these, 33 (87%) reported more SAEs in ClinicalTrials.gov. Among the 84 trials that reported SAEs in ClinicalTrials.gov, 11 publications did not mention SAEs, 5 reported them as zero or not occurring, and 21 reported a different number of SAEs. Among 29 trials that reported deaths in ClinicalTrials.gov, 28% differed from the matched publication. Small sample that included earliest results posted to the database. Reporting discrepancies between the ClinicalTrials.gov results database and matching publications are common. Which source contains the more accurate account of results is unclear, although ClinicalTrials.gov may provide a more comprehensive description of adverse events than the publication. Agency for Healthcare Research and Quality.
Methods for automatic detection of artifacts in microelectrode recordings.
Bakštein, Eduard; Sieger, Tomáš; Wild, Jiří; Novák, Daniel; Schneider, Jakub; Vostatek, Pavel; Urgošík, Dušan; Jech, Robert
2017-10-01
Extracellular microelectrode recording (MER) is a prominent technique for studies of extracellular single-unit neuronal activity. In order to achieve robust results in more complex analysis pipelines, it is necessary to have high quality input data with a low amount of artifacts. We show that noise (mainly electromagnetic interference and motion artifacts) may affect more than 25% of the recording length in a clinical MER database. We present several methods for automatic detection of noise in MER signals, based on (i) unsupervised detection of stationary segments, (ii) large peaks in the power spectral density, and (iii) a classifier based on multiple time- and frequency-domain features. We evaluate the proposed methods on a manually annotated database of 5735 ten-second MER signals from 58 Parkinson's disease patients. The existing methods for artifact detection in single-channel MER that have been rigorously tested, are based on unsupervised change-point detection. We show on an extensive real MER database that the presented techniques are better suited for the task of artifact identification and achieve much better results. The best-performing classifiers (bagging and decision tree) achieved artifact classification accuracy of up to 89% on an unseen test set and outperformed the unsupervised techniques by 5-10%. This was close to the level of agreement among raters using manual annotation (93.5%). We conclude that the proposed methods are suitable for automatic MER denoising and may help in the efficient elimination of undesirable signal artifacts. Copyright © 2017 Elsevier B.V. All rights reserved.
The PyCASSO database: spatially resolved stellar population properties for CALIFA galaxies
NASA Astrophysics Data System (ADS)
de Amorim, A. L.; García-Benito, R.; Cid Fernandes, R.; Cortijo-Ferrero, C.; González Delgado, R. M.; Lacerda, E. A. D.; López Fernández, R.; Pérez, E.; Vale Asari, N.
2017-11-01
The Calar Alto Legacy Integral Field Area (CALIFA) survey, a pioneer in integral field spectroscopy legacy projects, has fostered many studies exploring the information encoded on the spatially resolved data on gaseous and stellar features in the optical range of galaxies. We describe a value-added catalogue of stellar population properties for CALIFA galaxies analysed with the spectral synthesis code starlight and processed with the pycasso platform. Our public database (http://pycasso.ufsc.br/, mirror at http://pycasso.iaa.es/) comprises 445 galaxies from the CALIFA Data Release 3 with COMBO data. The catalogue provides maps for the stellar mass surface density, mean stellar ages and metallicities, stellar dust attenuation, star formation rates, and kinematics. Example applications both for individual galaxies and for statistical studies are presented to illustrate the power of this data set. We revisit and update a few of our own results on mass density radial profiles and on the local mass-metallicity relation. We also show how to employ the catalogue for new investigations, and show a pseudo Schmidt-Kennicutt relation entirely made with information extracted from the stellar continuum. Combinations to other databases are also illustrated. Among other results, we find a very good agreement between star formation rate surface densities derived from the stellar continuum and the H α emission. This public catalogue joins the scientific community's effort towards transparency and reproducibility, and will be useful for researchers focusing on (or complementing their studies with) stellar properties of CALIFA galaxies.
Databases in the Central Government : State-of-the-art and the Future
NASA Astrophysics Data System (ADS)
Ohashi, Tomohiro
Management and Coordination Agency, Prime Minister’s Office, conducted a survey by questionnaire against all Japanese Ministries and Agencies, in November 1985, on a subject of the present status of databases produced or planned to be produced by the central government. According to the results, the number of the produced databases has been 132 in 19 Ministries and Agencies. Many of such databases have been possessed by Defence Agency, Ministry of Construction, Ministry of Agriculture, Forestry & Fisheries, and Ministry of International Trade & Industries and have been in the fields of architecture & civil engineering, science & technology, R & D, agriculture, forestry and fishery. However the ratio of the databases available for other Ministries and Agencies has amounted to only 39 percent of all produced databases and the ratio of the databases unavailable for them has amounted to 60 percent of all of such databases, because of in-house databases and so forth. The outline of such results of the survey is reported and the databases produced by the central government are introduced under the items of (1) databases commonly used by all Ministries and Agencies, (2) integrated databases, (3) statistical databases and (4) bibliographic databases. The future problems are also described from the viewpoints of technology developments and mutual uses of databases.
Patterns of Undergraduates' Use of Scholarly Databases in a Large Research University
ERIC Educational Resources Information Center
Mbabu, Loyd Gitari; Bertram, Albert; Varnum, Ken
2013-01-01
Authentication data was utilized to explore undergraduate usage of subscription electronic databases. These usage patterns were linked to the information literacy curriculum of the library. The data showed that out of the 26,208 enrolled undergraduate students, 42% of them accessed a scholarly database at least once in the course of the entire…
Knowledge representation in metabolic pathway databases.
Stobbe, Miranda D; Jansen, Gerbert A; Moerland, Perry D; van Kampen, Antoine H C
2014-05-01
The accurate representation of all aspects of a metabolic network in a structured format, such that it can be used for a wide variety of computational analyses, is a challenge faced by a growing number of researchers. Analysis of five major metabolic pathway databases reveals that each database has made widely different choices to address this challenge, including how to deal with knowledge that is uncertain or missing. In concise overviews, we show how concepts such as compartments, enzymatic complexes and the direction of reactions are represented in each database. Importantly, also concepts which a database does not represent are described. Which aspects of the metabolic network need to be available in a structured format and to what detail differs per application. For example, for in silico phenotype prediction, a detailed representation of gene-protein-reaction relations and the compartmentalization of the network is essential. Our analysis also shows that current databases are still limited in capturing all details of the biology of the metabolic network, further illustrated with a detailed analysis of three metabolic processes. Finally, we conclude that the conceptual differences between the databases, which make knowledge exchange and integration a challenge, have not been resolved, so far, by the exchange formats in which knowledge representation is standardized.
FARME DB: a functional antibiotic resistance element database
Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.
2017-01-01
Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Impacts of European drought events: insights from an international database of text-based reports
NASA Astrophysics Data System (ADS)
Stahl, K.; Kohn, I.; Blauhut, V.; Urquijo, J.; De Stefano, L.; Acacio, V.; Dias, S.; Stagge, J. H.; Tallaksen, L. M.; Kampragou, E.; Van Loon, A. F.; Barker, L. J.; Melsen, L. A.; Bifulco, C.; Musolino, D.; de Carli, A.; Massarutto, A.; Assimacopoulos, D.; Van Lanen, H. A. J.
2015-09-01
Drought is a natural hazard that can cause a wide range of impacts affecting the environment, society, and the economy. Assessing and reducing vulnerability to these impacts for regions beyond the local scale, spanning political and sectoral boundaries, requires systematic and detailed data regarding impacts. This study presents an assessment of the diversity of drought impacts across Europe based on the European Drought Impact report Inventory (EDII), a unique research database that has collected close to 5000 impact reports from 33 European countries. The reported drought impacts were classified into major impact categories, each of which had a number of subtypes. The distribution of these categories and types was then analyzed over time, by country, across Europe and for particular drought events. The results show that impacts on agriculture and public water supply dominate the collection of drought impact reports for most countries and for all major drought events since the 1970s, while the number and relative fractions of reported impacts in other sectors can vary regionally and from event to event. The data also shows that reported impacts have increased over time as more media and website information has become available and environmental awareness has increased. Even though the distribution of impact categories is relatively consistent across Europe, the details of the reports show some differences. They confirm severe impacts in southern regions (particularly on agriculture and public water supply) and sector-specific impacts in central and northern regions (e.g. on forestry or energy production). As a text-based database, the EDII presents a new challenge for quantitative analysis; however, the EDII provides a new and more comprehensive view on drought impacts. Related studies have already developed statistical techniques to evaluate the link between drought indices and impacts using the EDII. The EDII is a living database and is a promising source for further research on drought impacts, vulnerabilities, and risks across Europe. A key result is the extensive variety of impacts found across Europe and its documentation. This data coverage may help drought policy planning at national to international levels.
NASA Astrophysics Data System (ADS)
Zhou, Xiangrong; Morita, Syoichi; Zhou, Xinxin; Chen, Huayue; Hara, Takeshi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Fujita, Hiroshi
2015-03-01
This paper describes an automatic approach for anatomy partitioning on three-dimensional (3D) computedtomography (CT) images that divide the human torso into several volume-of-interesting (VOI) images based on anatomical definition. The proposed approach combines several individual detections of organ-location with a groupwise organ-location calibration and correction to achieve an automatic and robust multiple-organ localization task. The essence of the proposed method is to jointly detect the 3D minimum bounding box for each type of organ shown on CT images based on intra-organ-image-textures and inter-organ-spatial-relationship in the anatomy. Machine-learning-based template matching and generalized Hough transform-based point-distribution estimation are used in the detection and calibration processes. We apply this approach to the automatic partitioning of a torso region on CT images, which are divided into 35 VOIs presenting major organ regions and tissues required by routine diagnosis in clinical medicine. A database containing 4,300 patient cases of high-resolution 3D torso CT images is used for training and performance evaluations. We confirmed that the proposed method was successful in target organ localization on more than 95% of CT cases. Only two organs (gallbladder and pancreas) showed a lower success rate: 71 and 78% respectively. In addition, we applied this approach to another database that included 287 patient cases of whole-body CT images scanned for positron emission tomography (PET) studies and used for additional performance evaluation. The experimental results showed that no significant difference between the anatomy partitioning results from those two databases except regarding the spleen. All experimental results showed that the proposed approach was efficient and useful in accomplishing localization tasks for major organs and tissues on CT images scanned using different protocols.
Post-Inpatient Brain Injury Rehabilitation Outcomes: Report from the National OutcomeInfo Database.
Malec, James F; Kean, Jacob
2016-07-15
This study examined outcomes for intensive residential and outpatient/community-based post-inpatient brain injury rehabilitation (PBIR) programs compared with supported living programs. The goal of supported living programs was stable functioning (no change). Data were obtained for a large cohort of adults with acquired brain injury (ABI) from the OutcomeInfo national database, a web-based database system developed through National Institutes of Health (NIH) Small Business Technology Transfer (STTR) funding for monitoring progress and outcomes in PBIR programs primarily with the Mayo-Portland Adaptability Inventory (MPAI-4). Rasch-derived MPAI-4 measures for cases from 2008 to 2014 from 9 provider organizations offering programs in 23 facilities throughout the United States were examined. Controlling for age at injury, time in program, and time since injury on admission (chronicity), both intensive residential (n = 205) and outpatient/community-based (n = 2781) programs resulted in significant (approximately 1 standard deviation [SD]) functional improvement on the MPAI-4 Total Score compared with supported living (n = 101) programs (F = 18.184, p < 0.001). Intensive outpatient/community-based programs showed greater improvements on MPAI-4 Ability (F = 14.135, p < 0.001), Adjustment (F = 12.939, p < 0.001), and Participation (F = 16.679, p < 0.001) indices than supported living programs; whereas, intensive residential programs showed improvement primarily in Adjustment and Participation. Age at injury and time in program had small effects on outcome; the effect of chronicity was small to moderate. Examination of more chronic cases (>1 year post-injury) showed significant, but smaller (approximately 0.5 SD) change on the MPAI-4 relative to supported living programs (F = 17.562, p < 0.001). Results indicate that intensive residential and outpatient/community-based PIBR programs result in substantial positive functional changes moderated by chronicity.
Post-Inpatient Brain Injury Rehabilitation Outcomes: Report from the National OutcomeInfo Database
Kean, Jacob
2016-01-01
Abstract This study examined outcomes for intensive residential and outpatient/community-based post-inpatient brain injury rehabilitation (PBIR) programs compared with supported living programs. The goal of supported living programs was stable functioning (no change). Data were obtained for a large cohort of adults with acquired brain injury (ABI) from the OutcomeInfo national database, a web-based database system developed through National Institutes of Health (NIH) Small Business Technology Transfer (STTR) funding for monitoring progress and outcomes in PBIR programs primarily with the Mayo-Portland Adaptability Inventory (MPAI-4). Rasch-derived MPAI-4 measures for cases from 2008 to 2014 from 9 provider organizations offering programs in 23 facilities throughout the United States were examined. Controlling for age at injury, time in program, and time since injury on admission (chronicity), both intensive residential (n = 205) and outpatient/community-based (n = 2781) programs resulted in significant (approximately 1 standard deviation [SD]) functional improvement on the MPAI-4 Total Score compared with supported living (n = 101) programs (F = 18.184, p < 0.001). Intensive outpatient/community-based programs showed greater improvements on MPAI-4 Ability (F = 14.135, p < 0.001), Adjustment (F = 12.939, p < 0.001), and Participation (F = 16.679, p < 0.001) indices than supported living programs; whereas, intensive residential programs showed improvement primarily in Adjustment and Participation. Age at injury and time in program had small effects on outcome; the effect of chronicity was small to moderate. Examination of more chronic cases (>1 year post-injury) showed significant, but smaller (approximately 0.5 SD) change on the MPAI-4 relative to supported living programs (F = 17.562, p < 0.001). Results indicate that intensive residential and outpatient/community-based PIBR programs result in substantial positive functional changes moderated by chronicity. PMID:26414433
Messay, Temesguen; Hardie, Russell C; Tuinstra, Timothy R
2015-05-01
We present new pulmonary nodule segmentation algorithms for computed tomography (CT). These include a fully-automated (FA) system, a semi-automated (SA) system, and a hybrid system. Like most traditional systems, the new FA system requires only a single user-supplied cue point. On the other hand, the SA system represents a new algorithm class requiring 8 user-supplied control points. This does increase the burden on the user, but we show that the resulting system is highly robust and can handle a variety of challenging cases. The proposed hybrid system starts with the FA system. If improved segmentation results are needed, the SA system is then deployed. The FA segmentation engine has 2 free parameters, and the SA system has 3. These parameters are adaptively determined for each nodule in a search process guided by a regression neural network (RNN). The RNN uses a number of features computed for each candidate segmentation. We train and test our systems using the new Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) data. To the best of our knowledge, this is one of the first nodule-specific performance benchmarks using the new LIDC-IDRI dataset. We also compare the performance of the proposed methods with several previously reported results on the same data used by those other methods. Our results suggest that the proposed FA system improves upon the state-of-the-art, and the SA system offers a considerable boost over the FA system. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Rafiei, Atefeh; Sleno, Lekha
2015-01-15
Data analysis is a key step in mass spectrometry based untargeted metabolomics, starting with the generation of generic peak lists from raw liquid chromatography/mass spectrometry (LC/MS) data. Due to the use of various algorithms by different workflows, the results of different peak-picking strategies often differ widely. Raw LC/HRMS data from two types of biological samples (bile and urine), as well as a standard mixture of 84 metabolites, were processed with four peak-picking softwares: Peakview®, Markerview™, MetabolitePilot™ and XCMS Online. The overlaps between the results of each peak-generating method were then investigated. To gauge the relevance of peak lists, a database search using the METLIN online database was performed to determine which features had accurate masses matching known metabolites as well as a secondary filtering based on MS/MS spectral matching. In this study, only a small proportion of all peaks (less than 10%) were common to all four software programs. Comparison of database searching results showed peaks found uniquely by one workflow have less chance of being found in the METLIN metabolomics database and are even less likely to be confirmed by MS/MS. It was shown that the performance of peak-generating workflows has a direct impact on untargeted metabolomics results. As it was demonstrated that the peaks found in more than one peak detection workflow have higher potential to be identified by accurate mass as well as MS/MS spectrum matching, it is suggested to use the overlap of different peak-picking workflows as preliminary peak lists for more rugged statistical analysis in global metabolomics investigations. Copyright © 2014 John Wiley & Sons, Ltd.
Modeling the Surface Energy Balance of the Core of an Old Mediterranean City: Marseille.
NASA Astrophysics Data System (ADS)
Lemonsu, A.; Grimmond, C. S. B.; Masson, V.
2004-02-01
The Town Energy Balance (TEB) model, which parameterizes the local-scale energy and water exchanges between urban surfaces and the atmosphere by treating the urban area as a series of urban canyons, coupled to the Interactions between Soil, Biosphere, and Atmosphere (ISBA) scheme, was run in offline mode for Marseille, France. TEB's performance is evaluated with observations of surface temperatures and surface energy balance fluxes collected during the field experiments to constrain models of atmospheric pollution and transport of emissions (ESCOMPTE) urban boundary layer (UBL) campaign. Particular attention was directed to the influence of different surface databases, used for input parameters, on model predictions. Comparison of simulated canyon temperatures with observations resulted in improvements to TEB parameterizations by increasing the ventilation. Evaluation of the model with wall, road, and roof surface temperatures gave good results. The model succeeds in simulating a sensible heat flux larger than heat storage, as observed. A sensitivity comparison using generic dense city parameters, derived from the Coordination of Information on the Environment (CORINE) land cover database, and those from a surface database developed specifically for the Marseille city center shows the importance of correctly documenting the urban surface. Overall, the TEB scheme is shown to be fairly robust, consistent with results from previous studies.
Hurst, Dominic
2012-06-01
The Medline, Cochrane CENTRAL, Biomed Central, Database of Open Access Journals (DOAJ), OpenJ-Gate, Bibliografia Brasileira de Odontologia (BBO), LILACS, IndMed, Sabinet, Scielo, Scirus (Medicine), OpenSIGLE and Google Scholar databases were searched. Hand searching was performed for journals not indexed in the databases. References of included trials were checked. Prospective clinical trials with test and control groups with a follow up of at least one year were included. Data abstraction was conducted independently and clinical and methodologically homogeneous data were pooled using a fixed-effects model. Eighteen trials were included. From these 32 individual dichotomous datasets were extracted and analysed. The majority of the results show no differences between both types of intervention. A high risk of selection-, performance-, detection- and attrition bias was identified. Existing research gaps are mainly due to lack of trials and small sample size. The current evidence indicates that the failure rate of high-viscosity GIC/ART restorations is not higher than, but similar to that of conventional amalgam fillings after periods longer than one year. These results are in line with the conclusions drawn during the original systematic review. There is a high risk that these results are affected by bias, and thus confirmation by further trials with suitably high numbers of participants is needed.
NASA Astrophysics Data System (ADS)
Setiawan Abdullah, Atje; Nurani Ruchjana, Budi; Rejito, Juli; Rosadi, Rudi; Candra Permana, Fahmi
2017-10-01
National Exam level of schooling is implemented by the Ministry of Education and Culture for the development of education in Indonesia. The national examinations are centrally evaluated by the National Education Standards Agency, and the expected implementation of the national exams can describe the successful implementation of education at the district, municipal, provincial, or national level. In this study, we evaluate, analyze, and explore the implementation of the national exam database of the results of the Junior High School in 2014, with the Junior High School (SMP/MTs) as the smallest unit of analysis at the district level. The method used in this study is a data mining approach using the methodology of Knowledge Discovery in Databases (KDD) using descriptive analysis and spatial mapping of national examinations. The results of the classification of the data mining process to national exams of Junior High School in 2014 using data 6,878 SMP/MTs in West Java showed that 81.01 % were at moderate levels. While the results of the spatial mapping for SMP/MTs in West Java can be explained 36,99 % at the unfavorable level. The evaluation results visualization in graphic is done using ArcGIS to provide position information quality of education in municipal, provincial or national level. The results of this study can be used by management to make decision to improve educational services based on the national exam database in West Java. Keywords: KDD, spatial mapping, national exam.
NASA Technical Reports Server (NTRS)
Bohnhoff-Hlavacek, Gail
1992-01-01
One of the objectives of the team supporting the LDEF Systems and Materials Special Investigative Groups is to develop databases of experimental findings. These databases identify the hardware flown, summarize results and conclusions, and provide a system for acknowledging investigators, tracing sources of data, and future design suggestions. To date, databases covering the optical experiments, and thermal control materials (chromic acid anodized aluminum, silverized Teflon blankets, and paints) have been developed at Boeing. We used the Filemaker Pro software, the database manager for the Macintosh computer produced by the Claris Corporation. It is a flat, text-retrievable database that provides access to the data via an intuitive user interface, without tedious programming. Though this software is available only for the Macintosh computer at this time, copies of the databases can be saved to a format that is readable on a personal computer as well. Further, the data can be exported to more powerful relational databases, capabilities, and use of the LDEF databases and describe how to get copies of the database for your own research.
NASA Astrophysics Data System (ADS)
Thanos, Konstantinos-Georgios; Thomopoulos, Stelios C. A.
2014-06-01
The study in this paper belongs to a more general research of discovering facial sub-clusters in different ethnicity face databases. These new sub-clusters along with other metadata (such as race, sex, etc.) lead to a vector for each face in the database where each vector component represents the likelihood of participation of a given face to each cluster. This vector is then used as a feature vector in a human identification and tracking system based on face and other biometrics. The first stage in this system involves a clustering method which evaluates and compares the clustering results of five different clustering algorithms (average, complete, single hierarchical algorithm, k-means and DIGNET), and selects the best strategy for each data collection. In this paper we present the comparative performance of clustering results of DIGNET and four clustering algorithms (average, complete, single hierarchical and k-means) on fabricated 2D and 3D samples, and on actual face images from various databases, using four different standard metrics. These metrics are the silhouette figure, the mean silhouette coefficient, the Hubert test Γ coefficient, and the classification accuracy for each clustering result. The results showed that, in general, DIGNET gives more trustworthy results than the other algorithms when the metrics values are above a specific acceptance threshold. However when the evaluation results metrics have values lower than the acceptance threshold but not too low (too low corresponds to ambiguous results or false results), then it is necessary for the clustering results to be verified by the other algorithms.
[Effect of 3D printing technology on pelvic fractures:a Meta-analysis].
Zhang, Yu-Dong; Wu, Ren-Yuan; Xie, Ding-Ding; Zhang, Lei; He, Yi; Zhang, Hong
2018-05-25
To evaluate the effect of 3D printing technology applied in the surgical treatment of pelvic fractures through the published literatures by Meta-analysis. The PubMed database, EMCC database, CBM database, CNKI database, VIP database and Wanfang database were searched from the date of database foundation to August 2017 to collect the controlled clinical trials in wich 3D printing technology was applied in preoperative planning of pelvic fracture surgery. The retrieved literatures were screened according to predefined inclusion and exclusion criteria, and quality evaluation were performed. Then, the available data were extracted and analyzed with the RevMan5.3 software. Totally 9 controlled clinical trials including 638 cases were chosen. Among them, 279 cases were assigned to the 3D printing technology group and 359 cases to the conventional group. The Meta-analysis results showed that the operative time[SMD=-2.81, 95%CI(-3.76, -1.85)], intraoperative blood loss[SMD=-3.28, 95%CI(-4.72, -1.85)] and the rate of complication [OR=0.47, 95%CI(0.25, 0.87)] in the 3D printing technology were all lower than those in the conventional group;the excellent and good rate of pelvic fracture reduction[OR=2.09, 95%CI(1.32, 3.30)] and postoperative pelvic functional restoration [OR=1.94, 95%CI(1.15, 3.28) in the 3D printing technology were all superior to those in the conventional group. 3D printing technology applied in the surgical treatment of pelvic fractures has the advantage of shorter operative time, less intraoperative blood loss and lower rate of complication, and can improve the quality of pelvic fracture reduction and the recovery of postoperative pelvic function. Copyright© 2018 by the China Journal of Orthopaedics and Traumatology Press.
How to locate and appraise qualitative research in complementary and alternative medicine
2013-01-01
Background The aim of this publication is to present a case study of how to locate and appraise qualitative studies for the conduct of a meta-ethnography in the field of complementary and alternative medicine (CAM). CAM is commonly associated with individualized medicine. However, one established scientific approach to the individual, qualitative research, thus far has been explicitly used very rarely. This article demonstrates a case example of how qualitative research in the field of CAM studies was identified and critically appraised. Methods Several search terms and techniques were tested for the identification and appraisal of qualitative CAM research in the conduct of a meta-ethnography. Sixty-seven electronic databases were searched for the identification of qualitative CAM trials, including CAM databases, nursing, nutrition, psychological, social, medical databases, the Cochrane Library and DIMDI. Results 9578 citations were screened, 223 articles met the pre-specified inclusion criteria, 63 full text publications were reviewed, 38 articles were appraised qualitatively and 30 articles were included. The search began with PubMed, yielding 87% of the included publications of all databases with few additional relevant findings in the specific databases. CINHAL and DIMDI also revealed a high number of precise hits. Although CAMbase and CAM-QUEST® focus on CAM research only, almost no hits of qualitative trials were found there. Searching with broad text terms was the most effective search strategy in all databases. Conclusions This publication presents a case study on how to locate and appraise qualitative studies in the field of CAM. The example shows that the literature search for qualitative studies in the field of CAM is most effective when the search is begun in PubMed followed by CINHAL or DIMDI using broad text terms. Exclusive CAM databases delivered no additional findings to locate qualitative CAM studies. PMID:23731997
NASA Astrophysics Data System (ADS)
Elnasir, Selma; Shamsuddin, Siti Mariyam; Farokhi, Sajad
2015-01-01
Palm vein recognition (PVR) is a promising new biometric that has been applied successfully as a method of access control by many organizations, which has even further potential in the field of forensics. The palm vein pattern has highly discriminative features that are difficult to forge because of its subcutaneous position in the palm. Despite considerable progress and a few practical issues, providing accurate palm vein readings has remained an unsolved issue in biometrics. We propose a robust and more accurate PVR method based on the combination of wavelet scattering (WS) with spectral regression kernel discriminant analysis (SRKDA). As the dimension of WS generated features is quite large, SRKDA is required to reduce the extracted features to enhance the discrimination. The results based on two public databases-PolyU Hyper Spectral Palmprint public database and PolyU Multi Spectral Palmprint-show the high performance of the proposed scheme in comparison with state-of-the-art methods. The proposed approach scored a 99.44% identification rate and a 99.90% verification rate [equal error rate (EER)=0.1%] for the hyperspectral database and a 99.97% identification rate and a 99.98% verification rate (EER=0.019%) for the multispectral database.
Hierarchical Data Distribution Scheme for Peer-to-Peer Networks
NASA Astrophysics Data System (ADS)
Bhushan, Shashi; Dave, M.; Patel, R. B.
2010-11-01
In the past few years, peer-to-peer (P2P) networks have become an extremely popular mechanism for large-scale content sharing. P2P systems have focused on specific application domains (e.g. music files, video files) or on providing file system like capabilities. P2P is a powerful paradigm, which provides a large-scale and cost-effective mechanism for data sharing. P2P system may be used for storing data globally. Can we implement a conventional database on P2P system? But successful implementation of conventional databases on the P2P systems is yet to be reported. In this paper we have presented the mathematical model for the replication of the partitions and presented a hierarchical based data distribution scheme for the P2P networks. We have also analyzed the resource utilization and throughput of the P2P system with respect to the availability, when a conventional database is implemented over the P2P system with variable query rate. Simulation results show that database partitions placed on the peers with higher availability factor perform better. Degradation index, throughput, resource utilization are the parameters evaluated with respect to the availability factor.
Peptide Identification by Database Search of Mixture Tandem Mass Spectra*
Wang, Jian; Bourne, Philip E.; Bandeira, Nuno
2011-01-01
In high-throughput proteomics the development of computational methods and novel experimental strategies often rely on each other. In certain areas, mass spectrometry methods for data acquisition are ahead of computational methods to interpret the resulting tandem mass spectra. Particularly, although there are numerous situations in which a mixture tandem mass spectrum can contain fragment ions from two or more peptides, nearly all database search tools still make the assumption that each tandem mass spectrum comes from one peptide. Common examples include mixture spectra from co-eluting peptides in complex samples, spectra generated from data-independent acquisition methods, and spectra from peptides with complex post-translational modifications. We propose a new database search tool (MixDB) that is able to identify mixture tandem mass spectra from more than one peptide. We show that peptides can be reliably identified with up to 95% accuracy from mixture spectra while considering only a 0.01% of all possible peptide pairs (four orders of magnitude speedup). Comparison with current database search methods indicates that our approach has better or comparable sensitivity and precision at identifying single-peptide spectra while simultaneously being able to identify 38% more peptides from mixture spectra at significantly higher precision. PMID:21862760
Image ratio features for facial expression recognition application.
Song, Mingli; Tao, Dacheng; Liu, Zicheng; Li, Xuelong; Zhou, Mengchu
2010-06-01
Video-based facial expression recognition is a challenging problem in computer vision and human-computer interaction. To target this problem, texture features have been extracted and widely used, because they can capture image intensity changes raised by skin deformation. However, existing texture features encounter problems with albedo and lighting variations. To solve both problems, we propose a new texture feature called image ratio features. Compared with previously proposed texture features, e.g., high gradient component features, image ratio features are more robust to albedo and lighting variations. In addition, to further improve facial expression recognition accuracy based on image ratio features, we combine image ratio features with facial animation parameters (FAPs), which describe the geometric motions of facial feature points. The performance evaluation is based on the Carnegie Mellon University Cohn-Kanade database, our own database, and the Japanese Female Facial Expression database. Experimental results show that the proposed image ratio feature is more robust to albedo and lighting variations, and the combination of image ratio features and FAPs outperforms each feature alone. In addition, we study asymmetric facial expressions based on our own facial expression database and demonstrate the superior performance of our combined expression recognition system.
NASA Astrophysics Data System (ADS)
Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng
2013-03-01
Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.
High-precision positioning system of four-quadrant detector based on the database query
NASA Astrophysics Data System (ADS)
Zhang, Xin; Deng, Xiao-guo; Su, Xiu-qin; Zheng, Xiao-qiang
2015-02-01
The fine pointing mechanism of the Acquisition, Pointing and Tracking (APT) system in free space laser communication usually use four-quadrant detector (QD) to point and track the laser beam accurately. The positioning precision of QD is one of the key factors of the pointing accuracy to APT system. A positioning system is designed based on FPGA and DSP in this paper, which can realize the sampling of AD, the positioning algorithm and the control of the fast swing mirror. We analyze the positioning error of facular center calculated by universal algorithm when the facular energy obeys Gauss distribution from the working principle of QD. A database is built by calculation and simulation with MatLab software, in which the facular center calculated by universal algorithm is corresponded with the facular center of Gaussian beam, and the database is stored in two pieces of E2PROM as the external memory of DSP. The facular center of Gaussian beam is inquiry in the database on the basis of the facular center calculated by universal algorithm in DSP. The experiment results show that the positioning accuracy of the high-precision positioning system is much better than the positioning accuracy calculated by universal algorithm.
NASA Astrophysics Data System (ADS)
Regazzoni, C.; Payraudeau, S.
2012-04-01
Runoff and associated erosion represent a primary mode of mobilization and transfer of pesticides from agricultural lands to watercourses and groundwater. The pesticides toxicity is potentially higher at the headwater catchment scale. These catchments are usually ungauged and characterized by temporary streams. Several mitigation strategies and management practices are currently used to mitigate the pesticides mixtures in agro-ecosystems. Among those practices, Stormwater Wetlands (SW) could be implemented to store surface runoff and to mitigate pesticides loads. The implementation of New Potential Stormwater Wetlands (NPSW) requires a diagnosis of intermittent runoff at the headwater catchment scale. The main difficulty to perform this diagnosis at the headwater catchment scale is to spatially characterize with enough accuracy the landscape components. Indeed, fields and field margins enhance or decrease the runoff and determine the pathways of hortonian overland flow. Land use, soil and Digital Elevation Model databases are systematically used. The question of the respective weight of each of these databases on the uncertainty of the diagnostic results is rarely analyzed at the headwater catchment scale. Therefore, this work focused (i) on the uncertainties of each of these databases and their propagation on the hortonian overland flow modelling, (ii) the methods to improve the accuracy of each database, (iii) the propagation of the databases uncertainties on intermittent runoff modelling and (iv) the impact of modelling cell size on the diagnosis. The model developed was a raster approach of the SCS-CN method integrating re-infiltration processes. The uncertainty propagation was analyzed on the Briançon vineyard catchment (Gard, France, 1400 ha). Based on this study site, the results showed that the geographic and thematic accuracies of regional soil database (1:250 000) were insufficient to correctly simulate the hortonian overland flow. These results have to be weighted according to the soil heterogeneity. Conversely, the regional land use (1:50 000) provided an acceptable diagnostic when combining with accurate soil database (1:15 000). Moreover, the regional land use quality can be improved by integrating road and river networks usually available at the national scale. Finally, a 5 m modelling cell size appeared as an optimum to correctly describe the landscape components and to assess the hortonian overland flow. A wrong assessment of the hortonian overland flow leads to a misinterpretation of the results and affects effective decision-making, e.g. the number and the location of the NSPW. This uncertainty analysis and the improvement methods developed on this study site can be adapted on other headwater catchments characterized by intermittent surface runoff.
Southan, Christopher; Várkonyi, Péter; Muresan, Sorel
2009-07-06
Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases chosen because of their bioactive content, availability of downloads and facility to select informative subsets. Where they could be calculated, extracted compounds-per-journal article were in the range of 12 to 19 but compound-per-protein counts increased with document numbers. Chemical structure filtration to facilitate standardised comparisons typically reduced source counts by between 5% and 30%. The pair-wise overlaps between 23 databases and subsets were determined, as well as changes between 2006 and 2008. While all compound sets have increased, PubChem has doubled to 14.2 million. The 2008 comparison matrix shows not only overlap but also unique content across all sources. Many of the detailed differences could be attributed to individual strategies for data selection and extraction. While there was a big increase in patent-derived structures entering PubChem since 2006, GVKBIO contains over 0.8 million unique structures from this source. Venn diagrams showed extensive overlap between compounds extracted by independent expert curation from journals by GVKBIO, WOMBAT (both commercial) and BindingDB (public) but each included unique content. In contrast, the approved drug collections from GVKBIO, MDDR (commercial) and DrugBank (public) showed surprisingly low overlap. Aggregating all commercial sources established that while 1 million compounds overlapped with PubChem 1.2 million did not. On the basis of chemical structure content per se public sources have covered an increasing proportion of commercial databases over the last two years. However, commercial products included in this study provide links between compounds and information from patents and journals at a larger scale than current public efforts. They also continue to capture a significant proportion of unique content. Our results thus demonstrate not only an encouraging overall expansion of data-supported bioactive chemical space but also that both commercial and public sources are complementary for its exploration.
Sana, Theodore R; Roark, Joseph C; Li, Xiangdong; Waddell, Keith; Fischer, Steven M
2008-09-01
In an effort to simplify and streamline compound identification from metabolomics data generated by liquid chromatography time-of-flight mass spectrometry, we have created software for constructing Personalized Metabolite Databases with content from over 15,000 compounds pulled from the public METLIN database (http://metlin.scripps.edu/). Moreover, we have added extra functionalities to the database that (a) permit the addition of user-defined retention times as an orthogonal searchable parameter to complement accurate mass data; and (b) allow interfacing to separate software, a Molecular Formula Generator (MFG), that facilitates reliable interpretation of any database matches from the accurate mass spectral data. To test the utility of this identification strategy, we added retention times to a subset of masses in this database, representing a mixture of 78 synthetic urine standards. The synthetic mixture was analyzed and screened against this METLIN urine database, resulting in 46 accurate mass and retention time matches. Human urine samples were subsequently analyzed under the same analytical conditions and screened against this database. A total of 1387 ions were detected in human urine; 16 of these ions matched both accurate mass and retention time parameters for the 78 urine standards in the database. Another 374 had only an accurate mass match to the database, with 163 of those masses also having the highest MFG score. Furthermore, MFG calculated a formula for a further 849 ions that had no match to the database. Taken together, these results suggest that the METLIN Personal Metabolite database and MFG software offer a robust strategy for confirming the formula of database matches. In the event of no database match, it also suggests possible formulas that may be helpful in interpreting the experimental results.
Image correlation method for DNA sequence alignment.
Curilem Saldías, Millaray; Villarroel Sassarini, Felipe; Muñoz Poblete, Carlos; Vargas Vásquez, Asticio; Maureira Butler, Iván
2012-01-01
The complexity of searches and the volume of genomic data make sequence alignment one of bioinformatics most active research areas. New alignment approaches have incorporated digital signal processing techniques. Among these, correlation methods are highly sensitive. This paper proposes a novel sequence alignment method based on 2-dimensional images, where each nucleic acid base is represented as a fixed gray intensity pixel. Query and known database sequences are coded to their pixel representation and sequence alignment is handled as object recognition in a scene problem. Query and database become object and scene, respectively. An image correlation process is carried out in order to search for the best match between them. Given that this procedure can be implemented in an optical correlator, the correlation could eventually be accomplished at light speed. This paper shows an initial research stage where results were "digitally" obtained by simulating an optical correlation of DNA sequences represented as images. A total of 303 queries (variable lengths from 50 to 4500 base pairs) and 100 scenes represented by 100 x 100 images each (in total, one million base pair database) were considered for the image correlation analysis. The results showed that correlations reached very high sensitivity (99.01%), specificity (98.99%) and outperformed BLAST when mutation numbers increased. However, digital correlation processes were hundred times slower than BLAST. We are currently starting an initiative to evaluate the correlation speed process of a real experimental optical correlator. By doing this, we expect to fully exploit optical correlation light properties. As the optical correlator works jointly with the computer, digital algorithms should also be optimized. The results presented in this paper are encouraging and support the study of image correlation methods on sequence alignment.
Health Outcomes of Sarcopenia: A Systematic Review and Meta-Analysis
Beaudart, Charlotte; Zaaria, Myriam; Pasleau, Françoise; Reginster, Jean-Yves; Bruyère, Olivier
2017-01-01
Objective The purpose of this study was to perform a systematic review to assess the short-, middle- and long-term consequences of sarcopenia. Methods Prospective studies assessing the consequences of sarcopenia were searched across different electronic databases (MEDLINE, EMBASE, EBM Reviews, Cochrane Database of Systematic Reviews, EBM Reviews ACP Journal Club, EBM Reviews DARE and AMED). Only studies that used the definition of the European Working Group on Sarcopenia in Older People to diagnose sarcopenia were included. Study selection and data extraction were performed by two independent reviewers. For outcomes reported by three or more studies, a meta-analysis was performed. The study results are expressed as odds ratios (OR) with 95% CI. Results Of the 772 references identified through the database search, 17 were included in this systematic review. The number of participants in the included studies ranged from 99 to 6658, and the duration of follow-up varied from 3 months to 9.8 years. Eleven out of 12 studies assessed the impact of sarcopenia on mortality. The results showed a higher rate of mortality among sarcopenic subjects (pooled OR of 3.596 (95% CI 2.96–4.37)). The effect was higher in people aged 79 years or older compared with younger subjects (p = 0.02). Sarcopenia is also associated with functional decline (pooled OR of 6 studies 3.03 (95% CI 1.80–5.12)), a higher rate of falls (2/2 studies found a significant association) and a higher incidence of hospitalizations (1/1 study). The impact of sarcopenia on the incidence of fractures and the length of hospital stay was less clear (only 1/2 studies showed an association for both outcomes). Conclusion Sarcopenia is associated with several harmful outcomes, making this geriatric syndrome a real public health burden. PMID:28095426
Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice
2015-01-01
The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
Understanding the Influence of Environment on Adults' Walking Experiences: A Meta-Synthesis Study.
Dadpour, Sara; Pakzad, Jahanshah; Khankeh, Hamidreza
2016-07-20
The environment has an important impact on physical activity, especially walking. The relationship between the environment and walking is not the same as for other types of physical activity. This study seeks to comprehensively identify the environmental factors influencing walking and to show how those environmental factors impact on walking using the experiences of adults between the ages of 18 and 65. The current study is a meta-synthesis based on a systematic review. Seven databases of related disciplines were searched, including health, transportation, physical activity, architecture, and interdisciplinary databases. In addition to the databases, two journals were searched. Of the 11,777 papers identified, 10 met the eligibility criteria and quality for selection. Qualitative content analysis was used for analysis of the results. The four themes identified as influencing walking were "safety and security", "environmental aesthetics", "social relations", and "convenience and efficiency". "Convenience and efficiency" and "environmental aesthetics" could enhance the impact of "social relations" on walking in some aspects. In addition, "environmental aesthetics" and "social relations" could hinder the influence of "convenience and efficiency" on walking in some aspects. Given the results of the study, strategies are proposed to enhance the walking experience.
NASA Astrophysics Data System (ADS)
Yu, Li-Juan; Wan, Wenchao; Karton, Amir
2016-11-01
We evaluate the performance of standard and modified MPn procedures for a wide set of thermochemical and kinetic properties, including atomization energies, structural isomerization energies, conformational energies, and reaction barrier heights. The reference data are obtained at the CCSD(T)/CBS level by means of the Wn thermochemical protocols. We find that none of the MPn-based procedures show acceptable performance for the challenging W4-11 and BH76 databases. For the other thermochemical/kinetic databases, the MP2.5 and MP3.5 procedures provide the most attractive accuracy-to-computational cost ratios. The MP2.5 procedure results in a weighted-total-root-mean-square deviation (WTRMSD) of 3.4 kJ/mol, whilst the computationally more expensive MP3.5 procedure results in a WTRMSD of 1.9 kJ/mol (the same WTRMSD obtained for the CCSD(T) method in conjunction with a triple-zeta basis set). We also assess the performance of the computationally economical CCSD(T)/CBS(MP2) method, which provides the best overall performance for all the considered databases, including W4-11 and BH76.
Dvornyk, Volodymyr; Long, Ji-Rong; Xiong, Dong-Hai; Liu, Peng-Yuan; Zhao, Lan-Juan; Shen, Hui; Zhang, Yuan-Yuan; Liu, Yong-Jun; Rocha-Sanchez, Sonia; Xiao, Peng; Recker, Robert R; Deng, Hong-Wen
2004-01-01
Background Public SNP databases are frequently used to choose SNPs for candidate genes in the association and linkage studies of complex disorders. However, their utility for such studies of diseases with ethnic-dependent background has never been evaluated. Results To estimate the accuracy and completeness of SNP public databases, we analyzed the allele frequencies of 41 SNPs in 10 candidate genes for obesity and/or osteoporosis in a large American-Caucasian sample (1,873 individuals from 405 nuclear families) by PCR-invader assay. We compared our results with those from the databases and other published studies. Of the 41 SNPs, 8 were monomorphic in our sample. Twelve were reported for the first time for Caucasians and the other 29 SNPs in our sample essentially confirmed the respective allele frequencies for Caucasians in the databases and previous studies. The comparison of our data with other ethnic groups showed significant differentiation between the three major world ethnic groups at some SNPs (Caucasians and Africans differed at 3 of the 18 shared SNPs, and Caucasians and Asians differed at 13 of the 22 shared SNPs). This genetic differentiation may have an important implication for studying the well-known ethnic differences in the prevalence of obesity and osteoporosis, and complex disorders in general. Conclusion A comparative analysis of the SNP data of the candidate genes obtained in the present study, as well as those retrieved from the public domain, suggests that the databases may currently have serious limitations for studying complex disorders with an ethnic-dependent background due to the incomplete and uneven representation of the candidate SNPs in the databases for the major ethnic groups. This conclusion attests to the imperative necessity of large-scale and accurate characterization of these SNPs in different ethnic groups. PMID:15113403
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Akbari, Hamed; Bilello, Michel; Da, Xiao; Davatzikos, Christos
2015-01-01
Evaluating various algorithms for the inter-subject registration of brain magnetic resonance images (MRI) is a necessary topic receiving growing attention. Existing studies evaluated image registration algorithms in specific tasks or using specific databases (e.g., only for skull-stripped images, only for single-site images, etc.). Consequently, the choice of registration algorithms seems task- and usage/parameter-dependent. Nevertheless, recent large-scale, often multi-institutional imaging-related studies create the need and raise the question whether some registration algorithms can 1) generally apply to various tasks/databases posing various challenges; 2) perform consistently well, and while doing so, 3) require minimal or ideally no parameter tuning. In seeking answers to this question, we evaluated 12 general-purpose registration algorithms, for their generality, accuracy and robustness. We fixed their parameters at values suggested by algorithm developers as reported in the literature. We tested them in 7 databases/tasks, which present one or more of 4 commonly-encountered challenges: 1) inter-subject anatomical variability in skull-stripped images; 2) intensity homogeneity, noise and large structural differences in raw images; 3) imaging protocol and field-of-view (FOV) differences in multi-site data; and 4) missing correspondences in pathology-bearing images. Totally 7,562 registrations were performed. Registration accuracies were measured by (multi-)expert-annotated landmarks or regions of interest (ROIs). To ensure reproducibility, we used public software tools, public databases (whenever possible), and we fully disclose the parameter settings. We show evaluation results, and discuss the performances in light of algorithms’ similarity metrics, transformation models and optimization strategies. We also discuss future directions for the algorithm development and evaluations. PMID:24951685
Fast fingerprint database maintenance for indoor positioning based on UGV SLAM.
Tang, Jian; Chen, Yuwei; Chen, Liang; Liu, Jingbin; Hyyppä, Juha; Kukko, Antero; Kaartinen, Harri; Hyyppä, Hannu; Chen, Ruizhi
2015-03-04
Indoor positioning technology has become more and more important in the last two decades. Utilizing Received Signal Strength Indicator (RSSI) fingerprints of Signals of OPportunity (SOP) is a promising alternative navigation solution. However, as the RSSIs vary during operation due to their physical nature and are easily affected by the environmental change, one challenge of the indoor fingerprinting method is maintaining the RSSI fingerprint database in a timely and effective manner. In this paper, a solution for rapidly updating the fingerprint database is presented, based on a self-developed Unmanned Ground Vehicles (UGV) platform NAVIS. Several SOP sensors were installed on NAVIS for collecting indoor fingerprint information, including a digital compass collecting magnetic field intensity, a light sensor collecting light intensity, and a smartphone which collects the access point number and RSSIs of the pre-installed WiFi network. The NAVIS platform generates a map of the indoor environment and collects the SOPs during processing of the mapping, and then the SOP fingerprint database is interpolated and updated in real time. Field tests were carried out to evaluate the effectiveness and efficiency of the proposed method. The results showed that the fingerprint databases can be quickly created and updated with a higher sampling frequency (5Hz) and denser reference points compared with traditional methods, and the indoor map can be generated without prior information. Moreover, environmental changes could also be detected quickly for fingerprint indoor positioning.
Song, Sun Ok; Jung, Chang Hee; Song, Young Duk; Park, Cheol-Young; Kwon, Hyuk-Sang; Cha, Bong Soo; Park, Joong-Yeol; Lee, Ki-Up
2014-01-01
Background The National Health Insurance Service (NHIS) recently signed an agreement to provide limited open access to the databases within the Korean Diabetes Association for the benefit of Korean subjects with diabetes. Here, we present the history, structure, contents, and way to use data procurement in the Korean National Health Insurance (NHI) system for the benefit of Korean researchers. Methods The NHIS in Korea is a single-payer program and is mandatory for all residents in Korea. The three main healthcare programs of the NHI, Medical Aid, and long-term care insurance (LTCI) provide 100% coverage for the Korean population. The NHIS in Korea has adopted a fee-for-service system to pay health providers. Researchers can obtain health information from the four databases of the insured that contain data on health insurance claims, health check-ups and LTCI. Results Metabolic disease as chronic disease is increasing with aging society. NHIS data is based on mandatory, serial population data, so, this might show the time course of disease and predict some disease progress, and also be used in primary and secondary prevention of disease after data mining. Conclusion The NHIS database represents the entire Korean population and can be used as a population-based database. The integrated information technology of the NHIS database makes it a world-leading population-based epidemiology and disease research platform. PMID:25349827
[Scientometrics and bibliometrics of biomedical engineering periodicals and papers].
Zhao, Ping; Xu, Ping; Li, Bingyan; Wang, Zhengrong
2003-09-01
This investigation was made to reveal the current status, research trend and research level of biomedical engineering in Chinese mainland by means of scientometrics and to assess the quality of the four domestic publications by bibliometrics. We identified all articles of four related publications by searching Chinese and foreign databases from 1997 to 2001. All articles collected or cited by these databases were searched and statistically analyzed for finding out the relevant distributions, including databases, years, authors, institutions, subject headings and subheadings. The source of sustentation funds and the related articles were analyzed too. The results showed that two journals were cited by two foreign databases and five Chinese databases simultaneously. The output of Journal of Biomedical Engineering was the highest. Its quantity of original papers cited by EI, CA and the totality of papers sponsored by funds were higher than those of the others, but the quantity and percentage per year of biomedical articles cited by EI were decreased in all. Inland core authors and institutions had come into being in the field of biomedical engineering. Their research topics were mainly concentrated on ten subject headings which included biocompatible materials, computer-assisted signal processing, electrocardiography, computer-assisted image processing, biomechanics, algorithms, electroencephalography, automatic data processing, mechanical stress, hemodynamics, mathematical computing, microcomputers, theoretical models, etc. The main subheadings were concentrated on instrumentation, physiopathology, diagnosis, therapy, ultrasonography, physiology, analysis, surgery, pathology, method, etc.
The Impact of Internet Trading on the UK Antiquarian and Second-Hand Bookselling Industry.
ERIC Educational Resources Information Center
Whewell, Jane A.; Souitaris, Vangelis
2001-01-01
Investigates the impact of the Internet on the UK (United Kingdom) second-hand and antiquarian book trade. Results from questionnaires and interviews showed that, overall, electronic commerce presents an opportunity rather than a threat to this traditional retailing sector, partly due to pre-existing database management and distribution skills.…
Special Consideration in Post-Secondary Institutions: Trends at a Canadian University
ERIC Educational Resources Information Center
Zimmermann, Joelle; Kamenetsky, Stuart B.; Pongracic, Syb
2015-01-01
This study examined trends in the practice of granting special consideration for missed tests and late papers in colleges and universities. We analyzed a database of 4,183 special consideration requests at a large Canadian university between 1998 and 2008. Results show a growing rate of requests per enrolment between 2001 and 2007. Although…
ERIC Educational Resources Information Center
Volkwein, J. Fredericks; And Others
A study of campus crime trends from 1974 to 1990 examines the relationships between campus crime and college characteristics. The research drew on merged national databases containing federal crime statistics, community demographic data, and campus characteristics. The results show that campus rates of both violent crime and property crime are…
Bar Association Database Continues To Grow. Technical Assistance Bulletin No. 10.
ERIC Educational Resources Information Center
Koprowski-Moisant, Jane
As part of the American Bar Association's Special Committee on Youth Education for Citizenship's efforts to assist in the establishment and maintenance of law related education (LRE) projects in every state and local bar association, surveys were mailed to the associations. The results of the survey showed that 49 state bar associations and 133…
Bowden, Peter; Beavis, Ron; Marshall, John
2009-11-02
A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.
Quantitative evaluation of Iranian radiology papers and its comparison with selected countries.
Ghafoori, Mahyar; Emami, Hasan; Sedaghat, Abdolrasoul; Ghiasi, Mohammad; Shakiba, Madjid; Alavi, Manijeh
2014-01-01
Recent technological developments in medicine, including modern radiology have promoted the impact of scientific researches on social life. The scientific outputs such as article and patents are products that show the scientists' attempt to access these achievements. In the current study, we evaluate the current situation of Iranian scientists in the field of radiology and compare it with the selected countries in terms of scientific papers. For this purpose, we used scientometric tools to quantitatively assess the scientific papers in the field of radiology. Radiology papers were evaluated in the context of medical field audit using retrospective model. We used the related databases of biomedical sciences for extraction of articles related to radiology. In the next step, the situation of radiology scientific products of the country were determined with respect to the under study regional countries. Results of the current study showed a ratio of 0.19% for Iranian papers in PubMed database published in 2009. In addition, in 2009, Iranian papers constituted 0.29% of the Scopus scientific database. The proportion of Iranian papers in the understudy region was 7.6%. To diminish the gap between Iranian scientific radiology papers and other competitor countries in the region and achievement of document 2025 goals, multifold effort of the society of radiology is necessary.
Impact of database quality in knowledge-based treatment planning for prostate cancer.
Wall, Phillip D H; Carver, Robert L; Fontenot, Jonas D
2018-03-13
This article investigates dose-volume prediction improvements in a common knowledge-based planning (KBP) method using a Pareto plan database compared with using a conventional, clinical plan database. Two plan databases were created using retrospective, anonymized data of 124 volumetric modulated arc therapy (VMAT) prostate cancer patients. The clinical plan database (CPD) contained planning data from each patient's clinically treated VMAT plan, which were manually optimized by various planners. The multicriteria optimization database (MCOD) contained Pareto-optimal plan data from VMAT plans created using a standardized multicriteria optimization protocol. Overlap volume histograms, incorporating fractional organ at risk volumes only within the treatment fields, were computed for each patient and used to match new patient anatomy to similar database patients. For each database patient, CPD and MCOD KBP predictions were generated for D 10 , D 30 , D 50 , D 65 , and D 80 of the bladder and rectum in a leave-one-out manner. Prediction achievability was evaluated through a replanning study on a subset of 31 randomly selected database patients using the best KBP predictions, regardless of plan database origin, as planning goals. MCOD predictions were significantly lower than CPD predictions for all 5 bladder dose-volumes and rectum D 50 (P = .004) and D 65 (P < .001), whereas CPD predictions for rectum D 10 (P = .005) and D 30 (P < .001) were significantly less than MCOD predictions. KBP predictions were statistically achievable in the replans for all predicted dose-volumes, excluding D 10 of bladder (P = .03) and rectum (P = .04). Compared with clinical plans, replans showed significant average reductions in D mean for bladder (7.8 Gy; P < .001) and rectum (9.4 Gy; P < .001), while maintaining statistically similar planning target volume, femoral head, and penile bulb dose. KBP dose-volume predictions derived from Pareto plans were more optimal overall than those resulting from manually optimized clinical plans, which significantly improved KBP-assisted plan quality. This work investigates how the plan quality of knowledge databases affects the performance and achievability of dose-volume predictions from a common knowledge-based planning approach for prostate cancer. Bladder and rectum dose-volume predictions derived from a database of standardized Pareto-optimal plans were compared with those derived from clinical plans manually designed by various planners. Dose-volume predictions from the Pareto plan database were significantly lower overall than those from the clinical plan database, without compromising achievability. Copyright © 2018 Elsevier Inc. All rights reserved.
A Chinese Character Teaching System Using Structure Theory and Morphing Technology
Sun, Linjia; Liu, Min; Hu, Jiajia; Liang, Xiaohui
2014-01-01
This paper proposes a Chinese character teaching system by using the Chinese character structure theory and the 2D contour morphing technology. This system, including the offline phase and the online phase, automatically generates animation for the same Chinese character from different writing stages to intuitively show the evolution of shape and topology in the process of Chinese characters teaching. The offline phase builds the component models database for the same script and the components correspondence database for different scripts. Given two or several different scripts of the same Chinese character, the online phase firstly divides the Chinese characters into components by using the process of Chinese character parsing, and then generates the evolution animation by using the process of Chinese character morphing. Finally, two writing stages of Chinese characters, i.e., seal script and clerical script, are used in experiment to show the ability of the system. The result of the user experience study shows that the system can successfully guide students to improve the learning of Chinese characters. And the users agree that the system is interesting and can motivate them to learn. PMID:24978171
Liu, Jie; Zhang, Fu-Dong; Teng, Fei; Li, Jun; Wang, Zhi-Hong
2014-10-01
In order to in-situ detect the oil yield of oil shale, based on portable near infrared spectroscopy analytical technology, with 66 rock core samples from No. 2 well drilling of Fuyu oil shale base in Jilin, the modeling and analyzing methods for in-situ detection were researched. By the developed portable spectrometer, 3 data formats (reflectance, absorbance and K-M function) spectra were acquired. With 4 different modeling data optimization methods: principal component-mahalanobis distance (PCA-MD) for eliminating abnormal samples, uninformative variables elimination (UVE) for wavelength selection and their combina- tions: PCA-MD + UVE and UVE + PCA-MD, 2 modeling methods: partial least square (PLS) and back propagation artificial neural network (BPANN), and the same data pre-processing, the modeling and analyzing experiment were performed to determine the optimum analysis model and method. The results show that the data format, modeling data optimization method and modeling method all affect the analysis precision of model. Results show that whether or not using the optimization method, reflectance or K-M function is the proper spectrum format of the modeling database for two modeling methods. Using two different modeling methods and four different data optimization methods, the model precisions of the same modeling database are different. For PLS modeling method, the PCA-MD and UVE + PCA-MD data optimization methods can improve the modeling precision of database using K-M function spectrum data format. For BPANN modeling method, UVE, UVE + PCA-MD and PCA- MD + UVE data optimization methods can improve the modeling precision of database using any of the 3 spectrum data formats. In addition to using the reflectance spectra and PCA-MD data optimization method, modeling precision by BPANN method is better than that by PLS method. And modeling with reflectance spectra, UVE optimization method and BPANN modeling method, the model gets the highest analysis precision, its correlation coefficient (Rp) is 0.92, and its standard error of prediction (SEP) is 0.69%.
Paulekuhn, G Steffen; Dressman, Jennifer B; Saal, Christoph
2007-12-27
The Orange Book database published by the U.S. Drug and Food Administration (FDA) was analyzed for the frequency of occurrence of different counterions used for the formation of pharmaceutical salts. The data obtained from the present analysis of the Orange Book are compared to reviews of the Cambridge Structural Database (CSD) and of the Martindale "The Extra Pharmacopoeia". As well as showing overall distributions of counterion usage, results are broken down into 5-year increments to identify trends in counterion selection. Chloride ions continue to be the most frequently utilized anionic counterions for the formation of salts as active pharmaceutical ingredients (APIs), while sodium ions are most widely utilized for the formation of salts starting from acidic molecules. A strong trend toward a wider variety of counterions over the past decade is observed. This trend can be explained by a stronger need to improve physical chemical properties of research and development compounds.
Aerodynamic Database Development for Mars Smart Lander Vehicle Configurations
NASA Technical Reports Server (NTRS)
Bobskill, Glenn J.; Parikh, Paresh C.; Prabhu, Ramadas K.; Tyler, Erik D.
2002-01-01
An aerodynamic database has been generated for the Mars Smart Lander Shelf-All configuration using computational fluid dynamics (CFD) simulations. Three different CFD codes, USM3D and FELISA, based on unstructured grid technology and LAURA, an established and validated structured CFD code, were used. As part of this database development, the results for the Mars continuum were validated with experimental data and comparisons made where applicable. The validation of USM3D and LAURA with the Unitary experimental data, the use of intermediate LAURA check analyses, as well as the validation of FELISA with the Mach 6 CF(sub 4) experimental data provided a higher confidence in the ability for CFD to provide aerodynamic data in order to determine the static trim characteristics for longitudinal stability. The analyses of the noncontinuum regime showed the existence of multiple trim angles of attack that can be unstable or stable trim points. This information is needed to design guidance controller throughout the trajectory.
Zhang BSc, Jiaqi; Zhang, Hongyue; Kan, Laidi; Zhang, Chi; Wang, Pu
2016-09-01
[Purpose] To review and assess the effectiveness of whole body vibration therapy on the physical function of patients with type II diabetes mellitus. [Subjects and Methods] A computerized database search was performed through PubMed, Medline, EMBASE, the Cochrane Central Register of Controlled Trials, the Physiotherapy Evidence Database, and the reference lists of all relevant articles. The methodological quality was evaluated using the Physiotherapy Evidence Database scale. [Results] Five articles (four studies) with a combined study population of 154 patients with type II diabetes qualified for the inclusion criteria. Our review shows that whole body vibration therapy may have a positive impact on the muscle strength and balance of people with type 2 diabetes mellitus, whereas the effect on their mobility is still under discussion. [Conclusion] There was no sufficient evidence to support the premise that whole body vibration therapy is beneficial for the physical function of people with type II diabetes. Larger and higher-quality trials are needed.
[Establishment of Oncomelania hupensis snail database based on smartphone and Google Earth].
Wang, Wei-chun; Zhan, Ti; Zhu, Ying-fu
2015-02-01
To establish an Oncomelania hupensis snail database based on smartphone and Google Earth. The HEAD GPS software was loaded in the smartphone first. The GPS data of the snails were collected by the smartphone. The original data were exported to the computer with the format of KMIUKMZ. Then the data were converted into Excel file format by using some software. Finally, the results based on laboratory were filled, and the digital snail data were established. The data were converted into KML, and then were showed by Google Earth visually. The snail data of a 5 hm2-beach along the Yangtze River were collected and the distribution of the snails based on Google Earth was obtained. The database of the snails was built. The query function was implemented about the number of the total snails, the living snails and the schistosome infected snails of each survey frame. The digital management of the snail data is realized by using the smartphone and Google Earth.
[Developmental status and prospect of musical electroacupuncture].
Wang, Fan; Xu, Chun-Lan; Dong, Gui-Rong; Dong, Hong-Sheng
2014-12-01
Through searching domestic and foreign medical journals in CNKI, Wanfang database, VIP database and Pubmed database from January of 2003 to November of 2013, 39 articles regarding musical electroacupuncture (MEA) were analyzed. The result showed that MEA was clinically used to treat neurological and psychotic disorders; because it was combined with musical therapy and overcame the acupuncture tolerability, and MEA was superior to traditional electroacupuncture. However, problems such as low research efficiency and the mechanism of MEA superiority and the musical specificity not being revealed by research design still exist. In future, large-sample multi-center RCT researches should be performed to clarify MEA clinical efficacy. With modern science and technology and optimized study design, guided by five-element theory of TCM, researches on different musical elements and characteristics of musical pulse current as well as MEA's correlation with meridians and organs should be studied, so as to make a further exploration on MEA mechanisms and broaden the range of its clinical application.
Haytowitz, David B; Pehrsson, Pamela R
2018-01-01
For nearly 20years, the National Food and Nutrient Analysis Program (NFNAP) has expanded and improved the quantity and quality of data in US Department of Agriculture's (USDA) food composition databases (FCDB) through the collection and analysis of nationally representative food samples. NFNAP employs statistically valid sampling plans, the Key Foods approach to identify and prioritize foods and nutrients, comprehensive quality control protocols, and analytical oversight to generate new and updated analytical data for food components. NFNAP has allowed the Nutrient Data Laboratory to keep up with the dynamic US food supply and emerging scientific research. Recently generated results for nationally representative food samples show marked changes compared to previous database values for selected nutrients. Monitoring changes in the composition of foods is critical in keeping FCDB up-to-date, so that they remain a vital tool in assessing the nutrient intake of national populations, as well as for providing dietary advice. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
WANG, Qingrong; ZHU, Changfeng
2017-06-01
Integration of distributed heterogeneous data sources is the key issues under the big data applications. In this paper the strategy of variable precision is introduced to the concept lattice, and the one-to-one mapping mode of variable precision concept lattice and ontology concept lattice is constructed to produce the local ontology by constructing the variable precision concept lattice for each subsystem, and the distributed generation algorithm of variable precision concept lattice based on ontology heterogeneous database is proposed to draw support from the special relationship between concept lattice and ontology construction. Finally, based on the standard of main concept lattice of the existing heterogeneous database generated, a case study has been carried out in order to testify the feasibility and validity of this algorithm, and the differences between the main concept lattice and the standard concept lattice are compared. Analysis results show that this algorithm above-mentioned can automatically process the construction process of distributed concept lattice under the heterogeneous data sources.
Virtanen, Mikko J; Sane, Jussi; Mustonen, Pekka; Kaila, Minna; Helve, Otto
2017-01-01
Background People using the Internet to find information on health issues, such as specific diseases, usually start their search from a general search engine, for example, Google. Internet searches such as these may yield results and data of questionable quality and reliability. Health Library is a free-of-charge medical portal on the Internet providing medical information for the general public. Physician’s Databases, an Internet evidence-based medicine source, provides medical information for health care professionals (HCPs) to support their clinical practice. Both databases are available throughout Finland, but the latter is used only by health professionals and pharmacies. Little is known about how the general public seeks medical information from medical sources on the Internet, how this behavior differs from HCPs’ queries, and what causes possible differences in behavior. Objective The aim of our study was to evaluate how the general public’s and HCPs’ information-seeking trends from Internet medical databases differ seasonally and temporally. In addition, we aimed to evaluate whether the general public’s information-seeking trends could be utilized for disease surveillance and whether media coverage could affect these seeking trends. Methods Lyme disease, serving as a well-defined disease model with distinct seasonal variation, was chosen as a case study. Two Internet medical databases, Health Library and Physician’s Databases, were used. We compared the general public’s article openings on Lyme disease from Health Library to HCPs’ article openings on Lyme disease from Physician’s Databases seasonally across Finland from 2011 to 2015. Additionally, media publications related to Lyme disease were searched from the largest and most popular media websites in Finland. Results Both databases, Health Library and Physician’s Databases, show visually similar patterns in temporal variations of article openings on Lyme disease in Finland from 2011 to 2015. However, Health Library openings show not only an increasing trend over time but also greater fluctuations, especially during peak opening seasons. Outside these seasons, publications in the media coincide with Health Library article openings only occasionally. Conclusions Lyme disease–related information-seeking behaviors between the general public and HCPs from Internet medical portals share similar temporal variations, which is consistent with the trend seen in epidemiological data. Therefore, the general public’s article openings could be used as a supplementary source of information for disease surveillance. The fluctuations in article openings appeared stronger among the general public, thus, suggesting that different factors such as media coverage, affect the information-seeking behaviors of the public versus professionals. However, media coverage may also have an influence on HCPs. Not every publication was associated with an increase in openings, but the higher the media coverage by some publications, the higher the general public’s access to Health Library. PMID:29109071
Use of a German longitudinal prescription database (LRx) in pharmacoepidemiology.
Richter, Hartmut; Dombrowski, Silvia; Hamer, Hajo; Hadji, Peyman; Kostev, Karel
2015-01-01
Large epidemiological databases are often used to examine matters pertaining to drug utilization, health services, and drug safety. The major strength of such databases is that they include large sample sizes, which allow precise estimates to be made. The IMS® LRx database has in recent years been used as a data source for epidemiological research. The aim of this paper is to review a number of recent studies published with the aid of this database and compare these with the results of similar studies using independent data published in the literature. In spite of being somewhat limited to studies for which comparative independent results were available, it was possible to include a wide range of possible uses of the LRx database in a variety of therapeutic fields: prevalence/incidence rate determination (diabetes, epilepsy), persistence analyses (diabetes, osteoporosis), use of comedication (diabetes), drug utilization (G-CSF market) and treatment costs (diabetes, G-CSF market). In general, the results of the LRx studies were found to be clearly in line with previously published reports. In some cases, noticeable discrepancies between the LRx results and the literature data were found (e.g. prevalence in epilepsy, persistence in osteoporosis) and these were discussed and possible reasons presented. Overall, it was concluded that the IMS® LRx database forms a suitable database for pharmacoepidemiological studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Duren, Jeroen K; Koch, Carl; Luo, Alan
The primary limitation of today’s lightweight structural alloys is that specific yield strengths (SYS) higher than 200MPa x cc/g (typical value for titanium alloys) are extremely difficult to achieve. This holds true especially at a cost lower than 5dollars/kg (typical value for magnesium alloys). Recently, high-entropy alloys (HEA) have shown promising SYS, yet the large composition space of HEA makes screening compositions complex and time-consuming. Over the course of this 2-year project we started from 150 billion compositions and reduced the number of potential low-density (<5g/cc), low-cost (<5dollars/kg) high-entropy alloy (LDHEA) candidates that are single-phase, disordered, solid-solution (SPSS) to amore » few thousand compositions. This was accomplished by means of machine learning to guide design for SPSS LDHEA based on a combination of recursive partitioning, an extensive, experimental HEA database compiled from 24 literature sources, and 91 calculated parameters serving as phenomenological selection rules. Machine learning shows an accuracy of 82% in identifying which compositions of a separate, smaller, experimental HEA database are SPSS HEA. Calculation of Phase Diagrams (CALPHAD) shows an accuracy of 71-77% for the alloys supported by the CALPHAD database, where 30% of the compiled HEA database is not supported by CALPHAD. In addition to machine learning, and CALPHAD, a third tool was developed to aid design of SPSS LDHEA. Phase diagrams were calculated by constructing the Gibbs-free energy convex hull based on easily accessible enthalpy and entropy terms. Surprisingly, accuracy was 78%. Pursuing these LDHEA candidates by high-throughput experimental methods resulted in SPSS LDHEA composed of transition metals (e.g. Cr, Mn, Fe, Ni, Cu) alloyed with Al, yet the high concentration of Al, necessary to bring the mass density below 5.0g/cc, makes these materials hard and brittle, body-centered-cubic (BCC) alloys. A related, yet multi-phase BCC alloy, based on Al-Cr-Fe-Ni, shows compressive strain >10% and specific compressive yield strength of 229 MPa x cc/g, yet does not show ductility in tensile tests due to cleavage. When replacing Cr in Al-Cr-Fe-based 4- and 5-element LDHEA with Mn, hardness drops 2x. Combined with compression test results, including those on the ternaries Al-Cr-Fe and Al-Mn-Fe suggest that Al-Mn-Fe-based LDHEA are still worth pursuing. These initial results only represent one compressive stress-strain curve per composition without any property optimization. As such, reproducibility needs to be followed by optimization to show their full potential. When including Li, Mg, and Zn, single-phase Li-Mg-Al-Ti-Zn LDHEA has been found with a specific ultimate compressive strength of 289MPa x cc/g. Al-Ti-Mn-Zn showed a specific ultimate compressive strength of 73MPa x cc/g. These initial results after hot isostatic pressing (HIP) of the ball-milled powders represent the lower end of what is possible, since no secondary processing (e.g. extrusion) has been performed to optimize strength and ductility. Compositions for multi-phase (e.g. dual-phase) LDHEA were identified largely by automated searches through CALPHAD databases, while screening for large face-centered-cubic (FCC) volume fractions, followed by experimental verification. This resulted in several new alloys. Li-Mg-Al-Mn-Fe and Mg-Mn-Fe-Co ball-milled powders upon HIP show specific ultimate compressive strengths of 198MPa x cc/g and 45MPa x cc/g, respectively. Several malleable quarternary Al-Zn-based alloys have been found upon arc/induction melting, yet with limited specific compressive yield strength (<75 MPa x cc/g). These initial results are all without any optimization for strength and/or ductility. High-throughput experimentation allowed us to triple the existing experimental HEA database as published in the past 10 years in less than 2 years which happened at a rate 10x higher than previous methods. Furthermore, we showed that high-throughput thin-film combinatorial methods can be used to get insight in isothermal phase diagram slices. Although it is straightforward to map hardness as a function of composition for sputtered, thin-film, compositional gradients by nano-indentation and compare the results to micro-indentation on bulk samples, the simultaneous impact of composition, roughness, film density, and microstructure on hardness requires monitoring all these properties as a function of location on the compositional gradient, including dissecting the impact of these 4 factors on the hardness map. These additional efforts impact throughput significantly. This work shows that a lot of progress has been made over the years in predicting phase formation that aids the discovery of new alloys, yet that a lot of work needs to be done to predict phases more accurately for LDHEA, whether done by CALPHAD or by other means. More importantly, more work needs to be done to predict mechanical properties of novel alloys, like yield strength, and ductility. Furthermore, this work shows that there is a need for the generation of an empirical alloy database covering strategic points in a multi-dimensional composition space to allow for faster and more accurate predictive interpolations to identify the oasis in the dessert more quickly. Finally, this work suggests that it is worth pursuing a ductile alloy with a SYS > 300 MPa x cc/g in a mass density range of 6-7 g/cc, since the chances for a single-phase or majority-phase FCC increase significantly. Today’s lightweight steels are in this density range.« less
Content based information retrieval in forensic image databases.
Geradts, Zeno; Bijhold, Jurrien
2002-03-01
This paper gives an overview of the various available image databases and ways of searching these databases on image contents. The developments in research groups of searching in image databases is evaluated and compared with the forensic databases that exist. Forensic image databases of fingerprints, faces, shoeprints, handwriting, cartridge cases, drugs tablets, and tool marks are described. The developments in these fields appear to be valuable for forensic databases, especially that of the framework in MPEG-7, where the searching in image databases is standardized. In the future, the combination of the databases (also DNA-databases) and possibilities to combine these can result in stronger forensic evidence.
Ishihara, Masaru; Onoguchi, Masahisa; Taniguchi, Yasuyo; Shibutani, Takayuki
2017-12-01
The aim of this study was to clarify the differences in thallium-201-chloride (thallium-201) myocardial perfusion imaging (MPI) scans evaluated by conventional anger-type single-photon emission computed tomography (conventional SPECT) versus cadmium-zinc-telluride SPECT (CZT SPECT) imaging in normal databases for different ethnic groups. MPI scans from 81 consecutive Japanese patients were examined using conventional SPECT and CZT SPECT and analyzed with the pre-installed quantitative perfusion SPECT (QPS) software. We compared the summed stress score (SSS), summed rest score (SRS), and summed difference score (SDS) for the two SPECT devices. For a normal MPI reference, we usually use Japanese databases for MPI created by the Japanese Society of Nuclear Medicine, which can be used with conventional SPECT but not with CZT SPECT. In this study, we used new Japanese normal databases constructed in our institution to compare conventional and CZT SPECT. Compared with conventional SPECT, CZT SPECT showed lower SSS (p < 0.001), SRS (p = 0.001), and SDS (p = 0.189) using the pre-installed SPECT database. In contrast, CZT SPECT showed no significant difference from conventional SPECT in QPS analysis using the normal databases from our institution. Myocardial perfusion analyses by CZT SPECT should be evaluated using normal databases based on the ethnic group being evaluated.
NASA Astrophysics Data System (ADS)
Armante, Raymond; Scott, Noelle; Crevoisier, Cyril; Capelle, Virginie; Crepeau, Laurent; Jacquinet, Nicole; Chédin, Alain
2016-09-01
The quality of spectroscopic parameters that serve as input to forward radiative transfer models are essential to fully exploit remote sensing of Earth atmosphere. However, the process of updating spectroscopic databases in order to provide the users with a database that insures an optimal characterization of spectral properties of molecular absorption for radiative transfer modeling is challenging. The evaluation of the databases content and the underlying choices made by the managing team is thus a crucial step. Here, we introduce an original and powerful approach for evaluating spectroscopic parameters: the Spectroscopic Parameters And Radiative Transfer Evaluation (SPARTE) chain. The SPARTE chain relies on the comparison between forward radiative transfer simulations made by the 4A radiative transfer model and observations of spectra made from various observations collocated over several thousands of well-characterized atmospheric situations. Averaging the resulting 'calculated-observed spectral' residuals minimizes the random errors coming from both the radiometric noise of the instruments and the imperfect description of the atmospheric state. The SPARTE chain can be used to evaluate any spectroscopic databases, from the visible to the microwave, using any type of remote sensing observations (ground-based, airborne or space-borne). We show that the comparison of the shape of the residuals enables: (i) identifying incorrect line parameters (line position, intensity, width, pressure shift, etc.), even for molecules for which interferences between the lines have to be taken into account; (ii) proposing revised values, in cooperation with contributing teams; and (iii) validating the final updated parameters. In particular, we show that the simultaneous availability of two databases such as GEISA and HITRAN helps identifying remaining issues in each database. The SPARTE chain has been here applied to the validation of the update of GEISA-2015 in 2 spectral regions of particular interest for several currently exploited or planned Earth space missions: the thermal infrared domain and the short-wave infrared domain, for which observations from the space-borne IASI instrument and from the ground-based FTS instruments at the Parkfalls TCCON site are used respectively. Main results include: (i) the validation of the positions and intensities of line parameters, with overall significantly lower residuals for GEISA-2015 than for GEISA-2011 and (iii) the validation of the choice made on the parameters (such as pressure shift and air-broadened width) which has not been given by the provider but completed by ourselves. For example, comparisons between residuals obtained with GEISA-2015 and HITRAN-2012 have highlighted a specific issue with some HWHM values in the latter that can be clearly identified on the 'calculated-observed' residuals.
Comparison of the NCI open database with seven large chemical structural databases.
Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C
2001-01-01
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.
Petherick, Emily S; Pickett, Kate E; Cullum, Nicky A
2015-08-01
Primary care databases from the UK have been widely used to produce evidence on the epidemiology and health service usage of a wide range of conditions. To date there have been few evaluations of the comparability of estimates between different sources of these data. To estimate the comparability of two widely used primary care databases, the Health Improvement Network Database (THIN) and the General Practice Research Database (GPRD) using venous leg ulceration as an exemplar condition. Cross prospective cohort comparison. GPRD and the THIN databases using data from 1998 to 2006. A data set was extracted from both databases containing all cases of persons aged 20 years or greater with a database diagnosis of venous leg ulceration recorded in the databases for the period 1998-2006. Annual rates of incidence and prevalence of venous leg ulceration were calculated within each database and standardized to the European standard population and compared using standardized rate ratios. Comparable estimates of venous leg ulcer incidence from the GPRD and THIN databases could be obtained using data from 2000 to 2006 and of prevalence using data from 2001 to 2006. Recent data collected by these two databases are more likely to produce comparable results of the burden venous leg ulceration. These results require confirmation in other disease areas to enable researchers to have confidence in the comparability of findings from these two widely used primary care research resources. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment
NASA Astrophysics Data System (ADS)
Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.
2017-11-01
The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
Rosato, Stefano; D'Errigo, Paola; Badoni, Gabriella; Fusco, Danilo; Perucci, Carlo A; Seccareccia, Fulvia
2008-08-01
The availability of two contemporary sources of information about coronary artery bypass graft (CABG) interventions, allowed 1) to verify the feasibility of performing outcome evaluation studies using administrative data sources, and 2) to compare hospital performance obtainable using the CABG Project clinical database with hospital performance derived from the use of current administrative data. Interventions recorded in the CABG Project were linked to the hospital discharge record (HDR) administrative database. Only the linked records were considered for subsequent analyses (46% of the total CABG Project). A new selected population "clinical card-HDR" was then defined. Two independent risk-adjustment models were applied, each of them using information derived from one of the two different sources. Then, HDR information was supplemented with some patient preoperative conditions from the CABG clinical database. The two models were compared in terms of their adaptability to data. Hospital performances identified by the two different models and significantly different from the mean was compared. In only 4 of the 13 hospitals considered for analysis, the results obtained using the HDR model did not completely overlap with those obtained by the CABG model. When comparing statistical parameters of the HDR model and the HDR model + patient preoperative conditions, the latter showed the best adaptability to data. In this "clinical card-HDR" population, hospital performance assessment obtained using information from the clinical database is similar to that derived from the use of current administrative data. However, when risk-adjustment models built on administrative databases are supplemented with a few clinical variables, their statistical parameters improve and hospital performance assessment becomes more accurate.
Zhao, Lei; Guo, Yi; Wang, Wei; Yan, Li-juan
2011-08-01
To evaluate the effectiveness of acupuncture as a treatment for neurovascular headache and to analyze the current situation related to acupuncture treatment. PubMed database (1966-2010), EMBASE database (1986-2010), Cochrane Library (Issue 1, 2010), Chinese Biomedical Literature Database (1979-2010), China HowNet Knowledge Database (1979-2010), VIP Journals Database (1989-2010), and Wanfang database (1998-2010) were retrieved. Randomized or quasi-randomized controlled studies were included. The priority was given to high-quality randomized, controlled trials. Statistical outcome indicators were measured using RevMan 5.0.20 software. A total of 16 articles and 1 535 cases were included. Meta-analysis showed a significant difference between the acupuncture therapy and Western medicine therapy [combined RR (random efficacy model)=1.46, 95% CI (1.21, 1.75), Z=3.96, P<0.0001], indicating an obvious superior effect of the acupuncture therapy; significant difference also existed between the comprehensive acupuncture therapy and acupuncture therapy alone [combined RR (fixed efficacy model)=3.35, 95% CI (1.92, 5.82), Z=4.28, P<0.0001], indicating that acupuncture combined with other therapies, such as points injection, scalp acupuncture, auricular acupuncture, etc., were superior to the conventional body acupuncture therapy alone. The inclusion of limited clinical studies had verified the efficacy of acupuncture in the treatment of neurovascular headache. Although acupuncture or its combined therapies provides certain advantages, most clinical studies are of small sample sizes. Large sample size, randomized, controlled trials are needed in the future for more definitive results.
Development of a 2001 National Land Cover Database for the United States
Homer, Collin G.; Huang, Chengquan; Yang, Limin; Wylie, Bruce K.; Coan, Michael
2004-01-01
Multi-Resolution Land Characterization 2001 (MRLC 2001) is a second-generation Federal consortium designed to create an updated pool of nation-wide Landsat 5 and 7 imagery and derive a second-generation National Land Cover Database (NLCD 2001). The objectives of this multi-layer, multi-source database are two fold: first, to provide consistent land cover for all 50 States, and second, to provide a data framework which allows flexibility in developing and applying each independent data component to a wide variety of other applications. Components in the database include the following: (1) normalized imagery for three time periods per path/row, (2) ancillary data, including a 30 m Digital Elevation Model (DEM) derived into slope, aspect and slope position, (3) perpixel estimates of percent imperviousness and percent tree canopy, (4) 29 classes of land cover data derived from the imagery, ancillary data, and derivatives, (5) classification rules, confidence estimates, and metadata from the land cover classification. This database is now being developed using a Mapping Zone approach, with 66 Zones in the continental United States and 23 Zones in Alaska. Results from three initial mapping Zones show single-pixel land cover accuracies ranging from 73 to 77 percent, imperviousness accuracies ranging from 83 to 91 percent, tree canopy accuracies ranging from 78 to 93 percent, and an estimated 50 percent increase in mapping efficiency over previous methods. The database has now entered the production phase and is being created using extensive partnering in the Federal government with planned completion by 2006.
Oliveira, S R M; Almeida, G V; Souza, K R R; Rodrigues, D N; Kuser-Falcão, P R; Yamagishi, M E B; Santos, E H; Vieira, F D; Jardine, J G; Neshich, G
2007-10-05
An effective strategy for managing protein databases is to provide mechanisms to transform raw data into consistent, accurate and reliable information. Such mechanisms will greatly reduce operational inefficiencies and improve one's ability to better handle scientific objectives and interpret the research results. To achieve this challenging goal for the STING project, we introduce Sting_RDB, a relational database of structural parameters for protein analysis with support for data warehousing and data mining. In this article, we highlight the main features of Sting_RDB and show how a user can explore it for efficient and biologically relevant queries. Considering its importance for molecular biologists, effort has been made to advance Sting_RDB toward data quality assessment. To the best of our knowledge, Sting_RDB is one of the most comprehensive data repositories for protein analysis, now also capable of providing its users with a data quality indicator. This paper differs from our previous study in many aspects. First, we introduce Sting_RDB, a relational database with mechanisms for efficient and relevant queries using SQL. Sting_rdb evolved from the earlier, text (flat file)-based database, in which data consistency and integrity was not guaranteed. Second, we provide support for data warehousing and mining. Third, the data quality indicator was introduced. Finally and probably most importantly, complex queries that could not be posed on a text-based database, are now easily implemented. Further details are accessible at the Sting_RDB demo web page: http://www.cbi.cnptia.embrapa.br/StingRDB.
Attenuation relation for strong motion in Eastern Java based on appropriate database and method
NASA Astrophysics Data System (ADS)
Mahendra, Rian; Rohadi, Supriyanto; Rudyanto, Ariska
2017-07-01
The selection and determination of attenuation relation has become important for seismic hazard assessment in active seismic region. This research initially constructs the appropriate strong motion database, including site condition and type of the earthquake. The data set consisted of large number earthquakes of 5 ≤ Mw ≤ 9 and distance less than 500 km that occurred around Java from 2009 until 2016. The location and depth of earthquake are being relocated using double difference method to improve the quality of database. Strong motion data from twelve BMKG's accelerographs which are located in east Java is used. The site condition is known by using dominant period and Vs30. The type of earthquake is classified into crustal earthquake, interface, and intraslab based on slab geometry analysis. A total of 10 Ground Motion Prediction Equations (GMPEs) are tested using Likelihood (Scherbaum et al., 2004) and Euclidean Distance Ranking method (Kale and Akkar, 2012) with the associated database. The evaluation of these methods lead to a set of GMPEs that can be applied for seismic hazard in East Java where the strong motion data is collected. The result of these methods found that there is still high deviation of GMPEs, so the writer modified some GMPEs using inversion method. Validation was performed by analysing the attenuation curve of the selected GMPE and observation data in period 2015 up to 2016. The results show that the selected GMPE is suitable for estimated PGA value in East Java.
Garrido-Martín, Diego; Pazos, Florencio
2018-02-27
The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A
2017-11-01
Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.
A Utility Maximizing and Privacy Preserving Approach for Protecting Kinship in Genomic Databases.
Kale, Gulce; Ayday, Erman; Tastan, Oznur
2017-09-12
Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed.We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. Available at: https://github.com/tastanlab/Kinship-Privacy. erman@cs.bilkent.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Databases as policy instruments. About extending networks as evidence-based policy.
de Bont, Antoinette; Stoevelaar, Herman; Bal, Roland
2007-12-07
This article seeks to identify the role of databases in health policy. Access to information and communication technologies has changed traditional relationships between the state and professionals, creating new systems of surveillance and control. As a result, databases may have a profound effect on controlling clinical practice. We conducted three case studies to reconstruct the development and use of databases as policy instruments. Each database was intended to be employed to control the use of one particular pharmaceutical in the Netherlands (growth hormone, antiretroviral drugs for HIV and Taxol, respectively). We studied the archives of the Dutch Health Insurance Board, conducted in-depth interviews with key informants and organized two focus groups, all focused on the use of databases both in policy circles and in clinical practice. Our results demonstrate that policy makers hardly used the databases, neither for cost control nor for quality assurance. Further analysis revealed that these databases facilitated self-regulation and quality assurance by (national) bodies of professionals, resulting in restrictive prescription behavior amongst physicians. The databases fulfill control functions that were formerly located within the policy realm. The databases facilitate collaboration between policy makers and physicians, since they enable quality assurance by professionals. Delegating regulatory authority downwards into a network of physicians who control the use of pharmaceuticals seems to be a good alternative for centralized control on the basis of monitoring data.
NASA Astrophysics Data System (ADS)
Burns, D. T.; Kessler, C.; Büermann, L.; Ketelhut, S.
2018-01-01
A key comparison has been made between the absorbed dose to water standards of the PTB, Germany and the BIPM in the medium-energy x-ray range. The results show the standards to be in general agreement at the level of the standard uncertainty of the comparison of 9 to 11 parts in 103. The results are combined with those of a EURAMET comparison and presented in terms of degrees of equivalence for entry in the BIPM key comparison database. Main text To reach the main text of this paper, click on Final Report. Note that this text is that which appears in Appendix B of the BIPM key comparison database kcdb.bipm.org/. The final report has been peer-reviewed and approved for publication by the CCRI, according to the provisions of the CIPM Mutual Recognition Arrangement (CIPM MRA).
An Automated Ab Initio Framework for Identifying New Ferroelectrics
NASA Astrophysics Data System (ADS)
Smidt, Tess; Reyes-Lillo, Sebastian E.; Jain, Anubhav; Neaton, Jeffrey B.
Ferroelectric materials have a wide-range of technological applications including non-volatile RAM and optoelectronics. In this work, we present an automated first-principles search for ferroelectrics. We integrate density functional theory, crystal structure databases, symmetry tools, workflow software, and a custom analysis toolkit to build a library of known and proposed ferroelectrics. We screen thousands of candidates using symmetry relations between nonpolar and polar structure pairs. We use two search strategies 1) polar-nonpolar pairs with the same composition and 2) polar-nonpolar structure type pairs. Results are automatically parsed, stored in a database, and accessible via a web interface showing distortion animations and plots of polarization and total energy as a function of distortion. We benchmark our results against experimental data, present new ferroelectric candidates found through our search, and discuss future work on expanding this search methodology to other material classes such as anti-ferroelectrics and multiferroics.
An ECG storage and retrieval system embedded in client server HIS utilizing object-oriented DB.
Wang, C; Ohe, K; Sakurai, T; Nagase, T; Kaihara, S
1996-02-01
In the University of Tokyo Hospital, the improved client server HIS has been applied to clinical practice and physicians can order prescription, laboratory examination, ECG examination and radiographic examination, etc. directly by themselves and read results of these examinations, except medical signal waves, schema and image, on UNIX workstations. Recently, we designed and developed an ECG storage and retrieval system embedded in the client server HIS utilizing object-oriented database to take the first step in dealing with digitized signal, schema and image data and show waves, graphics, and images directly to physicians by the client server HIS. The system was developed based on object-oriented analysis and design, and implemented with object-oriented database management system (OODMS) and C++ programming language. In this paper, we describe the ECG data model, functions of the storage and retrieval system, features of user interface and the result of its implementation in the HIS.
Comparative homology agreement search: An effective combination of homology-search methods
Alam, Intikhab; Dress, Andreas; Rehmsmeier, Marc; Fuellen, Georg
2004-01-01
Many methods have been developed to search for homologous members of a protein family in databases, and the reliability of results and conclusions may be compromised if only one method is used, neglecting the others. Here we introduce a general scheme for combining such methods. Based on this scheme, we implemented a tool called comparative homology agreement search (chase) that integrates different search strategies to obtain a combined “E value.” Our results show that a consensus method integrating distinct strategies easily outperforms any of its component algorithms. More specifically, an evaluation based on the Structural Classification of Proteins database reveals that, on average, a coverage of 47% can be obtained in searches for distantly related homologues (i.e., members of the same superfamily but not the same family, which is a very difficult task), accepting only 10 false positives, whereas the individual methods obtain a coverage of 28–38%. PMID:15367730
Rainfall statistics, stationarity, and climate change.
Sun, Fubao; Roderick, Michael L; Farquhar, Graham D
2018-03-06
There is a growing research interest in the detection of changes in hydrologic and climatic time series. Stationarity can be assessed using the autocorrelation function, but this is not yet common practice in hydrology and climate. Here, we use a global land-based gridded annual precipitation (hereafter P ) database (1940-2009) and find that the lag 1 autocorrelation coefficient is statistically significant at around 14% of the global land surface, implying nonstationary behavior (90% confidence). In contrast, around 76% of the global land surface shows little or no change, implying stationary behavior. We use these results to assess change in the observed P over the most recent decade of the database. We find that the changes for most (84%) grid boxes are within the plausible bounds of no significant change at the 90% CI. The results emphasize the importance of adequately accounting for natural variability when assessing change. Copyright © 2018 the Author(s). Published by PNAS.
Rainfall statistics, stationarity, and climate change
NASA Astrophysics Data System (ADS)
Sun, Fubao; Roderick, Michael L.; Farquhar, Graham D.
2018-03-01
There is a growing research interest in the detection of changes in hydrologic and climatic time series. Stationarity can be assessed using the autocorrelation function, but this is not yet common practice in hydrology and climate. Here, we use a global land-based gridded annual precipitation (hereafter P) database (1940–2009) and find that the lag 1 autocorrelation coefficient is statistically significant at around 14% of the global land surface, implying nonstationary behavior (90% confidence). In contrast, around 76% of the global land surface shows little or no change, implying stationary behavior. We use these results to assess change in the observed P over the most recent decade of the database. We find that the changes for most (84%) grid boxes are within the plausible bounds of no significant change at the 90% CI. The results emphasize the importance of adequately accounting for natural variability when assessing change.
NASA Astrophysics Data System (ADS)
Ringerud, S.; Skofronick Jackson, G.; Kulie, M.; Randel, D.
2016-12-01
NASA's Global Precipitation Measurement Mission (GPM) provides a wealth of both active and passive microwave observations aimed at furthering understanding of global precipitation and the hydrologic cycle. Employing a constellation of passive microwave radiometers increases global coverage and sampling, while the core satellite acts as a transfer standard, enabling consistent retrievals across individual constellation members. The transfer standard is applied in the form of a physically based a priori database constructed for use in Bayesian retrieval algorithms for each radiometer. The database is constructed using hydrometeor profiles optimized for the best fit to simultaneous active/passive core satellite measurements via the GPM Combined Algorithm. Initial validation of GPM rainfall products using the combined database suggests high retrieval errors for convective precipitation over land and at high latitudes. In such regimes, the signal from ice scattering observed at the higher microwave frequencies becomes particularly important for detecting and retrieving precipitation. For cross-track sounders such as MHS and SAPHIR, this signal is crucial. It is therefore important that the scattering signals associated with precipitation are accurately represented and modeled in the retrieval database. In the current GPM combined retrieval and constellation databases, ice hydrometeors are represented as "fluffy spheres", with assumed density and scattering parameters calculated using Mie theory. Resulting simulated Tb agree reasonably well at frequencies up to 89 GHz, but show significant biases at higher frequencies. In this work the database is recreated using an ensemble of non-spherical ice particles with single scattering properties calculated using discrete dipole approximation. Simulated Tb agreement is significantly improved across the high frequencies, decreasing biases by an order of magnitude in several of the channels. The new database is applied for a sample of GPM constellation retrievals and the retrieved precipitation rates compared, to demonstrate areas where the use of more complex ice particles will have the greatest effect upon the final retrievals.
NASA Astrophysics Data System (ADS)
Lin, Zhongmin S.; Avinash, Gopal; Yan, Litao; McMillan, Kathryn
2014-03-01
Age-related cortical thinning has been studied by many researchers using quantitative MR images for the past three decades and vastly differing results have been reported. Although results have shown age-related cortical thickening in elderly cohort statistically in some brain regions under certain conditions, cortical thinning in elderly cohort requires further systematic investigation. This paper leverages our previously reported brain surface intensity model (BSIM)1 based technique to measure cortical thickness to study cortical changes due to normal aging. We measured cortical thickness of cognitively normal persons from 60 to 89 years old using Australian Imaging Biomarkers and Lifestyle Study (AIBL) data. MRI brains of 56 healthy people including 29 women and 27 men were selected. We measured average cortical thickness of each individual in eight brain regions: parietal, frontal, temporal, occipital, visual, sensory motor, medial frontal and medial parietal. Unlike the previous published studies, our results showed consistent age-related thinning of cerebral cortex in all brain regions. The parietal, medial frontal and medial parietal showed fastest thinning rates of 0.14, 0.12 and 0.10 mm/decade respectively while the visual region showed the slowest thinning rate of 0.05 mm/decade. In sensorimotor and parietal areas, women showed higher thinning (0.09 and 0.16 mm/decade) than men while in all other regions men showed higher thinning than women. We also created high resolution cortical thinning rate maps of the cohort and compared them to typical patterns of PET metabolic reduction of moderate AD and frontotemporal dementia (FTD). The results seemed to indicate vulnerable areas of cortical deterioration that may lead to brain dementia. These results validate our cortical thickness measurement technique by demonstrating the consistency of the cortical thinning and prediction of cortical deterioration trend with AIBL database.
Wei, Wei; Ji, Zhanglong; He, Yupeng; Zhang, Kai; Ha, Yuanchi; Li, Qi; Ohno-Machado, Lucila
2018-01-01
Abstract The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval. Database URL: https://github.com/w2wei/dataset_retrieval_pipeline PMID:29688374
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osses de Eicker, Margarita, E-mail: Margarita.Osses@empa.c; Hischier, Roland, E-mail: Roland.Hischier@empa.c; Hurni, Hans, E-mail: Hans.Hurni@cde.unibe.c
2010-04-15
Nine non-local databases were evaluated with respect to their suitability for the environmental assessment of industrial activities in Latin America. Three assessment methods were considered, namely Life Cycle Assessment (LCA), Environmental Impact Assessment (EIA) and air emission inventories. The analysis focused on data availability in the databases and the applicability of their international data to Latin American industry. The study showed that the European EMEP/EEA Guidebook and the U.S. EPA AP-42 database are the most suitable ones for air emission inventories, whereas the LCI database Ecoinvent is the most suitable one for LCA and EIA. Due to the data coveragemore » in the databases, air emission inventories are easier to develop than LCA or EIA, which require more comprehensive information. One strategy to overcome the limitations of non-local databases for Latin American industry is the combination of validated data from international databases with newly developed local datasets.« less
Integration of air traffic databases : a case study
DOT National Transportation Integrated Search
1995-03-01
This report describes a case study to show the benefits from maximum utilization of existing air traffic databases. The study demonstrates the utility of integrating available data through developing and demonstrating a methodology addressing the iss...
Difficulties in diagnosing Marfan syndrome using current FBN1 databases.
Groth, Kristian A; Gaustadnes, Mette; Thorsen, Kasper; Østergaard, John R; Jensen, Uffe Birk; Gravholt, Claus H; Andersen, Niels H
2016-01-01
The diagnostic criteria of Marfan syndrome (MFS) highlight the importance of a FBN1 mutation test in diagnosing MFS. As genetic sequencing becomes better, cheaper, and more accessible, the expected increase in the number of genetic tests will become evident, resulting in numerous genetic variants that need to be evaluated for disease-causing effects based on database information. The aim of this study was to evaluate genetic variants in four databases and review the relevant literature. We assessed background data on 23 common variants registered in ESP6500 and classified as causing MFS in the Human Gene Mutation Database (HGMD). We evaluated data in four variant databases (HGMD, UMD-FBN1, ClinVar, and UniProt) according to the diagnostic criteria for MFS and compared the results with the classification of each variant in the four databases. None of the 23 variants was clearly associated with MFS, even though all classifications in the databases stated otherwise. A genetic diagnosis of MFS cannot reliably be based on current variant databases because they contain incorrectly interpreted conclusions on variants. Variants must be evaluated by time-consuming review of the background material in the databases and by combining these data with expert knowledge on MFS. This is a major problem because we expect even more genetic test results in the near future as a result of the reduced cost and process time for next-generation sequencing.Genet Med 18 1, 98-102.
The Optical Gravitational Lensing Experiment Ogle-Ii Results
NASA Astrophysics Data System (ADS)
Żebruń, K.; Udalski, A.; Szymański, M.; Kubiak, M.; Pietrzyński, G.; Soszyński, I.; Woźniak, P.
2002-12-01
We present results of a search for microlensing events in the OGLE-II database of observations of stars from the Galactic Bulge (GB). Our main result is the Catalog of Microlensing events in the GB containing data about 214 cases of microlensing in 1997-1999. We present also the distribution of the normalized number of microlensing events in 24 lines of sight. Our results show that the majority of lenses are located in the Galactic Bar rather than in the Galactic disk. Details and the Catalog are available from the OGLE internet archive.
Combining Digital Watermarking and Fingerprinting Techniques to Identify Copyrights for Color Images
Hsieh, Shang-Lin; Chen, Chun-Che; Shen, Wen-Shan
2014-01-01
This paper presents a copyright identification scheme for color images that takes advantage of the complementary nature of watermarking and fingerprinting. It utilizes an authentication logo and the extracted features of the host image to generate a fingerprint, which is then stored in a database and also embedded in the host image to produce a watermarked image. When a dispute over the copyright of a suspect image occurs, the image is first processed by watermarking. If the watermark can be retrieved from the suspect image, the copyright can then be confirmed; otherwise, the watermark then serves as the fingerprint and is processed by fingerprinting. If a match in the fingerprint database is found, then the suspect image will be considered a duplicated one. Because the proposed scheme utilizes both watermarking and fingerprinting, it is more robust than those that only adopt watermarking, and it can also obtain the preliminary result more quickly than those that only utilize fingerprinting. The experimental results show that when the watermarked image suffers slight attacks, watermarking alone is enough to identify the copyright. The results also show that when the watermarked image suffers heavy attacks that render watermarking incompetent, fingerprinting can successfully identify the copyright, hence demonstrating the effectiveness of the proposed scheme. PMID:25114966
Application of an adaptive neuro-fuzzy inference system to ground subsidence hazard mapping
NASA Astrophysics Data System (ADS)
Park, Inhye; Choi, Jaewon; Jin Lee, Moung; Lee, Saro
2012-11-01
We constructed hazard maps of ground subsidence around abandoned underground coal mines (AUCMs) in Samcheok City, Korea, using an adaptive neuro-fuzzy inference system (ANFIS) and a geographical information system (GIS). To evaluate the factors related to ground subsidence, a spatial database was constructed from topographic, geologic, mine tunnel, land use, and ground subsidence maps. An attribute database was also constructed from field investigations and reports on existing ground subsidence areas at the study site. Five major factors causing ground subsidence were extracted: (1) depth of drift; (2) distance from drift; (3) slope gradient; (4) geology; and (5) land use. The adaptive ANFIS model with different types of membership functions (MFs) was then applied for ground subsidence hazard mapping in the study area. Two ground subsidence hazard maps were prepared using the different MFs. Finally, the resulting ground subsidence hazard maps were validated using the ground subsidence test data which were not used for training the ANFIS. The validation results showed 95.12% accuracy using the generalized bell-shaped MF model and 94.94% accuracy using the Sigmoidal2 MF model. These accuracy results show that an ANFIS can be an effective tool in ground subsidence hazard mapping. Analysis of ground subsidence with the ANFIS model suggests that quantitative analysis of ground subsidence near AUCMs is possible.
NASA Technical Reports Server (NTRS)
Brunstrom, Anna; Leutenegger, Scott T.; Simha, Rahul
1995-01-01
Traditionally, allocation of data in distributed database management systems has been determined by off-line analysis and optimization. This technique works well for static database access patterns, but is often inadequate for frequently changing workloads. In this paper we address how to dynamically reallocate data for partionable distributed databases with changing access patterns. Rather than complicated and expensive optimization algorithms, a simple heuristic is presented and shown, via an implementation study, to improve system throughput by 30 percent in a local area network based system. Based on artificial wide area network delays, we show that dynamic reallocation can improve system throughput by a factor of two and a half for wide area networks. We also show that individual site load must be taken into consideration when reallocating data, and provide a simple policy that incorporates load in the reallocation decision.
Geo-database use to promote dengue infection prevention and control.
Wongbutdee, Jaruwan; Chaikoolvatana, Anun; Saengnill, Wacharapong; Krasuaythong, Nantaya; Phuphak, Surajit
2010-07-01
Dengue infection (DI) is a major health problem in Thailand and is especially prevalent in Ubon Ratchathani Province. The objectives of the project were: (1) to develop a geo-database system for DI prevention and control, (2) to perform an Aedes aegypti larval vector survey for DI prevention and control in Ubon Ratchathani Province, (3) to study the behavior and perceptions regarding DI prevention among the target population in Ubon Ratchathani Province. Ten villages with high incidences of DI over a 3 year period from 2005 to 2007 were selected. The survey was divided into 2 periods, pre-outbreak period (February-April 2008) and outbreak period (June-August 2008). The data were collected in April and June 2008. The households in each village were purposively sampled. Water containers inside and outside of the houses were surveyed using the World Health Organization's house index (HI), container index (CI), and Breteau index (BI). The location of each household was recorded using the global positioning system (GPS). Data regarding people's perceptions and behaviors concerning DI prevention were collected during interviews of 383 families in Mach 2008. A database for DI was developed using ArcView version 9.2. The results showed during the pre-outbreak period, Non Jig, Non Sawang, and Huai Teeneu villages had the highest risk level (BI > or =50). During the outbreak period, Non Jig and Huai Teeneu village had the highest risk level (BI > or =50). Results regarding DI perceptions showed the target population had high levels of DI perceptions. DI preventive behavior was found in 50.9%.
Peng, Jiale; Li, Yaping; Zhou, Yeheng; Zhang, Li; Liu, Xingyong; Zuo, Zhili
2018-05-29
Gout is a common inflammatory arthritis caused by the deposition of urate crystals within joints. It is increasingly in prevalence during the past few decades as shown by the epidemiological survey results. Xanthine oxidase (XO) is a key enzyme to transfer hypoxanthine and xanthine to uric acid, whose overproduction leads to gout. Therefore, inhibiting the activity of xanthine oxidase is an important way to reduce the production of urate. In the study, in order to identify the potential natural products targeting XO, pharmacophore modeling was employed to filter databases. Here, two methods, pharmacophore based on ligand and pharmacophore based on receptor-ligand, were constructed by Discovery Studio. Then GOLD was used to refine the potential compounds with higher fitness scores. Finally, molecular docking and dynamics simulations were employed to analyze the interactions between compounds and protein. The best hypothesis was set as a 3D query to screen database, returning 785 and 297 compounds respectively. A merged set of the above 1082 molecules was subjected to molecular docking, which returned 144 hits with high-fitness scores. These molecules were clustered in four main kinds depending on different backbones. What is more, molecular docking showed that the representative compounds established key interactions with the amino acid residues in the protein, and the RMSD and RMSF of molecular dynamics results showed that these compounds can stabilize the protein. The information represented in the study confirmed previous reports. And it may assist to discover and design new backbones as potential XO inhibitors based on natural products.
Optics survivability support, volume 2
NASA Astrophysics Data System (ADS)
Wild, N.; Simpson, T.; Busdeker, A.; Doft, F.
1993-01-01
This volume of the Optics Survivability Support Final Report contains plots of all the data contained in the computerized Optical Glasses Database. All of these plots are accessible through the Database, but are included here as a convenient reference. The first three pages summarize the types of glass included with a description of the radiation source, test date, and the original data reference. This information is included in the database as a macro button labeled 'LLNL DATABASE'. Following this summary is an Abbe chart showing which glasses are included and where they lie as a function of nu(sub d) and n(sub d). This chart is also callable through the database as a macro button labeled 'ABBEC'.
EST databases and web tools for EST projects.
Shen, Yao-Qing; O'Brien, Emmet; Koski, Liisa; Lang, B Franz; Burger, Gertraud
2009-01-01
This chapter outlines key considerations for constructing and implementing an EST database. Instead of showing the technological details step by step, emphasis is put on the design of an EST database suited to the specific needs of EST projects and how to choose the most suitable tools. Using TBestDB as an example, we illustrate the essential factors to be considered for database construction and the steps for data population and annotation. This process employs technologies such as PostgreSQL, Perl, and PHP to build the database and interface, and tools such as AutoFACT for data processing and annotation. We discuss these in comparison to other available technologies and tools, and explain the reasons for our choices.
Improved Information Retrieval Performance on SQL Database Using Data Adapter
NASA Astrophysics Data System (ADS)
Husni, M.; Djanali, S.; Ciptaningtyas, H. T.; Wicaksana, I. G. N. A.
2018-02-01
The NoSQL databases, short for Not Only SQL, are increasingly being used as the number of big data applications increases. Most systems still use relational databases (RDBs), but as the number of data increases each year, the system handles big data with NoSQL databases to analyze and access data more quickly. NoSQL emerged as a result of the exponential growth of the internet and the development of web applications. The query syntax in the NoSQL database differs from the SQL database, therefore requiring code changes in the application. Data adapter allow applications to not change their SQL query syntax. Data adapters provide methods that can synchronize SQL databases with NotSQL databases. In addition, the data adapter provides an interface which is application can access to run SQL queries. Hence, this research applied data adapter system to synchronize data between MySQL database and Apache HBase using direct access query approach, where system allows application to accept query while synchronization process in progress. From the test performed using data adapter, the results obtained that the data adapter can synchronize between SQL databases, MySQL, and NoSQL database, Apache HBase. This system spends the percentage of memory resources in the range of 40% to 60%, and the percentage of processor moving from 10% to 90%. In addition, from this system also obtained the performance of database NoSQL better than SQL database.
The improved Apriori algorithm based on matrix pruning and weight analysis
NASA Astrophysics Data System (ADS)
Lang, Zhenhong
2018-04-01
This paper uses the matrix compression algorithm and weight analysis algorithm for reference and proposes an improved matrix pruning and weight analysis Apriori algorithm. After the transactional database is scanned for only once, the algorithm will construct the boolean transaction matrix. Through the calculation of one figure in the rows and columns of the matrix, the infrequent item set is pruned, and a new candidate item set is formed. Then, the item's weight and the transaction's weight as well as the weight support for items are calculated, thus the frequent item sets are gained. The experimental result shows that the improved Apriori algorithm not only reduces the number of repeated scans of the database, but also improves the efficiency of data correlation mining.
Nuclear Data Matters - The obvious case of a bad mixing ratio for 58Co
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoffman, R. D.; Nesaraja, Caroline D.; Mattoon, Caleb
We present results of modeled cross sections for neutron- and proton-induced reactions leading to the final product nucleus 58Co. In each case the gamma-cascade branching ratios given in the ENSDF database circa 2014 predict modeled nuclear cross sections leading to the ground and first excited metastable state that are incompatible with measured cross sections found in the NNDC experimental cross section database EXFOR. We show that exploring the uncertainty in the mixing ratio used to calculate the gamma-cascade branching ratios for the 53.15 keV 2 nd excited state leads to changes in the predicted partial cross sections by amounts thatmore » give good agreement with measured data.« less
Constructing a Geology Ontology Using a Relational Database
NASA Astrophysics Data System (ADS)
Hou, W.; Yang, L.; Yin, S.; Ye, J.; Clarke, K.
2013-12-01
In geology community, the creation of a common geology ontology has become a useful means to solve problems of data integration, knowledge transformation and the interoperation of multi-source, heterogeneous and multiple scale geological data. Currently, human-computer interaction methods and relational database-based methods are the primary ontology construction methods. Some human-computer interaction methods such as the Geo-rule based method, the ontology life cycle method and the module design method have been proposed for applied geological ontologies. Essentially, the relational database-based method is a reverse engineering of abstracted semantic information from an existing database. The key is to construct rules for the transformation of database entities into the ontology. Relative to the human-computer interaction method, relational database-based methods can use existing resources and the stated semantic relationships among geological entities. However, two problems challenge the development and application. One is the transformation of multiple inheritances and nested relationships and their representation in an ontology. The other is that most of these methods do not measure the semantic retention of the transformation process. In this study, we focused on constructing a rule set to convert the semantics in a geological database into a geological ontology. According to the relational schema of a geological database, a conversion approach is presented to convert a geological spatial database to an OWL-based geological ontology, which is based on identifying semantics such as entities, relationships, inheritance relationships, nested relationships and cluster relationships. The semantic integrity of the transformation was verified using an inverse mapping process. In a geological ontology, an inheritance and union operations between superclass and subclass were used to present the nested relationship in a geochronology and the multiple inheritances relationship. Based on a Quaternary database of downtown of Foshan city, Guangdong Province, in Southern China, a geological ontology was constructed using the proposed method. To measure the maintenance of semantics in the conversation process and the results, an inverse mapping from the ontology to a relational database was tested based on a proposed conversation rule. The comparison of schema and entities and the reduction of tables between the inverse database and the original database illustrated that the proposed method retains the semantic information well during the conversation process. An application for abstracting sandstone information showed that semantic relationships among concepts in the geological database were successfully reorganized in the constructed ontology. Key words: geological ontology; geological spatial database; multiple inheritance; OWL Acknowledgement: This research is jointly funded by the Specialized Research Fund for the Doctoral Program of Higher Education of China (RFDP) (20100171120001), NSFC (41102207) and the Fundamental Research Funds for the Central Universities (12lgpy19).
Ascertainment of acute liver injury in two European primary care databases.
Ruigómez, A; Brauer, R; Rodríguez, L A García; Huerta, C; Requena, G; Gil, M; de Abajo, Francisco; Downey, G; Bate, A; Tepie, M Feudjo; de Groot, M; Schlienger, R; Reynolds, R; Klungel, O
2014-10-01
The purpose of this study was to ascertain acute liver injury (ALI) in primary care databases using different computer algorithms. The aim of this investigation was to study and compare the incidence of ALI in different primary care databases and using different definitions of ALI. The Clinical Practice Research Datalink (CPRD) in UK and the Spanish "Base de datos para la Investigación Farmacoepidemiológica en Atención Primaria" (BIFAP) were used. Both are primary care databases from which we selected individuals of all ages registered between January 2004 and December 2009. We developed two case definitions of idiopathic ALI using computer algorithms: (i) restrictive definition (definite cases) and (ii) broad definition (definite and probable cases). Patients presenting prior liver conditions were excluded. Manual review of potential cases was performed to confirm diagnosis, in a sample in CPRD (21%) and all potential cases in BIFAP. Incidence rates of ALI by age, sex and calendar year were calculated. In BIFAP, all cases considered definite after manual review had been detected with the computer algorithm as potential cases, and none came from the non-cases group. The restrictive definition of ALI had a low sensitivity but a very high specificity (95% in BIFAP) and showed higher rates of agreement between computer search and manual review compared to the broad definition. Higher incidence rates of definite ALI in 2008 were observed in BIFAP (3.01 (95% confidence interval (CI) 2.13-4.25) per 100,000 person-years than CPRD (1.35 (95% CI 1.03-1.78)). This study shows that it is feasible to identify ALI cases if restrictive selection criteria are used and the possibility to review additional information to rule out differential diagnoses. Our results confirm that idiopathic ALI is a very rare disease in the general population. Finally, the construction of a standard definition with predefined criteria facilitates the timely comparison across databases.
Large-scale annotation of small-molecule libraries using public databases.
Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A
2007-01-01
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
Geospatial Database for Strata Objects Based on Land Administration Domain Model (ladm)
NASA Astrophysics Data System (ADS)
Nasorudin, N. N.; Hassan, M. I.; Zulkifli, N. A.; Rahman, A. Abdul
2016-09-01
Recently in our country, the construction of buildings become more complex and it seems that strata objects database becomes more important in registering the real world as people now own and use multilevel of spaces. Furthermore, strata title was increasingly important and need to be well-managed. LADM is a standard model for land administration and it allows integrated 2D and 3D representation of spatial units. LADM also known as ISO 19152. The aim of this paper is to develop a strata objects database using LADM. This paper discusses the current 2D geospatial database and needs for 3D geospatial database in future. This paper also attempts to develop a strata objects database using a standard data model (LADM) and to analyze the developed strata objects database using LADM data model. The current cadastre system in Malaysia includes the strata title is discussed in this paper. The problems in the 2D geospatial database were listed and the needs for 3D geospatial database in future also is discussed. The processes to design a strata objects database are conceptual, logical and physical database design. The strata objects database will allow us to find the information on both non-spatial and spatial strata title information thus shows the location of the strata unit. This development of strata objects database may help to handle the strata title and information.
A New Paradigm to Analyze Data Completeness of Patient Data
Nasir, Ayan; Liu, Xinliang
2016-01-01
Summary Background There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Objectives Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. Methods The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. Results The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. Conclusion DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data. PMID:27484918
Consistency Analysis of Genome-Scale Models of Bacterial Metabolism: A Metamodel Approach
Ponce-de-Leon, Miguel; Calle-Espinosa, Jorge; Peretó, Juli; Montero, Francisco
2015-01-01
Genome-scale metabolic models usually contain inconsistencies that manifest as blocked reactions and gap metabolites. With the purpose to detect recurrent inconsistencies in metabolic models, a large-scale analysis was performed using a previously published dataset of 130 genome-scale models. The results showed that a large number of reactions (~22%) are blocked in all the models where they are present. To unravel the nature of such inconsistencies a metamodel was construed by joining the 130 models in a single network. This metamodel was manually curated using the unconnected modules approach, and then, it was used as a reference network to perform a gap-filling on each individual genome-scale model. Finally, a set of 36 models that had not been considered during the construction of the metamodel was used, as a proof of concept, to extend the metamodel with new biochemical information, and to assess its impact on gap-filling results. The analysis performed on the metamodel allowed to conclude: 1) the recurrent inconsistencies found in the models were already present in the metabolic database used during the reconstructions process; 2) the presence of inconsistencies in a metabolic database can be propagated to the reconstructed models; 3) there are reactions not manifested as blocked which are active as a consequence of some classes of artifacts, and; 4) the results of an automatic gap-filling are highly dependent on the consistency and completeness of the metamodel or metabolic database used as the reference network. In conclusion the consistency analysis should be applied to metabolic databases in order to detect and fill gaps as well as to detect and remove artifacts and redundant information. PMID:26629901
ERIC Educational Resources Information Center
Zhou, Ping; Wang, Qinwen; Yang, Jie; Li, Jingqiu; Guo, Junming; Gong, Zhaohui
2015-01-01
This study aimed to investigate the statuses on the publishing and usage of college biochemistry textbooks in China. A textbook database was constructed and the statistical analysis was adopted to evaluate the textbooks. The results showed that there were 945 (~57%) books for theory teaching, 379 (~23%) books for experiment teaching and 331 (~20%)…
ERIC Educational Resources Information Center
Hauck, Roslin V.; Weisband, Suzanne
2002-01-01
Describes two database systems in a law enforcement agency: one is a legacy, text-based system with cumbersome navigation; the newer system is a graphical user interface with simplified navigation. Discusses results of two user studies that showed personnel preferred the older more familiar system and considers implications for system design and…
ERIC Educational Resources Information Center
Borgman, Christine L.
1996-01-01
Reports on a survey of 70 research libraries in Croatia, Czech Republic, Hungary, Poland, Slovakia, and Slovenia. Results show that libraries are rapidly acquiring automated processing systems, CD-ROM databases, and connections to computer networks. Discusses specific data on system implementation and network services by country and by type of…
Navigating through the Jungle of Allergens: Features and Applications of Allergen Databases.
Radauer, Christian
2017-01-01
The increasing number of available data on allergenic proteins demanded the establishment of structured, freely accessible allergen databases. In this review article, features and applications of 6 of the most widely used allergen databases are discussed. The WHO/IUIS Allergen Nomenclature Database is the official resource of allergen designations. Allergome is the most comprehensive collection of data on allergens and allergen sources. AllergenOnline is aimed at providing a peer-reviewed database of allergen sequences for prediction of allergenicity of proteins, such as those planned to be inserted into genetically modified crops. The Structural Database of Allergenic Proteins (SDAP) provides a database of allergen sequences, structures, and epitopes linked to bioinformatics tools for sequence analysis and comparison. The Immune Epitope Database (IEDB) is the largest repository of T-cell, B-cell, and major histocompatibility complex protein epitopes including epitopes of allergens. AllFam classifies allergens into families of evolutionarily related proteins using definitions from the Pfam protein family database. These databases contain mostly overlapping data, but also show differences in terms of their targeted users, the criteria for including allergens, data shown for each allergen, and the availability of bioinformatics tools. © 2017 S. Karger AG, Basel.
Effects of exposure to malathion on blood glucose concentration: a meta-analysis.
Ramirez-Vargas, Marco Antonio; Flores-Alfaro, Eugenia; Uriostegui-Acosta, Mayrut; Alvarez-Fitz, Patricia; Parra-Rojas, Isela; Moreno-Godinez, Ma Elena
2018-02-01
Exposure to malathion (an organophosphate pesticide widely used around the world) has been associated with alterations in blood glucose concentration in animal models. However, the results are inconsistent. The aim of this meta-analysis was to evaluate whether malathion exposure can disturb the concentrations of blood glucose in exposed rats. We performed a literature search of online databases including PubMed, EBSCO, and Google Scholar and reviewed original articles that analyzed the relation between malathion exposure and glucose levels in animal models. The selection of articles was based on inclusion and exclusion criteria. The database search identified thirty-five possible articles, but only eight fulfilled our inclusion criteria, and these studies were included in the meta-analysis. The effect of malathion on blood glucose concentration showed a non-monotonic dose-response curve. In addition, pooled analysis showed that blood glucose concentrations were 3.3-fold higher in exposed rats than in the control group (95% CI, 2-5; Z = 3.9; p < 0.0001) in a random-effect model. This result suggested that alteration of glucose homeostasis is a possible mechanism of toxicity associated with exposure to malathion.
NASA Astrophysics Data System (ADS)
Zhan, Aibin; Bao, Zhenmin; Wang, Mingling; Chang, Dan; Yuan, Jian; Wang, Xiaolong; Hu, Xiaoli; Liang, Chengzhu; Hu, Jingjie
2008-05-01
The EST database of the Pacific abalone ( Haliotis discus) was mined for developing microsatellite markers. A total of 1476 EST sequences were registered in GenBank when data mining was performed. Fifty sequences (approximately 3.4%) were found to contain one or more microsatellites. Based on the length and GC content of the flanking regions, cluster analysis and BLASTN, 13 microsatellite-containing ESTs were selected for PCR primer design. The results showed that 10 out of 13 primer pairs could amplify scorable PCR products and showed polymorphism. The number of alleles ranged from 2 to 13 and the values of H o and H e varied from 0.1222 to 0.8611 and 0.2449 to 0.9311, respectively. No significant linkage disequilibrium (LD) between any pairs of these loci was found, and 6 of 10 loci conformed to the Hardy-Weinberg equilibrium (HWE). These EST-SSRs are therefore potential tools for studies of intraspecies variation and hybrid identification.
Genetic analysis of duck circovirus in Pekin ducks from South Korea.
Cha, S-Y; Kang, M; Cho, J-G; Jang, H-K
2013-11-01
The genetic organization of the 24 duck circovirus (DuCV) strains detected in commercial Pekin ducks from South Korea between 2011 and 2012 is described in this study. Multiple sequence alignment and phylogenetic analyses were performed on the 24 viral genome sequences as well as on 45 genome sequences available from the GenBank database. Phylogenetic analyses based on the genomic and open reading frame 2/cap sequences demonstrated that all DuCV strains belonged to genotype 1 and were designated in a subcluster under genotype 1. Analysis of the capsid protein amino acid sequences of the 24 Korean DuCV strains showed 10 substitutions compared with that of other genotype 1 strains. Our analysis showed that genotype 1 is predominant and circulating in South Korea. These present results serve as incentive to add more data to the DuCV database and provide insight to conduct further intensive study on the geographic relationships among these virus strains.
Loss-tolerant measurement-device-independent quantum private queries
Zhao, Liang-Yuan; Yin, Zhen-Qiang; Chen, Wei; Qian, Yong-Jun; Zhang, Chun-Mei; Guo, Guang-Can; Han, Zheng-Fu
2017-01-01
Quantum private queries (QPQ) is an important cryptography protocol aiming to protect both the user’s and database’s privacy when the database is queried privately. Recently, a variety of practical QPQ protocols based on quantum key distribution (QKD) have been proposed. However, for QKD-based QPQ the user’s imperfect detectors can be subjected to some detector- side-channel attacks launched by the dishonest owner of the database. Here, we present a simple example that shows how the detector-blinding attack can damage the security of QKD-based QPQ completely. To remove all the known and unknown detector side channels, we propose a solution of measurement-device-independent QPQ (MDI-QPQ) with single- photon sources. The security of the proposed protocol has been analyzed under some typical attacks. Moreover, we prove that its security is completely loss independent. The results show that practical QPQ will remain the same degree of privacy as before even with seriously uncharacterized detectors. PMID:28051101
Creative self-efficacy development and creative performance over time.
Tierney, Pamela; Farmer, Steven M
2011-03-01
Building from an established framework of self-efficacy development, this study provides a longitudinal examination of the development of creative self-efficacy in an ongoing work context. Results show that increases in employee creative role identity and perceived creative expectation from supervisors over a 6-month time period were associated with enhanced sense of employee capacity for creative work. Contrary to what was expected, employees who experienced increased requirements for creativity in their jobs actually reported a decreased sense of efficaciousness for creative work. Results show that increases in creative self-efficacy corresponded with increases in creative performance as well. PsycINFO Database Record (c) 2011 APA, all rights reserved.
Macagno, Eduardo R; Gaasterland, Terry; Edsall, Lee; Bafna, Vineet; Soares, Marcelo B; Scheetz, Todd; Casavant, Thomas; Da Silva, Corinne; Wincker, Patrick; Tasiemski, Aurélie; Salzet, Michel
2010-06-25
The medicinal leech, Hirudo medicinalis, is an important model system for the study of nervous system structure, function, development, regeneration and repair. It is also a unique species in being presently approved for use in medical procedures, such as clearing of pooled blood following certain surgical procedures. It is a current, and potentially also future, source of medically useful molecular factors, such as anticoagulants and antibacterial peptides, which may have evolved as a result of its parasitizing large mammals, including humans. Despite the broad focus of research on this system, little has been done at the genomic or transcriptomic levels and there is a paucity of openly available sequence data. To begin to address this problem, we constructed whole embryo and adult central nervous system (CNS) EST libraries and created a clustered sequence database of the Hirudo transcriptome that is available to the scientific community. A total of approximately 133,000 EST clones from two directionally-cloned cDNA libraries, one constructed from mRNA derived from whole embryos at several developmental stages and the other from adult CNS cords, were sequenced in one or both directions by three different groups: Genoscope (French National Sequencing Center), the University of Iowa Sequencing Facility and the DOE Joint Genome Institute. These were assembled using the phrap software package into 31,232 unique contigs and singletons, with an average length of 827 nt. The assembled transcripts were then translated in all six frames and compared to proteins in NCBI's non-redundant (NR) and to the Gene Ontology (GO) protein sequence databases, resulting in 15,565 matches to 11,236 proteins in NR and 13,935 matches to 8,073 proteins in GO. Searching the database for transcripts of genes homologous to those thought to be involved in the innate immune responses of vertebrates and other invertebrates yielded a set of nearly one hundred evolutionarily conserved sequences, representing all known pathways involved in these important functions. The sequences obtained for Hirudo transcripts represent the first major database of genes expressed in this important model system. Comparison of translated open reading frames (ORFs) with the other openly available leech datasets, the genome and transcriptome of Helobdella robusta, shows an average identity at the amino acid level of 58% in matched sequences. Interestingly, comparison with other available Lophotrochozoans shows similar high levels of amino acid identity, where sequences match, for example, 64% with Capitella capitata (a polychaete) and 56% with Aplysia californica (a mollusk), as well as 58% with Schistosoma mansoni (a platyhelminth). Phylogenetic comparisons of putative Hirudo innate immune response genes present within the Hirudo transcriptome database herein described show a strong resemblance to the corresponding mammalian genes, indicating that this important physiological response may have older origins than what has been previously proposed.
Regional early flood warning system: design and implementation
NASA Astrophysics Data System (ADS)
Chang, L. C.; Yang, S. N.; Kuo, C. L.; Wang, Y. F.
2017-12-01
This study proposes a prototype of the regional early flood inundation warning system in Tainan City, Taiwan. The AI technology is used to forecast multi-step-ahead regional flood inundation maps during storm events. The computing time is only few seconds that leads to real-time regional flood inundation forecasting. A database is built to organize data and information for building real-time forecasting models, maintaining the relations of forecasted points, and displaying forecasted results, while real-time data acquisition is another key task where the model requires immediately accessing rain gauge information to provide forecast services. All programs related database are constructed in Microsoft SQL Server by using Visual C# to extracting real-time hydrological data, managing data, storing the forecasted data and providing the information to the visual map-based display. The regional early flood inundation warning system use the up-to-date Web technologies driven by the database and real-time data acquisition to display the on-line forecasting flood inundation depths in the study area. The friendly interface includes on-line sequentially showing inundation area by Google Map, maximum inundation depth and its location, and providing KMZ file download of the results which can be watched on Google Earth. The developed system can provide all the relevant information and on-line forecast results that helps city authorities to make decisions during typhoon events and make actions to mitigate the losses.
Combining Evidence of Preferential Gene-Tissue Relationships from Multiple Sources
Guo, Jing; Hammar, Mårten; Öberg, Lisa; Padmanabhuni, Shanmukha S.; Bjäreland, Marcus; Dalevi, Daniel
2013-01-01
An important challenge in drug discovery and disease prognosis is to predict genes that are preferentially expressed in one or a few tissues, i.e. showing a considerably higher expression in one tissue(s) compared to the others. Although several data sources and methods have been published explicitly for this purpose, they often disagree and it is not evident how to retrieve these genes and how to distinguish true biological findings from those that are due to choice-of-method and/or experimental settings. In this work we have developed a computational approach that combines results from multiple methods and datasets with the aim to eliminate method/study-specific biases and to improve the predictability of preferentially expressed human genes. A rule-based score is used to merge and assign support to the results. Five sets of genes with known tissue specificity were used for parameter pruning and cross-validation. In total we identify 3434 tissue-specific genes. We compare the genes of highest scores with the public databases: PaGenBase (microarray), TiGER (EST) and HPA (protein expression data). The results have 85% overlap to PaGenBase, 71% to TiGER and only 28% to HPA. 99% of our predictions have support from at least one of these databases. Our approach also performs better than any of the databases on identifying drug targets and biomarkers with known tissue-specificity. PMID:23950964
Building An Integrated Neurodegenerative Disease Database At An Academic Health Center
Xie, Sharon X.; Baek, Young; Grossman, Murray; Arnold, Steven E.; Karlawish, Jason; Siderowf, Andrew; Hurtig, Howard; Elman, Lauren; McCluskey, Leo; Van Deerlin, Vivianna; Lee, Virginia M.-Y.; Trojanowski, John Q.
2010-01-01
Background It is becoming increasingly important to study common and distinct etiologies, clinical and pathological features, and mechanisms related to neurodegenerative diseases such as Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis (ALS), and frontotemporal lobar degeneration (FTLD). These comparative studies rely on powerful database tools to quickly generate data sets which match diverse and complementary criteria set by the studies. Methods In this paper, we present a novel Integrated NeuroDegenerative Disease (INDD) database developed at the University of Pennsylvania (Penn) through a consortium of Penn investigators. Since these investigators work on AD, PD, ALS and FTLD, this allowed us to achieve the goal of developing an INDD database for these major neurodegenerative disorders. We used Microsoft SQL Server as the platform with built-in “backwards” functionality to provide Access as a front-end client to interface with the database. We used PHP hypertext Preprocessor to create the “front end” web interface and then integrated individual neurodegenerative disease databases using a master lookup table. We also present methods of data entry, database security, database backups, and database audit trails for this INDD database. Results We compare the results of a biomarker study using the INDD database to those using an alternative approach by querying individual database separately. Conclusions We have demonstrated that the Penn INDD database has the ability to query multiple database tables from a single console with high accuracy and reliability. The INDD database provides a powerful tool for generating data sets in comparative studies across several neurodegenerative diseases. PMID:21784346
Toward unification of taxonomy databases in a distributed computer environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kitakami, Hajime; Tateno, Yoshio; Gojobori, Takashi
1994-12-31
All the taxonomy databases constructed with the DNA databases of the international DNA data banks are powerful electronic dictionaries which aid in biological research by computer. The taxonomy databases are, however not consistently unified with a relational format. If we can achieve consistent unification of the taxonomy databases, it will be useful in comparing many research results, and investigating future research directions from existent research results. In particular, it will be useful in comparing relationships between phylogenetic trees inferred from molecular data and those constructed from morphological data. The goal of the present study is to unify the existent taxonomymore » databases and eliminate inconsistencies (errors) that are present in them. Inconsistencies occur particularly in the restructuring of the existent taxonomy databases, since classification rules for constructing the taxonomy have rapidly changed with biological advancements. A repair system is needed to remove inconsistencies in each data bank and mismatches among data banks. This paper describes a new methodology for removing both inconsistencies and mismatches from the databases on a distributed computer environment. The methodology is implemented in a relational database management system, SYBASE.« less
DANESHPARVAR, Afrooz; MOWLAVI, Gholamreza; MIRJALALI, Hamed; HAJJARAN, Homa; MOBEDI, Iraj; NADDAF, Saeed Reza; SHIDFAR, Mohammadreza; SADAT MAKKI, Mahsa
2017-01-01
Background: Demodicosis is one of the most prevalent skin diseases resulting from infestation by Demodex mites. This parasite usually inhabits in follicular infundibulum or sebaceous duct and transmits through close contact with an infested host. Methods: This study was carried from September 2014 to January 2016 at Tehran University of Medical Sciences, Tehran, Iran. DNA extraction and amplification of 16S ribosomal RNA was performed on four isolates, already obtained from four different patients and identified morphologically though clearing with 10% Potassium hydroxide (KOH) and microscopical examination. Amplified fragments from the isolates were compared with GeneBank database and phylogenetic analysis was carried out using MEGA6 software. Results: A 390 bp fragment of 16S rDNA was obtained in all isolates and analysis of generated sequences showed high similarity with those submitted to GenBank, previously. Intra-species similarity and distance also showed 99.983% and 0.017, respectively, for the studied isolates. Multiple alignments of the isolates showed Single Nucleotide Polymorphisms (SNPs) in 16S rRNA fragment. Phylogenetic analysis revealed that all 4 isolates clustered with other D. folliculorum, recovered from GenBank database. Our accession numbers KF875587 and KF875589 showed more similarity together in comparison with two other studied isolates. Conclusion: Mitochondrial 16S rDNA is one of the most suitable molecular barcodes for identification D. folliculorum and this fragment can use for intra-species characterization of the most human-infected mites. PMID:28761482
3D multi-view convolutional neural networks for lung nodule classification
Kang, Guixia; Hou, Beibei; Zhang, Ningbo
2017-01-01
The 3D convolutional neural network (CNN) is able to make full use of the spatial 3D context information of lung nodules, and the multi-view strategy has been shown to be useful for improving the performance of 2D CNN in classifying lung nodules. In this paper, we explore the classification of lung nodules using the 3D multi-view convolutional neural networks (MV-CNN) with both chain architecture and directed acyclic graph architecture, including 3D Inception and 3D Inception-ResNet. All networks employ the multi-view-one-network strategy. We conduct a binary classification (benign and malignant) and a ternary classification (benign, primary malignant and metastatic malignant) on Computed Tomography (CT) images from Lung Image Database Consortium and Image Database Resource Initiative database (LIDC-IDRI). All results are obtained via 10-fold cross validation. As regards the MV-CNN with chain architecture, results show that the performance of 3D MV-CNN surpasses that of 2D MV-CNN by a significant margin. Finally, a 3D Inception network achieved an error rate of 4.59% for the binary classification and 7.70% for the ternary classification, both of which represent superior results for the corresponding task. We compare the multi-view-one-network strategy with the one-view-one-network strategy. The results reveal that the multi-view-one-network strategy can achieve a lower error rate than the one-view-one-network strategy. PMID:29145492
Database trial impact on graduate nursing comprehensive exams
Pionke, Katharine; Huckstadt, Alicia
2015-01-01
While the authors were doing a test period of databases, the question of whether or not databases affect outcomes of graduate nursing comprehensive examinations came up. This study explored that question through using citation analysis of exams that were taken during a database trial and exams that were not. The findings showed no difference in examination pass/fail rates. While the pass/fail rates did not change, a great deal was learned in terms of citation accuracy and types of materials that students used, leading to discussions about changing how citation and plagiarism awareness were taught. PMID:26512218
Fuentes-Márquez, Pedro; Cabrera-Martos, Irene; Valenza, Marie Carmen
2018-05-14
To summarize the available scientific evidence on physiotherapy interventions in the management of chronic pelvic pain (CPP). A systematic review of randomized controlled trials was performed. An electronic search of MEDLINE, CINAHL, and Web of Science databases was performed to identify relevant randomized trials from 2010-2016. Manuscripts were included if at least one of the comparison groups received a physiotherapy intervention. Studies were assessed in duplicate for data extraction and risk of bias using the Physiotherapy Evidence Database scale PEDro. Eight of the studies screened met the inclusion criteria. Four manuscripts studied the effects of electrotherapy including intravaginal electrical stimulation, short wave diathermy, respiratory-gated auricular vagal afferent nerve stimulation, percutaneous tibial nerve stimulation, and sono-electro-magnetic therapy with positive results. Three studies focused on manual assessing the efficacy of myofascial versus massage therapy in two of them and ischemic compression for trigger points. Although physiotherapy interventions show some beneficial effects, evidence cannot support the results. Heterogeneity in terms of population phenotype, methodological quality, interpretation of results, and operational definition result in little overall evidence to guide treatment.
Li, Honglan; Joh, Yoon Sung; Kim, Hyunwoo; Paek, Eunok; Lee, Sang-Won; Hwang, Kyu-Baek
2016-12-22
Proteogenomics is a promising approach for various tasks ranging from gene annotation to cancer research. Databases for proteogenomic searches are often constructed by adding peptide sequences inferred from genomic or transcriptomic evidence to reference protein sequences. Such inflation of databases has potential of identifying novel peptides. However, it also raises concerns on sensitive and reliable peptide identification. Spurious peptides included in target databases may result in underestimated false discovery rate (FDR). On the other hand, inflation of decoy databases could decrease the sensitivity of peptide identification due to the increased number of high-scoring random hits. Although several studies have addressed these issues, widely applicable guidelines for sensitive and reliable proteogenomic search have hardly been available. To systematically evaluate the effect of database inflation in proteogenomic searches, we constructed a variety of real and simulated proteogenomic databases for yeast and human tandem mass spectrometry (MS/MS) data, respectively. Against these databases, we tested two popular database search tools with various approaches to search result validation: the target-decoy search strategy (with and without a refined scoring-metric) and a mixture model-based method. The effect of separate filtering of known and novel peptides was also examined. The results from real and simulated proteogenomic searches confirmed that separate filtering increases the sensitivity and reliability in proteogenomic search. However, no one method consistently identified the largest (or the smallest) number of novel peptides from real proteogenomic searches. We propose to use a set of search result validation methods with separate filtering, for sensitive and reliable identification of peptides in proteogenomic search.
National Institute of Standards and Technology Data Gateway
SRD 17 NIST Chemical Kinetics Database (Web, free access) The NIST Chemical Kinetics Database includes essentially all reported kinetics results for thermal gas-phase chemical reactions. The database is designed to be searched for kinetics data based on the specific reactants involved, for reactions resulting in specified products, for all the reactions of a particular species, or for various combinations of these. In addition, the bibliography can be searched by author name or combination of names. The database contains in excess of 38,000 separate reaction records for over 11,700 distinct reactant pairs. These data have been abstracted from over 12,000 papers with literature coverage through early 2000.
Flexible Decision Support in Device-Saturated Environments
2003-10-01
also output tuples to a remote MySQL or Postgres database. 3.3 GUI The GUI allows the user to pose queries using SQL and to display query...DatabaseConnection.java – handles connections to an external database (such as MySQL or Postgres ). • Debug.java – contains the code for printing out Debug messages...also provided. It is possible to output the results of queries to a MySQL or Postgres database for archival and the GUI can query those results
Diway, Bibian; Khoo, Eyen
2017-01-01
The development of timber tracking methods based on genetic markers can provide scientific evidence to verify the origin of timber products and fulfill the growing requirement for sustainable forestry practices. In this study, the origin of an important Dark Red Meranti wood, Shorea platyclados, was studied by using the combination of seven chloroplast DNA and 15 short tandem repeats (STRs) markers. A total of 27 natural populations of S. platyclados were sampled throughout Malaysia to establish population level and individual level identification databases. A haplotype map was generated from chloroplast DNA sequencing for population identification, resulting in 29 multilocus haplotypes, based on 39 informative intraspecific variable sites. Subsequently, a DNA profiling database was developed from 15 STRs allowing for individual identification in Malaysia. Cluster analysis divided the 27 populations into two genetic clusters, corresponding to the region of Eastern and Western Malaysia. The conservativeness tests showed that the Malaysia database is conservative after removal of bias from population subdivision and sampling effects. Independent self-assignment tests correctly assigned individuals to the database in an overall 60.60−94.95% of cases for identified populations, and in 98.99−99.23% of cases for identified regions. Both the chloroplast DNA database and the STRs appear to be useful for tracking timber originating in Malaysia. Hence, this DNA-based method could serve as an effective addition tool to the existing forensic timber identification system for ensuring the sustainably management of this species into the future. PMID:28430826
Fast Fingerprint Database Maintenance for Indoor Positioning Based on UGV SLAM
Tang, Jian; Chen, Yuwei; Chen, Liang; Liu, Jingbin; Hyyppä, Juha; Kukko, Antero; Kaartinen, Harri; Hyyppä, Hannu; Chen, Ruizhi
2015-01-01
Indoor positioning technology has become more and more important in the last two decades. Utilizing Received Signal Strength Indicator (RSSI) fingerprints of Signals of OPportunity (SOP) is a promising alternative navigation solution. However, as the RSSIs vary during operation due to their physical nature and are easily affected by the environmental change, one challenge of the indoor fingerprinting method is maintaining the RSSI fingerprint database in a timely and effective manner. In this paper, a solution for rapidly updating the fingerprint database is presented, based on a self-developed Unmanned Ground Vehicles (UGV) platform NAVIS. Several SOP sensors were installed on NAVIS for collecting indoor fingerprint information, including a digital compass collecting magnetic field intensity, a light sensor collecting light intensity, and a smartphone which collects the access point number and RSSIs of the pre-installed WiFi network. The NAVIS platform generates a map of the indoor environment and collects the SOPs during processing of the mapping, and then the SOP fingerprint database is interpolated and updated in real time. Field tests were carried out to evaluate the effectiveness and efficiency of the proposed method. The results showed that the fingerprint databases can be quickly created and updated with a higher sampling frequency (5Hz) and denser reference points compared with traditional methods, and the indoor map can be generated without prior information. Moreover, environmental changes could also be detected quickly for fingerprint indoor positioning. PMID:25746096
A Benchmark and Comparative Study of Video-Based Face Recognition on COX Face Database.
Huang, Zhiwu; Shan, Shiguang; Wang, Ruiping; Zhang, Haihong; Lao, Shihong; Kuerban, Alifu; Chen, Xilin
2015-12-01
Face recognition with still face images has been widely studied, while the research on video-based face recognition is inadequate relatively, especially in terms of benchmark datasets and comparisons. Real-world video-based face recognition applications require techniques for three distinct scenarios: 1) Videoto-Still (V2S); 2) Still-to-Video (S2V); and 3) Video-to-Video (V2V), respectively, taking video or still image as query or target. To the best of our knowledge, few datasets and evaluation protocols have benchmarked for all the three scenarios. In order to facilitate the study of this specific topic, this paper contributes a benchmarking and comparative study based on a newly collected still/video face database, named COX(1) Face DB. Specifically, we make three contributions. First, we collect and release a largescale still/video face database to simulate video surveillance with three different video-based face recognition scenarios (i.e., V2S, S2V, and V2V). Second, for benchmarking the three scenarios designed on our database, we review and experimentally compare a number of existing set-based methods. Third, we further propose a novel Point-to-Set Correlation Learning (PSCL) method, and experimentally show that it can be used as a promising baseline method for V2S/S2V face recognition on COX Face DB. Extensive experimental results clearly demonstrate that video-based face recognition needs more efforts, and our COX Face DB is a good benchmark database for evaluation.
Yu, Yue; Liu, Hongwei; Tu, Maolin; Qiao, Meiling; Wang, Zhenyu; Du, Ming
2017-12-01
Ruditapes philippinarum is nutrient-rich and widely-distributed, but little attention has been paid to the identification and characterization of the bioactive peptides in the bivalve. In the present study, we evaluated the peptides of the R. philippinarum that were enzymolysised by trypsin using a combination of ultra-performance liquid chromatography separation and electrospray ionization quadrupole time-of-flight tandem mass spectrometry, followed by data processing and sequence-similarity database searching. The potential allergenicity of the peptides was assessed in silico. The enzymolysis was performed under the conditions: E:S 3:100 (w/w), pH 9.0, 45 °C for 4 h. After separation and detection, the Swiss-Prot database and a Ruditapes philippinarum sequence database were used: 966 unique peptides were identified by non-error tolerant database searching; 173 peptides matching 55 precursor proteins comprised highly conserved cytoskeleton proteins. The remaining 793 peptides were identified from the R. philippinarum sequence database. The results showed that 510 peptides were labeled as allergens and 31 peptides were potential allergens; 425 peptides were predicted to be nonallergenic. The abundant peptide information contributes to further investigations of the structure and potential function of R. philippinarum. Additional in vitro studies are required to demonstrate and ensure the correct production of the hydrolysates for use in the food industry with respect to R. philippinarum. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Cognetti, Daniel; Keeny, Heather M; Samdani, Amer F; Pahys, Joshua M; Hanson, Darrell S; Blanke, Kathy; Hwang, Steven W
2017-10-01
OBJECTIVE Postoperative complications are one of the most significant concerns in surgeries of the spine, especially in higher-risk cases such as neuromuscular scoliosis. Neuromuscular scoliosis is a classification of multiple diseases affecting the neuromotor system or musculature of patients leading to severe degrees of spinal deformation, disability, and comorbidity, all likely contributing to higher rates of postoperative complications. The objective of this study was to evaluate deformity correction of patients with neuromuscular scoliosis over a 12-year period (2004-2015) by looking at changes in postsurgical complications and management. METHODS The authors queried the Scoliosis Research Society (SRS) Morbidity and Mortality (M&M) database for neuromuscular scoliosis cases from 2004 to 2015. The SRS M&M database is an international database with thousands of self-reported cases by fellowship-trained surgeons. The database has previously been validated, but reorganization in 2008 created less-robust data sets from 2008 to 2011. Consequently, the majority of analysis in this report was performed using cohorts that bookend the 12-year period (2004-2007 and 2012-2015). Of the 312 individual fields recorded per patient, demographic analysis was completed for age, sex, diagnosis, and preoperative curvature. Analysis of complications included infection, bleeding, mortality, respiratory, neurological deficit, and management practices. RESULTS From 2004 to 2015, a total of 29,019 cases of neuromuscular scoliosis were reported with 1385 complications, equating to a 6.3% complication rate when excluding the less-robust data from 2008 to 2011. This study shows a 3.5-fold decrease in overall complication rates from 2004 to 2015. A closer look at complications shows a significant decrease in wound infections (superficial and deep), respiratory complications, and implant-associated complications. The overall complication rate decreased by approximately 10% from 2004-2007 to 2012-2015. CONCLUSIONS This study demonstrates a substantial decrease in complication rates from 2004 to 2015 for patients with neuromuscular scoliosis undergoing spine surgery. Decreases in specific complications, such as surgical site infection, allow us to gauge our progress while observing how trends in management affect outcomes. Further study is needed to validate this report, but these results are encouraging, helping to reinforce efforts toward continual improvement in patient care.
Management system for the SND experiments
NASA Astrophysics Data System (ADS)
Pugachev, K.; Korol, A.
2017-09-01
A new management system for the SND detector experiments (at VEPP-2000 collider in Novosibirsk) is developed. We describe here the interaction between a user and the SND databases. These databases contain experiment configuration, conditions and metadata. The new system is designed in client-server architecture. It has several logical layers corresponding to the users roles. A new template engine is created. A web application is implemented using Node.js framework. At the time the application provides: showing and editing configuration; showing experiment metadata and experiment conditions data index; showing SND log (prototype).
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
QBIC project: querying images by content, using color, texture, and shape
NASA Astrophysics Data System (ADS)
Niblack, Carlton W.; Barber, Ron; Equitz, Will; Flickner, Myron D.; Glasman, Eduardo H.; Petkovic, Dragutin; Yanker, Peter; Faloutsos, Christos; Taubin, Gabriel
1993-04-01
In the query by image content (QBIC) project we are studying methods to query large on-line image databases using the images' content as the basis of the queries. Examples of the content we use include color, texture, and shape of image objects and regions. Potential applications include medical (`Give me other images that contain a tumor with a texture like this one'), photo-journalism (`Give me images that have blue at the top and red at the bottom'), and many others in art, fashion, cataloging, retailing, and industry. Key issues include derivation and computation of attributes of images and objects that provide useful query functionality, retrieval methods based on similarity as opposed to exact match, query by image example or user drawn image, the user interfaces, query refinement and navigation, high dimensional database indexing, and automatic and semi-automatic database population. We currently have a prototype system written in X/Motif and C running on an RS/6000 that allows a variety of queries, and a test database of over 1000 images and 1000 objects populated from commercially available photo clip art images. In this paper we present the main algorithms for color texture, shape and sketch query that we use, show example query results, and discuss future directions.
Verification of road databases using multiple road models
NASA Astrophysics Data System (ADS)
Ziems, Marcel; Rottensteiner, Franz; Heipke, Christian
2017-08-01
In this paper a new approach for automatic road database verification based on remote sensing images is presented. In contrast to existing methods, the applicability of the new approach is not restricted to specific road types, context areas or geographic regions. This is achieved by combining several state-of-the-art road detection and road verification approaches that work well under different circumstances. Each one serves as an independent module representing a unique road model and a specific processing strategy. All modules provide independent solutions for the verification problem of each road object stored in the database in form of two probability distributions, the first one for the state of a database object (correct or incorrect), and a second one for the state of the underlying road model (applicable or not applicable). In accordance with the Dempster-Shafer Theory, both distributions are mapped to a new state space comprising the classes correct, incorrect and unknown. Statistical reasoning is applied to obtain the optimal state of a road object. A comparison with state-of-the-art road detection approaches using benchmark datasets shows that in general the proposed approach provides results with larger completeness. Additional experiments reveal that based on the proposed method a highly reliable semi-automatic approach for road data base verification can be designed.
ERIC Educational Resources Information Center
Criscuolo, Chiara; Martin, Ralf
2004-01-01
The main objective of this Working Paper is to show a set of indicators on the knowledge-based economy for China, mainly compiled from databases within EAS, although data from databases maintained by other parts of the OECD are included as well. These indicators are put in context by comparison with data for the United States, Japan and the EU (or…
Hydrogen Leak Detection Sensor Database
NASA Technical Reports Server (NTRS)
Baker, Barton D.
2010-01-01
This slide presentation reviews the characteristics of the Hydrogen Sensor database. The database is the result of NASA's continuing interest in and improvement of its ability to detect and assess gas leaks in space applications. The database specifics and a snapshot of an entry in the database are reviewed. Attempts were made to determine the applicability of each of the 65 sensors for ground and/or vehicle use.
Simple Logic for Big Problems: An Inside Look at Relational Databases.
ERIC Educational Resources Information Center
Seba, Douglas B.; Smith, Pat
1982-01-01
Discusses database design concept termed "normalization" (process replacing associations between data with associations in two-dimensional tabular form) which results in formation of relational databases (they are to computers what dictionaries are to spoken languages). Applications of the database in serials control and complex systems…
NASA Technical Reports Server (NTRS)
Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang
2009-01-01
Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.
NASA Astrophysics Data System (ADS)
Yin, Lucy; Andrews, Jennifer; Heaton, Thomas
2018-05-01
Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics.
Deutsch, Eric W; Sun, Zhi; Campbell, David S; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S; Moritz, Robert L
2016-11-04
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances-a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ∼20,000 primary isoforms plus contaminants to a very large database that includes almost all nonredundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/ .
Tiered Human Integrated Sequence Search Databases for Shotgun Proteomics
Deutsch, Eric W.; Sun, Zhi; Campbell, David S.; Binz, Pierre-Alain; Farrah, Terry; Shteynberg, David; Mendoza, Luis; Omenn, Gilbert S.; Moritz, Robert L.
2016-01-01
The results of analysis of shotgun proteomics mass spectrometry data can be greatly affected by the selection of the reference protein sequence database against which the spectra are matched. For many species there are multiple sources from which somewhat different sequence sets can be obtained. This can lead to confusion about which database is best in which circumstances – a problem especially acute in human sample analysis. All sequence databases are genome-based, with sequences for the predicted gene and their protein translation products compiled. Our goal is to create a set of primary sequence databases that comprise the union of sequences from many of the different available sources and make the result easily available to the community. We have compiled a set of four sequence databases of varying sizes, from a small database consisting of only the ~20,000 primary isoforms plus contaminants to a very large database that includes almost all non-redundant protein sequences from several sources. This set of tiered, increasingly complete human protein sequence databases suitable for mass spectrometry proteomics sequence database searching is called the Tiered Human Integrated Search Proteome set. In order to evaluate the utility of these databases, we have analyzed two different data sets, one from the HeLa cell line and the other from normal human liver tissue, with each of the four tiers of database complexity. The result is that approximately 0.8%, 1.1%, and 1.5% additional peptides can be identified for Tiers 2, 3, and 4, respectively, as compared with the Tier 1 database, at substantially increasing computational cost. This increase in computational cost may be worth bearing if the identification of sequence variants or the discovery of sequences that are not present in the reviewed knowledge base entries is an important goal of the study. We find that it is useful to search a data set against a simpler database, and then check the uniqueness of the discovered peptides against a more complex database. We have set up an automated system that downloads all the source databases on the first of each month and automatically generates a new set of search databases and makes them available for download at http://www.peptideatlas.org/thisp/. PMID:27577934
Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard
2016-10-01
Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
Bujak-Pietrek, Stella; Mikołajczyk, Urszula; Szadkowska-Stańczyk, Irena; Stroszejn-Mrowca, Grazyna
2008-01-01
To evaluate occupational exposure to dusts, the Nofer Institute of Occupational Medicine in Łódź, in collaboration with the Chief Sanitary Inspectorate, has developed the national database to store the results of routine dust exposure measurements performed by occupational hygiene and environmental protection laboratories in Poland in the years 2001-2005. It was assumed that the collected information will be useful in analyzing workers' exposure to free crystalline silica (WKK)-containing dusts in Poland, identyfing exceeded hygiene standards and showing relevant trends, which illustrate the dynamics of exposure in the years under study. Inhalable and respirable dust measurement using personal dosimetry were done according to polish standard PN-91/Z-04030/05 and PN-91/Z-04030/06. In total, 148 638 measurement records, provided by sanitary inspection services from all over Poland, were entered into the database. The database enables the estimation of occupational exposure to dust by the sectors of national economy, according to the Polish Classification of Activity (PKD) and by kinds of dust. The highest exposure level of inhalable and respirable dusts was found in coal mining. Also in this sector, almost 60% of surveys demonstrated exceeded current hygiene standards. High concentrations of both dust fractions (inhalable and respirable) and a considerable percentage of measurements exceeding hygiene standards were found in the manufacture of transport equipment (except for cars), as well as in the chemical, mining (rock, sand, gravel, clay mines) and construction industries. The highest percentage of surveys (inhalable and respirable dust) showing exceeded hygiene standards were observed for coal dust with different content of crystalline silica, organic dust containing more than 10% of SiO2, and highly fibrosis dust containing more than 50% of SiO2.
A Global Geospatial Database of 5000+ Historic Flood Event Extents
NASA Astrophysics Data System (ADS)
Tellman, B.; Sullivan, J.; Doyle, C.; Kettner, A.; Brakenridge, G. R.; Erickson, T.; Slayback, D. A.
2017-12-01
A key dataset that is missing for global flood model validation and understanding historic spatial flood vulnerability is a global historical geo-database of flood event extents. Decades of earth observing satellites and cloud computing now make it possible to not only detect floods in near real time, but to run these water detection algorithms back in time to capture the spatial extent of large numbers of specific events. This talk will show results from the largest global historical flood database developed to date. We use the Dartmouth Flood Observatory flood catalogue to map over 5000 floods (from 1985-2017) using MODIS, Landsat, and Sentinel-1 Satellites. All events are available for public download via the Earth Engine Catalogue and via a website that allows the user to query floods by area or date, assess population exposure trends over time, and download flood extents in geospatial format.In this talk, we will highlight major trends in global flood exposure per continent, land use type, and eco-region. We will also make suggestions how to use this dataset in conjunction with other global sets to i) validate global flood models, ii) assess the potential role of climatic change in flood exposure iii) understand how urbanization and other land change processes may influence spatial flood exposure iv) assess how innovative flood interventions (e.g. wetland restoration) influence flood patterns v) control for event magnitude to assess the role of social vulnerability and damage assessment vi) aid in rapid probabilistic risk assessment to enable microinsurance markets. Authors on this paper are already using the database for the later three applications and will show examples of wetland intervention analysis in Argentina, social vulnerability analysis in the USA, and micro insurance in India.
Zhu, Y B; Xie, X Q; Li, Z Y; Bai, H; Dong, L; Dong, Z P; Dong, J G
2014-08-28
The nucleotide-binding site (NBS) disease-resistance genes are the largest category of plant disease-resistance gene analogs. The complete set of disease-resistant candidate genes, which encode the NBS sequence, was filtered in the genomes of two varieties of foxtail millet (Yugu1 and 'Zhang gu'). This study investigated a number of characteristics of the putative NBS genes, such as structural diversity and phylogenetic relationships. A total of 269 and 281 NBS-coding sequences were identified in Yugu1 and 'Zhang gu', respectively. When the two databases were compared, 72 genes were found to be identical and 164 genes showed more than 90% similarity. Physical positioning and gene family analysis of the NBS disease-resistance genes in the genome revealed that the number of genes on each chromosome was similar in both varieties. The eighth chromosome contained the largest number of genes and the ninth chromosome contained the lowest number of genes. Exactly 34 gene clusters containing the 161 genes were found in the Yugu1 genome, with each cluster containing 4.7 genes on average. In comparison, the 'Zhang gu' genome possessed 28 gene clusters, which had 151 genes, with an average of 5.4 genes in each cluster. The largest gene cluster, located on the eighth chromosome, contained 12 genes in the Yugu1 database, whereas it contained 16 genes in the 'Zhang gu' database. The classification results showed that the CC-NBS-LRR gene made up the largest part of each chromosome in the two databases. Two TIR-NBS genes were also found in the Yugu1 genome.
Mackey, Aaron J; Pearson, William R
2004-10-01
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
NASA Astrophysics Data System (ADS)
Hargitai, Henrik
2016-10-01
We have created a metacatalog, or catalog or catalogs, of surface features of Mars that also includes the actual data in the catalogs listed. The goal is to make mesoscale surface feature databases available in one place, in a GIS-ready format. The databases can be directly imported to ArcGIS or other GIS platforms, like Google Mars. Some of the catalogs in our database are also ingested into the JMARS platform.All catalogs have been previously published in a peer-reviewed journal, but they may contain updates of the published catalogs. Many of the catalogs are "integrated", i.e. they merge databases or information from various papers on the same topic, including references to each individual features listed.Where available, we have included shapefiles with polygon or linear features, however, most of the catalogs only contain point data of their center points and morphological data.One of the unexpected results of the planetary feature metacatalog is that some features have been described by several papers, using different, i.e., conflicting designations. This shows the need for the development of an identification system suitable for mesoscale (100s m to km sized) features that tracks papers and thus prevents multiple naming of the same feature.The feature database can be used for multicriteria analysis of a terrain, thus enables easy distribution pattern analysis and the correlation of the distribution of different landforms and features on Mars. Such catalog makes a scientific evaluation of potential landing sites easier and more effective during the selection process and also supports automated landing site selections.The catalog is accessible at https://planetarydatabase.wordpress.com/.
Adaptive Data-based Predictive Control for Short Take-off and Landing (STOL) Aircraft
NASA Technical Reports Server (NTRS)
Barlow, Jonathan Spencer; Acosta, Diana Michelle; Phan, Minh Q.
2010-01-01
Data-based Predictive Control is an emerging control method that stems from Model Predictive Control (MPC). MPC computes current control action based on a prediction of the system output a number of time steps into the future and is generally derived from a known model of the system. Data-based predictive control has the advantage of deriving predictive models and controller gains from input-output data. Thus, a controller can be designed from the outputs of complex simulation code or a physical system where no explicit model exists. If the output data happens to be corrupted by periodic disturbances, the designed controller will also have the built-in ability to reject these disturbances without the need to know them. When data-based predictive control is implemented online, it becomes a version of adaptive control. The characteristics of adaptive data-based predictive control are particularly appropriate for the control of nonlinear and time-varying systems, such as Short Take-off and Landing (STOL) aircraft. STOL is a capability of interest to NASA because conceptual Cruise Efficient Short Take-off and Landing (CESTOL) transport aircraft offer the ability to reduce congestion in the terminal area by utilizing existing shorter runways at airports, as well as to lower community noise by flying steep approach and climb-out patterns that reduce the noise footprint of the aircraft. In this study, adaptive data-based predictive control is implemented as an integrated flight-propulsion controller for the outer-loop control of a CESTOL-type aircraft. Results show that the controller successfully tracks velocity while attempting to maintain a constant flight path angle, using longitudinal command, thrust and flap setting as the control inputs.
Lugardon, Stephanie; Desboeuf, Karine; Fernet, Pierre; Montastruc, Jean-Louis; Lapeyre-Mestre, Maryse
2006-01-01
Aims There is evidence that different methods used to identify and quantify adverse drug reactions (ADR) in hospitals are not exhaustive (spontaneous reporting or computerized medical databases). The combination of these different sources of data could improve knowledge about ADR frequency in hospitals. The aim of this study was to estimate the incidence of serious ADRs handled in medical wards of a French university hospital using data from the Programme de Medicalization des Systemes d’Information (PMSI) and spontaneous reports recorded in the French Pharmacovigilance Database. Methods The study period was the first semester of 2001. From PMSI, all hospitalization summaries including an ICD-10th code related to a potential ADR were selected. From the French Pharmacovigilance Database, all serious ADRs which occurred during the study period and were reported by physicians working in the University Hospital were collected. After identification of common cases, the capture–recapture method was applied in order to estimate the real number of ADRs occurring during the first semester of 2001. Results From PMSI, we identified 274 different hospital stays related to an ADR. Out of 241 reports selected from the French Pharmacovigilance Database, we retained 151 ADRs for analysis. Fifty-two ADRs were common in the two databases, giving an estimated number of serious ADRs of 796 [95% confidence interval (CI) 638, 954], corresponding to 2.9% of inpatients (95% CI 2.3, 3.5). Conclusions This study shows the lack of exhaustiveness of ADR reporting whatever the sources of data and underlines the interest of merging data from different databases to identify fully the real impact of ADR in hospitals. PMID:16842398
Quantification of the Uncertainties for the Ares I A106 Ascent Aerodynamic Database
NASA Technical Reports Server (NTRS)
Houlden, Heather P.; Favaregh, Amber L.
2010-01-01
A detailed description of the quantification of uncertainties for the Ares I ascent aero 6-DOF wind tunnel database is presented. The database was constructed from wind tunnel test data and CFD results. The experimental data came from tests conducted in the Boeing Polysonic Wind Tunnel in St. Louis and the Unitary Plan Wind Tunnel at NASA Langley Research Center. The major sources of error for this database were: experimental error (repeatability), database modeling errors, and database interpolation errors.
Real-time terrain storage generation from multiple sensors towards mobile robot operation interface.
Song, Wei; Cho, Seoungjae; Xi, Yulong; Cho, Kyungeun; Um, Kyhyun
2014-01-01
A mobile robot mounted with multiple sensors is used to rapidly collect 3D point clouds and video images so as to allow accurate terrain modeling. In this study, we develop a real-time terrain storage generation and representation system including a nonground point database (PDB), ground mesh database (MDB), and texture database (TDB). A voxel-based flag map is proposed for incrementally registering large-scale point clouds in a terrain model in real time. We quantize the 3D point clouds into 3D grids of the flag map as a comparative table in order to remove the redundant points. We integrate the large-scale 3D point clouds into a nonground PDB and a node-based terrain mesh using the CPU. Subsequently, we program a graphics processing unit (GPU) to generate the TDB by mapping the triangles in the terrain mesh onto the captured video images. Finally, we produce a nonground voxel map and a ground textured mesh as a terrain reconstruction result. Our proposed methods were tested in an outdoor environment. Our results show that the proposed system was able to rapidly generate terrain storage and provide high resolution terrain representation for mobile mapping services and a graphical user interface between remote operators and mobile robots.
Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad
2016-01-01
Objective: The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. Background: However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. Material and Methods: A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. Results: With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital’s clinical governance was required to create a database. Conclusion: Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems. PMID:27147804
Jeddi, Fatemeh Rangraz; Farzandipoor, Mehrdad; Arabfard, Masoud; Hosseini, Azam Haj Mohammad
2014-01-01
Objective: The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. Background: However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. Material and Methods: A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. Results: With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital’s clinical governance was required to create a database. Conclusion: Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems. PMID:24825933
Human emotion detector based on genetic algorithm using lip features
NASA Astrophysics Data System (ADS)
Brown, Terrence; Fetanat, Gholamreza; Homaifar, Abdollah; Tsou, Brian; Mendoza-Schrock, Olga
2010-04-01
We predicted human emotion using a Genetic Algorithm (GA) based lip feature extractor from facial images to classify all seven universal emotions of fear, happiness, dislike, surprise, anger, sadness and neutrality. First, we isolated the mouth from the input images using special methods, such as Region of Interest (ROI) acquisition, grayscaling, histogram equalization, filtering, and edge detection. Next, the GA determined the optimal or near optimal ellipse parameters that circumvent and separate the mouth into upper and lower lips. The two ellipses then went through fitness calculation and were followed by training using a database of Japanese women's faces expressing all seven emotions. Finally, our proposed algorithm was tested using a published database consisting of emotions from several persons. The final results were then presented in confusion matrices. Our results showed an accuracy that varies from 20% to 60% for each of the seven emotions. The errors were mainly due to inaccuracies in the classification, and also due to the different expressions in the given emotion database. Detailed analysis of these errors pointed to the limitation of detecting emotion based on the lip features alone. Similar work [1] has been done in the literature for emotion detection in only one person, we have successfully extended our GA based solution to include several subjects.
Understanding the Influence of Environment on Adults’ Walking Experiences: A Meta-Synthesis Study
Dadpour, Sara; Pakzad, Jahanshah; Khankeh, Hamidreza
2016-01-01
The environment has an important impact on physical activity, especially walking. The relationship between the environment and walking is not the same as for other types of physical activity. This study seeks to comprehensively identify the environmental factors influencing walking and to show how those environmental factors impact on walking using the experiences of adults between the ages of 18 and 65. The current study is a meta-synthesis based on a systematic review. Seven databases of related disciplines were searched, including health, transportation, physical activity, architecture, and interdisciplinary databases. In addition to the databases, two journals were searched. Of the 11,777 papers identified, 10 met the eligibility criteria and quality for selection. Qualitative content analysis was used for analysis of the results. The four themes identified as influencing walking were “safety and security”, “environmental aesthetics”, “social relations”, and “convenience and efficiency”. “Convenience and efficiency” and “environmental aesthetics” could enhance the impact of “social relations” on walking in some aspects. In addition, “environmental aesthetics” and “social relations” could hinder the influence of “convenience and efficiency” on walking in some aspects. Given the results of the study, strategies are proposed to enhance the walking experience. PMID:27447660
Real-Time Terrain Storage Generation from Multiple Sensors towards Mobile Robot Operation Interface
Cho, Seoungjae; Xi, Yulong; Cho, Kyungeun
2014-01-01
A mobile robot mounted with multiple sensors is used to rapidly collect 3D point clouds and video images so as to allow accurate terrain modeling. In this study, we develop a real-time terrain storage generation and representation system including a nonground point database (PDB), ground mesh database (MDB), and texture database (TDB). A voxel-based flag map is proposed for incrementally registering large-scale point clouds in a terrain model in real time. We quantize the 3D point clouds into 3D grids of the flag map as a comparative table in order to remove the redundant points. We integrate the large-scale 3D point clouds into a nonground PDB and a node-based terrain mesh using the CPU. Subsequently, we program a graphics processing unit (GPU) to generate the TDB by mapping the triangles in the terrain mesh onto the captured video images. Finally, we produce a nonground voxel map and a ground textured mesh as a terrain reconstruction result. Our proposed methods were tested in an outdoor environment. Our results show that the proposed system was able to rapidly generate terrain storage and provide high resolution terrain representation for mobile mapping services and a graphical user interface between remote operators and mobile robots. PMID:25101321
Identification of Functionally Related Enzymes by Learning-to-Rank Methods.
Stock, Michiel; Fober, Thomas; Hüllermeier, Eyke; Glinca, Serghei; Klebe, Gerhard; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2014-01-01
Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.
Face liveness detection using shearlet-based feature descriptors
NASA Astrophysics Data System (ADS)
Feng, Litong; Po, Lai-Man; Li, Yuming; Yuan, Fang
2016-07-01
Face recognition is a widely used biometric technology due to its convenience but it is vulnerable to spoofing attacks made by nonreal faces such as photographs or videos of valid users. The antispoof problem must be well resolved before widely applying face recognition in our daily life. Face liveness detection is a core technology to make sure that the input face is a live person. However, this is still very challenging using conventional liveness detection approaches of texture analysis and motion detection. The aim of this paper is to propose a feature descriptor and an efficient framework that can be used to effectively deal with the face liveness detection problem. In this framework, new feature descriptors are defined using a multiscale directional transform (shearlet transform). Then, stacked autoencoders and a softmax classifier are concatenated to detect face liveness. We evaluated this approach using the CASIA Face antispoofing database and replay-attack database. The experimental results show that our approach performs better than the state-of-the-art techniques following the provided protocols of these databases, and it is possible to significantly enhance the security of the face recognition biometric system. In addition, the experimental results also demonstrate that this framework can be easily extended to classify different spoofing attacks.
Nursing students' satisfaction with bilingual teaching in nursing courses in China: A meta-analysis.
Cai, Chunlian; Zhang, Chunmei; Wang, Yan; Xiong, Lina; Jin, Yanfei; Jin, Changde
2016-09-01
The aim of this meta-analysis is to systematically evaluate nursing students' satisfaction with the textbooks, teachers, teaching methods and overall teaching result in nursing bilingual teaching in China. The relevant cross-sectional studies were retrieved from multiple electronic databases including PubMed, Web of Science, Chinese BioMed Database (CBM), China National Knowledge Infrastructure (CNKI) and WanFang Database from inception to August 2015. Studies that measured students' satisfaction with textbooks, teachers, teaching methods, overall teaching result in nursing bilingual teaching in China as outcomes were included. The data were independently extracted using a standardized form and analyzed by STATA (version12.0). A total of thirty-four studies, including 3533 nursing students, were eligible for inclusion in the review. Meta-analyses revealed that nursing students' satisfaction rate of textbooks was 64%, 95%CI (46%, 82%), teachers' teaching attitude was 88%, 95%CI (84%, 92%), teachers' oral expression was 60%, 95%CI (38%, 81%), teachers' pronunciation was 90%, 95%CI (86%, 94%), teachers' teaching ability was 71%, 95%CI (60%, 82%), teaching methods was 69%, 95%CI (52%, 86%) and overall teaching result was 80%, 95%CI (68%, 92%). Our results show that nursing students' satisfaction with the textbooks, teachers, teaching methods and overall teaching result is not high in nursing bilingual teaching in China. These findings suggest that future directions for improving bilingual teaching in China include establishing suitable bilingual teaching material, training teaching faculty members and adopting proper teaching methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
Release of (and lessons learned from mining) a pioneering large toxicogenomics database.
Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R
2015-07-01
We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi
2013-01-01
Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Risk of injurious road traffic crash after prescription of antidepressants.
Orriols, Ludivine; Queinec, Raphaëlle; Philip, Pierre; Gadegbeku, Blandine; Delorme, Bernard; Moore, Nicholas; Suissa, Samy; Lagarde, Emmanuel
2012-08-01
To estimate the risk of road traffic crash associated with prescription of antidepressants. Data were extracted and matched from 3 French national databases: the national health care insurance database, police reports, and the national police database of injurious crashes. A case-control analysis comparing 34,896 responsible versus 37,789 nonresponsible drivers was conducted. Case-crossover analysis was performed to investigate the acute effect of medicine exposure. 72,685 drivers, identified by their national health care number, involved in an injurious crash in France from July 2005 to May 2008 were included. 2,936 drivers (4.0%) were exposed to at least 1 antidepressant on the day of the crash. The results showed a significant association between the risk of being responsible for a crash and prescription of antidepressants (odds ratio [OR] = 1.34; 95% CI, 1.22-1.47). The case-crossover analysis showed no association with treatment prescription, but the risk of road traffic crash increased after an initiation of antidepressant treatment (OR = 1.49; 95% CI, 1.24-1.79) and after a change in antidepressant treatment (OR = 1.32; 95% CI, 1.09-1.60). Patients and prescribers should be warned about the risk of crash during periods of treatment with antidepressant medication and about particularly high vulnerability periods such as those when a treatment is initiated or modified. © Copyright 2012 Physicians Postgraduate Press, Inc.
LeishCyc: a guide to building a metabolic pathway database and visualization of metabolomic data.
Saunders, Eleanor C; MacRae, James I; Naderer, Thomas; Ng, Milica; McConville, Malcolm J; Likić, Vladimir A
2012-01-01
The complexity of the metabolic networks in even the simplest organisms has raised new challenges in organizing metabolic information. To address this, specialized computer frameworks have been developed to capture, manage, and visualize metabolic knowledge. The leading databases of metabolic information are those organized under the umbrella of the BioCyc project, which consists of the reference database MetaCyc, and a number of pathway/genome databases (PGDBs) each focussed on a specific organism. A number of PGDBs have been developed for bacterial, fungal, and protozoan pathogens, greatly facilitating dissection of the metabolic potential of these organisms and the identification of new drug targets. Leishmania are protozoan parasites belonging to the family Trypanosomatidae that cause a broad spectrum of diseases in humans. In this work we use the LeishCyc database, the BioCyc database for Leishmania major, to describe how to build a BioCyc database from genomic sequences and associated annotations. By using metabolomic data generated in our group, we show how such databases can be utilized to elucidate specific changes in parasite metabolism.
Selection of examples in case-based computer-aided decision systems
Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.
2013-01-01
Case-based computer-aided decision (CB-CAD) systems rely on a database of previously stored, known examples when classifying new, incoming queries. Such systems can be particularly useful since they do not need retraining every time a new example is deposited in the case base. The adaptive nature of case-based systems is well suited to the current trend of continuously expanding digital databases in the medical domain. To maintain efficiency, however, such systems need sophisticated strategies to effectively manage the available evidence database. In this paper, we discuss the general problem of building an evidence database by selecting the most useful examples to store while satisfying existing storage requirements. We evaluate three intelligent techniques for this purpose: genetic algorithm-based selection, greedy selection and random mutation hill climbing. These techniques are compared to a random selection strategy used as the baseline. The study is performed with a previously presented CB-CAD system applied for false positive reduction in screening mammograms. The experimental evaluation shows that when the development goal is to maximize the system’s diagnostic performance, the intelligent techniques are able to reduce the size of the evidence database to 37% of the original database by eliminating superfluous and/or detrimental examples while at the same time significantly improving the CAD system’s performance. Furthermore, if the case-base size is a main concern, the total number of examples stored in the system can be reduced to only 2–4% of the original database without a decrease in the diagnostic performance. Comparison of the techniques shows that random mutation hill climbing provides the best balance between the diagnostic performance and computational efficiency when building the evidence database of the CB-CAD system. PMID:18854606
Risch, John S [Kennewick, WA; Dowson, Scott T [West Richland, WA; Hart, Michelle L [Richland, WA; Hatley, Wes L [Kennewick, WA
2008-05-13
A method of displaying correlations among information objects comprises receiving a query against a database; obtaining a query result set; and generating a visualization representing the components of the result set, the visualization including one of a plane and line to represent a data field, nodes representing data values, and links showing correlations among fields and values. Other visualization methods and apparatus are disclosed.
Risch, John S [Kennewick, WA; Dowson, Scott T [West Richland, WA
2012-03-06
A method of displaying correlations among information objects includes receiving a query against a database; obtaining a query result set; and generating a visualization representing the components of the result set, the visualization including one of a plane and line to represent a data field, nodes representing data values, and links showing correlations among fields and values. Other visualization methods and apparatus are disclosed.
A data model and database for high-resolution pathology analytical image informatics.
Wang, Fusheng; Kong, Jun; Cooper, Lee; Pan, Tony; Kurc, Tahsin; Chen, Wenjin; Sharma, Ashish; Niedermayr, Cristobal; Oh, Tae W; Brat, Daniel; Farris, Alton B; Foran, David J; Saltz, Joel
2011-01-01
The systematic analysis of imaged pathology specimens often results in a vast amount of morphological information at both the cellular and sub-cellular scales. While microscopy scanners and computerized analysis are capable of capturing and analyzing data rapidly, microscopy image data remain underutilized in research and clinical settings. One major obstacle which tends to reduce wider adoption of these new technologies throughout the clinical and scientific communities is the challenge of managing, querying, and integrating the vast amounts of data resulting from the analysis of large digital pathology datasets. This paper presents a data model, which addresses these challenges, and demonstrates its implementation in a relational database system. This paper describes a data model, referred to as Pathology Analytic Imaging Standards (PAIS), and a database implementation, which are designed to support the data management and query requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines on whole-slide images and tissue microarrays (TMAs). (1) Development of a data model capable of efficiently representing and storing virtual slide related image, annotation, markup, and feature information. (2) Development of a database, based on the data model, capable of supporting queries for data retrieval based on analysis and image metadata, queries for comparison of results from different analyses, and spatial queries on segmented regions, features, and classified objects. The work described in this paper is motivated by the challenges associated with characterization of micro-scale features for comparative and correlative analyses involving whole-slides tissue images and TMAs. Technologies for digitizing tissues have advanced significantly in the past decade. Slide scanners are capable of producing high-magnification, high-resolution images from whole slides and TMAs within several minutes. Hence, it is becoming increasingly feasible for basic, clinical, and translational research studies to produce thousands of whole-slide images. Systematic analysis of these large datasets requires efficient data management support for representing and indexing results from hundreds of interrelated analyses generating very large volumes of quantifications such as shape and texture and of classifications of the quantified features. We have designed a data model and a database to address the data management requirements of detailed characterization of micro-anatomic morphology through many interrelated analysis pipelines. The data model represents virtual slide related image, annotation, markup and feature information. The database supports a wide range of metadata and spatial queries on images, annotations, markups, and features. We currently have three databases running on a Dell PowerEdge T410 server with CentOS 5.5 Linux operating system. The database server is IBM DB2 Enterprise Edition 9.7.2. The set of databases consists of 1) a TMA database containing image analysis results from 4740 cases of breast cancer, with 641 MB storage size; 2) an algorithm validation database, which stores markups and annotations from two segmentation algorithms and two parameter sets on 18 selected slides, with 66 GB storage size; and 3) an in silico brain tumor study database comprising results from 307 TCGA slides, with 365 GB storage size. The latter two databases also contain human-generated annotations and markups for regions and nuclei. Modeling and managing pathology image analysis results in a database provide immediate benefits on the value and usability of data in a research study. The database provides powerful query capabilities, which are otherwise difficult or cumbersome to support by other approaches such as programming languages. Standardized, semantic annotated data representation and interfaces also make it possible to more efficiently share image data and analysis results.
The role of perceptual load in object recognition.
Lavie, Nilli; Lin, Zhicheng; Zokaei, Nahid; Thoma, Volker
2009-10-01
Predictions from perceptual load theory (Lavie, 1995, 2005) regarding object recognition across the same or different viewpoints were tested. Results showed that high perceptual load reduces distracter recognition levels despite always presenting distracter objects from the same view. They also showed that the levels of distracter recognition were unaffected by a change in the distracter object view under conditions of low perceptual load. These results were found both with repetition priming measures of distracter recognition and with performance on a surprise recognition memory test. The results support load theory proposals that distracter recognition critically depends on the level of perceptual load. The implications for the role of attention in object recognition theories are discussed. PsycINFO Database Record (c) 2009 APA, all rights reserved.
Guardado Yordi, E; Matos, M J; Pérez Martínez, A; Tornes, A C; Santana, L; Molina, E; Uriarte, E
2017-08-01
Coumarins are a group of phytochemicals that may be beneficial or harmful to health depending on their type and dosage and the matrix that contains them. Some of these compounds have been proven to display pro-oxidant and clastogenic activities. Therefore, in the current work, we have studied the coumarins that are present in food sources extracted from the Phenol-Explorer database in order to predict their clastogenic activity and identify the structure-activity relationships and genotoxic structural alerts using alternative methods in the field of computational toxicology. It was necessary to compile information on the type and amount of coumarins in different food sources through the analysis of databases of food composition available online. A virtual screening using a clastogenic model and different software, such as MODESLAB, ChemDraw and STATISTIC, was performed. As a result, a table of food composition was prepared and qualitative information from this data was extracted. The virtual screening showed that the esterified substituents inactivate molecules, while the methoxyl and hydroxyl substituents contribute to their activity and constitute, together with the basic structures of the studied subclasses, clastogenic structural alerts. Chemical subclasses of simple coumarins and furocoumarins were classified as active (xanthotoxin, isopimpinellin, esculin, scopoletin, scopolin and bergapten). In silico genotoxicity was mainly predicted for coumarins found in beer, sherry, dried parsley, fresh parsley and raw celery stalks. The results obtained can be interesting for the future design of functional foods and dietary supplements. These studies constitute a reference for the genotoxic chemoinformatic analysis of bioactive compounds present in databases of food composition.
Shier, Medhat K; Iles, James C; El-Wetidy, Mohammad S; Ali, Hebatallah H; Al Qattan, Mohammad M
2017-01-01
The source of HCV transmission in Saudi Arabia is unknown. This study aimed to determine HCV genotypes in a representative sample of chronically infected patients in Saudi Arabia. All HCV isolates were genotyped and subtyped by sequencing of the HCV core region and 54 new HCV isolates were identified. Three sets of primers targeting the core region were used for both amplification and sequencing of all isolates resulting in a 326 bp fragment. Most HCV isolates were genotype 4 (85%), whereas only a few isolates were recognized as genotype 1 (15%). With the assistance of Genbank database and BLAST, subtyping results showed that most of genotype 4 isolates were 4d whereas most of genotype 1 isolates were 1b. Nucleotide conservation and variation rates of HCV core sequences showed that 4a and 1b have the highest levels of variation. Phylogenetic analysis of sequences by Maximum Likelihood and Bayesian Coalescent methods was used to explore the source of HCV transmission by investigating the relationship between Saudi Arabia and other countries in the Middle East and Africa. Coalescent analysis showed that transmissions of HCV from Egypt to Saudi Arabia are estimated to have occurred in three major clusters: 4d was introduced into the country before 1900, the major 4a clade's MRCA was introduced between 1900 and 1920, and the remaining lineages were introduced between 1940 and 1960 from Egypt and Middle Africa. Results showed that no lineages seem to have crossed from Egypt to Saudi Arabia in the last 15 years. Finally, sequencing and characterization of new HCV isolates from Saudi Arabia will enrich the HCV database and help further studies related to treatment and management of the virus.
Iles, James C.; El-Wetidy, Mohammad S.; Ali, Hebatallah H.; Al Qattan, Mohammad M.
2017-01-01
The source of HCV transmission in Saudi Arabia is unknown. This study aimed to determine HCV genotypes in a representative sample of chronically infected patients in Saudi Arabia. All HCV isolates were genotyped and subtyped by sequencing of the HCV core region and 54 new HCV isolates were identified. Three sets of primers targeting the core region were used for both amplification and sequencing of all isolates resulting in a 326 bp fragment. Most HCV isolates were genotype 4 (85%), whereas only a few isolates were recognized as genotype 1 (15%). With the assistance of Genbank database and BLAST, subtyping results showed that most of genotype 4 isolates were 4d whereas most of genotype 1 isolates were 1b. Nucleotide conservation and variation rates of HCV core sequences showed that 4a and 1b have the highest levels of variation. Phylogenetic analysis of sequences by Maximum Likelihood and Bayesian Coalescent methods was used to explore the source of HCV transmission by investigating the relationship between Saudi Arabia and other countries in the Middle East and Africa. Coalescent analysis showed that transmissions of HCV from Egypt to Saudi Arabia are estimated to have occurred in three major clusters: 4d was introduced into the country before 1900, the major 4a clade’s MRCA was introduced between 1900 and 1920, and the remaining lineages were introduced between 1940 and 1960 from Egypt and Middle Africa. Results showed that no lineages seem to have crossed from Egypt to Saudi Arabia in the last 15 years. Finally, sequencing and characterization of new HCV isolates from Saudi Arabia will enrich the HCV database and help further studies related to treatment and management of the virus. PMID:28863156
The Effect of Impurities on the Processing of Aluminum Alloys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zi-Kui Liu; Shengjun Zhang; Qingyou Han
2007-04-23
For this Aluminum Industry of the Future (IOF) project, the effect of impurities on the processing of aluminum alloys was systematically investigated. The work was carried out as a collaborative effort between the Pennsylvania State University and Oak Ridge National Laboratory. Industrial support was provided by ALCOA and ThermoCalc, Inc. The achievements described below were made. A method that combines first-principles calculation and calculation of phase diagrams (CALPHAD) was used to develop the multicomponent database Al-Ca-K-Li-Mg-Na. This method was extensively used in this project for the development of a thermodynamic database. The first-principles approach provided some thermodynamic property data thatmore » are not available in the open literature. These calculated results were used in the thermodynamic modeling as experimental data. Some of the thermodynamic property data are difficult, if not impossible, to measure. The method developed and used in this project allows the estimation of these data for thermodynamic database development. The multicomponent database Al-Ca-K-Li-Mg-Na was developed. Elements such as Ca, Li, Na, and K are impurities that strongly affect the formability and corrosion behavior of aluminum alloys. However, these impurity elements are not included in the commercial aluminum alloy database. The process of thermodynamic modeling began from Al-Na, Ca-Li, Li-Na, K-Na, and Li-K sub-binary systems. Then ternary and higher systems were extrapolated because of the lack of experimental information. Databases for five binary alloy systems and two ternary systems were developed. Along with other existing binary and ternary databases, the full database of the multicomponent Al-Ca-K-Li-Mg-Na system was completed in this project. The methodology in integrating with commercial or other aluminum alloy databases can be developed. The mechanism of sodium-induced high-temperature embrittlement (HTE) of Al-Mg is now understood. Using the thermodynamic database developed in this project, thermodynamic simulations were carried out to investigate the effect of sodium on the HTE of Al-Mg alloys. The simulation results indicated that the liquid miscibility gap resulting from the dissolved sodium in the molten material plays an important role in HTE. A liquid phase forms from the solid face-centered cubic (fcc) phase (most likely at grain boundaries) during cooling, resulting in the occurrence of HTE. Comparison of the thermodynamic simulation results with experimental measurements on the high-temperature ductility of an Al-5Mg-Na alloy shows that HTE occurs in the temperature range at which the liquid phase exists. Based on this fundamental understanding of the HTE mechanism during processing of aluminum alloy, an HTE sensitive zone and a hot-rolling safe zone of the Al-Mg-Na alloys are defined as functions of processing temperature and alloy composition. The tendency of HTE was evaluated based on thermodynamic simulations of the fraction of the intergranular sodium-rich liquid phase. Methods of avoiding HTE during rolling/extrusion of Al-Mg-based alloys were suggested. Energy and environmental benefits from the results of this project could occur through a number of avenues: (1) energy benefits accruing from reduced rejection rates of the aluminum sheet and bar, (2) reduced dross formation during the remelting of the aluminum rejects, and (3) reduced CO2 emission related to the energy savings. The sheet and extruded bar quantities produced in the United States during 2000 were 10,822 and 4,546 million pounds, respectively. It is assumed that 50% of the sheet and 10% of the bar will be affected by implementing the results of this project. With the current process, the rejection rate of sheet and bar is estimated at 5%. Assuming that at least half of the 5% rejection of sheet and bar will be eliminated by using the results of this project and that 4% of the aluminum will be lost through dross (Al2O3) during remelting of the rejects, the full-scale industrial implementation of the project results would lead to energy savings in excess of 6.2 trillion Btu/year and cost savings of $42.7 million by 2020.« less
Ontology based heterogeneous materials database integration and semantic query
NASA Astrophysics Data System (ADS)
Zhao, Shuai; Qian, Quan
2017-10-01
Materials digital data, high throughput experiments and high throughput computations are regarded as three key pillars of materials genome initiatives. With the fast growth of materials data, the integration and sharing of data is very urgent, that has gradually become a hot topic of materials informatics. Due to the lack of semantic description, it is difficult to integrate data deeply in semantic level when adopting the conventional heterogeneous database integration approaches such as federal database or data warehouse. In this paper, a semantic integration method is proposed to create the semantic ontology by extracting the database schema semi-automatically. Other heterogeneous databases are integrated to the ontology by means of relational algebra and the rooted graph. Based on integrated ontology, semantic query can be done using SPARQL. During the experiments, two world famous First Principle Computational databases, OQMD and Materials Project are used as the integration targets, which show the availability and effectiveness of our method.
ERIC Educational Resources Information Center
Stewart, Endya B.
2008-01-01
This research examines the extent to which individual-level and school structural variables are predictors of academic achievement among a sample of 10th grade students abstracted from the National Educational Longitudinal Study database. A secondary analysis of the data produced the following findings. The study results show that individual-level…
"Mr. Database" : Jim Gray and the History of Database Technologies.
Hanwahr, Nils C
2017-12-01
Although the widespread use of the term "Big Data" is comparatively recent, it invokes a phenomenon in the developments of database technology with distinct historical contexts. The database engineer Jim Gray, known as "Mr. Database" in Silicon Valley before his disappearance at sea in 2007, was involved in many of the crucial developments since the 1970s that constitute the foundation of exceedingly large and distributed databases. Jim Gray was involved in the development of relational database systems based on the concepts of Edgar F. Codd at IBM in the 1970s before he went on to develop principles of Transaction Processing that enable the parallel and highly distributed performance of databases today. He was also involved in creating forums for discourse between academia and industry, which influenced industry performance standards as well as database research agendas. As a co-founder of the San Francisco branch of Microsoft Research, Gray increasingly turned toward scientific applications of database technologies, e. g. leading the TerraServer project, an online database of satellite images. Inspired by Vannevar Bush's idea of the memex, Gray laid out his vision of a Personal Memex as well as a World Memex, eventually postulating a new era of data-based scientific discovery termed "Fourth Paradigm Science". This article gives an overview of Gray's contributions to the development of database technology as well as his research agendas and shows that central notions of Big Data have been occupying database engineers for much longer than the actual term has been in use.
Reactome graph database: Efficient access to complex pathway data
Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.
Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning
2018-01-01
Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
NASA Astrophysics Data System (ADS)
Magi, B. I.; Marlon, J. R.; Mouillot, F.; Daniau, A. L.; Bartlein, P. J.; Schaefer, A.
2017-12-01
Fire is intertwined with climate variability and human activities in terms of both its causes and consequences, and the most complete understanding will require a multidisciplinary approach. The focus in this study is to compare data-based records of variability in climate and human activities, with fire and land cover change records over the past 250 years in North America and Europe. The past 250 years is a critical period for contextualizing the present-day impact of human activities on climate. Data are from the Global Charcoal Database and from historical reconstructions of past burning. The GCD is comprised of sediment records of charcoal accumulation rates collected around the world by dozens of researchers, and facilitated by the PAGES Global Paleofire Working Group. The historical reconstruction extends back to 1750 CE is based on literature and government records when available, and completed with non-charcoal proxies including tree ring scars or storylines when data are missing. The key data sets are independent records, and the methods and results are independent of any climate or fire-model simulations. Results are presented for Europe, and subsets of North America. Analysis of fire trends from GCD and the historical reconstruction shows broad agreement, with some regional variations as expected. Western USA and North America in general show the best agreement, with departures in the GCD and historical reconstruction fire trends in the present day that may reflect limits in the data itself. Eastern North America shows agreement with an increase in fire from 1750 to 1900, and a strong decreasing trend thereafter. We present ideas for why the trends agree and disagree relative to historical events, and to the sequence of land-cover change in the regions of interest. Together with careful consideration of uncertainties in the data, these results can be used to constrain Earth System Model simulations of both past fire, which explicitly incorporate historical fire emissions, and the pathways of future fire on a warmer planet.
The performance of disk arrays in shared-memory database machines
NASA Technical Reports Server (NTRS)
Katz, Randy H.; Hong, Wei
1993-01-01
In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Information Retrieval in Telemedicine: a Comparative Study on Bibliographic Databases
Ahmadi, Maryam; Sarabi, Roghayeh Ershad; Orak, Roohangiz Jamshidi; Bahaadinbeigy, Kambiz
2015-01-01
Background and Aims: The first step in each systematic review is selection of the most valid database that can provide the highest number of relevant references. This study was carried out to determine the most suitable database for information retrieval in telemedicine field. Methods: Cinhal, PubMed, Web of Science and Scopus databases were searched for telemedicine matched with Education, cost benefit and patient satisfaction. After analysis of the obtained results, the accuracy coefficient, sensitivity, uniqueness and overlap of databases were calculated. Results: The studied databases differed in the number of retrieved articles. PubMed was identified as the most suitable database for retrieving information on the selected topics with the accuracy and sensitivity ratios of 50.7% and 61.4% respectively. The uniqueness percent of retrieved articles ranged from 38% for Pubmed to 3.0% for Cinhal. The highest overlap rate (18.6%) was found between PubMed and Web of Science. Less than 1% of articles have been indexed in all searched databases. Conclusion: PubMed is suggested as the most suitable database for starting search in telemedicine and after PubMed, Scopus and Web of Science can retrieve about 90% of the relevant articles. PMID:26236086
ARACHNID: A prototype object-oriented database tool for distributed systems
NASA Technical Reports Server (NTRS)
Younger, Herbert; Oreilly, John; Frogner, Bjorn
1994-01-01
This paper discusses the results of a Phase 2 SBIR project sponsored by NASA and performed by MIMD Systems, Inc. A major objective of this project was to develop specific concepts for improved performance in accessing large databases. An object-oriented and distributed approach was used for the general design, while a geographical decomposition was used as a specific solution. The resulting software framework is called ARACHNID. The Faint Source Catalog developed by NASA was the initial database testbed. This is a database of many giga-bytes, where an order of magnitude improvement in query speed is being sought. This database contains faint infrared point sources obtained from telescope measurements of the sky. A geographical decomposition of this database is an attractive approach to dividing it into pieces. Each piece can then be searched on individual processors with only a weak data linkage between the processors being required. As a further demonstration of the concepts implemented in ARACHNID, a tourist information system is discussed. This version of ARACHNID is the commercial result of the project. It is a distributed, networked, database application where speed, maintenance, and reliability are important considerations. This paper focuses on the design concepts and technologies that form the basis for ARACHNID.
NASA Astrophysics Data System (ADS)
Bashev, A.
2012-04-01
Currently there is an enormous amount of various geoscience databases. Unfortunately the only users of the majority of the databases are their elaborators. There are several reasons for that: incompaitability, specificity of tasks and objects and so on. However the main obstacles for wide usage of geoscience databases are complexity for elaborators and complication for users. The complexity of architecture leads to high costs that block the public access. The complication prevents users from understanding when and how to use the database. Only databases, associated with GoogleMaps don't have these drawbacks, but they could be hardly named "geoscience" Nevertheless, open and simple geoscience database is necessary at least for educational purposes (see our abstract for ESSI20/EOS12). We developed a database and web interface to work with them and now it is accessible at maps.sch192.ru. In this database a result is a value of a parameter (no matter which) in a station with a certain position, associated with metadata: the date when the result was obtained; the type of a station (lake, soil etc); the contributor that sent the result. Each contributor has its own profile, that allows to estimate the reliability of the data. The results can be represented on GoogleMaps space image as a point in a certain position, coloured according to the value of the parameter. There are default colour scales and each registered user can create the own scale. The results can be also extracted in *.csv file. For both types of representation one could select the data by date, object type, parameter type, area and contributor. The data are uploaded in *.csv format: Name of the station; Lattitude(dd.dddddd); Longitude(ddd.dddddd); Station type; Parameter type; Parameter value; Date(yyyy-mm-dd). The contributor is recognised while entering. This is the minimal set of features that is required to connect a value of a parameter with a position and see the results. All the complicated data treatment could be conducted in other programs after extraction the filtered data into *.csv file. It makes the database understandable for non-experts. The database employs open data format (*.csv) and wide spread tools: PHP as the program language, MySQL as database management system, JavaScript for interaction with GoogleMaps and JQueryUI for create user interface. The database is multilingual: there are association tables, which connect with elements of the database. In total the development required about 150 hours. The database still has several problems. The main problem is the reliability of the data. Actually it needs an expert system for estimation the reliability, but the elaboration of such a system would take more resources than the database itself. The second problem is the problem of stream selection - how to select the stations that are connected with each other (for example, belong to one water stream) and indicate their sequence. Currently the interface is English and Russian. However it can be easily translated to your language. But some problems we decided. For example problem "the problem of the same station" (sometimes the distance between stations is smaller, than the error of position): when you adding new station to the database our application automatically find station near this place. Also we decided problem of object and parameter type (how to regard "EC" and "electrical conductivity" as the same parameter). This problem has been solved using "associative tables". If you would like to see the interface on your language, just contact us. We should send you the list of terms and phrases for translation on your language. The main advantage of the database is that it is totally open: everybody can see, extract the data from the database and use them for non-commercial purposes with no charge. Registered users can contribute to the database without getting paid. We hope, that it will be widely used first of all for education purposes, but professional scientists could use it also.
GPCALMA: A Tool For Mammography With A GRID-Connected Distributed Database
NASA Astrophysics Data System (ADS)
Bottigli, U.; Cerello, P.; Cheran, S.; Delogu, P.; Fantacci, M. E.; Fauci, F.; Golosio, B.; Lauria, A.; Lopez Torres, E.; Magro, R.; Masala, G. L.; Oliva, P.; Palmiero, R.; Raso, G.; Retico, A.; Stumbo, S.; Tangaro, S.
2003-09-01
The GPCALMA (Grid Platform for Computer Assisted Library for MAmmography) collaboration involves several departments of physics, INFN (National Institute of Nuclear Physics) sections, and italian hospitals. The aim of this collaboration is developing a tool that can help radiologists in early detection of breast cancer. GPCALMA has built a large distributed database of digitised mammographic images (about 5500 images corresponding to 1650 patients) and developed a CAD (Computer Aided Detection) software which is integrated in a station that can also be used to acquire new images, as archive and to perform statistical analysis. The images (18×24 cm2, digitised by a CCD linear scanner with a 85 μm pitch and 4096 gray levels) are completely described: pathological ones have a consistent characterization with radiologist's diagnosis and histological data, non pathological ones correspond to patients with a follow up at least three years. The distributed database is realized throught the connection of all the hospitals and research centers in GRID tecnology. In each hospital local patients digital images are stored in the local database. Using GRID connection, GPCALMA will allow each node to work on distributed database data as well as local database data. Using its database the GPCALMA tools perform several analysis. A texture analysis, i.e. an automated classification on adipose, dense or glandular texture, can be provided by the system. GPCALMA software also allows classification of pathological features, in particular massive lesions (both opacities and spiculated lesions) analysis and microcalcification clusters analysis. The detection of pathological features is made using neural network software that provides a selection of areas showing a given "suspicion level" of lesion occurrence. The performance of the GPCALMA system will be presented in terms of the ROC (Receiver Operating Characteristic) curves. The results of GPCALMA system as "second reader" will also be presented.
Ukrainian Database and Atlas of Light Curves of Artificial Space Objects
NASA Astrophysics Data System (ADS)
Koshkin, N.; Savanevich, V.; Pohorelov, A.; Shakun, L.; Zhukov, V.; Korobeynikova, E.; Strakhova, S.; Moskalenko, S.; Kashuba, V.; Krasnoshchokov, A.
This paper describes the Ukrainian database of long-term photometric observations of resident space objects (RSO). For the purpose of using this database for the outer space monitoring and space situational awareness (SSA) the open internet resource has been developed. The paper shows examples of using the Atlas of light curves of RSO's for analyzing the state of rotation around the center of mass of several active and non-functioning satellites in orbit.
Ambiguity and variability of database and software names in bioinformatics.
Duck, Geraint; Kovacevic, Aleksandar; Robertson, David L; Stevens, Robert; Nenadic, Goran
2015-01-01
There are numerous options available to achieve various tasks in bioinformatics, but until recently, there were no tools that could systematically identify mentions of databases and tools within the literature. In this paper we explore the variability and ambiguity of database and software name mentions and compare dictionary and machine learning approaches to their identification. Through the development and analysis of a corpus of 60 full-text documents manually annotated at the mention level, we report high variability and ambiguity in database and software mentions. On a test set of 25 full-text documents, a baseline dictionary look-up achieved an F-score of 46 %, highlighting not only variability and ambiguity but also the extensive number of new resources introduced. A machine learning approach achieved an F-score of 63 % (with precision of 74 %) and 70 % (with precision of 83 %) for strict and lenient matching respectively. We characterise the issues with various mention types and propose potential ways of capturing additional database and software mentions in the literature. Our analyses show that identification of mentions of databases and tools is a challenging task that cannot be achieved by relying on current manually-curated resource repositories. Although machine learning shows improvement and promise (primarily in precision), more contextual information needs to be taken into account to achieve a good degree of accuracy.
Scholarly Online Database Use in Higher Education: A Faculty Survey.
ERIC Educational Resources Information Center
Piotrowski, Chris; Perdue, Bob; Armstrong, Terry
2005-01-01
The present study reports the results of a survey conducted at the University of West Florida concerning faculty usage and views toward online databases. Most respondents (N=46) felt quite satisfied with scholarly database availability through the university library. However, some faculty suggested that databases such as Current Contents and…
University Faculty Use of Computerized Databases: An Assessment of Needs and Resources.
ERIC Educational Resources Information Center
Borgman, Christine L.; And Others
1985-01-01
Results of survey indicate that: academic faculty are unaware of range of databases available; few recognize need for databases in research; most delegate searching to librarian or assistant, rather than perform searching themselves; and 39 database guides identified tended to be descriptive rather than evaluative. A comparison of the guides is…
The Effectiveness of Aromatherapy for Depressive Symptoms: A Systematic Review.
Sánchez-Vidaña, Dalinda Isabel; Ngai, Shirley Pui-Ching; He, Wanjia; Chow, Jason Ka-Wing; Lau, Benson Wui-Man; Tsang, Hector Wing-Hong
2017-01-01
Background . Depression is one of the greatest health concerns affecting 350 million people globally. Aromatherapy is a popular CAM intervention chosen by people with depression. Due to the growing popularity of aromatherapy for alleviating depressive symptoms, in-depth evaluation of the evidence-based clinical efficacy of aromatherapy is urgently needed. Purpose . This systematic review aims to provide an analysis of the clinical evidence on the efficacy of aromatherapy for depressive symptoms on any type of patients. Methods . A systematic database search was carried out using predefined search terms in 5 databases: AMED, CINHAL, CCRCT, MEDLINE, and PsycINFO. Outcome measures included scales measuring depressive symptoms levels. Results . Twelve randomized controlled trials were included and two administration methods for the aromatherapy intervention including inhaled aromatherapy (5 studies) and massage aromatherapy (7 studies) were identified. Seven studies showed improvement in depressive symptoms. Limitations . The quality of half of the studies included is low, and the administration protocols among the studies varied considerably. Different assessment tools were also employed among the studies. Conclusions . Aromatherapy showed potential to be used as an effective therapeutic option for the relief of depressive symptoms in a wide variety of subjects. Particularly, aromatherapy massage showed to have more beneficial effects than inhalation aromatherapy.
Huang, Charles Lung-Cheng; Hsiao, Sigmund; Hwu, Hai-Gwo; Howng, Shen-Long
2012-12-30
The Chinese Facial Emotion Recognition Database (CFERD), a computer-generated three-dimensional (3D) paradigm, was developed to measure the recognition of facial emotional expressions at different intensities. The stimuli consisted of 3D colour photographic images of six basic facial emotional expressions (happiness, sadness, disgust, fear, anger and surprise) and neutral faces of the Chinese. The purpose of the present study is to describe the development and validation of CFERD with nonclinical healthy participants (N=100; 50 men; age ranging between 18 and 50 years), and to generate normative data set. The results showed that the sensitivity index d' [d'=Z(hit rate)-Z(false alarm rate), where function Z(p), p∈[0,1
Research on Improved Depth Belief Network-Based Prediction of Cardiovascular Diseases
Zhang, Hongpo
2018-01-01
Quantitative analysis and prediction can help to reduce the risk of cardiovascular disease. Quantitative prediction based on traditional model has low accuracy. The variance of model prediction based on shallow neural network is larger. In this paper, cardiovascular disease prediction model based on improved deep belief network (DBN) is proposed. Using the reconstruction error, the network depth is determined independently, and unsupervised training and supervised optimization are combined. It ensures the accuracy of model prediction while guaranteeing stability. Thirty experiments were performed independently on the Statlog (Heart) and Heart Disease Database data sets in the UCI database. Experimental results showed that the mean of prediction accuracy was 91.26% and 89.78%, respectively. The variance of prediction accuracy was 5.78 and 4.46, respectively. PMID:29854369
Cloning and expression of N-glycosylation-related glucosidase from Glaciozyma antarctica
NASA Astrophysics Data System (ADS)
Yajit, Noor Liana Mat; Kamaruddin, Shazilah; Hashim, Noor Haza Fazlin; Bakar, Farah Diba Abu; Murad, Abd. Munir Abd.; Mahadi, Nor Muhammad; Mackeen, Mukram Mohamed
2016-11-01
The need for functional oligosaccharides in various field is ever growing. The enzymatic approach for synthesis of oligosaccharides is advantageous over traditional chemical synthesis because of the regio- and stereo- selectivity that can be achieved without the need for protection chemistry. In this study, the α-glucosidase I protein sequence from Saccharomyces cerevisiae (UniProt database) was compared using Basic Local Alignment Search Tool (BLAST) with Glaciozyma antarctica genome database. Results showed 33% identity and an E-value of 1 × 10-125 for α-glucosidase I. The gene was amplified, cloned into the pPICZα C vector and used to transform Pichia pastoris X-33 cells. Soluble expression of α-Glucosidase I (˜91 kDa) was achieved at 28 °C with 1.0 % of methanol.
Expert system for generating initial layouts of zoom systems with multiple moving lens groups
NASA Astrophysics Data System (ADS)
Cheng, Xuemin; Wang, Yongtian; Hao, Qun; Sasián, José M.
2005-01-01
An expert system is developed for the automatic generation of initial layouts for the design of zoom systems with multiple moving lens groups. The Gaussian parameters of the zoom system are optimized using the damped-least-squares method to achieve smooth zoom cam curves, with the f-number of each lens group in the zoom system constrained to a rational value. Then each lens group is selected automatically from a database according to its range of f-number, field of view, and magnification ratio as it is used in the zoom system. The lens group database is established from the results of analyzing thousands of zoom lens patents. Design examples are given, which show that the scheme is a practical approach to generate starting points for zoom lens design.
Evaluation of a vortex-based subgrid stress model using DNS databases
NASA Technical Reports Server (NTRS)
Misra, Ashish; Lund, Thomas S.
1996-01-01
The performance of a SubGrid Stress (SGS) model for Large-Eddy Simulation (LES) developed by Misra k Pullin (1996) is studied for forced and decaying isotropic turbulence on a 32(exp 3) grid. The physical viability of the model assumptions are tested using DNS databases. The results from LES of forced turbulence at Taylor Reynolds number R(sub (lambda)) approximately equals 90 are compared with filtered DNS fields. Probability density functions (pdfs) of the subgrid energy transfer, total dissipation, and the stretch of the subgrid vorticity by the resolved velocity-gradient tensor show reasonable agreement with the DNS data. The model is also tested in LES of decaying isotropic turbulence where it correctly predicts the decay rate and energy spectra measured by Comte-Bellot & Corrsin (1971).
Experts' perceptions on the entrepreneurial framework conditions
NASA Astrophysics Data System (ADS)
Correia, Aldina; e Silva, Eliana Costa; Lopes, I. Cristina; Braga, Alexandra; Braga, Vitor
2017-11-01
The Global Entrepreneurship Monitor is a large scale database for internationally comparative entrepreneurship. This database includes information of more than 100 countries concerning several aspects of entrepreneurship activities, perceptions, conditions, national and regional policy, among others, in two main sources of primary data: the Adult Population Survey and the National Expert Survey. In the present work the National Expert Survey datasets for 2011, 2012 and 2013 are analyzed with the purpose of studying the effects of different type of entrepreneurship expert specialization on the perceptions about the Entrepreneurial Framework Conditions (EFCs). The results of the multivariate analysis of variance for the 2013 data show significant differences of the entrepreneurship experts when compared the 2011 and 2012 surveys. For the 2013 data entrepreneur experts are less favorable then most of the other experts to the EFCs.
Preclinical tests of an android based dietary logging application.
Kósa, István; Vassányi, István; Pintér, Balázs; Nemes, Márta; Kámánné, Krisztina; Kohut, László
2014-01-01
The paper describes the first, preclinical evaluation of a dietary logging application developed at the University of Pannonia, Hungary. The mobile user interface is briefly introduced. The three evaluation phases examined the completeness and contents of the dietary database and the time expenditure of the mobile based diet logging procedure. The results show that although there are substantial individual differences between various dietary databases, the expectable difference with respect to nutrient contents is below 10% on typical institutional menu list. Another important finding is that the time needed to record the meals can be reduced to about 3 minutes daily especially if the user uses set-based search. a well designed user interface on a mobile device is a viable and reliable way for a personalized lifestyle support service.
Calculation of Phase Equilibria in the Y2O3-Yb2O3-ZrO2 System
NASA Technical Reports Server (NTRS)
Jacobson, Nathan S.; Liu, Zi-Kui; Kaufman, Larry; Zhang, Fan
2001-01-01
Rare earth oxide stabilized zirconias find a wide range of applications. An understanding of phase equilibria is essential to all applications. In this study, the available phase boundary data and thermodynamic data is collected and assessed. Calphad-type databases are developed to completely describe the Y2O3-ZrO2, Yb2O3-ZrO2, and Y2O3-Yb2O3 systems. The oxide units are treated as components and regular and subregular solution models are used. The resultant calculated phase diagrams show good agreement with the experimental data. Then the binaries are combined to form the database for the Y2O3-Yb2O3-ZrO2 psuedo-ternary.
Encryption Characteristics of Two USB-based Personal Health Record Devices
Wright, Adam; Sittig, Dean F.
2007-01-01
Personal health records (PHRs) hold great promise for empowering patients and increasing the accuracy and completeness of health information. We reviewed two small USB-based PHR devices that allow a patient to easily store and transport their personal health information. Both devices offer password protection and encryption features. Analysis of the devices shows that they store their data in a Microsoft Access database. Due to a flaw in the encryption of this database, recovering the user’s password can be accomplished with minimal effort. Our analysis also showed that, rather than encrypting health information with the password chosen by the user, the devices stored the user’s password as a string in the database and then encrypted that database with a common password set by the manufacturer. This is another serious vulnerability. This article describes the weaknesses we discovered, outlines three critical flaws with the security model used by the devices, and recommends four guidelines for improving the security of similar devices. PMID:17460132
Validation of multi-mission satellite altimetry for the Baltic Sea region
NASA Astrophysics Data System (ADS)
Kudryavtseva, Nadia; Soomere, Tarmo; Giudici, Andrea
2016-04-01
Currently, three sources of wave data are available for the research community, namely, buoys, modelling, and satellite altimetry. The buoy measurements provide high-quality time series of wave properties but they are deployed only in a few locations. Wave modelling covers large domains and provides good results for the open sea conditions. However, the limitation of modelling is that the results are dependent on wind quality and assumptions put into the model. Satellite altimetry in many occasions provides homogeneous data over large sea areas with an appreciable spatial and temporal resolution. The use of satellite altimetry is problematic in coastal areas and partially ice-covered water bodies. These limitations can be circumvented by careful analysis of the geometry of the basin, ice conditions and spatial coverage of each altimetry snapshot. In this poster, for the first time, we discuss a validation of 30 years of multi-mission altimetry covering the whole Baltic Sea. We analysed data from RADS database (Scharroo et al. 2013) which span from 1985 to 2015. To assess the limitations of the satellite altimeter data quality, the data were cross-matched with available wave measurements from buoys of the Swedish Meteorological and Hydrological Institute and Finnish Meteorological Institute. The altimeter-measured significant wave heights showed a very good correspondence with the wave buoys. We show that the data with backscatter coefficients more than 13.5 and high errors in significant wave heights and range should be excluded. We also examined the effect of ice cover and distance from the land on satellite altimetry measurements. The analysis of cross-matches between the satellite altimetry data and buoys' measurements shows that the data are only corrupted in the nearshore domain within 0.2 degrees from the coast. The statistical analysis showed a significant decrease in wave heights for sea areas with ice concentration more than 30 percent. We also checked and corrected the data for biases between different missions. This analysis provides a unique uniform database of satellite altimetry measurements over the whole Baltic Sea, which can be further used for finding biases in wave modelling and studies of wave climatology. The database is available upon request.
NASA Astrophysics Data System (ADS)
Pecoraro, Gaetano; Calvello, Michele
2017-04-01
In Italy rainfall-induced landslides pose a significant and widespread hazard, resulting in a large number of casualties and enormous economic damages. Mitigation of such a diffuse risk cannot be attained with structural measures only. With respect to the risk to life, early warning systems represent a viable and useful tool for landslide risk mitigation over wide areas. Inventories of rainfall-induced landslides are critical to support investigations of where and when landslides have happened and may occur in the future, i.e. to establish reliable correlations between rainfall characteristics and landslide occurrences. In this work a parametric study has been conducted to evaluate the performance of correlation models between rainfall and landslides over the Italian territory using the "FraneItalia" database, an inventory of landslides retrieved from online Italian journalistic news. The information reported for each record of this database always include: the site of occurrence of the landslides, the date of occurrence, the source of the news. Multiple landslides occurring in the same date, within the same province or region, are inventoried together in one single record of the database, in this case also reporting the number of landslides of the event. Each record the database may also include, if the related information is available: hour of occurrence; typology, volume and material of the landslide; activity phase; effects on people, structures, infrastructures, cars or other elements. The database currently contains six complete years of data (2010-2015), including more than 4000 landslide reports, most of them triggered by rainfall. For the aim of this study, different rainfall-landslides correlation models have been tested by analysing the reported landslides, within all the 144 zones identified by the national civil protection for weather-related warnings in Italy, in relation to satellite-based precipitations estimates from the Global Precipitation Measurement (GPM) NASA mission. This remote sensing database contains gridded precipitation and precipitation-error estimates, with a half-hour temporal resolution and a 0.10-degree spatial resolution, covering most of the earth starting from 2014. It is well known that satellite estimates of rainfall have some limitations in resolving specific rainfall features (e.g., shallow orographic events and short-duration, high-intensity events), yet the temporal and spatial accuracy of the GPM data may be considered adequate in relation to the scale of the analysis and the size of the warning zones used for this study. The results of the parametric analysis conducted herein, although providing some indications on the most relevant rainfall conditions leading to widespread landsliding over a warning zone, must be considered preliminary as they show a very heterogeneous behaviour of the employed rainfall-based warning models over the Italian territory. Nevertheless, they clearly show the strong potential of the continuous multi-year landslide records available from the "FraneItalia" database as an important source of information to evaluate the performance of warning models at regional scale throughout Italy.
A statistical analysis of the global historical volcanic fatalities record
Auker, Melanie Rose; Sparks, Robert Stephen John; Siebert, Lee; Crosweller, H. S.; Ewert, John W.
2013-01-01
A new database of volcanic fatalities is presented and analysed, covering the period 1600 to 2010 AD. Data are from four sources: the Smithsonian Institution, Witham (2005), CRED EM-DAT and Munich RE. The data were combined and formatted, with a weighted average fatality figure used where more than one source reports an event; the former two databases were weighted twice as strongly as the latter two. More fatal incidents are contained within our database than similar previous works; approximately 46% of the fatal incidents are listed in only one of the four sources, and fewer than 10% are in all four. 278,880 fatalities are recorded in the database, resultant from 533 fatal incidents. The fatality count is dominated by a handful of disasters, though the majority of fatal incidents have caused fewer than ten fatalities. Number and empirical probability of fatalities are broadly correlated with VEI, but are more strongly influenced by population density around volcanoes and the occurrence and extent of lahars (mudflows) and pyroclastic density currents, which have caused 50% of fatalities. Indonesia, the Philippines, and the West Indies dominate the spatial distribution of fatalities, and there is some negative correlation between regional development and number of fatalities. With the largest disasters removed, over 90% of fatalities occurred between 5 km and 30 km from volcanoes, though the most devastating eruptions impacted far beyond these distances. A new measure, the Volcano Fatality Index, is defined to explore temporal changes in societal vulnerability to volcanic hazards. The measure incorporates population growth and recording improvements with the fatality data, and shows prima facie evidence that vulnerability to volcanic hazards has fallen during the last two centuries. Results and interpretations are limited in scope by the underlying fatalities data, which are affected by under-recording, uncertainty, and bias. Attempts have been made to estimate the extent of these issues, and to remove their effects where possible.The data analysed here are provided as supplementary material. An updated version of the Smithsonian fatality database fully integrated with this database will be publicly available in the near future and subsequently incorporate new data.
A Free Database of Auto-detected Full-sun Coronal Hole Maps
NASA Astrophysics Data System (ADS)
Caplan, R. M.; Downs, C.; Linker, J.
2016-12-01
We present a 4-yr (06/10/2010 to 08/18/14 at 6-hr cadence) database of full-sun synchronic EUV and coronal hole (CH) maps made available on a dedicated web site (http://www.predsci.com/chd). The maps are generated using STEREO/EUVI A&B 195Å and SDO/AIA 193Å images through an automated pipeline (Caplan et al, (2016) Ap.J. 823, 53).Specifically, the original data is preprocessed with PSF-deconvolution, a nonlinear limb-brightening correction, and a nonlinear inter-instrument intensity normalization. Coronal holes are then detected in the preprocessed images using a GPU-accelerated region growing segmentation algorithm. The final results from all three instruments are then merged and projected to form full-sun sine-latitude maps. All the software used in processing the maps is provided, which can easily be adapted for use with other instruments and channels. We describe the data pipeline and show examples from the database. We also detail recent CH-detection validation experiments using synthetic EUV emission images produced from global thermodynamic MHD simulations.
Radiative transfer and spectroscopic databases: A line-sampling Monte Carlo approach
NASA Astrophysics Data System (ADS)
Galtier, Mathieu; Blanco, Stéphane; Dauchet, Jérémi; El Hafi, Mouna; Eymet, Vincent; Fournier, Richard; Roger, Maxime; Spiesser, Christophe; Terrée, Guillaume
2016-03-01
Dealing with molecular-state transitions for radiative transfer purposes involves two successive steps that both reach the complexity level at which physicists start thinking about statistical approaches: (1) constructing line-shaped absorption spectra as the result of very numerous state-transitions, (2) integrating over optical-path domains. For the first time, we show here how these steps can be addressed simultaneously using the null-collision concept. This opens the door to the design of Monte Carlo codes directly estimating radiative transfer observables from spectroscopic databases. The intermediate step of producing accurate high-resolution absorption spectra is no longer required. A Monte Carlo algorithm is proposed and applied to six one-dimensional test cases. It allows the computation of spectrally integrated intensities (over 25 cm-1 bands or the full IR range) in a few seconds, regardless of the retained database and line model. But free parameters need to be selected and they impact the convergence. A first possible selection is provided in full detail. We observe that this selection is highly satisfactory for quite distinct atmospheric and combustion configurations, but a more systematic exploration is still in progress.
Tensor discriminant color space for face recognition.
Wang, Su-Jing; Yang, Jian; Zhang, Na; Zhou, Chun-Guang
2011-09-01
Recent research efforts reveal that color may provide useful information for face recognition. For different visual tasks, the choice of a color space is generally different. How can a color space be sought for the specific face recognition problem? To address this problem, this paper represents a color image as a third-order tensor and presents the tensor discriminant color space (TDCS) model. The model can keep the underlying spatial structure of color images. With the definition of n-mode between-class scatter matrices and within-class scatter matrices, TDCS constructs an iterative procedure to obtain one color space transformation matrix and two discriminant projection matrices by maximizing the ratio of these two scatter matrices. The experiments are conducted on two color face databases, AR and Georgia Tech face databases, and the results show that both the performance and the efficiency of the proposed method are better than those of the state-of-the-art color image discriminant model, which involve one color space transformation matrix and one discriminant projection matrix, specifically in a complicated face database with various pose variations.
3D facial expression recognition using maximum relevance minimum redundancy geometrical features
NASA Astrophysics Data System (ADS)
Rabiu, Habibu; Saripan, M. Iqbal; Mashohor, Syamsiah; Marhaban, Mohd Hamiruce
2012-12-01
In recent years, facial expression recognition (FER) has become an attractive research area, which besides the fundamental challenges, it poses, finds application in areas, such as human-computer interaction, clinical psychology, lie detection, pain assessment, and neurology. Generally the approaches to FER consist of three main steps: face detection, feature extraction and expression recognition. The recognition accuracy of FER hinges immensely on the relevance of the selected features in representing the target expressions. In this article, we present a person and gender independent 3D facial expression recognition method, using maximum relevance minimum redundancy geometrical features. The aim is to detect a compact set of features that sufficiently represents the most discriminative features between the target classes. Multi-class one-against-one SVM classifier was employed to recognize the seven facial expressions; neutral, happy, sad, angry, fear, disgust, and surprise. The average recognition accuracy of 92.2% was recorded. Furthermore, inter database homogeneity was investigated between two independent databases the BU-3DFE and UPM-3DFE the results showed a strong homogeneity between the two databases.
Wang, Shi-Yuan; Cheng, Xiao-Hua; Li, Jing-Xin; Li, Xi-Yan; Zhu, Feng-Cai; Liu, Pei
2015-01-01
Japanese encephalitis virus (JEV), a leading cause of Japanese encephalitis (JE) in children and adults, is a major public health problem in Asian countries. This study reports a meta-analysis of the immunogenicity and safety of vaccines used to protect infants or children from JE. Three types of JE vaccine were examined, namely, Japanese encephalitis live-attenuated vaccine (JEV-L), Japanese encephalitis inactivated vaccine (Vero cell) (JEV-I(Vero)), and Japanese encephalitis inactivated vaccine (primary hamster kidney cell) (JEV-I(PHK)). These vaccines are used to induce fundamental immunity against JE; however, few studies have compared their immunogenicity and safety in infants and young children less than 2 years of age. Data were obtained by searching 5 databases: Web of Science, PubMed, China National Knowledge Infrastructure, the China Wanfang database, and the Cochrane database. Fifteen articles were identified and scored using the Jadad score for inclusion in the meta-analysis. Random effect models were used to calculate the pooled seroconversion rate and adverse reaction rate when tests for heterogeneity were significant. The results showed that the pooled seroconversion rate for JEV-I(PHK) (62.23%) was lower than that for JEV-I(Vero) (86.49%) and JEV-L (83.52%), and that the pooled adverse reaction rate for JEV-L (18.09%) was higher than that for JEV-I(PHK) (10.08%) and JEV-I(Vero) (12.49%). The pooled relative risk was then calculated to compare the seroconversion and adverse reaction rates. The results showed that JEV-I(Vero) and JEV-L were more suitable than JEV-I(PHK) for inducing fundamental immunity to JE in infants and children less than 2 years of age.
Barneh, Farnaz; Jafari, Mohieddin; Mirzaie, Mehdi
2016-11-01
Network pharmacology elucidates the relationship between drugs and targets. As the identified targets for each drug increases, the corresponding drug-target network (DTN) evolves from solely reflection of the pharmaceutical industry trend to a portrait of polypharmacology. The aim of this study was to evaluate the potentials of DrugBank database in advancing systems pharmacology. We constructed and analyzed DTN from drugs and targets associations in the DrugBank 4.0 database. Our results showed that in bipartite DTN, increased ratio of identified targets for drugs augmented density and connectivity of drugs and targets and decreased modular structure. To clear up the details in the network structure, the DTNs were projected into two networks namely, drug similarity network (DSN) and target similarity network (TSN). In DSN, various classes of Food and Drug Administration-approved drugs with distinct therapeutic categories were linked together based on shared targets. Projected TSN also showed complexity because of promiscuity of the drugs. By including investigational drugs that are currently being tested in clinical trials, the networks manifested more connectivity and pictured the upcoming pharmacological space in the future years. Diverse biological processes and protein-protein interactions were manipulated by new drugs, which can extend possible target combinations. We conclude that network-based organization of DrugBank 4.0 data not only reveals the potential for repurposing of existing drugs, also allows generating novel predictions about drugs off-targets, drug-drug interactions and their side effects. Our results also encourage further effort for high-throughput identification of targets to build networks that can be integrated into disease networks. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Characterization of the Proteome of Theobroma cacao Beans by Nano-UHPLC-ESI MS/MS.
Scollo, Emanuele; Neville, David; Oruna-Concha, M Jose; Trotin, Martine; Cramer, Rainer
2018-02-01
Cocoa seed storage proteins play an important role in flavour development as aroma precursors are formed from their degradation during fermentation. Major proteins in the beans of Theobroma cacao are the storage proteins belonging to the vicilin and albumin classes. Although both these classes of proteins have been extensively characterized, there is still limited information on the expression and abundance of other proteins present in cocoa beans. This work is the first attempt to characterize the whole cocoa bean proteome by nano-UHPLC-ESI MS/MS analysis using tryptic digests of cocoa bean protein extracts. The results of this analysis show that >1000 proteins could be identified using a species-specific Theobroma cacao database. The majority of the identified proteins were involved with metabolism and energy. Additionally, a significant number of the identified proteins were linked to protein synthesis and processing. Several proteins were also involved with plant response to stress conditions and defence. Albumin and vicilin storage proteins showed the highest intensity values among all detected proteins, although only seven entries were identified as storage proteins. A comparison of MS/MS data searches carried out against larger non-specific databases confirmed that using a species-specific database can increase the number of identified proteins, and at the same time reduce the number of false positives. The results of this work will be useful in developing tools that can allow the comparison of the proteomic profile of cocoa beans from different genotypes and geographic origins. Data are available via ProteomeXchange with identifier PXD005586. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
SORTEZ: a relational translator for NCBI's ASN.1 database.
Hart, K W; Searls, D B; Overton, G C
1994-07-01
The National Center for Biotechnology Information (NCBI) has created a database collection that includes several protein and nucleic acid sequence databases, a biosequence-specific subset of MEDLINE, as well as value-added information such as links between similar sequences. Information in the NCBI database is modeled in Abstract Syntax Notation 1 (ASN.1) an Open Systems Interconnection protocol designed for the purpose of exchanging structured data between software applications rather than as a data model for database systems. While the NCBI database is distributed with an easy-to-use information retrieval system, ENTREZ, the ASN.1 data model currently lacks an ad hoc query language for general-purpose data access. For that reason, we have developed a software package, SORTEZ, that transforms the ASN.1 database (or other databases with nested data structures) to a relational data model and subsequently to a relational database management system (Sybase) where information can be accessed through the relational query language, SQL. Because the need to transform data from one data model and schema to another arises naturally in several important contexts, including efficient execution of specific applications, access to multiple databases and adaptation to database evolution this work also serves as a practical study of the issues involved in the various stages of database transformation. We show that transformation from the ASN.1 data model to a relational data model can be largely automated, but that schema transformation and data conversion require considerable domain expertise and would greatly benefit from additional support tools.
Normative Databases for Imaging Instrumentation
Realini, Tony; Zangwill, Linda; Flanagan, John; Garway-Heath, David; Patella, Vincent Michael; Johnson, Chris; Artes, Paul; Ben Gaddie, I.; Fingeret, Murray
2015-01-01
Purpose To describe the process by which imaging devices undergo reference database development and regulatory clearance. The limitations and potential improvements of reference (normative) data sets for ophthalmic imaging devices will be discussed. Methods A symposium was held in July 2013 in which a series of speakers discussed issues related to the development of reference databases for imaging devices. Results Automated imaging has become widely accepted and used in glaucoma management. The ability of such instruments to discriminate healthy from glaucomatous optic nerves, and to detect glaucomatous progression over time is limited by the quality of reference databases associated with the available commercial devices. In the absence of standardized rules governing the development of reference databases, each manufacturer’s database differs in size, eligibility criteria, and ethnic make-up, among other key features. Conclusions The process for development of imaging reference databases may be improved by standardizing eligibility requirements and data collection protocols. Such standardization may also improve the degree to which results may be compared between commercial instruments. PMID:25265003
Organization and dissemination of multimedia medical databases on the WWW.
Todorovski, L; Ribaric, S; Dimec, J; Hudomalj, E; Lunder, T
1999-01-01
In the paper, we focus on the problem of building and disseminating multimedia medical databases on the World Wide Web (WWW). The current results of the ongoing project of building a prototype dermatology images database and its WWW presentation are presented. The dermatology database is part of an ambitious plan concerning an organization of a network of medical institutions building distributed and federated multimedia databases of a much wider scale.
ANDERSSON, SONIA; MINTS, MIRIAM; WILANDER, ERIK
2013-01-01
The incidence rates of cervical adenocarcinoma have been increasing over the last two decades, contrary to those of squamous cell carcinoma. This trend is particularly evident among females aged <40 years and has occurred despite extensive cytology-based screening programs. The aim of the present retrospective database study was to investigate adenocarcinoma in situ (AIS) with respect to previous cytological results, high-risk (HR) human papillomavirus (HPV) infections and histological results from AIS-adjacent squamous mucosa. Databases were used to identify 32 female patients with AIS treated for various conditions between 2009 and 2012 at the Department of Gynecology, Uppsala University Hospital (Uppsala, Sweden) and previous cytological, HPV and histological results. Of the individuals in the study, 64.3% had a previously recorded cytological result showing squamous cell abnormalities; five had glandular cell abnormalities (18%) and two had AIS (7.1%). Among the patients with available HPV results, 95% were HR-HPV-positive; HPV18/45 predominated (77%), followed by HPV16 (27%). The patients with multiple HPV infections were aged ≤32 years, while patients aged ≥38 years were only infected with HPV18/45. All but three patients had cervical intraepithelial neoplasia (CIN) in the AIS-adjacent squamous mucosa, 79% of which was CIN2 or worse. The present retrospective database study suggests that AIS is detected at screening mainly due to simultaneous squamous precursor lesions and that HPV18/45 infection is an increasing cofactor for AIS in older patients. HPV analyses of glandular precursor lesions aid in the identification of female individuals at risk of progression to invasive disease, and thus have a favorable effect on adenocarcinoma prevention, together with vaccination. PMID:23946807
NASA Astrophysics Data System (ADS)
Bennett, K. E.; Bronaugh, D.; Rodenhuis, D.
2008-12-01
Observational databases of snow water equivalent (SWE) have been collected from Alaska, western US states and the Canadian provinces of British Columbia, Alberta, Saskatchewan, and territories of NWT, and the Yukon. These databases were initially validated to remove inconsistencies and errors in the station records, dates or the geographic co-ordinates of the station. The cleaned data was then analysed for historical (1950 to 2006) trend using emerging techniques for trend detection based on (first of the month) estimates for January to June. Analysis of SWE showed spatial variability in the count of records across the six month time period, and this study illustrated differences between Canadian and US (or the north and south) collection. Two different data sets (one gridded and one station) were then used to analyse April 1st records, for which there was the greatest spatial spread of station records for analysis with climate information. Initial results show spatial variability (in both magnitude and direction of trend) for trend results, and climate correlations and principal components indicate different drivers of change in SWE across the western US, Canada and north to Alaska. These results will be used to validate future predictions of SWE that are being undertaken using the Canadian Regional Climate Model (CRCM) and the Variable Infiltration Capacity (VIC) hydrologic model for Western Northern America (CRCM) and British Columbia (VIC).
Bao, Yong; Xie, Qing; Xu, Yang; Zhang, Junmei
2017-01-01
Background Constraint-induced aphasia therapy (CIAT) has been widely used in post-stroke aphasia rehabilitation. An increasing number of clinical controlled trials have investigated the efficacy of the CIAT for the post-stroke aphasia. Purpose To systematically review the randomized controlled trials (RCTs) concerning the effect of the CIAT in post-stroke patients with aphasia, and to identify the useful components of CIAT in post-stroke aphasia rehabilitation. Methods A computerized database search was performed through five databases (Pubmed, EMbase, Medline, ScienceDirect and Cochrane library). Cochrane handbook domains were used to evaluate the methodological quality of the included RCTs. Results Eight RCTs qualified in the inclusion criteria. Inconsistent results were found in comparing the CIAT with conventional therapies without any component from the CIAT based on the results of three RCTs. Five RCTs showed that the CIAT performed equally well as other intensive aphasia therapies, in terms of improving language performance. One RCT showed that therapies embedded with social interaction were likely to enhance the efficacy of the CIAT. Conclusion CIAT may be useful for improving chronic post-stroke aphasia, however, limited evidence to support its superiority to other aphasia therapies. Massed practice is likely to be a useful component of CIAT, while the role of “constraint” is needed to be further explored. CIAT embedded with social interaction may gain more benefits. PMID:28846724
The landslide database for Germany: Closing the gap at national level
NASA Astrophysics Data System (ADS)
Damm, Bodo; Klose, Martin
2015-11-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still has a long research history in Germany, but one focussed on the development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present paper reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to the 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this paper, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of three case studies in the German Central Uplands. The case study results exemplify database application in the analysis of landslide frequency and causes, impact statistics, and landslide susceptibility modeling. Using the example of these case studies, strengths and weaknesses of the database are discussed in detail. The paper concludes with a summary of the database project with regard to previous achievements and the strategic roadmap.
Lupiañez-Barbero, Ascension; González Blanco, Cintia; de Leiva Hidalgo, Alberto
2018-05-23
Food composition tables and databases (FCTs or FCDBs) provide the necessary information to estimate intake of nutrients and other food components. In Spain, the lack of a reference database has resulted in use of different FCTs/FCDBs in nutritional surveys and research studies, as well as for development of dietetic for diet analysis. As a result, biased, non-comparable results are obtained, and healthcare professionals are rarely aware of these limitations. AECOSAN and the BEDCA association developed a FCDB following European standards, the Spanish Food Composition Database Network (RedBEDCA).The current database has a limited number of foods and food components and barely contains processed foods, which limits its use in epidemiological studies and in the daily practice of healthcare professionals. Copyright © 2018 SEEN y SED. Publicado por Elsevier España, S.L.U. All rights reserved.
Culhane, D P; Gollub, E; Kuhn, R; Shpaner, M
2001-07-01
Administrative databases from the City of Philadelphia that track public shelter utilisation (n=44 337) and AIDS case reporting (n=7749) were merged to identify rates and risk factors for co-occurring homelessness and AIDS. Multiple decrement life tables analyses were conducted, and logistic regression analyses used to identify risk factors associated with AIDS among the homeless, and homelessness among people with AIDS. City of Philadelphia, Pennsylvania, USA. People admitted to public shelters had a three year rate of subsequent AIDS diagnosis of 1.8 per 100 person years; nine times the rate for the general population of Philadelphia. Logistic regression results show that substance abuse history (OR = 3.14), male gender (OR = 2.05), and a history of serious mental disorder (OR = 1.62) were significantly related to the risk for AIDS diagnosis among shelter users. Among people with AIDS, results show a three year rate of subsequent shelter admission of 6.9 per 100 person years, and a three year rate of prior shelter admission of 9%, three times the three year rate of shelter admission for the general population. Logistic regression results show that intravenous drug user history (OR = 3.14); no private insurance (OR = 2.93); black race (OR = 2.82); pulmonary or extra-pulmonary TB (OR = 1.43); and pneumocystis pneumonia (OR = 0.56) were all related to the risk for shelter admission. Homelessness prevention programmes should target people with HIV risk factors, and HIV prevention programmes should be targeted to homeless persons, as these populations have significant intersection. Reasons and implications for this intersection are discussed.
A Study of Memory Effects in a Chess Database.
Schaigorodsky, Ana L; Perotti, Juan I; Billoni, Orlando V
2016-01-01
A series of recent works studying a database of chronologically sorted chess games-containing 1.4 million games played by humans between 1998 and 2007- have shown that the popularity distribution of chess game-lines follows a Zipf's law, and that time series inferred from the sequences of those game-lines exhibit long-range memory effects. The presence of Zipf's law together with long-range memory effects was observed in several systems, however, the simultaneous emergence of these two phenomena were always studied separately up to now. In this work, by making use of a variant of the Yule-Simon preferential growth model, introduced by Cattuto et al., we provide an explanation for the simultaneous emergence of Zipf's law and long-range correlations memory effects in a chess database. We find that Cattuto's Model (CM) is able to reproduce both, Zipf's law and the long-range correlations, including size-dependent scaling of the Hurst exponent for the corresponding time series. CM allows an explanation for the simultaneous emergence of these two phenomena via a preferential growth dynamics, including a memory kernel, in the popularity distribution of chess game-lines. This mechanism results in an aging process in the chess game-line choice as the database grows. Moreover, we find burstiness in the activity of subsets of the most active players, although the aggregated activity of the pool of players displays inter-event times without burstiness. We show that CM is not able to produce time series with bursty behavior providing evidence that burstiness is not required for the explanation of the long-range correlation effects in the chess database. Our results provide further evidence favoring the hypothesis that long-range correlations effects are a consequence of the aging of game-lines and not burstiness, and shed light on the mechanism that operates in the simultaneous emergence of Zipf's law and long-range correlations in a community of chess players.
Physiology-based face recognition in the thermal infrared spectrum.
Buddharaju, Pradeep; Pavlidis, Ioannis T; Tsiamyrtzis, Panagiotis; Bazakos, Mike
2007-04-01
The current dominant approaches to face recognition rely on facial characteristics that are on or over the skin. Some of these characteristics have low permanency can be altered, and their phenomenology varies significantly with environmental factors (e.g., lighting). Many methodologies have been developed to address these problems to various degrees. However, the current framework of face recognition research has a potential weakness due to its very nature. We present a novel framework for face recognition based on physiological information. The motivation behind this effort is to capitalize on the permanency of innate characteristics that are under the skin. To establish feasibility, we propose a specific methodology to capture facial physiological patterns using the bioheat information contained in thermal imagery. First, the algorithm delineates the human face from the background using the Bayesian framework. Then, it localizes the superficial blood vessel network using image morphology. The extracted vascular network produces contour shapes that are characteristic to each individual. The branching points of the skeletonized vascular network are referred to as Thermal Minutia Points (TMPs) and constitute the feature database. To render the method robust to facial pose variations, we collect for each subject to be stored in the database five different pose images (center, midleft profile, left profile, midright profile, and right profile). During the classification stage, the algorithm first estimates the pose of the test image. Then, it matches the local and global TMP structures extracted from the test image with those of the corresponding pose images in the database. We have conducted experiments on a multipose database of thermal facial images collected in our laboratory, as well as on the time-gap database of the University of Notre Dame. The good experimental results show that the proposed methodology has merit, especially with respect to the problem of low permanence over time. More importantly, the results demonstrate the feasibility of the physiological framework in face recognition and open the way for further methodological and experimental research in the area.
Shirdel, Elize A.; Xie, Wing; Mak, Tak W.; Jurisica, Igor
2011-01-01
Background MicroRNAs are a class of small RNAs known to regulate gene expression at the transcript level, the protein level, or both. Since microRNA binding is sequence-based but possibly structure-specific, work in this area has resulted in multiple databases storing predicted microRNA:target relationships computed using diverse algorithms. We integrate prediction databases, compare predictions to in vitro data, and use cross-database predictions to model the microRNA:transcript interactome – referred to as the micronome – to study microRNA involvement in well-known signalling pathways as well as associations with disease. We make this data freely available with a flexible user interface as our microRNA Data Integration Portal — mirDIP (http://ophid.utoronto.ca/mirDIP). Results mirDIP integrates prediction databases to elucidate accurate microRNA:target relationships. Using NAViGaTOR to produce interaction networks implicating microRNAs in literature-based, KEGG-based and Reactome-based pathways, we find these signalling pathway networks have significantly more microRNA involvement compared to chance (p<0.05), suggesting microRNAs co-target many genes in a given pathway. Further examination of the micronome shows two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets (p<0.0001), to be more studied (p<0.0002), and to have higher degree in the KEGG cancer pathway (p<0.0001), compared to intra-pathway microRNAs. Conclusions Our pathway-based analysis of mirDIP data suggests microRNAs are involved in intra-pathway signalling. We identify two distinct classes of microRNAs, suggesting a hierarchical organization of microRNAs co-targeting genes both within and between pathways, and implying differential involvement of universe and intra-pathway microRNAs at the disease level. PMID:21364759
A Study of Memory Effects in a Chess Database
Schaigorodsky, Ana L.; Perotti, Juan I.; Billoni, Orlando V.
2016-01-01
A series of recent works studying a database of chronologically sorted chess games–containing 1.4 million games played by humans between 1998 and 2007– have shown that the popularity distribution of chess game-lines follows a Zipf’s law, and that time series inferred from the sequences of those game-lines exhibit long-range memory effects. The presence of Zipf’s law together with long-range memory effects was observed in several systems, however, the simultaneous emergence of these two phenomena were always studied separately up to now. In this work, by making use of a variant of the Yule-Simon preferential growth model, introduced by Cattuto et al., we provide an explanation for the simultaneous emergence of Zipf’s law and long-range correlations memory effects in a chess database. We find that Cattuto’s Model (CM) is able to reproduce both, Zipf’s law and the long-range correlations, including size-dependent scaling of the Hurst exponent for the corresponding time series. CM allows an explanation for the simultaneous emergence of these two phenomena via a preferential growth dynamics, including a memory kernel, in the popularity distribution of chess game-lines. This mechanism results in an aging process in the chess game-line choice as the database grows. Moreover, we find burstiness in the activity of subsets of the most active players, although the aggregated activity of the pool of players displays inter-event times without burstiness. We show that CM is not able to produce time series with bursty behavior providing evidence that burstiness is not required for the explanation of the long-range correlation effects in the chess database. Our results provide further evidence favoring the hypothesis that long-range correlations effects are a consequence of the aging of game-lines and not burstiness, and shed light on the mechanism that operates in the simultaneous emergence of Zipf’s law and long-range correlations in a community of chess players. PMID:28005922
Stalder, Hanspeter; Hug, Corinne; Zanoni, Reto; Vogt, Hans-Rudolf; Peterhans, Ernst; Schweizer, Matthias; Bachofen, Claudia
2016-06-15
Pestiviruses infect a wide variety of animals of the order Artiodactyla, with bovine viral diarrhea virus (BVDV) being an economically important pathogen of livestock globally. BVDV is maintained in the cattle population by infecting fetuses early in gestation and, thus, by generating persistently infected (PI) animals that efficiently transmit the virus throughout their lifetime. In 2008, Switzerland started a national control campaign with the aim to eradicate BVDV from all bovines in the country by searching for and eliminating every PI cattle. Different from previous eradication programs, all animals of the entire population were tested for virus within one year, followed by testing each newborn calf in the subsequent four years. Overall, 3,855,814 animals were tested from 2008 through 2011, 20,553 of which returned an initial BVDV-positive result. We were able to obtain samples from at least 36% of all initially positive tested animals. We sequenced the 5' untranslated region (UTR) of more than 7400 pestiviral strains and compiled the sequence data in a database together with an array of information on the PI animals, among others, the location of the farm in which they were born, their dams, and the locations where the animals had lived. To our knowledge, this is the largest database combining viral sequences with animal data of an endemic viral disease. Using unique identification tags, the different datasets within the database were connected to run diverse molecular epidemiological analyses. The large sets of animal and sequence data made it possible to run analyses in both directions, i.e., starting from a likely epidemiological link, or starting from related sequences. We present the results of three epidemiological investigations in detail and a compilation of 122 individual investigations that show the usefulness of such a database in a country-wide BVD eradication program. Copyright © 2015 Elsevier B.V. All rights reserved.
Surgical research using national databases
Leland, Hyuma; Heckmann, Nathanael
2016-01-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945
Surgical research using national databases.
Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael
2016-10-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.
G-Hash: Towards Fast Kernel-based Similarity Search in Large Graph Databases.
Wang, Xiaohong; Smalter, Aaron; Huan, Jun; Lushington, Gerald H
2009-01-01
Structured data including sets, sequences, trees and graphs, pose significant challenges to fundamental aspects of data management such as efficient storage, indexing, and similarity search. With the fast accumulation of graph databases, similarity search in graph databases has emerged as an important research topic. Graph similarity search has applications in a wide range of domains including cheminformatics, bioinformatics, sensor network management, social network management, and XML documents, among others.Most of the current graph indexing methods focus on subgraph query processing, i.e. determining the set of database graphs that contains the query graph and hence do not directly support similarity search. In data mining and machine learning, various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models for supervised learning, graph kernel functions have (i) high computational complexity and (ii) non-trivial difficulty to be indexed in a graph database.Our objective is to bridge graph kernel function and similarity search in graph databases by proposing (i) a novel kernel-based similarity measurement and (ii) an efficient indexing structure for graph data management. Our method of similarity measurement builds upon local features extracted from each node and their neighboring nodes in graphs. A hash table is utilized to support efficient storage and fast search of the extracted local features. Using the hash table, a graph kernel function is defined to capture the intrinsic similarity of graphs and for fast similarity query processing. We have implemented our method, which we have named G-hash, and have demonstrated its utility on large chemical graph databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Most importantly, the new similarity measurement and the index structure is scalable to large database with smaller indexing size, faster indexing construction time, and faster query processing time as compared to state-of-the-art indexing methods such as C-tree, gIndex, and GraphGrep.
SS-Wrapper: a package of wrapper applications for similarity searches on Linux clusters
Wang, Chunlin; Lefkowitz, Elliot J
2004-01-01
Background Large-scale sequence comparison is a powerful tool for biological inference in modern molecular biology. Comparing new sequences to those in annotated databases is a useful source of functional and structural information about these sequences. Using software such as the basic local alignment search tool (BLAST) or HMMPFAM to identify statistically significant matches between newly sequenced segments of genetic material and those in databases is an important task for most molecular biologists. Searching algorithms are intrinsically slow and data-intensive, especially in light of the rapid growth of biological sequence databases due to the emergence of high throughput DNA sequencing techniques. Thus, traditional bioinformatics tools are impractical on PCs and even on dedicated UNIX servers. To take advantage of larger databases and more reliable methods, high performance computation becomes necessary. Results We describe the implementation of SS-Wrapper (Similarity Search Wrapper), a package of wrapper applications that can parallelize similarity search applications on a Linux cluster. Our wrapper utilizes a query segmentation-search (QS-search) approach to parallelize sequence database search applications. It takes into consideration load balancing between each node on the cluster to maximize resource usage. QS-search is designed to wrap many different search tools, such as BLAST and HMMPFAM using the same interface. This implementation does not alter the original program, so newly obtained programs and program updates should be accommodated easily. Benchmark experiments using QS-search to optimize BLAST and HMMPFAM showed that QS-search accelerated the performance of these programs almost linearly in proportion to the number of CPUs used. We have also implemented a wrapper that utilizes a database segmentation approach (DS-BLAST) that provides a complementary solution for BLAST searches when the database is too large to fit into the memory of a single node. Conclusions Used together, QS-search and DS-BLAST provide a flexible solution to adapt sequential similarity searching applications in high performance computing environments. Their ease of use and their ability to wrap a variety of database search programs provide an analytical architecture to assist both the seasoned bioinformaticist and the wet-bench biologist. PMID:15511296
EQUIP: A European Survey of Quality Criteria for the Evaluation of Databases.
ERIC Educational Resources Information Center
Wilson, T. D.
1998-01-01
Reports on two stages of an investigation into the perceived quality of online databases. Presents data from 989 questionnaires from 600 database users in 12 European and Scandinavian countries and results of a test of the SERVQUAL methodology for identifying user expectations about database services. Lists statements used in the SERVQUAL survey.…