NASA Astrophysics Data System (ADS)
Yin, Lucy; Andrews, Jennifer; Heaton, Thomas
2018-05-01
Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Osteoporosis therapies: evidence from health-care databases and observational population studies.
Silverman, Stuart L
2010-11-01
Osteoporosis is a well-recognized disease with severe consequences if left untreated. Randomized controlled trials are the most rigorous method for determining the efficacy and safety of therapies. Nevertheless, randomized controlled trials underrepresent the real-world patient population and are costly in both time and money. Modern technology has enabled researchers to use information gathered from large health-care or medical-claims databases to assess the practical utilization of available therapies in appropriate patients. Observational database studies lack randomization but, if carefully designed and successfully completed, can provide valuable information that complements results obtained from randomized controlled trials and extends our knowledge to real-world clinical patients. Randomized controlled trials comparing fracture outcomes among osteoporosis therapies are difficult to perform. In this regard, large observational database studies could be useful in identifying clinically important differences among therapeutic options. Database studies can also provide important information with regard to osteoporosis prevalence, health economics, and compliance and persistence with treatment. This article describes the strengths and limitations of both randomized controlled trials and observational database studies, discusses considerations for observational study design, and reviews a wealth of information generated by database studies in the field of osteoporosis.
Validation of a common data model for active safety surveillance research
Ryan, Patrick B; Reich, Christian G; Hartzema, Abraham G; Stang, Paul E
2011-01-01
Objective Systematic analysis of observational medical databases for active safety surveillance is hindered by the variation in data models and coding systems. Data analysts often find robust clinical data models difficult to understand and ill suited to support their analytic approaches. Further, some models do not facilitate the computations required for systematic analysis across many interventions and outcomes for large datasets. Translating the data from these idiosyncratic data models to a common data model (CDM) could facilitate both the analysts' understanding and the suitability for large-scale systematic analysis. In addition to facilitating analysis, a suitable CDM has to faithfully represent the source observational database. Before beginning to use the Observational Medical Outcomes Partnership (OMOP) CDM and a related dictionary of standardized terminologies for a study of large-scale systematic active safety surveillance, the authors validated the model's suitability for this use by example. Validation by example To validate the OMOP CDM, the model was instantiated into a relational database, data from 10 different observational healthcare databases were loaded into separate instances, a comprehensive array of analytic methods that operate on the data model was created, and these methods were executed against the databases to measure performance. Conclusion There was acceptable representation of the data from 10 observational databases in the OMOP CDM using the standardized terminologies selected, and a range of analytic methods was developed and executed with sufficient performance to be useful for active safety surveillance. PMID:22037893
NVST Data Archiving System Based On FastBit NoSQL Database
NASA Astrophysics Data System (ADS)
Liu, Ying-bo; Wang, Feng; Ji, Kai-fan; Deng, Hui; Dai, Wei; Liang, Bo
2014-06-01
The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces a maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our study brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.
A Database as a Service for the Healthcare System to Store Physiological Signal Data.
Chang, Hsien-Tsung; Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records- 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users-we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance.
A Database as a Service for the Healthcare System to Store Physiological Signal Data
Lin, Tsai-Huei
2016-01-01
Wearable devices that measure physiological signals to help develop self-health management habits have become increasingly popular in recent years. These records are conducive for follow-up health and medical care. In this study, based on the characteristics of the observed physiological signal records– 1) a large number of users, 2) a large amount of data, 3) low information variability, 4) data privacy authorization, and 5) data access by designated users—we wish to resolve physiological signal record-relevant issues utilizing the advantages of the Database as a Service (DaaS) model. Storing a large amount of data using file patterns can reduce database load, allowing users to access data efficiently; the privacy control settings allow users to store data securely. The results of the experiment show that the proposed system has better database access performance than a traditional relational database, with a small difference in database volume, thus proving that the proposed system can improve data storage performance. PMID:28033415
A Framework for Cloudy Model Optimization and Database Storage
NASA Astrophysics Data System (ADS)
Calvén, Emilia; Helton, Andrew; Sankrit, Ravi
2018-01-01
We present a framework for producing Cloudy photoionization models of the nebular emission from novae ejecta and storing a subset of the results in SQL database format for later usage. The database can be searched for models best fitting observed spectral line ratios. Additionally, the framework includes an optimization feature that can be used in tandem with the database to search for and improve on models by creating new Cloudy models while, varying the parameters. The database search and optimization can be used to explore the structures of nebulae by deriving their properties from the best-fit models. The goal is to provide the community with a large database of Cloudy photoionization models, generated from parameters reflecting conditions within novae ejecta, that can be easily fitted to observed spectral lines; either by directly accessing the database using the framework code or by usage of a website specifically made for this purpose.
NASA Technical Reports Server (NTRS)
Benson, Robert F.; Fainberg, Joseph; Osherovich, Vladimir A.; Truhlik, Vladimir; Wang, Yongli; Bilitza, Dieter; Fung, Shing F.
2015-01-01
Large magnetic-storm induced changes have been detected in high-latitude topside vertical electron-density profiles Ne(h). The investigation was based on the large database of topside Ne(h) profiles and digital topside ionograms from the International Satellites for Ionospheric Studies (ISIS) program available from the NASA Space Physics Data Facility (SPDF) at http://spdf.gsfc.nasa.gov/isis/isis-status.html. This large database enabled Ne(h) profiles to be obtained when an ISIS satellite passed through nearly the same region of space before, during, and after a major magnetic storm. A major goal was to relate the magnetic-storm induced high-latitude Ne(h) profile changes to solar-wind parameters. Thus an additional data constraint was to consider only storms where solar-wind data were available from the NASA/SPDF OMNIWeb database. Ten large magnetic storms (with Dst less than -100 nT) were identified that satisfied both the Ne(h) profile and the solar-wind data constraints. During five of these storms topside ionospheric Ne(h) profiles were available in the high-latitude northern hemisphere and during the other five storms similar ionospheric data were available in the southern hemisphere. Large Ne(h) changes were observed during each one of these storms. Our concentration in this paper is on the northern hemisphere. The data coverage was best for the northern-hemisphere winter. Here Ne(h) profile enhancements were always observed when the magnetic local time (MLT) was between 00 and 03 and Ne(h) profile depletions were always observed between 08 and 10 MLT. The observed Ne(h) deviations were compared with solar-wind parameters, with appropriate time shifts, for four storms.
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Suchard, Marc A; Zorych, Ivan; Simpson, Shawn E; Schuemie, Martijn J; Ryan, Patrick B; Madigan, David
2013-10-01
The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown. To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data. We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs. We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples. The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability. The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.
Batista Rodríguez, Gabriela; Balla, Andrea; Fernández-Ananín, Sonia; Balagué, Carmen; Targarona, Eduard M
2018-05-01
The term big data refers to databases that include large amounts of information used in various areas of knowledge. Currently, there are large databases that allow the evaluation of postoperative evolution, such as the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP), the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample (NIS), and the National Cancer Database (NCDB). The aim of this review was to evaluate the clinical impact of information obtained from these registries regarding gastroesophageal surgery. A systematic review using the Meta-analysis of Observational Studies in Epidemiology guidelines was performed. The research was carried out using the PubMed database identifying 251 articles. All outcomes related to gastroesophageal surgery were analyzed. A total of 34 articles published between January 2007 and July 2017 were included, for a total of 345 697 patients. Studies were analyzed and divided according to the type of surgery and main theme in (1) esophageal surgery and (2) gastric surgery. The information provided by these databases is an effective way to obtain levels of evidence not obtainable by conventional methods. Furthermore, this information is useful for the external validation of previous studies, to establish benchmarks that allow comparisons between centers and have a positive impact on the quality of care.
NASA Astrophysics Data System (ADS)
Hueso, R.; Juaristi, J.; Legarreta, J.; Sánchez-Lavega, A.; Rojas, J. F.; Erard, S.; Cecconi, B.; Le Sidaner, Pierre
2018-01-01
Since 2003 the Planetary Virtual Observatory and Laboratory (PVOL) has been storing and serving publicly through its web site a large database of amateur observations of the Giant Planets (Hueso et al., 2010a). These images are used for scientific research of the atmospheric dynamics and cloud structure on these planets and constitute a powerful resource to address time variable phenomena in their atmospheres. Advances over the last decade in observation techniques, and a wider recognition by professional astronomers of the quality of amateur observations, have resulted in the need to upgrade this database. We here present major advances in the PVOL database, which has evolved into a full virtual planetary observatory encompassing also observations of Mercury, Venus, Mars, the Moon and the Galilean satellites. Besides the new objects, the images can be tagged and the database allows simple and complex searches over the data. The new web service: PVOL2 is available online in http://pvol2.ehu.eus/.
Hirano, Yoko; Asami, Yuko; Kuribayashi, Kazuhiko; Kitazaki, Shigeru; Yamamoto, Yuji; Fujimoto, Yoko
2018-05-01
Many pharmacoepidemiologic studies using large-scale databases have recently been utilized to evaluate the safety and effectiveness of drugs in Western countries. In Japan, however, conventional methodology has been applied to postmarketing surveillance (PMS) to collect safety and effectiveness information on new drugs to meet regulatory requirements. Conventional PMS entails enormous costs and resources despite being an uncontrolled observational study method. This study is aimed at examining the possibility of database research as a more efficient pharmacovigilance approach by comparing a health care claims database and PMS with regard to the characteristics and safety profiles of sertraline-prescribed patients. The characteristics of sertraline-prescribed patients recorded in a large-scale Japanese health insurance claims database developed by MinaCare Co. Ltd. were scanned and compared with the PMS results. We also explored the possibility of detecting signals indicative of adverse reactions based on the claims database by using sequence symmetry analysis. Diabetes mellitus, hyperlipidemia, and hyperthyroidism served as exploratory events, and their detection criteria for the claims database were reported by the Pharmaceuticals and Medical Devices Agency in Japan. Most of the characteristics of sertraline-prescribed patients in the claims database did not differ markedly from those in the PMS. There was no tendency for higher risks of the exploratory events after exposure to sertraline, and this was consistent with sertraline's known safety profile. Our results support the concept of using database research as a cost-effective pharmacovigilance tool that is free of selection bias . Further investigation using database research is required to confirm our preliminary observations. Copyright © 2018. Published by Elsevier Inc.
[Benefits of large healthcare databases for drug risk research].
Garbe, Edeltraut; Pigeot, Iris
2015-08-01
Large electronic healthcare databases have become an important worldwide data resource for drug safety research after approval. Signal generation methods and drug safety studies based on these data facilitate the prospective monitoring of drug safety after approval, as has been recently required by EU law and the German Medicines Act. Despite its large size, a single healthcare database may include insufficient patients for the study of a very small number of drug-exposed patients or the investigation of very rare drug risks. For that reason, in the United States, efforts have been made to work on models that provide the linkage of data from different electronic healthcare databases for monitoring the safety of medicines after authorization in (i) the Sentinel Initiative and (ii) the Observational Medical Outcomes Partnership (OMOP). In July 2014, the pilot project Mini-Sentinel included a total of 178 million people from 18 different US databases. The merging of the data is based on a distributed data network with a common data model. In the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance (ENCEPP) there has been no comparable merging of data from different databases; however, first experiences have been gained in various EU drug safety projects. In Germany, the data of the statutory health insurance providers constitute the most important resource for establishing a large healthcare database. Their use for this purpose has so far been severely restricted by the Code of Social Law (Section 75, Book 10). Therefore, a reform of this section is absolutely necessary.
Applications of Precipitation Feature Databases from GPM core and constellation Satellites
NASA Astrophysics Data System (ADS)
Liu, C.
2017-12-01
Using the observations from Global Precipitation Mission (GPM) core and constellation satellites, global precipitation was quantitatively described from the perspective of precipitation systems and their properties. This presentation will introduce the development of precipitation feature databases, and several scientific questions that have been tackled using this database, including the topics of global snow precipitation, extreme intensive convection, hail storms, extreme precipitation, and microphysical properties derived with dual frequency radars at the top of convective cores. As more and more observations of constellation satellites become available, it is anticipated that the precipitation feature approach will help to address a large variety of scientific questions in the future. For anyone who is interested, all the current precipitation feature databases are freely open to public at: http://atmos.tamucc.edu/trmm/.
Medication safety research by observational study design.
Lao, Kim S J; Chui, Celine S L; Man, Kenneth K C; Lau, Wallis C Y; Chan, Esther W; Wong, Ian C K
2016-06-01
Observational studies have been recognised to be essential for investigating the safety profile of medications. Numerous observational studies have been conducted on the platform of large population databases, which provide adequate sample size and follow-up length to detect infrequent and/or delayed clinical outcomes. Cohort and case-control are well-accepted traditional methodologies for hypothesis testing, while within-individual study designs are developing and evolving, addressing previous known methodological limitations to reduce confounding and bias. Respective examples of observational studies of different study designs using medical databases are shown. Methodology characteristics, study assumptions, strengths and weaknesses of each method are discussed in this review.
The electric dipole moment of DNA-binding HU protein calculated by the use of an NMR database.
Takashima, S; Yamaoka, K
1999-08-30
Electric birefringence measurements indicated the presence of a large permanent dipole moment in HU protein-DNA complex. In order to substantiate this observation, numerical computation of the dipole moment of HU protein homodimer was carried out by using NMR protein databases. The dipole moments of globular proteins have hitherto been calculated with X-ray databases and NMR data have never been used before. The advantages of NMR databases are: (a) NMR data are obtained, unlike X-ray databases, using protein solutions. Accordingly, this method eliminates the bothersome question as to the possible alteration of the protein structure due to the transition from the crystalline state to the solution state. This question is particularly important for proteins such as HU protein which has some degree of internal flexibility; (b) the three-dimensional coordinates of hydrogen atoms in protein molecules can be determined with a sufficient resolution and this enables the N-H as well as C = O bond moments to be calculated. Since the NMR database of HU protein from Bacillus stearothermophilus consists of 25 models, the surface charge as well as the core dipole moments were computed for each of these structures. The results of these calculations show that the net permanent dipole moments of HU protein homodimer is approximately 500-530 D (1 D = 3.33 x 10(-30) Cm) at pH 7.5 and 600-630 D at the isoelectric point (pH 10.5). These permanent dipole moments are unusually large for a small protein of the size of 19.5 kDa. Nevertheless, the result of numerical calculations is compatible with the electro-optical observation, confirming a very large dipole moment in this protein.
Wise, Gregory R; Schwartz, Brian P; Dittoe, Nathaniel; Safar, Ammar; Sherman, Steven; Bowdy, Bruce; Hahn, Harvey S
2012-06-01
Percutaneous coronary intervention (PCI) is the most commonly used procedure for coronary revascularization. There are multiple adjuvant anticoagulation strategies available. In this era of cost containment, we performed a comparative effectiveness analysis of clinical outcomes and cost of the major anticoagulant strategies across all types of PCI procedures in a large observational database. A retrospective, comparative effectiveness analysis of the Premier observational database was conducted to determine the impact of anticoagulant treatment on outcomes. Multiple linear regression and logistic regression models were used to assess the association of initial antithrombotic treatment with outcomes while controlling for other factors. A total of 458,448 inpatient PCI procedures with known antithrombotic regimen from 299 hospitals between January 1, 2004 and March 31, 2008 were identified. Compared to patients treated with heparin plus glycoprotein IIb/IIIa inhibitor (GPI), bivalirudin was associated with a 41% relative risk reduction (RRR) for inpatient mortality, a 44% RRR for clinically apparent bleeding, and a 37% RRR for any transfusion. Furthermore, treatment with bivalirudin alone resulted in a cost savings of $976 per case. Similar results were seen between bivalirudin and heparin in all end-points. Combined use of both bivalirudin and GPI substantially attenuated the cost benefits demonstrated with bivalirudin alone. Bivalirudin use was associated with both improved clinical outcomes and decreased hospital costs in this large "real-world" database. To our knowledge, this study is the first to demonstrate the ideal comparative effectiveness end-point of both improved clinical outcomes with decreased costs in PCI. ©2012, Wiley Periodicals, Inc.
A Review of Stellar Abundance Databases and the Hypatia Catalog Database
NASA Astrophysics Data System (ADS)
Hinkel, Natalie Rose
2018-01-01
The astronomical community is interested in elements from lithium to thorium, from solar twins to peculiarities of stellar evolution, because they give insight into different regimes of star formation and evolution. However, while some trends between elements and other stellar or planetary properties are well known, many other trends are not as obvious and are a point of conflict. For example, stars that host giant planets are found to be consistently enriched in iron, but the same cannot be definitively said for any other element. Therefore, it is time to take advantage of large stellar abundance databases in order to better understand not only the large-scale patterns, but also the more subtle, small-scale trends within the data.In this overview to the special session, I will present a review of large stellar abundance databases that are both currently available (i.e. RAVE, APOGEE) and those that will soon be online (i.e. Gaia-ESO, GALAH). Additionally, I will discuss the Hypatia Catalog Database (www.hypatiacatalog.com) -- which includes abundances from individual literature sources that observed stars within 150pc. The Hypatia Catalog currently contains 72 elements as measured within ~6000 stars, with a total of ~240,000 unique abundance determinations. The online database offers a variety of solar normalizations, stellar properties, and planetary properties (where applicable) that can all be viewed through multiple interactive plotting interfaces as well as in a tabular format. By analyzing stellar abundances for large populations of stars and from a variety of different perspectives, a wealth of information can be revealed on both large and small scales.
Smeets, Hugo M; de Wit, Niek J; Hoes, Arno W
2011-04-01
Observational studies performed within routine health care databases have the advantage of their large size and, when the aim is to assess the effect of interventions, can offer a completion to randomized controlled trials with usually small samples from experimental situations. Institutional Health Insurance Databases (HIDs) are attractive for research because of their large size, their longitudinal perspective, and their practice-based information. As they are based on financial reimbursement, the information is generally reliable. The database of one of the major insurance companies in the Netherlands, the Agis Health Database (AHD), is described in detail. Whether the AHD data sets meet the specific requirements to conduct several types of clinical studies is discussed according to the classification of the four different types of clinical research; that is, diagnostic, etiologic, prognostic, and intervention research. The potential of the AHD for these various types of research is illustrated using examples of studies recently conducted in the AHD. HIDs such as the AHD offer large potential for several types of clinical research, in particular etiologic and intervention studies, but at present the lack of detailed clinical information is an important limitation. Copyright © 2011 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Rainer, M.; Poretti, E.; Mistò, A.; Panzera, M. R.; Molinaro, M.; Cepparo, F.; Roth, M.; Michel, E.; Monteiro, M. J. P. F. G.
2016-12-01
We created a large database of physical parameters and variability indicators by fully reducing and analyzing the large number of spectra taken to complement the asteroseismic observations of the COnvection, ROtation and planetary Transits (CoRoT) satellite. 7103 spectra of 261 stars obtained with the ESO echelle spectrograph HARPS have been stored in the VO-compliant database Spectroscopic Indicators in a SeisMic Archive (SISMA), along with the CoRoT photometric data of the 72 CoRoT asteroseismic targets. The remaining stars belong to the same variable classes of the CoRoT targets and were observed to better characterize the properties of such classes. Several useful variability indicators (mean line profiles, indices of differential rotation, activity and emission lines) together with v\\sin I and radial-velocity measurements have been extracted from the spectra. The atmospheric parameters {T}{eff},{log}g, and [Fe/H] have been computed following a homogeneous procedure. As a result, we fully characterize a sample of new and known variable stars by computing several spectroscopic indicators, also providing some cases of simultaneous photometry and spectroscopy.
Aegerter, Philippe; Bendersky, Noelle; Tran, Thi-Chien; Ropers, Jacques; Taright, Namik; Chatellier, Gilles
2014-01-01
Recruitment of large samples of patients is crucial for evidence level and efficacy of clinical trials (CT). Clinical Trial Recruitment Support Systems (CTRSS) used to estimate patient recruitment are generally specific to Hospital Information Systems and few were evaluated on a large number of trials. Our aim was to assess, on a large number of CT, the usefulness of commonly available data as Diagnosis Related Groups (DRG) databases in order to estimate potential recruitment. We used the DRG database of a large French multicenter medical institution (1.2 million inpatient stays and 400 new trials each year). Eligibility criteria of protocols were broken down into in atomic entities (diagnosis, procedures, treatments...) then translated into codes and operators recorded in a standardized form. A program parsed the forms and generated requests on the DRG database. A large majority of selection criteria could be coded and final estimations of number of eligible patients were close to observed ones (median difference = 25). Such a system could be part of the feasability evaluation and center selection process before the start of the clinical trial.
Nosql for Storage and Retrieval of Large LIDAR Data Collections
NASA Astrophysics Data System (ADS)
Boehm, J.; Liu, K.
2015-08-01
Developments in LiDAR technology over the past decades have made LiDAR to become a mature and widely accepted source of geospatial information. This in turn has led to an enormous growth in data volume. The central idea for a file-centric storage of LiDAR point clouds is the observation that large collections of LiDAR data are typically delivered as large collections of files, rather than single files of terabyte size. This split of the dataset, commonly referred to as tiling, was usually done to accommodate a specific processing pipeline. It makes therefore sense to preserve this split. A document oriented NoSQL database can easily emulate this data partitioning, by representing each tile (file) in a separate document. The document stores the metadata of the tile. The actual files are stored in a distributed file system emulated by the NoSQL database. We demonstrate the use of MongoDB a highly scalable document oriented NoSQL database for storing large LiDAR files. MongoDB like any NoSQL database allows for queries on the attributes of the document. As a specialty MongoDB also allows spatial queries. Hence we can perform spatial queries on the bounding boxes of the LiDAR tiles. Inserting and retrieving files on a cloud-based database is compared to native file system and cloud storage transfer speed.
Bianca N. I. Eskelson; Hailemariam Temesgen; Valerie Lemay; Tara M. Barrett; Nicholas L. Crookston; Andrew T. Hudak
2009-01-01
Almost universally, forest inventory and monitoring databases are incomplete, ranging from missing data for only a few records and a few variables, common for small land areas, to missing data for many observations and many variables, common for large land areas. For a wide variety of applications, nearest neighbor (NN) imputation methods have been developed to fill in...
Batista Rodríguez, Gabriela; Balla, Andrea; Corradetti, Santiago; Martinez, Carmen; Hernández, Pilar; Bollo, Jesús; Targarona, Eduard M
2018-06-01
"Big data" refers to large amount of dataset. Those large databases are useful in many areas, including healthcare. The American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) and the National Inpatient Sample (NIS) are big databases that were developed in the USA in order to record surgical outcomes. The aim of the present systematic review is to evaluate the type and clinical impact of the information retrieved through NISQP and NIS big database articles focused on laparoscopic colorectal surgery. A systematic review was conducted using The Meta-Analysis Of Observational Studies in Epidemiology (MOOSE) guidelines. The research was carried out on PubMed database and revealed 350 published papers. Outcomes of articles in which laparoscopic colorectal surgery was the primary aim were analyzed. Fifty-five studies, published between 2007 and February 2017, were included. Articles included were categorized in groups according to the main topic as: outcomes related to surgical technique comparisons, morbidity and perioperatory results, specific disease-related outcomes, sociodemographic disparities, and academic training impact. NSQIP and NIS databases are just the tip of the iceberg for the potential application of Big Data technology and analysis in MIS. Information obtained through big data is useful and could be considered as external validation in those situations where a significant evidence-based medicine exists; also, those databases establish benchmarks to measure the quality of patient care. Data retrieved helps to inform decision-making and improve healthcare delivery.
Tropical Cyclone Information System
NASA Technical Reports Server (NTRS)
Li, P. Peggy; Knosp, Brian W.; Vu, Quoc A.; Yi, Chao; Hristova-Veleva, Svetla M.
2009-01-01
The JPL Tropical Cyclone Infor ma tion System (TCIS) is a Web portal (http://tropicalcyclone.jpl.nasa.gov) that provides researchers with an extensive set of observed hurricane parameters together with large-scale and convection resolving model outputs. It provides a comprehensive set of high-resolution satellite (see figure), airborne, and in-situ observations in both image and data formats. Large-scale datasets depict the surrounding environmental parameters such as SST (Sea Surface Temperature) and aerosol loading. Model outputs and analysis tools are provided to evaluate model performance and compare observations from different platforms. The system pertains to the thermodynamic and microphysical structure of the storm, the air-sea interaction processes, and the larger-scale environment as depicted by ocean heat content and the aerosol loading of the environment. Currently, the TCIS is populated with satellite observations of all tropical cyclones observed globally during 2005. There is a plan to extend the database both forward in time till present as well as backward to 1998. The portal is powered by a MySQL database and an Apache/Tomcat Web server on a Linux system. The interactive graphic user interface is provided by Google Map.
A comparison of database systems for XML-type data.
Risse, Judith E; Leunissen, Jack A M
2010-01-01
In the field of bioinformatics interchangeable data formats based on XML are widely used. XML-type data is also at the core of most web services. With the increasing amount of data stored in XML comes the need for storing and accessing the data. In this paper we analyse the suitability of different database systems for storing and querying large datasets in general and Medline in particular. All reviewed database systems perform well when tested with small to medium sized datasets, however when the full Medline dataset is queried a large variation in query times is observed. There is not one system that is vastly superior to the others in this comparison and, depending on the database size and the query requirements, different systems are most suitable. The best all-round solution is the Oracle 11~g database system using the new binary storage option. Alias-i's Lingpipe is a more lightweight, customizable and sufficiently fast solution. It does however require more initial configuration steps. For data with a changing XML structure Sedna and BaseX as native XML database systems or MySQL with an XML-type column are suitable.
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of "big data" has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on "adherence to prescription and medical plans" identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases.
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-01-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10-17. This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications. PMID:21597912
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-07-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10(-17). This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications.
van Staa, T-P; Klungel, O; Smeeth, L
2014-06-01
A solid foundation of evidence of the effects of an intervention is a prerequisite of evidence-based medicine. The best source of such evidence is considered to be randomized trials, which are able to avoid confounding. However, they may not always estimate effectiveness in clinical practice. Databases that collate anonymized electronic health records (EHRs) from different clinical centres have been widely used for many years in observational studies. Randomized point-of-care trials have been initiated recently to recruit and follow patients using the data from EHR databases. In this review, we describe how EHR databases can be used for conducting large-scale simple trials and discuss the advantages and disadvantages of their use. © 2014 The Association for the Publication of the Journal of Internal Medicine.
BAO Plate Archive Project: Digitization, Electronic Database and Research Programmes
NASA Astrophysics Data System (ADS)
Mickaelian, A. M.; Abrahamyan, H. V.; Andreasyan, H. R.; Azatyan, N. M.; Farmanyan, S. V.; Gigoyan, K. S.; Gyulzadyan, M. V.; Khachatryan, K. G.; Knyazyan, A. V.; Kostandyan, G. R.; Mikayelyan, G. A.; Nikoghosyan, E. H.; Paronyan, G. M.; Vardanyan, A. V.
2016-06-01
The most important part of the astronomical observational heritage are astronomical plate archives created on the basis of numerous observations at many observatories. Byurakan Astrophysical Observatory (BAO) plate archive consists of 37,000 photographic plates and films, obtained at 2.6m telescope, 1m and 0.5m Schmidt type and other smaller telescopes during 1947-1991. In 2002-2005, the famous Markarian Survey (also called First Byurakan Survey, FBS) 1874 plates were digitized and the Digitized FBS (DFBS) was created. New science projects have been conducted based on these low-dispersion spectroscopic material. A large project on the whole BAO Plate Archive digitization, creation of electronic database and its scientific usage was started in 2015. A Science Program Board is created to evaluate the observing material, to investigate new possibilities and to propose new projects based on the combined usage of these observations together with other world databases. The Executing Team consists of 11 astronomers and 2 computer scientists and will use 2 EPSON Perfection V750 Pro scanners for the digitization, as well as Armenian Virtual Observatory (ArVO) database will be used to accommodate all new data. The project will run during 3 years in 2015-2017 and the final result will be an electronic database and online interactive sky map to be used for further research projects, mainly including high proper motion stars, variable objects and Solar System bodies.
Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy
2013-08-01
Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
Ueber den Nachweis von Exoplaneten in der ASAS-3 Datenbank
NASA Astrophysics Data System (ADS)
Huemmerich, Stefan; Bernhard, Klaus
2015-02-01
Under favourable circumstances, transits of known exoplanets with large amplitudes like WASP-18 b can be observed in the ASAS-3 database. An attempt to search for exoplanets using ASAS-3 data is discussed.
An Evaluation of Database Solutions to Spatial Object Association
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2008-06-24
Object association is a common problem encountered in many applications. Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two datasets based on their positions in a common spatial coordinate system--one of the datasets may correspond to a catalog of objects observed over time in a multi-dimensional domain; the other dataset may consist of objects observed in a snapshot of the domain at a time point. The use of database management systems to the solve the object association problem provides portability across different platforms and also greater flexibility. Increasingmore » dataset sizes in today's applications, however, have made object association a data/compute-intensive problem that requires targeted optimizations for efficient execution. In this work, we investigate how database-based crossmatch algorithms can be deployed on different database system architectures and evaluate the deployments to understand the impact of architectural choices on crossmatch performance and associated trade-offs. We investigate the execution of two crossmatch algorithms on (1) a parallel database system with active disk style processing capabilities, (2) a high-throughput network database (MySQL Cluster), and (3) shared-nothing databases with replication. We have conducted our study in the context of a large-scale astronomy application with real use-case scenarios.« less
Huel, René L. M.; Bašić, Lara; Madacki-Todorović, Kamelija; Smajlović, Lejla; Eminović, Izet; Berbić, Irfan; Miloš, Ana; Parsons, Thomas J.
2007-01-01
Aim To present a compendium of off-ladder alleles and other genotyping irregularities relating to rare/unexpected population genetic variation, observed in a large short tandem repeat (STR) database from Bosnia and Serbia. Methods DNA was extracted from blood stain cards relating to reference samples from a population of 32 800 individuals from Bosnia and Serbia, and typed using Promega’s PowerPlex®16 STR kit. Results There were 31 distinct off-ladder alleles were observed in 10 of the 15 STR loci amplified from the PowerPlex®16 STR kit. Of these 31 alleles, 3 have not been previously reported. Furthermore, 16 instances of triallelic patterns were observed in 9 of the 15 loci. Primer binding site mismatches that affected amplification were observed in two loci, D5S818 and D8S1179. Conclusion Instances of deviations from manufacturer’s allelic ladders should be expected and caution taken to properly designate the correct alleles in large DNA databases. Particular care should be taken in kinship matching or paternity cases as incorrect designation of any of these deviations from allelic ladders could lead to false exclusions. PMID:17696304
NASA Astrophysics Data System (ADS)
Appel, Marius; Lahn, Florian; Pebesma, Edzer; Buytaert, Wouter; Moulds, Simon
2016-04-01
Today's amount of freely available data requires scientists to spend large parts of their work on data management. This is especially true in environmental sciences when working with large remote sensing datasets, such as obtained from earth-observation satellites like the Sentinel fleet. Many frameworks like SpatialHadoop or Apache Spark address the scalability but target programmers rather than data analysts, and are not dedicated to imagery or array data. In this work, we use the open-source data management and analytics system SciDB to bring large earth-observation datasets closer to analysts. Its underlying data representation as multidimensional arrays fits naturally to earth-observation datasets, distributes storage and computational load over multiple instances by multidimensional chunking, and also enables efficient time-series based analyses, which is usually difficult using file- or tile-based approaches. Existing interfaces to R and Python furthermore allow for scalable analytics with relatively little learning effort. However, interfacing SciDB and file-based earth-observation datasets that come as tiled temporal snapshots requires a lot of manual bookkeeping during ingestion, and SciDB natively only supports loading data from CSV-like and custom binary formatted files, which currently limits its practical use in earth-observation analytics. To make it easier to work with large multi-temporal datasets in SciDB, we developed software tools that enrich SciDB with earth observation metadata and allow working with commonly used file formats: (i) the SciDB extension library scidb4geo simplifies working with spatiotemporal arrays by adding relevant metadata to the database and (ii) the Geospatial Data Abstraction Library (GDAL) driver implementation scidb4gdal allows to ingest and export remote sensing imagery from and to a large number of file formats. Using added metadata on temporal resolution and coverage, the GDAL driver supports time-based ingestion of imagery to existing multi-temporal SciDB arrays. While our SciDB plugin works directly in the database, the GDAL driver has been specifically developed using a minimum amount of external dependencies (i.e. CURL). Source code for both tools is available from github [1]. We present these tools in a case-study that demonstrates the ingestion of multi-temporal tiled earth-observation data to SciDB, followed by a time-series analysis using R and SciDBR. Through the exclusive use of open-source software, our approach supports reproducibility in scalable large-scale earth-observation analytics. In the future, these tools can be used in an automated way to let scientists only work on ready-to-use SciDB arrays to significantly reduce the data management workload for domain scientists. [1] https://github.com/mappl/scidb4geo} and \\url{https://github.com/mappl/scidb4gdal
Surveying the Sky at Low Frequencies with the Commensal VLITE System
NASA Astrophysics Data System (ADS)
Clarke, Tracy; Kassim, Namir E.; Richards, Emily; Peters, Wendy; Polisensky, Emil
2017-05-01
We present details of a new commensal observing program on NRAO's Karl G. Jansky Very Large Array (VLA). The VLA Low-band Ionosphere and Transient Experiment (VLITE) provides a simultaneous sub-GHz data stream during all Cassegrain (1-50 GHz) observations. This unique low frequency opportunity opens up over 6000 hours per year of VLA observing time to the low frequency community. In the first 2 1/4 years of operation, VLITE processed images cover regions containing 2,322 unique exoplanets in 62,000 individual scans. VLITE observations provide a large database to observe samples of nearby stellar systems, enabling a powerful means of monitoring these systems for stellar activity as well as emission from exoplanets.
A new improved database to support spanish phenological observations
NASA Astrophysics Data System (ADS)
Romero-Fresneda, Ramiro; Martínez-Núñez, Lourdes; Botey-Fullat, Roser; Gallego-Abaroa, Teresa; De Cara-García, Juan Antonio; Rodríguez-Ballesteros, César
2017-04-01
Since the last 30 years, phenology has regained scientific interest as the most reported biological indicator of anthropogenic climate change. AEMET (Spanish National Meteorological Agency) has long records in the field of phenological observations, since the 1940s. However, there is a large variety of paper records which are necessary to digitalize. On the other hand, it had been necessary to adapt our methods to the World Meteorological Organization (WMO) guidelines (BBCH code, data documentation- metadata…) and to standardize phenological stages and species in order to provide information to PEP725 (Pan European Phenology Database). Consequently, AEMET is developing a long-term, multi-taxa phenological database to support research and scientific studies about climate, their variability and influence on natural ecosystems, agriculture, etc. This paper presents the steps that are being carried out in order to achieve this goal.
Menditto, Enrica; Bolufer De Gea, Angela; Cahir, Caitriona; Marengoni, Alessandra; Riegler, Salvatore; Fico, Giuseppe; Costa, Elisio; Monaco, Alessandro; Pecorelli, Sergio; Pani, Luca; Prados-Torres, Alexandra
2016-01-01
Computerized health care databases have been widely described as an excellent opportunity for research. The availability of “big data” has brought about a wave of innovation in projects when conducting health services research. Most of the available secondary data sources are restricted to the geographical scope of a given country and present heterogeneous structure and content. Under the umbrella of the European Innovation Partnership on Active and Healthy Ageing, collaborative work conducted by the partners of the group on “adherence to prescription and medical plans” identified the use of observational and large-population databases to monitor medication-taking behavior in the elderly. This article describes the methodology used to gather the information from available databases among the Adherence Action Group partners with the aim of improving data sharing on a European level. A total of six databases belonging to three different European countries (Spain, Republic of Ireland, and Italy) were included in the analysis. Preliminary results suggest that there are some similarities. However, these results should be applied in different contexts and European countries, supporting the idea that large European studies should be designed in order to get the most of already available databases. PMID:27358570
You, Seng Chan; Lee, Seongwon; Cho, Soo-Yeon; Park, Hojun; Jung, Sungjae; Cho, Jaehyeong; Yoon, Dukyong; Park, Rae Woong
2017-01-01
It is increasingly necessary to generate medical evidence applicable to Asian people compared to those in Western countries. Observational Health Data Sciences a Informatics (OHDSI) is an international collaborative which aims to facilitate generating high-quality evidence via creating and applying open-source data analytic solutions to a large network of health databases across countries. We aimed to incorporate Korean nationwide cohort data into the OHDSI network by converting the national sample cohort into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM). The data of 1.13 million subjects was converted to OMOP-CDM, resulting in average 99.1% conversion rate. The ACHILLES, open-source OMOP-CDM-based data profiling tool, was conducted on the converted database to visualize data-driven characterization and access the quality of data. The OMOP-CDM version of National Health Insurance Service-National Sample Cohort (NHIS-NSC) can be a valuable tool for multiple aspects of medical research by incorporation into the OHDSI research network.
Static versus dynamic sampling for data mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
John, G.H.; Langley, P.
1996-12-31
As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of data-mining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does not reduce the quality of the mined knowledge. To this end, we introduce the {open_quotes}Probably Close Enough{close_quotes} criterion to describe the desired properties of a sample. Sampling usually refers to the use of static statistical tests to decide whether a sample is sufficiently similar to the large database, in the absence of any knowledgemore » of the tools the data miner intends to use. We discuss dynamic sampling methods, which take into account the mining tool being used and can thus give better samples. We describe dynamic schemes that observe a mining tool`s performance on training samples of increasing size and use these results to determine when a sample is sufficiently large. We evaluate these sampling methods on data from the UCI repository and conclude that dynamic sampling is preferable.« less
VizieR Online Data Catalog: The orbits of Jupiter's irregular satellites (Brozovic+, 2017)
NASA Astrophysics Data System (ADS)
Brozovic, M.; Jacobson, R. A.
2018-05-01
The large majority of astrometric observations originate from Earth-based telescopes, although there are a handful of observations of Himalia and Callirrhoe from the New Horizons spacecraft flyby of Jupiter. The modern Hipparcos Catalog (Perryman et al. 1997A&A...323L..49P) based astrometry is reported as positions in the ICRF. We convert the older measurements to the ICRF positions. The references to optical observations up to the year 2000 are documented in Jacobson (2000AJ....120.2679J). We continued to use the Jacobson (2000AJ....120.2679J) observational biases for the early measurements. We have since extended the data set with observations published in the Minor Planet Electronic Circulars (MPEC), the International Astronomical Union Circulars (IAUC), the Natural Satellites Data Center (NSDC) database (Arlot & Emelyanov 2009A&A...503..631A), the United States Naval Observatory Flagstaff Station catalog, and the Pulkovo Observatory database. (5 data files).
Integrated database for rapid mass movements in Norway
NASA Astrophysics Data System (ADS)
Jaedicke, C.; Lied, K.; Kronholm, K.
2009-03-01
Rapid gravitational slope mass movements include all kinds of short term relocation of geological material, snow or ice. Traditionally, information about such events is collected separately in different databases covering selected geographical regions and types of movement. In Norway the terrain is susceptible to all types of rapid gravitational slope mass movements ranging from single rocks hitting roads and houses to large snow avalanches and rock slides where entire mountainsides collapse into fjords creating flood waves and endangering large areas. In addition, quick clay slides occur in desalinated marine sediments in South Eastern and Mid Norway. For the authorities and inhabitants of endangered areas, the type of threat is of minor importance and mitigation measures have to consider several types of rapid mass movements simultaneously. An integrated national database for all types of rapid mass movements built around individual events has been established. Only three data entries are mandatory: time, location and type of movement. The remaining optional parameters enable recording of detailed information about the terrain, materials involved and damages caused. Pictures, movies and other documentation can be uploaded into the database. A web-based graphical user interface has been developed allowing new events to be entered, as well as editing and querying for all events. An integration of the database into a GIS system is currently under development. Datasets from various national sources like the road authorities and the Geological Survey of Norway were imported into the database. Today, the database contains 33 000 rapid mass movement events from the last five hundred years covering the entire country. A first analysis of the data shows that the most frequent type of recorded rapid mass movement is rock slides and snow avalanches followed by debris slides in third place. Most events are recorded in the steep fjord terrain of the Norwegian west coast, but major events are recorded all over the country. Snow avalanches account for most fatalities, while large rock slides causing flood waves and huge quick clay slides are the most damaging individual events in terms of damage to infrastructure and property and for causing multiple fatalities. The quality of the data is strongly influenced by the personal engagement of local observers and varying observation routines. This database is a unique source for statistical analysis including, risk analysis and the relation between rapid mass movements and climate. The database of rapid mass movement events will also facilitate validation of national hazard and risk maps.
Determinants of Post-fire Water Quality in the Western United States
NASA Astrophysics Data System (ADS)
Rust, A.; Saxe, S.; Dolan, F.; Hogue, T. S.; McCray, J. E.
2015-12-01
Large wildfires are becoming increasingly common in the Western United States. Wildfires that consume greater than twenty percent of the watershed impact river water quality. The surface waters of the arid West are limited and in demand by the aquatic ecosystems, irrigated agriculture, and the region's growing human population. A range of studies, typically focused on individual fires, have observed mobilization of contaminants, nutrients (including nitrates), and sediments into receiving streams. Post-fire metal concentrations have also been observed to increase when fires were located in streams close to urban centers. The objective of this work was to assemble an extensive historical water quality database through data mining from federal, state and local agencies into a fire-database. Data from previous studies on individual fires by the co-authors was also included. The fire-database includes observations of water quality, discharge, geospatial and land characteristics from over 200 fire-impacted watersheds in the western U.S. since 1985. Water quality data from burn impacted watersheds was examined for trends in water quality response using statistical analysis. Watersheds where there was no change in water quality after fire were also examined to determine characteristics of the watershed that make it more resilient to fire. The ultimate goal is to evaluate trends in post-fire water quality response and identify key drivers of resiliency and post-fire response. The fire-database will eventually be publicly available.Large wildfires are becoming increasingly common in the Western United States. Wildfires that consume greater than twenty percent of the watershed impact river water quality. The surface waters of the arid West are limited and in demand by the aquatic ecosystems, irrigated agriculture, and the region's growing human population. A range of studies, typically focused on individual fires, have observed mobilization of contaminants, nutrients (including nitrates), and sediments into receiving streams. Post-fire metal concentrations have also been observed to increase when fires were located in streams close to urban centers. The objective of this work was to assemble an extensive historical water quality database through data mining from federal, state and local agencies into a fire-database. Data from previous studies on individual fires by the co-authors was also included. The fire-database includes observations of water quality, discharge, geospatial and land characteristics from over 200 fire-impacted watersheds in the western U.S. since 1985. Water quality data from burn impacted watersheds was examined for trends in water quality response using statistical analysis. Watersheds where there was no change in water quality after fire were also examined to determine characteristics of the watershed that make it more resilient to fire. The ultimate goal is to evaluate trends in post-fire water quality response and identify key drivers of resiliency and post-fire response. The fire-database will eventually be publicly available.
Electronic Catalog Of Extragalactic Objects
NASA Technical Reports Server (NTRS)
Helou, George; Madore, Barry F.
1993-01-01
NASA/IPAC Extragalactic Database (NED) is publicly accessible computerized catalog of published information about extragalactic observations. Developed to accommodate increasingly large sets of data from surveys, exponentially growing literature, and trend among astronomers to take multispectral approach to astrophysical problems. Accessible to researchers and librarians.
Surface Observation Climatic Summaries for Nellis AFB, Nevada
1992-05-01
DISTRIBUTION OF THIS DOMWI! TO THE PUBLIC AT LARGE, OR BY THE DEFENSE TECHNICAL IMKNMTI1M CENTER (DTIC) TO THE NATIOAL T•ECICRL INFO TION SERVICE (NTS). JOSEPH...DOCUMENTS FORMERLY KNOW AS THE REVISED UNIFON4 StlMMRRY OF SURFACE OBSERVATIONS (RUSSW) AND THE LIMITED SURFACE OBSERVATIONS CLIMATIC SWSU.R (LISOCS...RECORD (POR). -SUMMARY OF DAY- (SOD) INFOEATIOR IS SUMMARIZED )FRO ALL AVAILABLE DATA IN THE OL-A, USARETJC CLIMATIC DATABASE. 14. SUBJECT TOM
The EXOSAT database and archive
NASA Technical Reports Server (NTRS)
Reynolds, A. P.; Parmar, A. N.
1992-01-01
The EXOSAT database provides on-line access to the results and data products (spectra, images, and lightcurves) from the EXOSAT mission as well as access to data and logs from a number of other missions (such as EINSTEIN, COS-B, ROSAT, and IRAS). In addition, a number of familiar optical, infrared, and x ray catalogs, including the Hubble Space Telescope (HST) guide star catalog are available. The complete database is located at the EXOSAT observatory at ESTEC in the Netherlands and is accessible remotely via a captive account. The database management system was specifically developed to efficiently access the database and to allow the user to perform statistical studies on large samples of astronomical objects as well as to retrieve scientific and bibliographic information on single sources. The system was designed to be mission independent and includes timing, image processing, and spectral analysis packages as well as software to allow the easy transfer of analysis results and products to the user's own institute. The archive at ESTEC comprises a subset of the EXOSAT observations, stored on magnetic tape. Observations of particular interest were copied in compressed format to an optical jukebox, allowing users to retrieve and analyze selected raw data entirely from their terminals. Such analysis may be necessary if the user's needs are not accommodated by the products contained in the database (in terms of time resolution, spectral range, and the finesse of the background subtraction, for instance). Long-term archiving of the full final observation data is taking place at ESRIN in Italy as part of the ESIS program, again using optical media, and ESRIN have now assumed responsibility for distributing the data to the community. Tests showed that raw observational data (typically several tens of megabytes for a single target) can be transferred via the existing networks in reasonable time.
DEXTER: Disease-Expression Relation Extraction from Text.
Gupta, Samir; Dingerdissen, Hayley; Ross, Karen E; Hu, Yu; Wu, Cathy H; Mazumder, Raja; Vijay-Shanker, K
2018-01-01
Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.
NASA Astrophysics Data System (ADS)
Fontaine, Alain; Sauvage, Bastien; Pétetin, Hervé; Auby, Antoine; Boulanger, Damien; Thouret, Valerie
2016-04-01
Since 1994, the IAGOS program (In-Service Aircraft for a Global Observing System http://www.iagos.org) and its predecessor MOZAIC has produced in-situ measurements of the atmospheric composition during more than 46000 commercial aircraft flights. In order to help analyzing these observations and further understanding the processes driving their evolution, we developed a modelling tool SOFT-IO quantifying their source/receptor link. We improved the methodology used by Stohl et al. (2003), based on the FLEXPART plume dispersion model, to simulate the contributions of anthropogenic and biomass burning emissions from the ECCAD database (http://eccad.aeris-data.fr) to the measured carbon monoxide mixing ratio along each IAGOS flight. Thanks to automated processes, contributions are simulated for the last 20 days before observation, separating individual contributions from the different source regions. The main goal is to supply add-value products to the IAGOS database showing pollutants geographical origin and emission type. Using this information, it may be possible to link trends in the atmospheric composition to changes in the transport pathways and to the evolution of emissions. This tool could be used for statistical validation as well as for inter-comparisons of emission inventories using large amounts of data, as Lagrangian models are able to bring the global scale emissions down to a smaller scale, where they can be directly compared to the in-situ observations from the IAGOS database.
Design considerations, architecture, and use of the Mini-Sentinel distributed data system.
Curtis, Lesley H; Weiner, Mark G; Boudreau, Denise M; Cooper, William O; Daniel, Gregory W; Nair, Vinit P; Raebel, Marsha A; Beaulieu, Nicolas U; Rosofsky, Robert; Woodworth, Tiffany S; Brown, Jeffrey S
2012-01-01
We describe the design, implementation, and use of a large, multiorganizational distributed database developed to support the Mini-Sentinel Pilot Program of the US Food and Drug Administration (FDA). As envisioned by the US FDA, this implementation will inform and facilitate the development of an active surveillance system for monitoring the safety of medical products (drugs, biologics, and devices) in the USA. A common data model was designed to address the priorities of the Mini-Sentinel Pilot and to leverage the experience and data of participating organizations and data partners. A review of existing common data models informed the process. Each participating organization designed a process to extract, transform, and load its source data, applying the common data model to create the Mini-Sentinel Distributed Database. Transformed data were characterized and evaluated using a series of programs developed centrally and executed locally by participating organizations. A secure communications portal was designed to facilitate queries of the Mini-Sentinel Distributed Database and transfer of confidential data, analytic tools were developed to facilitate rapid response to common questions, and distributed querying software was implemented to facilitate rapid querying of summary data. As of July 2011, information on 99,260,976 health plan members was included in the Mini-Sentinel Distributed Database. The database includes 316,009,067 person-years of observation time, with members contributing, on average, 27.0 months of observation time. All data partners have successfully executed distributed code and returned findings to the Mini-Sentinel Operations Center. This work demonstrates the feasibility of building a large, multiorganizational distributed data system in which organizations retain possession of their data that are used in an active surveillance system. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Kidd, Chris; Matsui, Toshi; Chern, Jiundar; Mohr, Karen; Kummerow, Christian; Randel, Dave
2015-01-01
The estimation of precipitation across the globe from satellite sensors provides a key resource in the observation and understanding of our climate system. Estimates from all pertinent satellite observations are critical in providing the necessary temporal sampling. However, consistency in these estimates from instruments with different frequencies and resolutions is critical. This paper details the physically based retrieval scheme to estimate precipitation from cross-track (XT) passive microwave (PM) sensors on board the constellation satellites of the Global Precipitation Measurement (GPM) mission. Here the Goddard profiling algorithm (GPROF), a physically based Bayesian scheme developed for conically scanning (CS) sensors, is adapted for use with XT PM sensors. The present XT GPROF scheme utilizes a model-generated database to overcome issues encountered with an observational database as used by the CS scheme. The model database ensures greater consistency across meteorological regimes and surface types by providing a more comprehensive set of precipitation profiles. The database is corrected for bias against the CS database to ensure consistency in the final product. Statistical comparisons over western Europe and the United States show that the XT GPROF estimates are comparable with those from the CS scheme. Indeed, the XT estimates have higher correlations against surface radar data, while maintaining similar root-mean-square errors. Latitudinal profiles of precipitation show the XT estimates are generally comparable with the CS estimates, although in the southern midlatitudes the peak precipitation is shifted equatorward while over the Arctic large differences are seen between the XT and the CS retrievals.
LOINC, a universal standard for identifying laboratory observations: a 5-year update.
McDonald, Clement J; Huff, Stanley M; Suico, Jeffrey G; Hill, Gilbert; Leavelle, Dennis; Aller, Raymond; Forrey, Arden; Mercer, Kathy; DeMoor, Georges; Hook, John; Williams, Warren; Case, James; Maloney, Pat
2003-04-01
The Logical Observation Identifier Names and Codes (LOINC) database provides a universal code system for reporting laboratory and other clinical observations. Its purpose is to identify observations in electronic messages such as Health Level Seven (HL7) observation messages, so that when hospitals, health maintenance organizations, pharmaceutical manufacturers, researchers, and public health departments receive such messages from multiple sources, they can automatically file the results in the right slots of their medical records, research, and/or public health systems. For each observation, the database includes a code (of which 25 000 are laboratory test observations), a long formal name, a "short" 30-character name, and synonyms. The database comes with a mapping program called Regenstrief LOINC Mapping Assistant (RELMA(TM)) to assist the mapping of local test codes to LOINC codes and to facilitate browsing of the LOINC results. Both LOINC and RELMA are available at no cost from http://www.regenstrief.org/loinc/. The LOINC medical database carries records for >30 000 different observations. LOINC codes are being used by large reference laboratories and federal agencies, e.g., the CDC and the Department of Veterans Affairs, and are part of the Health Insurance Portability and Accountability Act (HIPAA) attachment proposal. Internationally, they have been adopted in Switzerland, Hong Kong, Australia, and Canada, and by the German national standards organization, the Deutsches Instituts für Normung. Laboratories should include LOINC codes in their outbound HL7 messages so that clinical and research clients can easily integrate these results into their clinical and research repositories. Laboratories should also encourage instrument vendors to deliver LOINC codes in their instrument outputs and demand LOINC codes in HL7 messages they get from reference laboratories to avoid the need to lump so many referral tests under the "send out lab" code.
Capturing the Petermann Ice Island Flux With the CI2D3 Database
NASA Astrophysics Data System (ADS)
Crawford, A. J.; Crocker, G.; Mueller, D.; Saper, R.; Desjardins, L.; Carrieres, T.
2017-12-01
The Petermann Glacier ice tongue lost >460 km2 of areal extent ( 38 Gt of mass) due to three large calving events in 2008, 2010 and 2012, as well as three previously unrecorded events in 2011 and 2012. Hundreds of ice islands subsequently drifted south between Hall Basin and Newfoundland's Grand Banks, but no systematic data collection or analysis has been conducted for the full flux of fragments prior to the present study. To accomplish this, the Canadian Ice Service's extensive RADARSAT-1 and -2 synthetic aperture radar image archive was mined to create the Canadian Ice Island Drift, Deterioration and Detection (CI2D3) Database. Over 15000 fragments have been digitized in GIS software from 3200 SAR scenes. A unique characteristic of the database is the inclusion of the lineage (i.e., connecting repeat observations or mother-daughter fragments) for all tracked fragments with areas >0.25 km2. This genealogical information was used to isolate ice islands that were about to fracture in order to assess the environmental conditions and morphological characteristics that influence this deterioration mechanism. Fracture counts showed a significant relationship with sea ice concentration (r = -0.56). However, variations in relative thickness played a large role in fracturing likelihood regardless of sea ice conditions. The exceedance probability of the daughter fragment length was calculated, as is often conducted for offshore industry hazard assessment. Grounded ice islands, which are hazards to seafloor installations and disturb benthic ecology, were recognized from their negligible drift speeds and two grounding hot-spots were identified along the Coburg and eastern Baffin island coasts. Petermann ice islands have been noted to drift along specific isobaths due to the influence of bathymetry on ocean currents. 50% of observations occurred between the 100 and 300 m isobaths, and smaller ice islands were observed more frequently in deeper regions. The CI2D3 Database can be utilized for the development of operational models and remote sensing tools for ice island detection, as well as assessing the distribution of Greenland Ice Sheet freshwater. The database will contribute to the study of these large, tabular icebergs that are anticipated to continue calving in both Polar Regions, including at the Petermann Glacier.
Six and Three-Hourly Meteorological Observations From 223 Former U.S.S.R. Stations (NPD-048)
Razuvaev, V. N. [All-Russian Research Institute of Hydrometeorological Information, World Data Center, Russia; Apasova, E. B. [All-Russian Research Institute of Hydrometeorological Information, World Data Center, Russia; Martuganov, R. A. [All-Russian Research Institute of Hydrometeorological Information, World Data Center, Russia; Kaiser, D. P. [CDIAC, Oak Ridge National Laboratory; Marino, G. P. [CDIAC, Oak Ridge National Laboratory
2007-11-01
This database contains 6- and 3-hourly meteorological observations from a 223-station network of the former Soviet Union. These data have been made available through cooperation between the two principal climate data centers of the United States and Russia: the National Climatic Data Center (NCDC), in Asheville, North Carolina, and the All-Russian Research Institute of Hydrometeorological Information-World Data Centre (RIHMI-WDC) in Obninsk, Russia. The first version of this database extended through the mid-1980s (ending year dependent upon station) and was made available in 1995 by the Carbon Dioxide Information Analysis Center (CDIAC) as NDP-048. A second version of the database extended the data records through 1990. This third, and current version of the database includes data through 2000 for over half of the stations (mainly for Russia), whereas the remainder of the stations have records extending through various years of the 1990s. Because of the break up of the Soviet Union in 1991, and since RIHMI-WDC is a Russian institution, only Russain stations are generally available through 2000. The non-Russian station records in this database typically extend through 1991. Station records consist of 6- and 3-hourly observations of some 24 meteorological variables including temperature, past and present weather type, precipitation amount, cloud amount and type, sea level pressure, relative humidity, and wind direction and speed. The 6-hourly observations extend from 1936 through 1965; the 3-hourly observations extend from 1966 through 2000 (or through the latest year available). These data have undergone extensive quality assurance checks by RIHMI-WDC, NCDC, and CDIAC. The database represents a wealth of meteorological information for a large and climatologically important portion of the earth's land area, and should prove extremely useful for a wide variety of regional climate change studies.
Matching CCD images to a stellar catalog using locality-sensitive hashing
NASA Astrophysics Data System (ADS)
Liu, Bo; Yu, Jia-Zong; Peng, Qing-Yu
2018-02-01
The usage of a subset of observed stars in a CCD image to find their corresponding matched stars in a stellar catalog is an important issue in astronomical research. Subgraph isomorphic-based algorithms are the most widely used methods in star catalog matching. When more subgraph features are provided, the CCD images are recognized better. However, when the navigation feature database is large, the method requires more time to match the observing model. To solve this problem, this study investigates further and improves subgraph isomorphic matching algorithms. We present an algorithm based on a locality-sensitive hashing technique, which allocates quadrilateral models in the navigation feature database into different hash buckets and reduces the search range to the bucket in which the observed quadrilateral model is located. Experimental results indicate the effectivity of our method.
ECG-ViEW II, a freely accessible electrocardiogram database
Park, Man Young; Lee, Sukhoon; Jeon, Min Seok; Yoon, Dukyong; Park, Rae Woong
2017-01-01
The Electrocardiogram Vigilance with Electronic data Warehouse II (ECG-ViEW II) is a large, single-center database comprising numeric parameter data of the surface electrocardiograms of all patients who underwent testing from 1 June 1994 to 31 July 2013. The electrocardiographic data include the test date, clinical department, RR interval, PR interval, QRS duration, QT interval, QTc interval, P axis, QRS axis, and T axis. These data are connected with patient age, sex, ethnicity, comorbidities, age-adjusted Charlson comorbidity index, prescribed drugs, and electrolyte levels. This longitudinal observational database contains 979,273 electrocardiograms from 461,178 patients over a 19-year study period. This database can provide an opportunity to study electrocardiographic changes caused by medications, disease, or other demographic variables. ECG-ViEW II is freely available at http://www.ecgview.org. PMID:28437484
VizieR Online Data Catalog: Galaxies in Hercules-Bootes region (Karachentsev+, 2017)
NASA Astrophysics Data System (ADS)
Karachentsev, I. D.; Kashibadze, O. G.; Karachentseva, V. E.
2017-04-01
The table contains original observational data on 412 galaxies in the Hercules-Bootes region with radial velocities of VLG<2500km/s. The main source of data is the NASA Extragalactic Database (NED) with additions from the HyperLEDA Database. Each object with a radial velocity estimate was visually inspected, and a large number of false "galaxies" with radial velocities of around zero was discarded. For many galaxies, we have refined the morphological types and integral B-magnitudes. The resulting sample includes 181 galaxies with individual distance estimates. (1 data file).
NASA Astrophysics Data System (ADS)
Endres, Christian P.; Schlemmer, Stephan; Schilke, Peter; Stutzki, Jürgen; Müller, Holger S. P.
2016-09-01
The Cologne Database for Molecular Spectroscopy, CDMS, was founded 1998 to provide in its catalog section line lists of mostly molecular species which are or may be observed in various astronomical sources (usually) by radio astronomical means. The line lists contain transition frequencies with qualified accuracies, intensities, quantum numbers, as well as further auxiliary information. They have been generated from critically evaluated experimental line lists, mostly from laboratory experiments, employing established Hamiltonian models. Separate entries exist for different isotopic species and usually also for different vibrational states. As of December 2015, the number of entries is 792. They are available online as ascii tables with additional files documenting information on the entries. The Virtual Atomic and Molecular Data Centre, VAMDC, was founded more than 5 years ago as a common platform for atomic and molecular data. This platform facilitates exchange not only between spectroscopic databases related to astrophysics or astrochemistry, but also with collisional and kinetic databases. A dedicated infrastructure was developed to provide a common data format in the various databases enabling queries to a large variety of databases on atomic and molecular data at once. For CDMS, the incorporation in VAMDC was combined with several modifications on the generation of CDMS catalog entries. Here we introduce related changes to the data structure and the data content in the CDMS. The new data scheme allows us to incorporate all previous data entries but in addition allows us also to include entries based on new theoretical descriptions. Moreover, the CDMS entries have been transferred into a mySQL database format. These developments within the VAMDC framework have in part been driven by the needs of the astronomical community to be able to deal efficiently with large data sets obtained with the Herschel Space Telescope or, more recently, with the Atacama Large Millimeter Array.
The TREAT-NMD DMD Global Database: Analysis of More than 7,000 Duchenne Muscular Dystrophy Mutations
Bladen, Catherine L; Salgado, David; Monges, Soledad; Foncuberta, Maria E; Kekou, Kyriaki; Kosma, Konstantina; Dawkins, Hugh; Lamont, Leanne; Roy, Anna J; Chamova, Teodora; Guergueltcheva, Velina; Chan, Sophelia; Korngut, Lawrence; Campbell, Craig; Dai, Yi; Wang, Jen; Barišić, Nina; Brabec, Petr; Lahdetie, Jaana; Walter, Maggie C; Schreiber-Katz, Olivia; Karcagi, Veronika; Garami, Marta; Viswanathan, Venkatarman; Bayat, Farhad; Buccella, Filippo; Kimura, En; Koeks, Zaïda; van den Bergen, Janneke C; Rodrigues, Miriam; Roxburgh, Richard; Lusakowska, Anna; Kostera-Pruszczyk, Anna; Zimowski, Janusz; Santos, Rosário; Neagu, Elena; Artemieva, Svetlana; Rasic, Vedrana Milic; Vojinovic, Dina; Posada, Manuel; Bloetzer, Clemens; Jeannet, Pierre-Yves; Joncourt, Franziska; Díaz-Manera, Jordi; Gallardo, Eduard; Karaduman, A Ayşe; Topaloğlu, Haluk; El Sherif, Rasha; Stringer, Angela; Shatillo, Andriy V; Martin, Ann S; Peay, Holly L; Bellgard, Matthew I; Kirschner, Jan; Flanigan, Kevin M; Straub, Volker; Bushby, Kate; Verschuuren, Jan; Aartsma-Rus, Annemieke; Béroud, Christophe; Lochmüller, Hanns
2015-01-01
Analyzing the type and frequency of patient-specific mutations that give rise to Duchenne muscular dystrophy (DMD) is an invaluable tool for diagnostics, basic scientific research, trial planning, and improved clinical care. Locus-specific databases allow for the collection, organization, storage, and analysis of genetic variants of disease. Here, we describe the development and analysis of the TREAT-NMD DMD Global database (http://umd.be/TREAT_DMD/). We analyzed genetic data for 7,149 DMD mutations held within the database. A total of 5,682 large mutations were observed (80% of total mutations), of which 4,894 (86%) were deletions (1 exon or larger) and 784 (14%) were duplications (1 exon or larger). There were 1,445 small mutations (smaller than 1 exon, 20% of all mutations), of which 358 (25%) were small deletions and 132 (9%) small insertions and 199 (14%) affected the splice sites. Point mutations totalled 756 (52% of small mutations) with 726 (50%) nonsense mutations and 30 (2%) missense mutations. Finally, 22 (0.3%) mid-intronic mutations were observed. In addition, mutations were identified within the database that would potentially benefit from novel genetic therapies for DMD including stop codon read-through therapies (10% of total mutations) and exon skipping therapy (80% of deletions and 55% of total mutations). PMID:25604253
Akiyama, Kenji; Kurotani, Atsushi; Iida, Kei; Kuromori, Takashi; Shinozaki, Kazuo; Sakurai, Tetsuya
2014-01-01
Arabidopsis thaliana is one of the most popular experimental plants. However, only 40% of its genes have at least one experimental Gene Ontology (GO) annotation assigned. Systematic observation of mutant phenotypes is an important technique for elucidating gene functions. Indeed, several large-scale phenotypic analyses have been performed and have generated phenotypic data sets from many Arabidopsis mutant lines and overexpressing lines, which are freely available online. Since each Arabidopsis mutant line database uses individual phenotype expression, the differences in the structured term sets used by each database make it difficult to compare data sets and make it impossible to search across databases. Therefore, we obtained publicly available information for a total of 66,209 Arabidopsis mutant lines, including loss-of-function (RATM and TARAPPER) and gain-of-function (AtFOX and OsFOX) lines, and integrated the phenotype data by mapping the descriptions onto Plant Ontology (PO) and Phenotypic Quality Ontology (PATO) terms. This approach made it possible to manage the four different phenotype databases as one large data set. Here, we report a publicly accessible web-based database, the RIKEN Arabidopsis Genome Encyclopedia II (RARGE II; http://rarge-v2.psc.riken.jp/), in which all of the data described in this study are included. Using the database, we demonstrated consistency (in terms of protein function) with a previous study and identified the presumed function of an unknown gene. We provide examples of AT1G21600, which is a subunit in the plastid-encoded RNA polymerase complex, and AT5G56980, which is related to the jasmonic acid signaling pathway.
Advances in Satellite Microwave Precipitation Retrieval Algorithms Over Land
NASA Astrophysics Data System (ADS)
Wang, N. Y.; You, Y.; Ferraro, R. R.
2015-12-01
Precipitation plays a key role in the earth's climate system, particularly in the aspect of its water and energy balance. Satellite microwave (MW) observations of precipitation provide a viable mean to achieve global measurement of precipitation with sufficient sampling density and accuracy. However, accurate precipitation information over land from satellite MW is a challenging problem. The Goddard Profiling Algorithm (GPROF) algorithm for the Global Precipitation Measurement (GPM) is built around the Bayesian formulation (Evans et al., 1995; Kummerow et al., 1996). GPROF uses the likelihood function and the prior probability distribution function to calculate the expected value of precipitation rate, given the observed brightness temperatures. It is particularly convenient to draw samples from a prior PDF from a predefined database of observations or models. GPROF algorithm does not search all database entries but only the subset thought to correspond to the actual observation. The GPM GPROF V1 database focuses on stratification by surface emissivity class, land surface temperature and total precipitable water. However, there is much uncertainty as to what is the optimal information needed to subset the database for different conditions. To this end, we conduct a database stratification study of using National Mosaic and Multi-Sensor Quantitative Precipitation Estimation, Special Sensor Microwave Imager/Sounder (SSMIS) and Advanced Technology Microwave Sounder (ATMS) and reanalysis data from Modern-Era Retrospective Analysis for Research and Applications (MERRA). Our database study (You et al., 2015) shows that environmental factors such as surface elevation, relative humidity, and storm vertical structure and height, and ice thickness can help in stratifying a single large database to smaller and more homogeneous subsets, in which the surface condition and precipitation vertical profiles are similar. It is found that the probability of detection (POD) increases about 8% and 12% by using stratified databases for rainfall and snowfall detection, respectively. In addition, by considering the relative humidity at lower troposphere and the vertical velocity at 700 hPa in the precipitation detection process, the POD for snowfall detection is further increased by 20.4% from 56.0% to 76.4%.
NASA Astrophysics Data System (ADS)
Nakagawa, Y.; Kawahara, S.; Araki, F.; Matsuoka, D.; Ishikawa, Y.; Fujita, M.; Sugimoto, S.; Okada, Y.; Kawazoe, S.; Watanabe, S.; Ishii, M.; Mizuta, R.; Murata, A.; Kawase, H.
2017-12-01
Analyses of large ensemble data are quite useful in order to produce probabilistic effect projection of climate change. Ensemble data of "+2K future climate simulations" are currently produced by Japanese national project "Social Implementation Program on Climate Change Adaptation Technology (SI-CAT)" as a part of a database for Policy Decision making for Future climate change (d4PDF; Mizuta et al. 2016) produced by Program for Risk Information on Climate Change. Those data consist of global warming simulations and regional downscaling simulations. Considering that those data volumes are too large (a few petabyte) to download to a local computer of users, a user-friendly system is required to search and download data which satisfy requests of the users. We develop "a database system for near-future climate change projections" for providing functions to find necessary data for the users under SI-CAT. The database system for near-future climate change projections mainly consists of a relational database, a data download function and user interface. The relational database using PostgreSQL is a key function among them. Temporally and spatially compressed data are registered on the relational database. As a first step, we develop the relational database for precipitation, temperature and track data of typhoon according to requests by SI-CAT members. The data download function using Open-source Project for a Network Data Access Protocol (OPeNDAP) provides a function to download temporally and spatially extracted data based on search results obtained by the relational database. We also develop the web-based user interface for using the relational database and the data download function. A prototype of the database system for near-future climate change projections are currently in operational test on our local server. The database system for near-future climate change projections will be released on Data Integration and Analysis System Program (DIAS) in fiscal year 2017. Techniques of the database system for near-future climate change projections might be quite useful for simulation and observational data in other research fields. We report current status of development and some case studies of the database system for near-future climate change projections.
Very large database of lipids: rationale and design.
Martin, Seth S; Blaha, Michael J; Toth, Peter P; Joshi, Parag H; McEvoy, John W; Ahmed, Haitham M; Elshazly, Mohamed B; Swiger, Kristopher J; Michos, Erin D; Kwiterovich, Peter O; Kulkarni, Krishnaji R; Chimera, Joseph; Cannon, Christopher P; Blumenthal, Roger S; Jones, Steven R
2013-11-01
Blood lipids have major cardiovascular and public health implications. Lipid-lowering drugs are prescribed based in part on categorization of patients into normal or abnormal lipid metabolism, yet relatively little emphasis has been placed on: (1) the accuracy of current lipid measures used in clinical practice, (2) the reliability of current categorizations of dyslipidemia states, and (3) the relationship of advanced lipid characterization to other cardiovascular disease biomarkers. To these ends, we developed the Very Large Database of Lipids (NCT01698489), an ongoing database protocol that harnesses deidentified data from the daily operations of a commercial lipid laboratory. The database includes individuals who were referred for clinical purposes for a Vertical Auto Profile (Atherotech Inc., Birmingham, AL), which directly measures cholesterol concentrations of low-density lipoprotein, very low-density lipoprotein, intermediate-density lipoprotein, high-density lipoprotein, their subclasses, and lipoprotein(a). Individual Very Large Database of Lipids studies, ranging from studies of measurement accuracy, to dyslipidemia categorization, to biomarker associations, to characterization of rare lipid disorders, are investigator-initiated and utilize peer-reviewed statistical analysis plans to address a priori hypotheses/aims. In the first database harvest (Very Large Database of Lipids 1.0) from 2009 to 2011, there were 1 340 614 adult and 10 294 pediatric patients; the adult sample had a median age of 59 years (interquartile range, 49-70 years) with even representation by sex. Lipid distributions closely matched those from the population-representative National Health and Nutrition Examination Survey. The second harvest of the database (Very Large Database of Lipids 2.0) is underway. Overall, the Very Large Database of Lipids database provides an opportunity for collaboration and new knowledge generation through careful examination of granular lipid data on a large scale. © 2013 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Sobue, Shin-ichi; Yoshida, Fumiyoshi; Ochiai, Osamu
1996-01-01
NASDA's new Advanced Earth Observing Satellite (ADEOS) is scheduled for launch in August, 1996. ADEOS carries 8 sensors to observe earth environmental phenomena and sends their data to NASDA, NASA, and other foreign ground stations around the world. The downlink data bit rate for ADEOS is 126 MB/s and the total volume of data is about 100 GB per day. To archive and manage such a large quantity of data with high reliability and easy accessibility it was necessary to develop a new mass storage system with a catalogue information database using advanced database management technology. The data will be archived and maintained in the Master Data Storage Subsystem (MDSS) which is one subsystem in NASDA's new Earth Observation data and Information System (EOIS). The MDSS is based on a SONY ID1 digital tape robotics system. This paper provides an overview of the EOIS system, with a focus on the Master Data Storage Subsystem and the NASDA Earth Observation Center (EOC) archive policy for earth observation satellite data.
Dankar, Fida K; Ptitsyn, Andrey; Dankar, Samar K
2018-04-10
Contemporary biomedical databases include a wide range of information types from various observational and instrumental sources. Among the most important features that unite biomedical databases across the field are high volume of information and high potential to cause damage through data corruption, loss of performance, and loss of patient privacy. Thus, issues of data governance and privacy protection are essential for the construction of data depositories for biomedical research and healthcare. In this paper, we discuss various challenges of data governance in the context of population genome projects. The various challenges along with best practices and current research efforts are discussed through the steps of data collection, storage, sharing, analysis, and knowledge dissemination.
A BRDF-BPDF database for the analysis of Earth target reflectances
NASA Astrophysics Data System (ADS)
Breon, Francois-Marie; Maignan, Fabienne
2017-01-01
Land surface reflectance is not isotropic. It varies with the observation geometry that is defined by the sun, view zenith angles, and the relative azimuth. In addition, the reflectance is linearly polarized. The reflectance anisotropy is quantified by the bidirectional reflectance distribution function (BRDF), while its polarization properties are defined by the bidirectional polarization distribution function (BPDF). The POLDER radiometer that flew onboard the PARASOL microsatellite remains the only space instrument that measured numerous samples of the BRDF and BPDF of Earth targets. Here, we describe a database of representative BRDFs and BPDFs derived from the POLDER measurements. From the huge number of data acquired by the spaceborne instrument over a period of 7 years, we selected a set of targets with high-quality observations. The selection aimed for a large number of observations, free of significant cloud or aerosol contamination, acquired in diverse observation geometries with a focus on the backscatter direction that shows the specific hot spot signature. The targets are sorted according to the 16-class International Geosphere-Biosphere Programme (IGBP) land cover classification system, and the target selection aims at a spatial representativeness within the class. The database thus provides a set of high-quality BRDF and BPDF samples that can be used to assess the typical variability of natural surface reflectances or to evaluate models. It is available freely from the PANGAEA website (doi:10.1594/PANGAEA.864090). In addition to the database, we provide a visualization and analysis tool based on the Interactive Data Language (IDL). It allows an interactive analysis of the measurements and a comparison against various BRDF and BPDF analytical models. The present paper describes the input data, the selection principles, the database format, and the analysis tool
A Qualitative Study of Resident Learning in Ambulatory Clinic
ERIC Educational Resources Information Center
Smith, C. Scott; Morris, Magdalena; Francovich, Chris; Hill, William; Gieselman, Janet
2004-01-01
Qualitative analysis of a large ethnographic database from observations of a resident teaching clinic revealed three important findings. The first finding was that breakdown, a situation where an "actor" (such as a person or the group) is not achieving expected effectiveness, was the most important category because of its frequency and explanatory…
Twenty Years of Work with Janet Mattei on Cataclysmic Variables
NASA Astrophysics Data System (ADS)
Szkody, P.
2005-08-01
Janet Mattei and the AAVSO database have had a large impact on the field of cataclysmic variables, especially in the areas of outburst light curves of dwarf novae and ground-based support of space observations. A summary of some of the major results from AAVSO data during the last 20 years is presented.
A Naive Bayes machine learning approach to risk prediction using censored, time-to-event data.
Wolfson, Julian; Bandyopadhyay, Sunayan; Elidrisi, Mohamed; Vazquez-Benitez, Gabriela; Vock, David M; Musgrove, Donald; Adomavicius, Gediminas; Johnson, Paul E; O'Connor, Patrick J
2015-09-20
Predicting an individual's risk of experiencing a future clinical outcome is a statistical task with important consequences for both practicing clinicians and public health experts. Modern observational databases such as electronic health records provide an alternative to the longitudinal cohort studies traditionally used to construct risk models, bringing with them both opportunities and challenges. Large sample sizes and detailed covariate histories enable the use of sophisticated machine learning techniques to uncover complex associations and interactions, but observational databases are often 'messy', with high levels of missing data and incomplete patient follow-up. In this paper, we propose an adaptation of the well-known Naive Bayes machine learning approach to time-to-event outcomes subject to censoring. We compare the predictive performance of our method with the Cox proportional hazards model which is commonly used for risk prediction in healthcare populations, and illustrate its application to prediction of cardiovascular risk using an electronic health record dataset from a large Midwest integrated healthcare system. Copyright © 2015 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Setyawan, Wahyu; Nandipati, Giridhar; Roche, Kenneth J.
Molecular dynamics simulations have been used to generate a comprehensive database of surviving defects due to displacement cascades in bulk tungsten. Twenty-one data points of primary knock-on atom (PKA) energies ranging from 100 eV (sub-threshold energy) to 100 keV (~780 × Ed, where Ed = 128 eV is the average displacement threshold energy) have been completed at 300 K, 1025 K and 2050 K. Within this range of PKA energies, two regimes of power-law energy-dependence of the defect production are observed. A distinct power-law exponent characterizes the number of Frenkel pairs produced within each regime. The two regimes intersect atmore » a transition energy which occurs at approximately 250 × Ed. The transition energy also marks the onset of the formation of large self-interstitial atom (SIA) clusters (size 14 or more). The observed defect clustering behavior is asymmetric, with SIA clustering increasing with temperature, while the vacancy clustering decreases. This asymmetry increases with temperature such that at 2050 K (~0.5 Tm) practically no large vacancy clusters are formed, meanwhile large SIA clusters appear in all simulations. The implication of such asymmetry on the long-term defect survival and damage accumulation is discussed. In addition, <100> {110} SIA loops are observed to form directly in the highest energy cascades, while vacancy <100> loops are observed to form at the lowest temperature and highest PKA energies, although the appearance of both the vacancy and SIA loops with Burgers vector of <100> type is relatively rare.« less
The History and Legacy of BATSE
NASA Technical Reports Server (NTRS)
Fishman, Gerald J.
2012-01-01
The BATSE experiment on the Compton Gamma-ray Observatory was the first large detector system specifically designed for the study of gamma-ray bursts. The eight large-area detectors allowed full-sky coverage and were optimized to operate in the energy region of the peak emission of most GRBs. BATSE provided detailed observations of the temporal and spectral characteristics of large samples of GRBs, and it was the first experiment to provide rapid notifications of the coarse location of many them. It also provided strong evidence for the cosmological distances to GRBs through the observation of the sky distribution and intensity distribution of numerous GRBs. The large number of GRBs observed with the high- sensitivity BATSE detectors continues to provide a database of GRB spectral and temporal properties in the primary energy range of GRB emission that will likely not be exceeded for at least another decade. The origin and development of the BATSE experiment, some highlights from the mission and its continuing legacy are described in this paper.
Video quality pooling adaptive to perceptual distortion severity.
Park, Jincheol; Seshadrinathan, Kalpana; Lee, Sanghoon; Bovik, Alan Conrad
2013-02-01
It is generally recognized that severe video distortions that are transient in space and/or time have a large effect on overall perceived video quality. In order to understand this phenomena, we study the distribution of spatio-temporally local quality scores obtained from several video quality assessment (VQA) algorithms on videos suffering from compression and lossy transmission over communication channels. We propose a content adaptive spatial and temporal pooling strategy based on the observed distribution. Our method adaptively emphasizes "worst" scores along both the spatial and temporal dimensions of a video sequence and also considers the perceptual effect of large-area cohesive motion flow such as egomotion. We demonstrate the efficacy of the method by testing it using three different VQA algorithms on the LIVE Video Quality database and the EPFL-PoliMI video quality database.
Database Dictionary for Ethiopian National Ground-Water DAtabase (ENGDA) Data Fields
Kuniansky, Eve L.; Litke, David W.; Tucci, Patrick
2007-01-01
Introduction This document describes the data fields that are used for both field forms and the Ethiopian National Ground-water Database (ENGDA) tables associated with information stored about production wells, springs, test holes, test wells, and water level or water-quality observation wells. Several different words are used in this database dictionary and in the ENGDA database to describe a narrow shaft constructed in the ground. The most general term is borehole, which is applicable to any type of hole. A well is a borehole specifically constructed to extract water from the ground; however, for this data dictionary and for the ENGDA database, the words well and borehole are used interchangeably. A production well is defined as any well used for water supply and includes hand-dug wells, small-diameter bored wells equipped with hand pumps, or large-diameter bored wells equipped with large-capacity motorized pumps. Test holes are borings made to collect information about the subsurface with continuous core or non-continuous core and/or where geophysical logs are collected. Test holes are not converted into wells. A test well is a well constructed for hydraulic testing of an aquifer in order to plan a larger ground-water production system. A water-level or water-quality observation well is a well that is used to collect information about an aquifer and not used for water supply. A spring is any naturally flowing, local, ground-water discharge site. The database dictionary is designed to help define all fields on both field data collection forms (provided in attachment 2 of this report) and for the ENGDA software screen entry forms (described in Litke, 2007). The data entered into each screen entry field are stored in relational database tables within the computer database. The organization of the database dictionary is designed based on field data collection and the field forms, because this is what the majority of people will use. After each field, however, the ENGDA database field name and relational database table is designated; along with the ENGDA screen entry form(s) and the ENGDA field form (attachment 2). The database dictionary is separated into sections. The first section, Basic Site Data Fields, describes the basic site information that is similar for all of the different types of sites. The remaining sections may be applicable for only one type of site; for example, the Well Drilling and Construction Data Fields and Lithologic Description Data Fields are applicable to boreholes and not to springs. Attachment 1 contains a table for conversion from English to metric units. Attachment 2 contains selected field forms used in conjunction with ENGDA. A separate document, 'Users Reference Manual for the Ethiopian National Ground-Water DAtabase (ENGDA),' by David W. Litke was developed as a users guide for the computer database and screen entry. This database dictionary serves as a reference for both the field forms and the computer database. Every effort has been made to have identical field names between the field forms and the screen entry forms in order to avoid confusion.
Cross-Matching Source Observations from the Palomar Transient Factory (PTF)
NASA Astrophysics Data System (ADS)
Laher, Russ; Grillmair, C.; Surace, J.; Monkewitz, S.; Jackson, E.
2009-01-01
Over the four-year lifetime of the PTF project, approximately 40 billion instances of astronomical-source observations will be extracted from the image data. The instances will correspond to the same astronomical objects being observed at roughly 25-50 different times, and so a very large catalog containing important object-variability information will be the chief PTF product. Organizing astronomical-source catalogs is conventionally done by dividing the catalog into declination zones and sorting by right ascension within each zone (e.g., the USNOA star catalog), in order to facilitate catalog searches. This method was reincarnated as the "zones" algorithm in a SQL-Server database implementation (Szalay et al., MSR-TR-2004-32), with corrections given by Gray et al. (MSR-TR-2006-52). The primary advantage of this implementation is that all of the work is done entirely on the database server and client/server communication is eliminated. We implemented the methods outlined in Gray et al. for a PostgreSQL database. We programmed the methods as database functions in PL/pgSQL procedural language. The cross-matching is currently based on source positions, but we intend to extend it to use both positions and positional uncertainties to form a chi-square statistic for optimal thresholding. The database design includes three main tables, plus a handful of internal tables. The Sources table stores the SExtractor source extractions taken at various times; the MergedSources table stores statistics about the astronomical objects, which are the result of cross-matching records in the Sources table; and the Merges table, which associates cross-matched primary keys in the Sources table with primary keys in the MergedSoures table. Besides judicious database indexing, we have also internally partitioned the Sources table by declination zone, in order to speed up the population of Sources records and make the database more manageable. The catalog will be accessible to the public after the proprietary period through IRSA (irsa.ipac.caltech.edu).
NASA Astrophysics Data System (ADS)
Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan
2010-10-01
The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation.
Dudek, Christian-Alexander; Dannheim, Henning; Schomburg, Dietmar
2017-01-01
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de.
BrEPS 2.0: Optimization of sequence pattern prediction for enzyme annotation
Schomburg, Dietmar
2017-01-01
The prediction of gene functions is crucial for a large number of different life science areas. Faster high throughput sequencing techniques generate more and larger datasets. The manual annotation by classical wet-lab experiments is not suitable for these large amounts of data. We showed earlier that the automatic sequence pattern-based BrEPS protocol, based on manually curated sequences, can be used for the prediction of enzymatic functions of genes. The growing sequence databases provide the opportunity for more reliable patterns, but are also a challenge for the implementation of automatic protocols. We reimplemented and optimized the BrEPS pattern generation to be applicable for larger datasets in an acceptable timescale. Primary improvement of the new BrEPS protocol is the enhanced data selection step. Manually curated annotations from Swiss-Prot are used as reliable source for function prediction of enzymes observed on protein level. The pool of sequences is extended by highly similar sequences from TrEMBL and SwissProt. This allows us to restrict the selection of Swiss-Prot entries, without losing the diversity of sequences needed to generate significant patterns. Additionally, a supporting pattern type was introduced by extending the patterns at semi-conserved positions with highly similar amino acids. Extended patterns have an increased complexity, increasing the chance to match more sequences, without losing the essential structural information of the pattern. To enhance the usability of the database, we introduced enzyme function prediction based on consensus EC numbers and IUBMB enzyme nomenclature. BrEPS is part of the Braunschweig Enzyme Database (BRENDA) and is available on a completely redesigned website and as download. The database can be downloaded and used with the BrEPScmd command line tool for large scale sequence analysis. The BrEPS website and downloads for the database creation tool, command line tool and database are freely accessible at http://breps.tu-bs.de. PMID:28750104
VizieR Online Data Catalog: Mira stars discovered in LAMOST DR4 (Yao+, 2017)
NASA Astrophysics Data System (ADS)
Yao, Y.; Liu, C.; Deng, L.; de Grijs, R.; Matsunaga, N.
2017-10-01
By the end of 2016 March, the wide-field Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST) DR4 catalog had accumulated 7681185 spectra (R=1800), of which 6898298 were of stars. We compiled a photometrically confirmed sample of Mira variables from the Kiso Wide-Field Camera (KWFC) Intensive Survey of the Galactic Plane (KISOGP; Matsunaga 2017, arXiv:1705.08567), the American Association of Variable Star Observers (AAVSO) International Database Variable Star Index (VSX; Watson 2006, B/vsx, version 2017-05-02; we selected stars of variability type "M"), and the SIMBAD Astronomical Database. We first cross-matched the KISOGP and VSX Miras with the LAMOST DR4 catalog. Finally, we cross-matched the DR4 catalog with the SIMBAD database. See section 2. (1 data file).
Olier, Ivan; Springate, David A; Ashcroft, Darren M; Doran, Tim; Reeves, David; Planner, Claire; Reilly, Siobhan; Kontopantelis, Evangelos
2016-01-01
The use of Electronic Health Records databases for medical research has become mainstream. In the UK, increasing use of Primary Care Databases is largely driven by almost complete computerisation and uniform standards within the National Health Service. Electronic Health Records research often begins with the development of a list of clinical codes with which to identify cases with a specific condition. We present a methodology and accompanying Stata and R commands (pcdsearch/Rpcdsearch) to help researchers in this task. We present severe mental illness as an example. We used the Clinical Practice Research Datalink, a UK Primary Care Database in which clinical information is largely organised using Read codes, a hierarchical clinical coding system. Pcdsearch is used to identify potentially relevant clinical codes and/or product codes from word-stubs and code-stubs suggested by clinicians. The returned code-lists are reviewed and codes relevant to the condition of interest are selected. The final code-list is then used to identify patients. We identified 270 Read codes linked to SMI and used them to identify cases in the database. We observed that our approach identified cases that would have been missed with a simpler approach using SMI registers defined within the UK Quality and Outcomes Framework. We described a framework for researchers of Electronic Health Records databases, for identifying patients with a particular condition or matching certain clinical criteria. The method is invariant to coding system or database and can be used with SNOMED CT, ICD or other medical classification code-lists.
STEREO Observations of Waves in the Ramp Regions of Interplanetary Shocks
NASA Astrophysics Data System (ADS)
Cohen, Z.; Breneman, A. W.; Cattell, C. A.; Davis, L.; Grul, P.; Kersten, K.; Wilson, L. B., III
2017-12-01
Determining the role of plasma waves in providing energy dissipation at shock waves is of long-standing interest. Interplanetary (IP) shocks serve as a large database of low Mach number shocks. We examine electric field waveforms captured by the Time Domain Sampler (TDS) on the STEREO spacecraft during the ramps of IP shocks, with emphasis on captures lasting 2.1 seconds. Previous work has used captures of shorter duration (66 and 131 ms on STEREO, and 17 ms on WIND), which allowed for observation of waves with maximum (minimum) frequencies of 125 kHz (15 Hz), 62.5 kHz (8 Hz), and 60 kHz (59 Hz), respectively. The maximum frequencies are comparable to 2-8 times the plasma frequency in the solar wind, enabling observation of Langmuir waves, ion acoustic, and some whistler-mode waves. The 2 second captures resolve lower frequencies ( few Hz), which allows us to analyze packet structure of the whistler-mode waves and some ion acoustic waves. The longer capture time also improves the resolvability of simultaneous wave modes and of waves with frequencies on the order of 10s of Hz. Langmuir waves, however, cannot be identified at this sampling rate, since the plasma frequency is usually higher than 3.9 kHz. IP shocks are identified from multiple databases (Helsinki heliospheric shock database at http://ipshocks.fi, and the STEREO level 3 shock database at ftp://stereoftp.nascom.nasa.gov/pub/ins_data/impact/level3/). Our analysis focuses on TDS captures in shock ramp regions, with ramp durations determined from magnetic field data taken at 8 Hz. Software is used to identify multiple wave modes in any given capture and classify waves as Langmuir, ion acoustic, whistler, lower hybrid, electron cyclotron drift instability, or electrostatic solitary waves. Relevant frequencies are determined from density and magnetic field data collected in situ. Preliminary results suggest that large amplitude (≥ 5 mV/m) ion acoustic waves are most prevalent in the ramp, in agreement with Wilson, et al. Other modes are also observed. Statistical results will be presented and compared with previous studies and theoretical predictions.
On the connection of gamma-ray bursts and X-ray flashes in the BATSE and RHESSI databases
NASA Astrophysics Data System (ADS)
Řípa, J.; Mészáros, A.
2016-12-01
Classification of gamma-ray bursts (GRBs) into groups has been intensively studied by various statistical tests in previous years. It has been suggested that there was a distinct group of GRBs, beyond the long and short ones, with intermediate durations. However, such a group is not securely confirmed yet. Strangely, concerning the spectral hardness, the observations from the Swift and RHESSI satellites give different results. For the Swift/BAT database it is found that the intermediate-duration bursts might well be related to so-called X-ray flashes (XRFs). On the other hand, for the RHESSI dataset the intermediate-duration bursts seem to be spectrally too hard to be given by XRFs. The connection of the intermediate-duration bursts and XRFs for the BATSE database is not clear as well. The purpose of this article is to check the relation between XRFs and GRBs for the BATSE and RHESSI databases, respectively. We use an empirical definition of XRFs introduced by other authors earlier. For the RHESSI database we also use a transformation between the detected counts and the fluences based on the simulated detector response function. The purpose is to compare the hardnesses of GRBs with the definition of XRFs. There is a 1.3-4.2 % fraction of XRFs in the whole BATSE database. The vast majority of the BATSE short bursts are not XRFs because only 0.7-5.7 % of the short bursts can be given by XRFs. However, there is a large uncertainty in the fraction of XRFs among the intermediate-duration bursts. The fraction of 1-85 % of the BATSE intermediate-duration bursts can be related to XRFs. For the long bursts this fraction is between 1.0 % and 3.4 %. The uncertainties in these fractions are large, however it can be claimed that all BATSE intermediate-duration bursts cannot be given by XRFs. At least 79 % of RHESSI short bursts, at least 53 % of RHESSI intermediate-duration bursts, and at least 45 % of RHESSI long bursts should not be given by XRFs. A simulation of XRFs observed by HETE-2 and Swift has shown that RHESSI would detect, and in fact detected, only one long-duration XRF out of 26 ones observed by those two satellites. We arrive at the conclusion that the intermediate-duration bursts in the BATSE database can be partly populated by XRFs, but the RHESSI intermediate-duration bursts are most likely not given by XRFs. The results, claiming that the Swift/BAT intermediate-duration bursts are closely related to XRFs do not hold for the BATSE and RHESSI databases.
Using Large Diabetes Databases for Research.
Wild, Sarah; Fischbacher, Colin; McKnight, John
2016-09-01
There are an increasing number of clinical, administrative and trial databases that can be used for research. These are particularly valuable if there are opportunities for linkage to other databases. This paper describes examples of the use of large diabetes databases for research. It reviews the advantages and disadvantages of using large diabetes databases for research and suggests solutions for some challenges. Large, high-quality databases offer potential sources of information for research at relatively low cost. Fundamental issues for using databases for research are the completeness of capture of cases within the population and time period of interest and accuracy of the diagnosis of diabetes and outcomes of interest. The extent to which people included in the database are representative should be considered if the database is not population based and there is the intention to extrapolate findings to the wider diabetes population. Information on key variables such as date of diagnosis or duration of diabetes may not be available at all, may be inaccurate or may contain a large amount of missing data. Information on key confounding factors is rarely available for the nondiabetic or general population limiting comparisons with the population of people with diabetes. However comparisons that allow for differences in distribution of important demographic factors may be feasible using data for the whole population or a matched cohort study design. In summary, diabetes databases can be used to address important research questions. Understanding the strengths and limitations of this approach is crucial to interpret the findings appropriately. © 2016 Diabetes Technology Society.
NASA Astrophysics Data System (ADS)
Yamada, A.; Saitoh, N.; Nonogaki, R.; Imasu, R.; Shiomi, K.; Kuze, A.
2016-12-01
The thermal infrared (TIR) band of Thermal and Near-infrared Sensor for Carbon Observation Fourier Transform Spectrometer (TANSO-FTS) onboard Greenhouse Gases Observing Satellite (GOSAT) observes CH4 profile at wavenumber range from 1210 cm-1 to 1360 cm-1 including CH4 ν4 band. The current retrieval algorithm (V1.0) uses LBLRTM V12.1 with AER V3.1 line database to calculate optical depth. LBLRTM V12.1 include MT_CKD 2.5.2 model to calculate continuum absorption. The continuum absorption has large uncertainty, especially temperature dependent coefficient, between BPS model and MT_CKD model in the wavenumber region of 1210-1250 cm-1(Paynter and Ramaswamy, 2014). The purpose of this study is to assess the impact on CH4 retrieval from the line parameter databases and the uncertainty of continuum absorption. We used AER v1.0 database, HITRAN2004 database, HITRAN2008 database, AER V3.2 database, and HITRAN2012 database (Rothman et al. 2005, 2009, and 2013. Clough et al., 2005). AER V1.0 database is based on HITRAN2000. The CH4 line parameters of AER V3.1 and V3.2 databases are developed from HITRAN2008 including updates until May 2009 with line mixing parameters. We compared the retrieved CH4 with the HIPPO CH4 observation (Wofsy et al., 2012). The difference of AER V3.2 was the smallest and 24.1 ± 45.9 ppbv. The differences of AER V1.0, HITRAN2004, HITRAN2008, and HITRAN2012 were 35.6 ± 46.5 ppbv, 37.6 ± 46.3 ppbv, 32.1 ± 46.1 ppbv, and 35.2 ± 46.0 ppbv, respectively. Compare AER V3.2 case to HITRAN2008 case, the line coupling effect reduced difference by 8.0 ppbv. Median values of Residual difference from HITRAN2008 to AER V1.0, HITRAN2004, AER V3.2, and HITRAN2012 were 0.6 K, 0.1 K, -0.08 K, and 0.08 K, respectively, while median values of transmittance difference were less than 0.0003 and transmittance differences have small wavenumber dependence. We also discuss the retrieval error from the uncertainty of the continuum absorption, the test of full grid configuration for retrieval, and the retrieval results using GOSAT TIR L1B V203203, which are sample products to evaluate the next level 1B algorithm.
NASA Astrophysics Data System (ADS)
Jones, A. S.; Horsburgh, J. S.; Matos, M.; Caraballo, J.
2015-12-01
Networks conducting long term monitoring using in situ sensors need the functionality to track physical equipment as well as deployments, calibrations, and other actions related to site and equipment maintenance. The observational data being generated by sensors are enhanced if direct linkages to equipment details and actions can be made. This type of information is typically recorded in field notebooks or in static files, which are rarely linked to observations in a way that could be used to interpret results. However, the record of field activities is often relevant to analysis or post-processing of the observational data. We have developed an underlying database schema and deployed a web interface for recording and retrieving information on physical infrastructure and related actions for observational networks. The database schema for equipment was designed as an extension to the Observations Data Model 2 (ODM2), a community-developed information model for spatially discrete, feature based earth observations. The core entities of ODM2 describe location, observed variable, and timing of observations, and the equipment extension contains entities to provide additional metadata specific to the inventory of physical infrastructure and associated actions. The schema is implemented in a relational database system for storage and management with an associated web interface. We designed the web-based tools for technicians to enter and query information on the physical equipment and actions such as site visits, equipment deployments, maintenance, and calibrations. These tools were implemented for the iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) ecohydrologic observatory, and we anticipate that they will be useful for similar large-scale monitoring networks desiring to link observing infrastructure to observational data to increase the quality of sensor-based data products.
NASA Astrophysics Data System (ADS)
Tatge, C. B.; Slater, S. J.; Slater, T. F.; Schleigh, S.; McKinnon, D.
2016-12-01
Historically, an important part of the scientific research cycle is to situate any research project within the landscape of the existing scientific literature. In the field of discipline-based astronomy education research, grappling with the existing literature base has proven difficult because of the difficulty in obtaining research reports from around the world, particularly early ones. In order to better survey and efficiently utilize the wide and fractured range and domain of astronomy education research methods and results, the iSTAR international Study of Astronomical Reasoning database project was initiated. The project aims to host a living, online repository of dissertations, theses, journal articles, and grey literature resources to serve the world's discipline-based astronomy education research community. The first domain of research artifacts ingested into the iSTAR database were doctoral dissertations. To the authors' great surprise, nearly 300 astronomy education research dissertations were found from the last 100-years. Few, if any, of the literature reviews from recent astronomy education dissertations surveyed even come close to summarizing this many dissertations, most of which have not been published in traditional journals, as re-publishing one's dissertation research as a journal article was not a widespread custom in the education research community until recently. A survey of the iSTAR database dissertations reveals that the vast majority of work has been largely quantitative in nature until the last decade. We also observe that modern-era astronomy education research writings reaches as far back as 1923 and that the majority of dissertations come from the same eight institutions. Moreover, most of the astronomy education research work has been done covering learners' grasp of broad knowledge of astronomy rather than delving into specific learning targets, which has been more in vogue during the last two decades. The surprisingly wide breadth of largely unknown research revealed in the iSTAR database motivates us to begin to synthesize the research and look for broader themes using widely accepted meta analysis techniques.
Application of kernel functions for accurate similarity search in large chemical databases.
Wang, Xiaohong; Huan, Jun; Smalter, Aaron; Lushington, Gerald H
2010-04-29
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
Systematic observations of the slip pulse properties of large earthquake ruptures
Melgar, Diego; Hayes, Gavin
2017-01-01
In earthquake dynamics there are two end member models of rupture: propagating cracks and self-healing pulses. These arise due to different properties of faults and have implications for seismic hazard; rupture mode controls near-field strong ground motions. Past studies favor the pulse-like mode of rupture; however, due to a variety of limitations, it has proven difficult to systematically establish their kinematic properties. Here we synthesize observations from a database of >150 rupture models of earthquakes spanning M7–M9 processed in a uniform manner and show the magnitude scaling properties of these slip pulses indicates self-similarity. Further, we find that large and very large events are statistically distinguishable relatively early (at ~15 s) in the rupture process. This suggests that with dense regional geophysical networks strong ground motions from a large rupture can be identified before their onset across the source region.
Spatial distribution of citizen science casuistic observations for different taxonomic groups.
Tiago, Patrícia; Ceia-Hasse, Ana; Marques, Tiago A; Capinha, César; Pereira, Henrique M
2017-10-16
Opportunistic citizen science databases are becoming an important way of gathering information on species distributions. These data are temporally and spatially dispersed and could have limitations regarding biases in the distribution of the observations in space and/or time. In this work, we test the influence of landscape variables in the distribution of citizen science observations for eight taxonomic groups. We use data collected through a Portuguese citizen science database (biodiversity4all.org). We use a zero-inflated negative binomial regression to model the distribution of observations as a function of a set of variables representing the landscape features plausibly influencing the spatial distribution of the records. Results suggest that the density of paths is the most important variable, having a statistically significant positive relationship with number of observations for seven of the eight taxa considered. Wetland coverage was also identified as having a significant, positive relationship, for birds, amphibians and reptiles, and mammals. Our results highlight that the distribution of species observations, in citizen science projects, is spatially biased. Higher frequency of observations is driven largely by accessibility and by the presence of water bodies. We conclude that efforts are required to increase the spatial evenness of sampling effort from volunteers.
Frequency Analysis of the RRc Variables of the MACHO Database for the LMC
NASA Astrophysics Data System (ADS)
Kovács, G.; Alcock, C.; Allsman, R.; Alves, D.; Axelrod, T.; Becker, A.; Bennett, D.; Clement, C.; Cook, K. H.; Drake, A.; Freeman, K.; Geha, M.; Griest, K.; Kurtz, D. W.; Lehner, M.; Marshall, S.; Minniti, D.; Nelson, C.; Peterson, B.; Popowski, P.; Pratt, M.; Quinn, P.; Rodgers, A.; Rowe, J.; Stubbs, C.; Sutherland, W.; Tomaney, A.; Vandehei, T.; Welch, D. L.; MACHO Collaboration
We present the first massive frequency analysis of the 1200 first overtone RR Lyrae stars in the Large Magellanic Cloud observed in the first 4.3 yr of the MACHO project. Besides the many new double-mode variables, we also discovered stars with closely spaced frequencies. These variables are most probably nonradial pulsators.
Pattern-based, multi-scale segmentation and regionalization of EOSD land cover
NASA Astrophysics Data System (ADS)
Niesterowicz, Jacek; Stepinski, Tomasz F.
2017-10-01
The Earth Observation for Sustainable Development of Forests (EOSD) map is a 25 m resolution thematic map of Canadian forests. Because of its large spatial extent and relatively high resolution the EOSD is difficult to analyze using standard GIS methods. In this paper we propose multi-scale segmentation and regionalization of EOSD as new methods for analyzing EOSD on large spatial scales. Segments, which we refer to as forest land units (FLUs), are delineated as tracts of forest characterized by cohesive patterns of EOSD categories; we delineated from 727 to 91,885 FLUs within the spatial extent of EOSD depending on the selected scale of a pattern. Pattern of EOSD's categories within each FLU is described by 1037 landscape metrics. A shapefile containing boundaries of all FLUs together with an attribute table listing landscape metrics make up an SQL-searchable spatial database providing detailed information on composition and pattern of land cover types in Canadian forest. Shapefile format and extensive attribute table pertaining to the entire legend of EOSD are designed to facilitate broad range of investigations in which assessment of composition and pattern of forest over large areas is needed. We calculated four such databases using different spatial scales of pattern. We illustrate the use of FLU database for producing forest regionalization maps of two Canadian provinces, Quebec and Ontario. Such maps capture the broad scale variability of forest at the spatial scale of the entire province. We also demonstrate how FLU database can be used to map variability of landscape metrics, and thus the character of landscape, over the entire Canada.
Adverse Events Associated with Prolonged Antibiotic Use
Meropol, Sharon B.; Chan, K. Arnold; Chen, Zhen; Finkelstein, Jonathan A.; Hennessy, Sean; Lautenbach, Ebbing; Platt, Richard; Schech, Stephanie D.; Shatin, Deborah; Metlay, Joshua P.
2014-01-01
Purpose The Infectious Diseases Society of America and US CDC recommend 60 days of ciprofloxacin, doxycycline or amoxicillin for anthrax prophylaxis. It is not possible to determine severe adverse drug event (ADE) risks from the few people thus far exposed to anthrax prophylaxis. This study’s objective was to estimate risks of severe ADEs associated with long-term ciprofloxacin, doxycycline and amoxicillin exposure using 3 large databases: one electronic medical record (General Practice Research Database) and two claims databases (UnitedHealthcare, HMO Research Network). Methods We include office visit, hospital admission and prescription data for 1/1/1999–6/30/2001. Exposure variable was oral antibiotic person-days (pds). Primary outcome was hospitalization during exposure with ADE diagnoses: anaphylaxis, phototoxicity, hepatotoxicity, nephrotoxicity, seizures, ventricular arrhythmia or infectious colitis. Results We randomly sampled 999,773, 1,047,496 and 1,819,004 patients from Databases A, B and C respectively. 33,183 amoxicillin, 15,250 ciprofloxacin and 50,171 doxycycline prescriptions continued ≥30 days. ADE hospitalizations during long-term exposure were not observed in Database A. ADEs during long-term amoxicillin were seen only in Database C with 5 ADEs or 1.2(0.4–2.7) ADEs/100,000 pds exposure. Long-term ciprofloxacin showed 3 and 4 ADEs with 5.7(1.2–16.6) and 3.5(1.0–9.0) ADEs/100,000 pds in Databases B and C, respectively. Only Database B had ADEs during long-term doxycycline with 3 ADEs or 0.9(0.2–2.6) ADEs/100,000 pds. For most events, the incidence rate ratio, comparing >28 vs.1–28 pds exposure was <1, showing limited evidence for cumulative dose-related ADEs from long-term exposure. Conclusions Long-term amoxicillin, ciprofloxacin and doxycycline appears safe, supporting use of these medications if needed for large-scale post-exposure anthrax prophylaxis. PMID:18215001
Comparative effectiveness research in hand surgery.
Johnson, Shepard P; Chung, Kevin C
2014-08-01
Comparative effectiveness research (CER) is a concept initiated by the Institute of Medicine and financially supported by the federal government. The primary objective of CER is to improve decision making in medicine. This research is intended to evaluate the effectiveness, benefits, and harmful effects of alternative interventions. CER studies are commonly large, simple, observational, and conducted using electronic databases. To date, there is little comparative effectiveness evidence within hand surgery to guide therapeutic decisions. To draw conclusions on effectiveness through electronic health records, databases must contain clinical information and outcomes relevant to hand surgery interventions, such as patient-related outcomes. Copyright © 2014 Elsevier Inc. All rights reserved.
The FRUITY database on AGB stars: past, present and future
NASA Astrophysics Data System (ADS)
Cristallo, S.; Piersanti, L.; Straniero, O.
2016-01-01
We present and show the features of the FRUITY database, an interactive web- based interface devoted to the nucleosynthesis in AGB stars. We describe the current available set of AGB models (largely expanded with respect to the original one) with masses in the range 1.3≤M/M⊙≤3.0 and metallicities -2.15 ≤[Fe/H]≤+0.15. We illustrate the details of our s-process surface distributions and we compare our results to observations. Moreover, we introduce a new set of models where the effects of rotation are taken into account. Finally, we shortly describe next planned upgrades.
National Databases for Neurosurgical Outcomes Research: Options, Strengths, and Limitations.
Karhade, Aditya V; Larsen, Alexandra M G; Cote, David J; Dubois, Heloise M; Smith, Timothy R
2017-08-05
Quality improvement, value-based care delivery, and personalized patient care depend on robust clinical, financial, and demographic data streams of neurosurgical outcomes. The neurosurgical literature lacks a comprehensive review of large national databases. To assess the strengths and limitations of various resources for outcomes research in neurosurgery. A review of the literature was conducted to identify surgical outcomes studies using national data sets. The databases were assessed for the availability of patient demographics and clinical variables, longitudinal follow-up of patients, strengths, and limitations. The number of unique patients contained within each data set ranged from thousands (Quality Outcomes Database [QOD]) to hundreds of millions (MarketScan). Databases with both clinical and financial data included PearlDiver, Premier Healthcare Database, Vizient Clinical Data Base and Resource Manager, and the National Inpatient Sample. Outcomes collected by databases included patient-reported outcomes (QOD); 30-day morbidity, readmissions, and reoperations (National Surgical Quality Improvement Program); and disease incidence and disease-specific survival (Surveillance, Epidemiology, and End Results-Medicare). The strengths of large databases included large numbers of rare pathologies and multi-institutional nationally representative sampling; the limitations of these databases included variable data veracity, variable data completeness, and missing disease-specific variables. The improvement of existing large national databases and the establishment of new registries will be crucial to the future of neurosurgical outcomes research. Copyright © 2017 by the Congress of Neurological Surgeons
Using All-Sky Imaging to Improve Telescope Scheduling (Abstract)
NASA Astrophysics Data System (ADS)
Cole, G. M.
2017-12-01
(Abstract only) Automated scheduling makes it possible for a small telescope to observe a large number of targets in a single night. But when used in areas which have less-than-perfect sky conditions such automation can lead to large numbers of observations of clouds and haze. This paper describes the development of a "sky-aware" telescope automation system that integrates the data flow from an SBIG AllSky340c camera with an enhanced dispatch scheduler to make optimum use of the available observing conditions for two highly instrumented backyard telescopes. Using the minute-by-minute time series image stream and a self-maintained reference database, the software maintains a file of sky brightness, transparency, stability, and forecasted visibility at several hundred grid positions. The scheduling software uses this information in real time to exclude targets obscured by clouds and select the best observing task, taking into account the requirements and limits of each instrument.
NASA Astrophysics Data System (ADS)
Gasser, Deta; Viola, Giulio; Bingen, Bernard
2016-04-01
Since 2010, the Geological Survey of Norway has been implementing and continuously developing a digital workflow for geological bedrock mapping in Norway, from fieldwork to final product. Our workflow is based on the ESRI ArcGIS platform, and we use rugged Windows computers in the field. Three different hardware solutions have been tested over the past 5 years (2010-2015). (1) Panasonic Toughbook CE-19 (2.3 kg), (2) Panasonic Toughbook CF H2 Field (1.6 kg) and (3) Motion MC F5t tablet (1.5 kg). For collection of point observations in the field we mainly use the SIGMA Mobile application in ESRI ArcGIS developed by the British Geological Survey, which allows the mappers to store georeferenced comments, structural measurements, sample information, photographs, sketches, log information etc. in a Microsoft Access database. The application is freely downloadable from the BGS websites. For line- and polygon work we use our in-house database, which is currently under revision. Our line database consists of three feature classes: (1) bedrock boundaries, (2) bedrock lineaments, and (3) bedrock lines, with each feature class having up to 24 different attribute fields. Our polygon database consists of one feature class with 38 attribute fields enabling to store various information concerning lithology, stratigraphic order, age, metamorphic grade and tectonic subdivision. The polygon and line databases are coupled via topology in ESRI ArcGIS, which allows us to edit them simultaneously. This approach has been applied in two large-scale 1:50 000 bedrock mapping projects, one in the Kongsberg domain of the Sveconorwegian orogen, and the other in the greater Trondheim area (Orkanger) in the Caledonian belt. The mapping projects combined collection of high-resolution geophysical data, digital acquisition of field data, and collection of geochronological, geochemical and petrological data. During the Kongsberg project, some 25000 field observation points were collected by eight geologists. For the Orkanger project, some 2100 field observation points were collected by three geologists. Several advantages of the applied digital approach became clear during these projects: (1) The systematic collection of geological field data in a common format allows easy access and exchange of data among different geologists, (2) Easier access to background information such as geophysics and DEMS in the field, (3) Faster workflow from field data collection to final map product. Obvious disadvantages include: (1) Heavy(ish) and expensive hardware, (2) Battery life and other technical issues in the field, (3) Need for a central field observation point storage inhouse (large amounts of data!), and (4) Acceptance of- and training in a common workflow from all involved geologists.
Cross-checking of Large Evaluated and Experimental Nuclear Reaction Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zeydina, O.; Koning, A.J.; Soppera, N.
2014-06-15
Automated methods are presented for the verification of large experimental and evaluated nuclear reaction databases (e.g. EXFOR, JEFF, TENDL). These methods allow an assessment of the overall consistency of the data and detect aberrant values in both evaluated and experimental databases.
Mackey, Aaron J; Pearson, William R
2004-10-01
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Olier, Ivan; Springate, David A.; Ashcroft, Darren M.; Doran, Tim; Reeves, David; Planner, Claire; Reilly, Siobhan; Kontopantelis, Evangelos
2016-01-01
Background The use of Electronic Health Records databases for medical research has become mainstream. In the UK, increasing use of Primary Care Databases is largely driven by almost complete computerisation and uniform standards within the National Health Service. Electronic Health Records research often begins with the development of a list of clinical codes with which to identify cases with a specific condition. We present a methodology and accompanying Stata and R commands (pcdsearch/Rpcdsearch) to help researchers in this task. We present severe mental illness as an example. Methods We used the Clinical Practice Research Datalink, a UK Primary Care Database in which clinical information is largely organised using Read codes, a hierarchical clinical coding system. Pcdsearch is used to identify potentially relevant clinical codes and/or product codes from word-stubs and code-stubs suggested by clinicians. The returned code-lists are reviewed and codes relevant to the condition of interest are selected. The final code-list is then used to identify patients. Results We identified 270 Read codes linked to SMI and used them to identify cases in the database. We observed that our approach identified cases that would have been missed with a simpler approach using SMI registers defined within the UK Quality and Outcomes Framework. Conclusion We described a framework for researchers of Electronic Health Records databases, for identifying patients with a particular condition or matching certain clinical criteria. The method is invariant to coding system or database and can be used with SNOMED CT, ICD or other medical classification code-lists. PMID:26918439
Surgical research using national databases
Leland, Hyuma; Heckmann, Nathanael
2016-01-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research. PMID:27867945
Surgical research using national databases.
Alluri, Ram K; Leland, Hyuma; Heckmann, Nathanael
2016-10-01
Recent changes in healthcare and advances in technology have increased the use of large-volume national databases in surgical research. These databases have been used to develop perioperative risk stratification tools, assess postoperative complications, calculate costs, and investigate numerous other topics across multiple surgical specialties. The results of these studies contain variable information but are subject to unique limitations. The use of large-volume national databases is increasing in popularity, and thorough understanding of these databases will allow for a more sophisticated and better educated interpretation of studies that utilize such databases. This review will highlight the composition, strengths, and weaknesses of commonly used national databases in surgical research.
Strabo: An App and Database for Structural Geology and Tectonics Data
NASA Astrophysics Data System (ADS)
Newman, J.; Williams, R. T.; Tikoff, B.; Walker, J. D.; Good, J.; Michels, Z. D.; Ash, J.
2016-12-01
Strabo is a data system designed to facilitate digital storage and sharing of structural geology and tectonics data. The data system allows researchers to store and share field and laboratory data as well as construct new multi-disciplinary data sets. Strabo is built on graph database technology, as opposed to a relational database, which provides the flexibility to define relationships between objects of any type. This framework allows observations to be linked in a complex and hierarchical manner that is not possible in traditional database topologies. Thus, the advantage of the Strabo data structure is the ability of graph databases to link objects in both numerous and complex ways, in a manner that more accurately reflects the realities of the collecting and organizing of geological data sets. The data system is accessible via a mobile interface (iOS and Android devices) that allows these data to be stored, visualized, and shared during primary collection in the field or the laboratory. The Strabo Data System is underlain by the concept of a "Spot," which we define as any observation that characterizes a specific area. This can be anything from a strike and dip measurement of bedding to cross-cutting relationships between faults in complex dissected terrains. Each of these spots can then contain other Spots and/or measurements (e.g., lithology, slickenlines, displacement magnitude.) Hence, the Spot concept is applicable to all relationships and observation sets. Strabo is therefore capable of quantifying and digitally storing large spatial variations and complex geometries of naturally deformed rocks within hierarchically related maps and images. These approaches provide an observational fidelity comparable to a traditional field book, but with the added benefits of digital data storage, processing, and ease of sharing. This approach allows Strabo to integrate seamlessly into the workflow of most geologists. Future efforts will focus on extending Strabo to other sub-disciplines as well as developing a desktop system for the enhanced collection and organization of microstructural data.
NASA Astrophysics Data System (ADS)
Do, Hong; Gudmundsson, Lukas; Leonard, Michael; Westra, Seth; Senerivatne, Sonia
2017-04-01
In-situ observations of daily streamflow with global coverage are a crucial asset for understanding large-scale freshwater resources which are an essential component of the Earth system and a prerequisite for societal development. Here we present the Global Streamflow Indices and Metadata archive (G-SIM), a collection indices derived from more than 20,000 daily streamflow time series across the globe. These indices are designed to support global assessments of change in wet and dry extremes, and have been compiled from 12 free-to-access online databases (seven national databases and five international collections). The G-SIM archive also includes significant metadata to help support detailed understanding of streamflow dynamics, with the inclusion of drainage area shapefile and many essential catchment properties such as land cover type, soil and topographic characteristics. The automated procedure in data handling and quality control of the project makes G-SIM a reproducible, extendible archive and can be utilised for many purposes in large-scale hydrology. Some potential applications include the identification of observational trends in hydrological extremes, the assessment of climate change impacts on streamflow regimes, and the validation of global hydrological models.
Some Reliability Issues in Very Large Databases.
ERIC Educational Resources Information Center
Lynch, Clifford A.
1988-01-01
Describes the unique reliability problems of very large databases that necessitate specialized techniques for hardware problem management. The discussion covers the use of controlled partial redundancy to improve reliability, issues in operating systems and database management systems design, and the impact of disk technology on very large…
Use of large healthcare databases for rheumatology clinical research.
Desai, Rishi J; Solomon, Daniel H
2017-03-01
Large healthcare databases, which contain data collected during routinely delivered healthcare to patients, can serve as a valuable resource for generating actionable evidence to assist medical and healthcare policy decision-making. In this review, we summarize use of large healthcare databases in rheumatology clinical research. Large healthcare data are critical to evaluate medication safety and effectiveness in patients with rheumatologic conditions. Three major sources of large healthcare data are: first, electronic medical records, second, health insurance claims, and third, patient registries. Each of these sources offers unique advantages, but also has some inherent limitations. To address some of these limitations and maximize the utility of these data sources for evidence generation, recent efforts have focused on linking different data sources. Innovations such as randomized registry trials, which aim to facilitate design of low-cost randomized controlled trials built on existing infrastructure provided by large healthcare databases, are likely to make clinical research more efficient in coming years. Harnessing the power of information contained in large healthcare databases, while paying close attention to their inherent limitations, is critical to generate a rigorous evidence-base for medical decision-making and ultimately enhancing patient care.
Fast Multivariate Search on Large Aviation Datasets
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Zhu, Qiang; Oza, Nikunj C.; Srivastava, Ashok N.
2010-01-01
Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual
NASA Astrophysics Data System (ADS)
Shchepashchenko, D.; Chave, J.; Phillips, O. L.; Davies, S. J.; Lewis, S. L.; Perger, C.; Dresel, C.; Fritz, S.; Scipal, K.
2017-12-01
Forest monitoring is high on the scientific and political agenda. Global measurements of forest height, biomass and how they change with time are urgently needed as essential climate and ecosystem variables. The Forest Observation System - FOS (http://forest-observation-system.net/) is an international cooperation to establish a global in-situ forest biomass database to support earth observation and to encourage investment in relevant field-based observations and science. FOS aims to link the Remote Sensing (RS) community with ecologists who measure forest biomass and estimating biodiversity in the field for a common benefit. The benefit of FOS for the RS community is the partnering of the most established teams and networks that manage permanent forest plots globally; to overcome data sharing issues and introduce a standard biomass data flow from tree level measurement to the plot level aggregation served in the most suitable form for the RS community. Ecologists benefit from the FOS with improved access to global biomass information, data standards, gap identification and potential improved funding opportunities to address the known gaps and deficiencies in the data. FOS closely collaborate with the Center for Tropical Forest Science -CTFS-ForestGEO, the ForestPlots.net (incl. RAINFOR, AfriTRON and T-FORCES), AusCover, Tropical managed Forests Observatory and the IIASA network. FOS is an open initiative with other networks and teams most welcome to join. The online database provides open access for both metadata (e.g. who conducted the measurements, where and which parameters) and actual data for a subset of plots where the authors have granted access. A minimum set of database values include: principal investigator and institution, plot coordinates, number of trees, forest type and tree species composition, wood density, canopy height and above ground biomass of trees. Plot size is 0.25 ha or large. The database will be essential for validating and calibrating satellite observations and various models.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Fermi-LAT Gamma-Ray Bursts and Insights from Swift
NASA Technical Reports Server (NTRS)
Racusin, Judith L.
2010-01-01
A new revolution in Gamma-ray Burst (GRB) observations and theory has begun over the last two years since the launch of the Fermi Gamma-ray Space Telescope. The new window into high energy gamma-rays opened by the Fermi-Large Area Telescope (LAT) is providing insight into prompt emission mechanisms and possibly also afterglow physics. The LAT detected GRBs appear to be a new unique subset of extremely energetic and bright bursts compared to the large sample detected by Swift over the last 6 years. In this talk, I will discuss the context and recent discoveries from these LAT GRBs and the large database of broadband observations collected by the Swift X-ray Telescope (XRT) and UV/Optical Telescope (UVOT). Through comparisons between the GRBs detected by Swift-BAT, G8M, and LAT, we can learn about the unique characteristics, physical differences, and the relationships between each population. These population characteristics provide insight into the different physical parameters that contribute to the diversity of observational GRB properties.
Stang, Paul E; Ryan, Patrick B; Overhage, J Marc; Schuemie, Martijn J; Hartzema, Abraham G; Welebob, Emily
2013-10-01
Researchers using observational data to understand drug effects must make a number of analytic design choices that suit the characteristics of the data and the subject of the study. Review of the published literature suggests that there is a lack of consistency even when addressing the same research question in the same database. To characterize the degree of similarity or difference in the method and analysis choices made by observational database research experts when presented with research study scenarios. On-line survey using research scenarios on drug-effect studies to capture method selection and analysis choices that follow a dependency branching based on response to key questions. Voluntary participants experienced in epidemiological study design solicited for participation through registration on the Observational Medical Outcomes Partnership website, membership in particular professional organizations, or links in relevant newsletters. Description (proportion) of respondents selecting particular methods and making specific analysis choices based on individual drug-outcome scenario pairs. The number of questions/decisions differed based on stem questions of study design, time-at-risk, outcome definition, and comparator. There is little consistency across scenarios, by drug or by outcome of interest, in the decisions made for design and analyses in scenarios using large healthcare databases. The most consistent choice was the cohort study design but variability in the other critical decisions was common. There is great variation among epidemiologists in the design and analytical choices that they make when implementing analyses in observational healthcare databases. These findings confirm that it will be important to generate empiric evidence to inform these decisions and to promote a better understanding of the impact of standardization on research implementation.
NASA Astrophysics Data System (ADS)
Hansen, Akio; Ament, Felix; Lammert, Andrea
2017-04-01
Large-eddy simulations have been performed since several decades, but due to computational limits most studies were restricted to small domains or idealised initial-/boundary conditions. Within the High definition clouds and precipitation for advancing climate prediction (HD(CP)2) project realistic weather forecasting like LES simulations were performed with the newly developed ICON LES model for several days. The domain covers central Europe with a horizontal resolution down to 156 m. The setup consists of more than 3 billion grid cells, by what one 3D dump requires roughly 500 GB. A newly developed online evaluation toolbox was created to check instantaneously for realistic model simulations. The toolbox automatically combines model results with observations and generates several quicklooks for various variables. So far temperature-/humidity profiles, cloud cover, integrated water vapour, precipitation and many more are included. All kind of observations like aircraft observations, soundings or precipitation radar networks are used. For each dataset, a specific module is created, which allows for an easy handling and enhancement of the toolbox. Most of the observations are automatically downloaded from the Standardized Atmospheric Measurement Database (SAMD). The evaluation tool should support scientists at monitoring computational costly model simulations as well as to give a first overview about model's performance. The structure of the toolbox as well as the SAMD database are presented. Furthermore, the toolbox was applied on an ICON LES sensitivity study, where example results are shown.
ON THE ENHANCED CORONAL MASS EJECTION DETECTION RATE SINCE THE SOLAR CYCLE 23 POLAR FIELD REVERSAL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Petrie, G. J. D.
2015-10-10
Compared to cycle 23, coronal mass ejections (CMEs) with angular widths >30° have been observed to occur at a higher rate during solar cycle 24, per sunspot number. This result is supported by data from three independent databases constructed using Large Angle and Spectrometric Coronagraph Experiment coronagraph images, two employing automated detection techniques and one compiled manually by human observers. According to the two databases that cover a larger field of view, the enhanced CME rate actually began shortly after the cycle 23 polar field reversal, in 2004, when the polar fields returned with a 40% reduction in strength andmore » the interplanetary radial magnetic field became ≈30% weaker. This result is consistent with the link between anomalous CME expansion and the heliospheric total pressure decrease recently reported by Gopalswamy et al.« less
Does filler database size influence identification accuracy?
Bergold, Amanda N; Heaton, Paul
2018-06-01
Police departments increasingly use large photo databases to select lineup fillers using facial recognition software, but this technological shift's implications have been largely unexplored in eyewitness research. Database use, particularly if coupled with facial matching software, could enable lineup constructors to increase filler-suspect similarity and thus enhance eyewitness accuracy (Fitzgerald, Oriet, Price, & Charman, 2013). However, with a large pool of potential fillers, such technologies might theoretically produce lineup fillers too similar to the suspect (Fitzgerald, Oriet, & Price, 2015; Luus & Wells, 1991; Wells, Rydell, & Seelau, 1993). This research proposes a new factor-filler database size-as a lineup feature affecting eyewitness accuracy. In a facial recognition experiment, we select lineup fillers in a legally realistic manner using facial matching software applied to filler databases of 5,000, 25,000, and 125,000 photos, and find that larger databases are associated with a higher objective similarity rating between suspects and fillers and lower overall identification accuracy. In target present lineups, witnesses viewing lineups created from the larger databases were less likely to make correct identifications and more likely to select known innocent fillers. When the target was absent, database size was associated with a lower rate of correct rejections and a higher rate of filler identifications. Higher algorithmic similarity ratings were also associated with decreases in eyewitness identification accuracy. The results suggest that using facial matching software to select fillers from large photograph databases may reduce identification accuracy, and provides support for filler database size as a meaningful system variable. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Su, Xiaoquan; Xu, Jian; Ning, Kang
2012-10-01
It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. ningkang@qibebt.ac.cn Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Cinzia Marra, Anna; Casella, Daniele; Martins Costa do Amaral, Lia; Sanò, Paolo; Dietrich, Stefano; Panegrossi, Giulia
2017-04-01
Two new precipitation retrieval algorithms for the Advanced Microwave Scanning Radiometer 2 (AMSR2) and for the GPM Microwave Imager (GMI) are presented. The algorithms are based on the Cloud Dynamics and Radiation Database (CDRD) Bayesian approach and represent an evolution of the previous version applied to Special Sensor Microwave Imager/Sounder (SSMIS) observations, and used operationally within the EUMETSAT Satellite Application Facility on support to Operational Hydrology and Water Management (H-SAF). These new products present as main innovation the use of an extended database entirely empirical, derived from coincident radar and radiometer observations from the NASA/JAXA Global Precipitation Measurement Core Observatory (GPM-CO) (Dual-frequency Precipitation Radar-DPR and GMI). The other new aspects are: 1) a new rain-no-rain screening approach; 2) the use of Empirical Orthogonal Functions (EOF) and Canonical Correlation Analysis (CCA) both in the screening approach, and in the Bayesian algorithm; 2) the use of new meteorological and environmental ancillary variables to categorize the database and mitigate the problem of non-uniqueness of the retrieval solution; 3) the development and implementations of specific modules for computational time minimization. The CDRD algorithms for AMSR2 and GMI are able to handle an extremely large observational database available from GPM-CO and provide the rainfall estimate with minimum latency, making them suitable for near-real time hydrological and operational applications. As far as CDRD for AMSR2, a verification study over Italy using ground-based radar data and over the MSG full disk area using coincident GPM-CO/AMSR2 observations has been carried out. Results show remarkable AMSR2 capabilities for rainfall rate (RR) retrieval over ocean (for RR > 0.25 mm/h), good capabilities over vegetated land (for RR > 1 mm/h), while for coastal areas the results are less certain. Comparisons with NASA GPM products, and with ground-based radar data, show that CDRD for AMSR2 is able to depict very well the areas of high precipitation over all surface types. Similarly, preliminary results of the application of CDRD for GMI are also shown and discussed, highlighting the advantage of the availability of high frequency channels (> 90 GHz) for precipitation retrieval over land and coastal areas.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Splendore, Alessandra; Fanganiello, Roberto D; Masotti, Cibele; Morganti, Lucas S C; Passos-Bueno, M Rita
2005-05-01
Recently, a novel exon was described in TCOF1 that, although alternatively spliced, is included in the major protein isoform. In addition, most published mutations in this gene do not conform to current mutation nomenclature guidelines. Given these observations, we developed an online database of TCOF1 mutations in which all the reported mutations are renamed according to standard recommendations and in reference to the genomic and novel cDNA reference sequences (www.genoma.ib.usp.br/TCOF1_database). We also report in this work: 1) results of the first screening for large deletions in TCOF1 by Southern blot in patients without mutation detected by direct sequencing; 2) the identification of the first pathogenic mutation in the newly described exon 6A; and 3) statistical analysis of pathogenic mutations and polymorphism distribution throughout the gene.
NASA Astrophysics Data System (ADS)
Cohen, Z.; Breneman, A. W.; Cattell, C. A.; Davis, L.; Grul, P.; Kersten, K.; Wilson, L. B., III
2017-12-01
Determining the role of plasma waves in providing energy dissipation at shock waves is of long-standing interest. Interplanetary (IP) shocks serve as a large database of low Mach number shocks. We examine electric field waveforms captured by the Time Domain Sampler (TDS) on the STEREO spacecraft during the ramps of IP shocks, with emphasis on captures lasting 2.1 seconds. Previous work has used captures of shorter duration (66 and 131 ms on STEREO, and 17 ms on WIND), which allowed for observation of waves with maximum (minimum) frequencies of 125 kHz (15 Hz), 62.5 kHz (8 Hz), and 60 kHz (59 Hz), respectively. The maximum frequencies are comparable to 2-8 times the plasma frequency in the solar wind, enabling observation of Langmuir waves, ion acoustic, and some whistler-mode waves. The 2 second captures resolve lower frequencies ( few Hz), which allows us to analyze packet structure of the whistler-mode waves and some ion acoustic waves. The longer capture time also improves the resolvability of simultaneous wave modes and of waves with frequencies on the order of 10s of Hz. Langmuir waves, however, cannot be identified at this sampling rate, since the plasma frequency is usually higher than 3.9 kHz. IP shocks are identified from multiple databases (Helsinki heliospheric shock database at http://ipshocks.fi, and the STEREO level 3 shock database at ftp://stereoftp.nascom.nasa.gov/pub/ins_data/impact/level3/). Our analysis focuses on TDS captures in shock ramp regions, with ramp durations determined from magnetic field data taken at 8 Hz. Software is used to identify multiple wave modes in any given capture and classify waves as Langmuir, ion acoustic, whistler, lower hybrid, electron cyclotron drift instability, or electrostatic solitary waves. Relevant frequencies are determined from density and magnetic field data collected in situ. Preliminary results suggest that large amplitude (∼ 5 mV/m) ion acoustic waves are most prevalent in the ramp, in agreement with Wilson, et al. Other modes are also observed. Statistical results will be presented and compared with previous studies and theoretical predictions.
Sequencing artifacts in the type A influenza databases and attempts to correct them.
Suarez, David L; Chester, Nikki; Hatfield, Jason
2014-07-01
There are over 276 000 influenza gene sequences in public databases, with the quality of the sequences determined by the contributor. As part of a high school class project, influenza sequences with possible errors were identified in the public databases based on the size of the gene being longer than expected, with the hypothesis that these sequences would have an error. Students contacted sequence submitters alerting them of the possible sequence issue(s) and requested they the suspect sequence(s) be correct as appropriate. Type A influenza viruses were screened, and gene segments longer than the accepted size were identified for further analysis. Attention was placed on sequences with additional nucleotides upstream or downstream of the highly conserved non-coding ends of the viral segments. A total of 1081 sequences were identified that met this criterion. Three types of errors were commonly observed: non-influenza primer sequence wasn't removed from the sequence; PCR product was cloned and plasmid sequence was included in the sequence; and Taq polymerase added an adenine at the end of the PCR product. Internal insertions of nucleotide sequence were also commonly observed, but in many cases it was unclear if the sequence was correct or actually contained an error. A total of 215 sequences, or 22.8% of the suspect sequences, were corrected in the public databases in the first year of the student project. Unfortunately 138 additional sequences with possible errors were added to the databases in the second year. Additional awareness of the need for data integrity of sequences submitted to public databases is needed to fully reap the benefits of these large data sets. © 2014 The Authors. Influenza and Other Respiratory Viruses Published by John Wiley & Sons Ltd.
NASA Technical Reports Server (NTRS)
Barbre, Robert E., Jr.
2012-01-01
This paper presents the process used by the Marshall Space Flight Center Natural Environments Branch (EV44) to quality control (QC) data from the Kennedy Space Center's 50-MHz Doppler Radar Wind Profiler for use in vehicle wind loads and steering commands. The database has been built to mitigate limitations of using the currently archived databases from weather balloons. The DRWP database contains wind measurements from approximately 2.7-18.6 km altitude at roughly five minute intervals for the August 1997 to December 2009 period of record, and the extensive QC process was designed to remove spurious data from various forms of atmospheric and non-atmospheric artifacts. The QC process is largely based on DRWP literature, but two new algorithms have been developed to remove data contaminated by convection and excessive first guess propagations from the Median Filter First Guess Algorithm. In addition to describing the automated and manual QC process in detail, this paper describes the extent of the data retained. Roughly 58% of all possible wind observations exist in the database, with approximately 100 times as many complete profile sets existing relative to the EV44 balloon databases. This increased sample of near-continuous wind profile measurements may help increase launch availability by reducing the uncertainty of wind changes during launch countdown
Rasdaman for Big Spatial Raster Data
NASA Astrophysics Data System (ADS)
Hu, F.; Huang, Q.; Scheele, C. J.; Yang, C. P.; Yu, M.; Liu, K.
2015-12-01
Spatial raster data have grown exponentially over the past decade. Recent advancements on data acquisition technology, such as remote sensing, have allowed us to collect massive observation data of various spatial resolution and domain coverage. The volume, velocity, and variety of such spatial data, along with the computational intensive nature of spatial queries, pose grand challenge to the storage technologies for effective big data management. While high performance computing platforms (e.g., cloud computing) can be used to solve the computing-intensive issues in big data analysis, data has to be managed in a way that is suitable for distributed parallel processing. Recently, rasdaman (raster data manager) has emerged as a scalable and cost-effective database solution to store and retrieve massive multi-dimensional arrays, such as sensor, image, and statistics data. Within this paper, the pros and cons of using rasdaman to manage and query spatial raster data will be examined and compared with other common approaches, including file-based systems, relational databases (e.g., PostgreSQL/PostGIS), and NoSQL databases (e.g., MongoDB and Hive). Earth Observing System (EOS) data collected from NASA's Atmospheric Scientific Data Center (ASDC) will be used and stored in these selected database systems, and a set of spatial and non-spatial queries will be designed to benchmark their performance on retrieving large-scale, multi-dimensional arrays of EOS data. Lessons learnt from using rasdaman will be discussed as well.
Kafkas, Şenay; Kim, Jee-Hyub; Pi, Xingjun; McEntyre, Johanna R
2015-01-01
In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases. Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771. Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.
Auchincloss, Amy H; Moore, Kari A B; Moore, Latetia V; Diez Roux, Ana V
2012-11-01
Access to healthy foods has received increasing attention due to growing prevalence of obesity and diet-related health conditions yet there are major obstacles in characterizing the local food environment. This study developed a method to retrospectively characterize supermarkets for a single historic year, 2005, in 19 counties in 6 states in the USA using a supermarket chain-name list and two business databases. Data preparation, merging, overlaps, added-value amongst various approaches and differences by census tract area-level socio-demographic characteristics are described. Agreement between two food store databases was modest: 63%. Only 55% of the final list of supermarkets were identified by a single business database and selection criteria that included industry classification codes and sales revenue ≥$2 million. The added-value of using a supermarket chain-name list and second business database was identification of an additional 14% and 30% of supermarkets, respectively. These methods are particularly useful to retrospectively characterize access to supermarkets during a historic period and when field observations are not feasible and business databases are used. Copyright © 2012 Elsevier Ltd. All rights reserved.
Design and deployment of a large brain-image database for clinical and nonclinical research
NASA Astrophysics Data System (ADS)
Yang, Guo Liang; Lim, Choie Cheio Tchoyoson; Banukumar, Narayanaswami; Aziz, Aamer; Hui, Francis; Nowinski, Wieslaw L.
2004-04-01
An efficient database is an essential component of organizing diverse information on image metadata and patient information for research in medical imaging. This paper describes the design, development and deployment of a large database system serving as a brain image repository that can be used across different platforms in various medical researches. It forms the infrastructure that links hospitals and institutions together and shares data among them. The database contains patient-, pathology-, image-, research- and management-specific data. The functionalities of the database system include image uploading, storage, indexing, downloading and sharing as well as database querying and management with security and data anonymization concerns well taken care of. The structure of database is multi-tier client-server architecture with Relational Database Management System, Security Layer, Application Layer and User Interface. Image source adapter has been developed to handle most of the popular image formats. The database has a user interface based on web browsers and is easy to handle. We have used Java programming language for its platform independency and vast function libraries. The brain image database can sort data according to clinically relevant information. This can be effectively used in research from the clinicians" points of view. The database is suitable for validation of algorithms on large population of cases. Medical images for processing could be identified and organized based on information in image metadata. Clinical research in various pathologies can thus be performed with greater efficiency and large image repositories can be managed more effectively. The prototype of the system has been installed in a few hospitals and is working to the satisfaction of the clinicians.
NASA Astrophysics Data System (ADS)
Sefton-Nash, E.; Williams, J.-P.; Greenhagen, B. T.; Aye, K.-M.; Paige, D. A.
2017-12-01
An approach is presented to efficiently produce high quality gridded data records from the large, global point-based dataset returned by the Diviner Lunar Radiometer Experiment aboard NASA's Lunar Reconnaissance Orbiter. The need to minimize data volume and processing time in production of science-ready map products is increasingly important with the growth in data volume of planetary datasets. Diviner makes on average >1400 observations per second of radiance that is reflected and emitted from the lunar surface, using 189 detectors divided into 9 spectral channels. Data management and processing bottlenecks are amplified by modeling every observation as a probability distribution function over the field of view, which can increase the required processing time by 2-3 orders of magnitude. Geometric corrections, such as projection of data points onto a digital elevation model, are numerically intensive and therefore it is desirable to perform them only once. Our approach reduces bottlenecks through parallel binning and efficient storage of a pre-processed database of observations. Database construction is via subdivision of a geodesic icosahedral grid, with a spatial resolution that can be tailored to suit the field of view of the observing instrument. Global geodesic grids with high spatial resolution are normally impractically memory intensive. We therefore demonstrate a minimum storage and highly parallel method to bin very large numbers of data points onto such a grid. A database of the pre-processed and binned points is then used for production of mapped data products that is significantly faster than if unprocessed points were used. We explore quality controls in the production of gridded data records by conditional interpolation, allowed only where data density is sufficient. The resultant effects on the spatial continuity and uncertainty in maps of lunar brightness temperatures is illustrated. We identify four binning regimes based on trades between the spatial resolution of the grid, the size of the FOV and the on-target spacing of observations. Our approach may be applicable and beneficial for many existing and future point-based planetary datasets.
Burton, Tanya; Le Nestour, Elisabeth; Neary, Maureen; Ludlam, William H
2016-04-01
This study aimed to develop an algorithm to identify patients with CD, and quantify the clinical and economic burden that patients with CD face compared to CD-free controls. A retrospective cohort study of CD patients was conducted in a large US commercial health plan database between 1/1/2007 and 12/31/2011. A control group with no evidence of CD during the same time was matched 1:3 based on demographics. Comorbidity rates were compared using Poisson and health care costs were compared using robust variance estimation. A case-finding algorithm identified 877 CD patients, who were matched to 2631 CD-free controls. The age and sex distribution of the selected population matched the known epidemiology of CD. CD patients were found to have comorbidity rates that were two to five times higher and health care costs that were four to seven times higher than CD-free controls. An algorithm based on eight pituitary conditions and procedures appeared to identify CD patients in a claims database without a unique diagnosis code. Young CD patients had high rates of comorbidities that are more commonly observed in an older population (e.g., diabetes, hypertension, and cardiovascular disease). Observed health care costs were also high for CD patients compared to CD-free controls, but may have been even higher if the sample had included healthier controls with no health care use as well. Earlier diagnosis, improved surgery success rates, and better treatments may all help to reduce the chronic comorbidity and high health care costs associated with CD.
Source attribution using FLEXPART and carbon monoxide emission inventories: SOFT-IO version 1.0
NASA Astrophysics Data System (ADS)
Sauvage, Bastien; Fontaine, Alain; Eckhardt, Sabine; Auby, Antoine; Boulanger, Damien; Petetin, Hervé; Paugam, Ronan; Athier, Gilles; Cousin, Jean-Marc; Darras, Sabine; Nédélec, Philippe; Stohl, Andreas; Turquety, Solène; Cammas, Jean-Pierre; Thouret, Valérie
2017-12-01
Since 1994, the In-service Aircraft for a Global Observing System (IAGOS) program has produced in situ measurements of the atmospheric composition during more than 51 000 commercial flights. In order to help analyze these observations and understand the processes driving the observed concentration distribution and variability, we developed the SOFT-IO tool to quantify source-receptor links for all measured data. Based on the FLEXPART particle dispersion model (Stohl et al., 2005), SOFT-IO simulates the contributions of anthropogenic and biomass burning emissions from the ECCAD emission inventory database for all locations and times corresponding to the measured carbon monoxide mixing ratios along each IAGOS flight. Contributions are simulated from emissions occurring during the last 20 days before an observation, separating individual contributions from the different source regions. The main goal is to supply added-value products to the IAGOS database by evincing the geographical origin and emission sources driving the CO enhancements observed in the troposphere and lower stratosphere. This requires a good match between observed and modeled CO enhancements. Indeed, SOFT-IO detects more than 95 % of the observed CO anomalies over most of the regions sampled by IAGOS in the troposphere. In the majority of cases, SOFT-IO simulates CO pollution plumes with biases lower than 10-15 ppbv. Differences between the model and observations are larger for very low or very high observed CO values. The added-value products will help in the understanding of the trace-gas distribution and seasonal variability. They are available in the IAGOS database via http://www.iagos.org. The SOFT-IO tool could also be applied to similar data sets of CO observations (e.g., ground-based measurements, satellite observations). SOFT-IO could also be used for statistical validation as well as for intercomparisons of emission inventories using large amounts of data.
A Data Analysis Expert System For Large Established Distributed Databases
NASA Astrophysics Data System (ADS)
Gnacek, Anne-Marie; An, Y. Kim; Ryan, J. Patrick
1987-05-01
The purpose of this work is to analyze the applicability of artificial intelligence techniques for developing a user-friendly, parallel interface to large isolated, incompatible NASA databases for the purpose of assisting the management decision process. To carry out this work, a survey was conducted to establish the data access requirements of several key NASA user groups. In addition, current NASA database access methods were evaluated. The results of this work are presented in the form of a design for a natural language database interface system, called the Deductively Augmented NASA Management Decision Support System (DANMDS). This design is feasible principally because of recently announced commercial hardware and software product developments which allow cross-vendor compatibility. The goal of the DANMDS system is commensurate with the central dilemma confronting most large companies and institutions in America, the retrieval of information from large, established, incompatible database systems. The DANMDS system implementation would represent a significant first step toward this problem's resolution.
NASA Astrophysics Data System (ADS)
Baru, Chaitan; Nandigam, Viswanath; Krishnan, Sriram
2010-05-01
Increasingly, the geoscience user community expects modern IT capabilities to be available in service of their research and education activities, including the ability to easily access and process large remote sensing datasets via online portals such as GEON (www.geongrid.org) and OpenTopography (opentopography.org). However, serving such datasets via online data portals presents a number of challenges. In this talk, we will evaluate the pros and cons of alternative storage strategies for management and processing of such datasets using binary large object implementations (BLOBs) in database systems versus implementation in Hadoop files using the Hadoop Distributed File System (HDFS). The storage and I/O requirements for providing online access to large datasets dictate the need for declustering data across multiple disks, for capacity as well as bandwidth and response time performance. This requires partitioning larger files into a set of smaller files, and is accompanied by the concomitant requirement for managing large numbers of file. Storing these sub-files as blobs in a shared-nothing database implemented across a cluster provides the advantage that all the distributed storage management is done by the DBMS. Furthermore, subsetting and processing routines can be implemented as user-defined functions (UDFs) on these blobs and would run in parallel across the set of nodes in the cluster. On the other hand, there are both storage overheads and constraints, and software licensing dependencies created by such an implementation. Another approach is to store the files in an external filesystem with pointers to them from within database tables. The filesystem may be a regular UNIX filesystem, a parallel filesystem, or HDFS. In the HDFS case, HDFS would provide the file management capability, while the subsetting and processing routines would be implemented as Hadoop programs using the MapReduce model. Hadoop and its related software libraries are freely available. Another consideration is the strategy used for partitioning large data collections, and large datasets within collections, using round-robin vs hash partitioning vs range partitioning methods. Each has different characteristics in terms of spatial locality of data and resultant degree of declustering of the computations on the data. Furthermore, we have observed that, in practice, there can be large variations in the frequency of access to different parts of a large data collection and/or dataset, thereby creating "hotspots" in the data. We will evaluate the ability of different approaches for dealing effectively with such hotspots and alternative strategies for dealing with hotspots.
NASA Astrophysics Data System (ADS)
Knapp, Wilfried
2018-01-01
Visual observation of double stars is an anachronistic passion especially attractive for amateurs looking for sky objects suitable for visual observation even in light polluted areas. Session planning then requires a basic idea which objects might be suitable for a given equipmentâthis question is a long term issue for visual double star observers and obviously not easy to answer, especially for unequal bright components. Based on a reasonably large database with limited aperture observations (done with variable aperture equipment iris diaphragm or aperture masks) a heuristic approach is used to derive a statistically well founded Rule of Thumb formula.
Comparative effectiveness research in cancer with observational data.
Giordano, Sharon H
2015-01-01
Observational studies are increasingly being used for comparative effectiveness research. These studies can have the greatest impact when randomized trials are not feasible or when randomized studies have not included the population or outcomes of interest. However, careful attention must be paid to study design to minimize the likelihood of selection biases. Analytic techniques, such as multivariable regression modeling, propensity score analysis, and instrumental variable analysis, also can also be used to help address confounding. Oncology has many existing large and clinically rich observational databases that can be used for comparative effectiveness research. With careful study design, observational studies can produce valid results to assess the benefits and harms of a treatment or intervention in representative real-world populations.
Implementation of the CUAHSI information system for regional hydrological research and workflow
NASA Astrophysics Data System (ADS)
Bugaets, Andrey; Gartsman, Boris; Bugaets, Nadezhda; Krasnopeyev, Sergey; Krasnopeyeva, Tatyana; Sokolov, Oleg; Gonchukov, Leonid
2013-04-01
Environmental research and education have become increasingly data-intensive as a result of the proliferation of digital technologies, instrumentation, and pervasive networks through which data are collected, generated, shared, and analyzed. Over the next decade, it is likely that science and engineering research will produce more scientific data than has been created over the whole of human history (Cox et al., 2006). Successful using these data to achieve new scientific breakthroughs depends on the ability to access, organize, integrate, and analyze these large datasets. The new project of PGI FEB RAS (http://tig.dvo.ru), FERHRI (www.ferhri.org) and Primgidromet (www.primgidromet.ru) is focused on creation of an open unified hydrological information system according to the international standards to support hydrological investigation, water management and forecasts systems. Within the hydrologic science community, the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (http://his.cuahsi.org) has been developing a distributed network of data sources and functions that are integrated using web services and that provide access to data, tools, and models that enable synthesis, visualization, and evaluation of hydrologic system behavior. Based on the top of CUAHSI technologies two first template databases were developed for primary datasets of special observations on experimental basins in the Far East Region of Russia. The first database contains data of special observation performed on the former (1957-1994) Primorskaya Water-Balance Station (1500 km2). Measurements were carried out on 20 hydrological and 40 rain gauging station and were published as special series but only as hardcopy books. Database provides raw data from loggers with hourly and daily time support. The second database called «FarEastHydro» provides published standard daily measurement performed at Roshydromet observation network (200 hydrological and meteorological stations) for the period beginning 1930 through 1990. Both of the data resources are maintained in a test mode at the project site http://gis.dvo.ru:81/, which is permanently updated. After first success, the decision was made to use the CUAHSI technology as a basis for development of hydrological information system to support data publishing and workflow of Primgidromet, the regional office of Federal State Hydrometeorological Agency. At the moment, Primgidromet observation network is equipped with 34 automatic SEBA hydrological pressure sensor pneumatic gauges PS-Light-2 and 36 automatic SEBA weather stations. Large datasets generated by sensor networks are organized and stored within a central ODM database which allows to unambiguously interpret the data with sufficient metadata and provides traceable heritage from raw measurements to useable information. Organization of the data within a central CUAHSI ODM database was the most critical step, with several important implications. This technology is widespread and well documented, and it ensures that all datasets are publicly available and readily used by other investigators and developers to support additional analyses and hydrological modeling. Implementation of ODM within a Relational Database Management System eliminates the potential data manipulation errors and intermediate the data processing steps. Wrapping CUAHSI WaterOneFlow web-service into OpenMI 2.0 linkable component (www.openmi.org) allows a seamless integration with well-known hydrological modeling systems.
Use of Patient Registries and Administrative Datasets for the Study of Pediatric Cancer
Rice, Henry E.; Englum, Brian R.; Gulack, Brian C.; Adibe, Obinna O.; Tracy, Elizabeth T.; Kreissman, Susan G.; Routh, Jonathan C.
2015-01-01
Analysis of data from large administrative databases and patient registries is increasingly being used to study childhood cancer care, although the value of these data sources remains unclear to many clinicians. Interpretation of large databases requires a thorough understanding of how the dataset was designed, how data were collected, and how to assess data quality. This review will detail the role of administrative databases and registry databases for the study of childhood cancer, tools to maximize information from these datasets, and recommendations to improve the use of these databases for the study of pediatric oncology. PMID:25807938
Creating databases for biological information: an introduction.
Stein, Lincoln
2013-06-01
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, relational databases, and NoSQL databases. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system. Copyright 2013 by JohnWiley & Sons, Inc.
Observations of HF backscatter decay rates from HAARP generated FAI
NASA Astrophysics Data System (ADS)
Bristow, William; Hysell, David
2016-07-01
Suitable experiments at the High-frequency Active Auroral Research Program (HAARP) facilities in Gakona, Alaska, create a region of ionospheric Field-Aligned Irregularities (FAI) that produces strong radar backscatter observed by the SuperDARN radar on Kodiak Island, Alaska. Creation of FAI in HF ionospheric modification experiments has been studied by a number of authors who have developed a rich theoretical background. The decay of the irregularities, however, has not been so widely studied yet it has the potential for providing estimates of the parameters of natural irregularity diffusion, which are difficult measure by other means. Hysell, et al. [1996] demonstrated using the decay of radar scatter above the Sura heating facility to estimate irregularity diffusion. A large database of radar backscatter from HAARP generated FAI has been collected over the years. Experiments often cycled the heater power on and off in a way that allowed estimates of the FAI decay rate. The database has been examined to extract decay time estimates and diffusion rates over a range of ionospheric conditions. This presentation will summarize the database and the estimated diffusion rates, and will discuss the potential for targeted experiments for aeronomy measurements. Hysell, D. L., M. C. Kelley, Y. M. Yampolski, V. S. Beley, A. V. Koloskov, P. V. Ponomarenko, and O. F. Tyrnov, HF radar observations of decaying artificial field aligned irregularities, J. Geophys. Res. , 101, 26,981, 1996.
Observations of HF backscatter decay rates from HAARP generated FAI
NASA Astrophysics Data System (ADS)
Bristow, W. A.; Hysell, D. L.
2016-12-01
Suitable experiments at the High-frequency Active Auroral Research Program (HAARP) facilities in Gakona, Alaska, create a region of ionospheric Field-Aligned Irregularities (FAI) that produces strong radar backscatter observed by the SuperDARN radar on Kodiak Island, Alaska. Creation of FAI in HF ionospheric modification experiments has been studied by a number of authors who have developed a rich theoretical background. The decay of the irregularities, however, has not been so widely studied yet it has the potential for providing estimates of the parameters of natural irregularity diffusion, which are difficult measure by other means. Hysell, et al. [1996] demonstrated using the decay of radar scatter above the Sura heating facility to estimate irregularity diffusion. A large database of radar backscatter from HAARP generated FAI has been collected over the years. Experiments often cycled the heater power on and off in a way that allowed estimates of the FAI decay rate. The database has been examined to extract decay time estimates and diffusion rates over a range of ionospheric conditions. This presentation will summarize the database and the estimated diffusion rates, and will discuss the potential for targeted experiments for aeronomy measurements. Hysell, D. L., M. C. Kelley, Y. M. Yampolski, V. S. Beley, A. V. Koloskov, P. V. Ponomarenko, and O. F. Tyrnov, HF radar observations of decaying artificial field aligned irregularities, J. Geophys. Res. , 101, 26,981, 1996.
NASA Astrophysics Data System (ADS)
Sun, Xiujun; Li, Dongming; Liu, Zhihong; Zhou, Liqing; Wu, Biao; Yang, Aiguo
2017-10-01
The pen shell ( Atrina pectinata) is a large wedge-shaped bivalve, which belongs to family Pinnidae. Due to its large and nutritious adductor muscle, it is the popular seafood with high commercial value in Asia-Pacific countries. However, limiting genomic and transcriptomic data have hampered its genetic investigations. In this study, the transcriptome of A. pectinata was deeply sequenced using Illumina pair-end sequencing technology. After assembling, a total of 127263 unigenes were obtained. Functional annotation indicated that the highest percentage of unigenes (18.60%) was annotated on GO database, followed by 18.44% on PFAM database and 17.04% on NR database. There were 270 biological pathways matched with those in KEGG database. Furthermore, a total of 23452 potential simple sequence repeats (SSRs) were identified, of them the most abundant type was mono-nucleotide repeats (12902, 55.01%), which was followed by di-nucleotide (8132, 34.68%), tri-nucleotide (2010, 8.57%), tetra-nucleotide (401, 1.71%), and penta-nucleotide (7, 0.03%) repeats. Sixty SSRs were selected for validating and developing genic SSR markers, of them 23 showed polymorphism in a cultured population with the average observed and expected heterozygosities of 0.412 and 0.579, respectively. In this study, we established the first comprehensive transcript dataset of A. pectinata genes. Our results demonstrated that RNA-Seq is a fast and cost-effective method for genic SSR development in non-model species.
Lottig, Noah R.; Wagner, Tyler; Henry, Emily N.; Cheruvelil, Kendra Spence; Webster, Katherine E.; Downing, John A.; Stow, Craig A.
2014-01-01
We compiled a lake-water clarity database using publically available, citizen volunteer observations made between 1938 and 2012 across eight states in the Upper Midwest, USA. Our objectives were to determine (1) whether temporal trends in lake-water clarity existed across this large geographic area and (2) whether trends were related to the lake-specific characteristics of latitude, lake size, or time period the lake was monitored. Our database consisted of >140,000 individual Secchi observations from 3,251 lakes that we summarized per lake-year, resulting in 21,020 summer averages. Using Bayesian hierarchical modeling, we found approximately a 1% per year increase in water clarity (quantified as Secchi depth) for the entire population of lakes. On an individual lake basis, 7% of lakes showed increased water clarity and 4% showed decreased clarity. Trend direction and strength were related to latitude and median sample date. Lakes in the southern part of our study-region had lower average annual summer water clarity, more negative long-term trends, and greater inter-annual variability in water clarity compared to northern lakes. Increasing trends were strongest for lakes with median sample dates earlier in the period of record (1938–2012). Our ability to identify specific mechanisms for these trends is currently hampered by the lack of a large, multi-thematic database of variables that drive water clarity (e.g., climate, land use/cover). Our results demonstrate, however, that citizen science can provide the critical monitoring data needed to address environmental questions at large spatial and long temporal scales. Collaborations among citizens, research scientists, and government agencies may be important for developing the data sources and analytical tools necessary to move toward an understanding of the factors influencing macro-scale patterns such as those shown here for lake water clarity.
Lottig, Noah R.; Wagner, Tyler; Norton Henry, Emily; Spence Cheruvelil, Kendra; Webster, Katherine E.; Downing, John A.; Stow, Craig A.
2014-01-01
We compiled a lake-water clarity database using publically available, citizen volunteer observations made between 1938 and 2012 across eight states in the Upper Midwest, USA. Our objectives were to determine (1) whether temporal trends in lake-water clarity existed across this large geographic area and (2) whether trends were related to the lake-specific characteristics of latitude, lake size, or time period the lake was monitored. Our database consisted of >140,000 individual Secchi observations from 3,251 lakes that we summarized per lake-year, resulting in 21,020 summer averages. Using Bayesian hierarchical modeling, we found approximately a 1% per year increase in water clarity (quantified as Secchi depth) for the entire population of lakes. On an individual lake basis, 7% of lakes showed increased water clarity and 4% showed decreased clarity. Trend direction and strength were related to latitude and median sample date. Lakes in the southern part of our study-region had lower average annual summer water clarity, more negative long-term trends, and greater inter-annual variability in water clarity compared to northern lakes. Increasing trends were strongest for lakes with median sample dates earlier in the period of record (1938–2012). Our ability to identify specific mechanisms for these trends is currently hampered by the lack of a large, multi-thematic database of variables that drive water clarity (e.g., climate, land use/cover). Our results demonstrate, however, that citizen science can provide the critical monitoring data needed to address environmental questions at large spatial and long temporal scales. Collaborations among citizens, research scientists, and government agencies may be important for developing the data sources and analytical tools necessary to move toward an understanding of the factors influencing macro-scale patterns such as those shown here for lake water clarity. PMID:24788722
Very Large Data Volumes Analysis of Collaborative Systems with Finite Number of States
ERIC Educational Resources Information Center
Ivan, Ion; Ciurea, Cristian; Pavel, Sorin
2010-01-01
The collaborative system with finite number of states is defined. A very large database is structured. Operations on large databases are identified. Repetitive procedures for collaborative systems operations are derived. The efficiency of such procedures is analyzed. (Contains 6 tables, 5 footnotes and 3 figures.)
NASA Astrophysics Data System (ADS)
O'Keeffe, Brendon; Johnson, Michael; Murphy Williams, Rosa Nina
2018-06-01
The WestRock observatory at Columbus State University provides laboratory and research opportunities to earth and space science students specializing in astrophysics and planetary geology. Through continuing improvements, the observatory has been expanding the types of research carried out by undergraduates. Photometric measurements are an essential tool for observational research, especially for objects of variable brightness.Using the American Association of Variable Star Observers (AAVSO) database, students choose variable star targets for observation. Students then perform observations to develop the ability to properly record, calibrate, and interpret the data. Results are then submitted to a large database of observations through the AAVSO.Standardized observation procedures will be developed in the form of manuals and instructional videos specific to the equipment housed in the WestRock Observatory. This procedure will be used by students conducting laboratory exercises and undergraduate research projects that utilize photometry. Such hands-on, direct observational experience will help to familiarize the students with observational techniques and contribute to an active dataset, which in turn will prepare them for future research in their field.In addition, this set of procedures and the data resulting from them will be used in the wider outreach programs of the WestRock Observatory, so that students and interested public nationwide can learn about both the process and importance of photometry in astronomical research.
Leaf optical properties shed light on foliar trait variability at individual to global scales
NASA Astrophysics Data System (ADS)
Shiklomanov, A. N.; Serbin, S.; Dietze, M.
2016-12-01
Recent syntheses of large trait databases have contributed immensely to our understanding of drivers of plant function at the global scale. However, the global trade-offs revealed by such syntheses, such as the trade-off between leaf productivity and resilience (i.e. "leaf economics spectrum"), are often absent at smaller scales and fail to correlate with actual functional limitations. An improved understanding of how traits vary within communities, species, and individuals is critical to accurate representations of vegetation ecophysiology and ecological dynamics in ecosystem models. Spectral data from both field observations and remote sensing platforms present a potentially rich and widely available source of information on plant traits. In particular, the inversion of physically-based radiative transfer models (RTMs) is an effective and general method for estimating plant traits from spectral measurements. Here, we apply Bayesian inversion of the PROSPECT leaf RTM to a large database of field spectra and plant traits spanning tropical, temperate, and boreal forests, agricultural plots, arid shrublands, and tundra to identify dominant sources of variability and characterize trade-offs in plant functional traits. By leveraging such a large and diverse dataset, we re-calibrate the empirical absorption coefficients underlying the PROSPECT model and expand its scope to include additional leaf biochemical components, namely leaf nitrogen content. Our work provides a key methodological contribution as a physically-based retrieval of leaf nitrogen from remote sensing observations, and provides substantial insights about trait trade-offs related to plant acclimation, adaptation, and community assembly.
Teaching Case: Adapting the Access Northwind Database to Support a Database Course
ERIC Educational Resources Information Center
Dyer, John N.; Rogers, Camille
2015-01-01
A common problem encountered when teaching database courses is that few large illustrative databases exist to support teaching and learning. Most database textbooks have small "toy" databases that are chapter objective specific, and thus do not support application over the complete domain of design, implementation and management concepts…
Large-Scale 1:1 Computing Initiatives: An Open Access Database
ERIC Educational Resources Information Center
Richardson, Jayson W.; McLeod, Scott; Flora, Kevin; Sauers, Nick J.; Kannan, Sathiamoorthy; Sincar, Mehmet
2013-01-01
This article details the spread and scope of large-scale 1:1 computing initiatives around the world. What follows is a review of the existing literature around 1:1 programs followed by a description of the large-scale 1:1 database. Main findings include: 1) the XO and the Classmate PC dominate large-scale 1:1 initiatives; 2) if professional…
Exploring Large-Scale Cross-Correlation for Teleseismic and Regional Seismic Event Characterization
NASA Astrophysics Data System (ADS)
Dodge, Doug; Walter, William; Myers, Steve; Ford, Sean; Harris, Dave; Ruppert, Stan; Buttler, Dave; Hauk, Terri
2013-04-01
The decrease in costs of both digital storage space and computation power invites new methods of seismic data processing. At Lawrence Livermore National Laboratory(LLNL) we operate a growing research database of seismic events and waveforms for nuclear explosion monitoring and other applications. Currently the LLNL database contains several million events associated with tens of millions of waveforms at thousands of stations. We are making use of this database to explore the power of seismic waveform correlation to quantify signal similarities, to discover new events not in catalogs, and to more accurately locate events and identify source types. Building on the very efficient correlation methodologies of Harris and Dodge (2011) we computed the waveform correlation for event pairs in the LLNL database in two ways. First we performed entire waveform cross-correlation over seven distinct frequency bands. The correlation coefficient exceeds 0.6 for more than 40 million waveform pairs for several hundred thousand events at more than a thousand stations. These correlations reveal clusters of mining events and aftershock sequences, which can be used to readily identify and locate events. Second we determine relative pick times by correlating signals in time windows for distinct seismic phases. These correlated picks are then used to perform very high accuracy event relocations. We are examining the percentage of events that correlate as a function of magnitude and observing station distance in selected high seismicity regions. Combining these empirical results and those using synthetic data, we are working to quantify relationships between correlation and event pair separation (in epicenter and depth) as well as mechanism differences. Our exploration of these techniques on a large seismic database is in process and we will report on our findings in more detail at the meeting.
Exploring Large-Scale Cross-Correlation for Teleseismic and Regional Seismic Event Characterization
NASA Astrophysics Data System (ADS)
Dodge, D.; Walter, W. R.; Myers, S. C.; Ford, S. R.; Harris, D.; Ruppert, S.; Buttler, D.; Hauk, T. F.
2012-12-01
The decrease in costs of both digital storage space and computation power invites new methods of seismic data processing. At Lawrence Livermore National Laboratory (LLNL) we operate a growing research database of seismic events and waveforms for nuclear explosion monitoring and other applications. Currently the LLNL database contains several million events associated with tens of millions of waveforms at thousands of stations. We are making use of this database to explore the power of seismic waveform correlation to quantify signal similarities, to discover new events not in catalogs, and to more accurately locate events and identify source types. Building on the very efficient correlation methodologies of Harris and Dodge (2011) we computed the waveform correlation for event pairs in the LLNL database in two ways. First we performed entire waveform cross-correlation over seven distinct frequency bands. The correlation coefficient exceeds 0.6 for more than 40 million waveform pairs for several hundred thousand events at more than a thousand stations. These correlations reveal clusters of mining events and aftershock sequences, which can be used to readily identify and locate events. Second we determine relative pick times by correlating signals in time windows for distinct seismic phases. These correlated picks are then used to perform very high accuracy event relocations. We are examining the percentage of events that correlate as a function of magnitude and observing station distance in selected high seismicity regions. Combining these empirical results and those using synthetic data, we are working to quantify relationships between correlation and event pair separation (in epicenter and depth) as well as mechanism differences. Our exploration of these techniques on a large seismic database is in process and we will report on our findings in more detail at the meeting.
George, Jaiben; Newman, Jared M; Ramanathan, Deepak; Klika, Alison K; Higuera, Carlos A; Barsoum, Wael K
2017-09-01
Research using large administrative databases has substantially increased in recent years. Accuracy with which comorbidities are represented in these databases has been questioned. The purpose of this study was to evaluate the extent of errors in obesity coding and its impact on arthroplasty research. Eighteen thousand thirty primary total knee arthroplasties (TKAs) and 10,475 total hip arthroplasties (THAs) performed at a single healthcare system from 2004-2014 were included. Patients were classified as obese or nonobese using 2 methods: (1) body mass index (BMI) ≥30 kg/m 2 and (2) international classification of disease, 9th edition codes. Length of stay, operative time, and 90-day complications were collected. Effect of obesity on various outcomes was analyzed separately for both BMI- and coding-based obesity. From 2004 to 2014, the prevalence of BMI-based obesity increased from 54% to 63% and 40% to 45% in TKA and THA, respectively. The prevalence of coding-based obesity increased from 15% to 28% and 8% to 17% in TKA and THA, respectively. Coding overestimated the growth of obesity in TKA and THA by 5.6 and 8.4 times, respectively. When obesity was defined by coding, obesity was falsely shown to be a significant risk factor for deep vein thrombosis (TKA), pulmonary embolism (THA), and longer hospital stay (TKA and THA). The growth in obesity observed in administrative databases may be an artifact because of improvements in coding over the years. Obesity defined by coding can overestimate the actual effect of obesity on complications after arthroplasty. Therefore, studies using large databases should be interpreted with caution, especially when variables prone to coding errors are involved. Copyright © 2017 Elsevier Inc. All rights reserved.
A Web-based Tool for SDSS and 2MASS Database Searches
NASA Astrophysics Data System (ADS)
Hendrickson, M. A.; Uomoto, A.; Golimowski, D. A.
We have developed a web site using HTML, Php, Python, and MySQL that extracts, processes, and displays data from the Sloan Digital Sky Survey (SDSS) and the Two-Micron All-Sky Survey (2MASS). The goal is to locate brown dwarf candidates in the SDSS database by looking at color cuts; however, this site could also be useful for targeted searches of other databases as well. MySQL databases are created from broad searches of SDSS and 2MASS data. Broad queries on the SDSS and 2MASS database servers are run weekly so that observers have the most up-to-date information from which to select candidates for observation. Observers can look at detailed information about specific objects including finding charts, images, and available spectra. In addition, updates from previous observations can be added by any collaborators; this format makes observational collaboration simple. Observers can also restrict the database search, just before or during an observing run, to select objects of special interest.
Nelson, Jennifer Clark; Marsh, Tracey; Lumley, Thomas; Larson, Eric B; Jackson, Lisa A; Jackson, Michael L
2013-08-01
Estimates of treatment effectiveness in epidemiologic studies using large observational health care databases may be biased owing to inaccurate or incomplete information on important confounders. Study methods that collect and incorporate more comprehensive confounder data on a validation cohort may reduce confounding bias. We applied two such methods, namely imputation and reweighting, to Group Health administrative data (full sample) supplemented by more detailed confounder data from the Adult Changes in Thought study (validation sample). We used influenza vaccination effectiveness (with an unexposed comparator group) as an example and evaluated each method's ability to reduce bias using the control time period before influenza circulation. Both methods reduced, but did not completely eliminate, the bias compared with traditional effectiveness estimates that do not use the validation sample confounders. Although these results support the use of validation sampling methods to improve the accuracy of comparative effectiveness findings from health care database studies, they also illustrate that the success of such methods depends on many factors, including the ability to measure important confounders in a representative and large enough validation sample, the comparability of the full sample and validation sample, and the accuracy with which the data can be imputed or reweighted using the additional validation sample information. Copyright © 2013 Elsevier Inc. All rights reserved.
Nelson, Jennifer C.; Marsh, Tracey; Lumley, Thomas; Larson, Eric B.; Jackson, Lisa A.; Jackson, Michael
2014-01-01
Objective Estimates of treatment effectiveness in epidemiologic studies using large observational health care databases may be biased due to inaccurate or incomplete information on important confounders. Study methods that collect and incorporate more comprehensive confounder data on a validation cohort may reduce confounding bias. Study Design and Setting We applied two such methods, imputation and reweighting, to Group Health administrative data (full sample) supplemented by more detailed confounder data from the Adult Changes in Thought study (validation sample). We used influenza vaccination effectiveness (with an unexposed comparator group) as an example and evaluated each method’s ability to reduce bias using the control time period prior to influenza circulation. Results Both methods reduced, but did not completely eliminate, the bias compared with traditional effectiveness estimates that do not utilize the validation sample confounders. Conclusion Although these results support the use of validation sampling methods to improve the accuracy of comparative effectiveness findings from healthcare database studies, they also illustrate that the success of such methods depends on many factors, including the ability to measure important confounders in a representative and large enough validation sample, the comparability of the full sample and validation sample, and the accuracy with which data can be imputed or reweighted using the additional validation sample information. PMID:23849144
Design and implementation of a distributed large-scale spatial database system based on J2EE
NASA Astrophysics Data System (ADS)
Gong, Jianya; Chen, Nengcheng; Zhu, Xinyan; Zhang, Xia
2003-03-01
With the increasing maturity of distributed object technology, CORBA, .NET and EJB are universally used in traditional IT field. However, theories and practices of distributed spatial database need farther improvement in virtue of contradictions between large scale spatial data and limited network bandwidth or between transitory session and long transaction processing. Differences and trends among of CORBA, .NET and EJB are discussed in details, afterwards the concept, architecture and characteristic of distributed large-scale seamless spatial database system based on J2EE is provided, which contains GIS client application, web server, GIS application server and spatial data server. Moreover the design and implementation of components of GIS client application based on JavaBeans, the GIS engine based on servlet, the GIS Application server based on GIS enterprise JavaBeans(contains session bean and entity bean) are explained.Besides, the experiments of relation of spatial data and response time under different conditions are conducted, which proves that distributed spatial database system based on J2EE can be used to manage, distribute and share large scale spatial data on Internet. Lastly, a distributed large-scale seamless image database based on Internet is presented.
TabSQL: a MySQL tool to facilitate mapping user data to public databases.
Xia, Xiao-Qin; McClelland, Michael; Wang, Yipeng
2010-06-23
With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data.
TabSQL: a MySQL tool to facilitate mapping user data to public databases
2010-01-01
Background With advances in high-throughput genomics and proteomics, it is challenging for biologists to deal with large data files and to map their data to annotations in public databases. Results We developed TabSQL, a MySQL-based application tool, for viewing, filtering and querying data files with large numbers of rows. TabSQL provides functions for downloading and installing table files from public databases including the Gene Ontology database (GO), the Ensembl databases, and genome databases from the UCSC genome bioinformatics site. Any other database that provides tab-delimited flat files can also be imported. The downloaded gene annotation tables can be queried together with users' data in TabSQL using either a graphic interface or command line. Conclusions TabSQL allows queries across the user's data and public databases without programming. It is a convenient tool for biologists to annotate and enrich their data. PMID:20573251
Creating databases for biological information: an introduction.
Stein, Lincoln
2002-08-01
The essence of bioinformatics is dealing with large quantities of information. Whether it be sequencing data, microarray data files, mass spectrometric data (e.g., fingerprints), the catalog of strains arising from an insertional mutagenesis project, or even large numbers of PDF files, there inevitably comes a time when the information can simply no longer be managed with files and directories. This is where databases come into play. This unit briefly reviews the characteristics of several database management systems, including flat file, indexed file, and relational databases, as well as ACeDB. It compares their strengths and weaknesses and offers some general guidelines for selecting an appropriate database management system.
NASA Technical Reports Server (NTRS)
Touch, Joseph D.
1994-01-01
Future NASA earth science missions, including the Earth Observing System (EOS), will be generating vast amounts of data that must be processed and stored at various locations around the world. Here we present a stepwise-refinement of the intelligent database management (IDM) of the distributed active archive center (DAAC - one of seven regionally-located EOSDIS archive sites) architecture, to showcase the telecommunications issues involved. We develop this architecture into a general overall design. We show that the current evolution of protocols is sufficient to support IDM at Gbps rates over large distances. We also show that network design can accommodate a flexible data ingestion storage pipeline and a user extraction and visualization engine, without interference between the two.
QuakeML - An XML Schema for Seismology
NASA Astrophysics Data System (ADS)
Wyss, A.; Schorlemmer, D.; Maraini, S.; Baer, M.; Wiemer, S.
2004-12-01
We propose an extensible format-definition for seismic data (QuakeML). Sharing data and seismic information efficiently is one of the most important issues for research and observational seismology in the future. The eXtensible Markup Language (XML) is playing an increasingly important role in the exchange of a variety of data. Due to its extensible definition capabilities, its wide acceptance and the existing large number of utilities and libraries for XML, a structured representation of various types of seismological data should in our opinion be developed by defining a 'QuakeML' standard. Here we present the QuakeML definitions for parameter databases and further efforts, e.g. a central QuakeML catalog database and a web portal for exchanging codes and stylesheets.
[Privacy and public benefit in using large scale health databases].
Yamamoto, Ryuichi
2014-01-01
In Japan, large scale heath databases were constructed in a few years, such as National Claim insurance and health checkup database (NDB) and Japanese Sentinel project. But there are some legal issues for making adequate balance between privacy and public benefit by using such databases. NDB is carried based on the act for elderly person's health care but in this act, nothing is mentioned for using this database for general public benefit. Therefore researchers who use this database are forced to pay much concern about anonymization and information security that may disturb the research work itself. Japanese Sentinel project is a national project to detecting drug adverse reaction using large scale distributed clinical databases of large hospitals. Although patients give the future consent for general such purpose for public good, it is still under discussion using insufficiently anonymized data. Generally speaking, researchers of study for public benefit will not infringe patient's privacy, but vague and complex requirements of legislation about personal data protection may disturb the researches. Medical science does not progress without using clinical information, therefore the adequate legislation that is simple and clear for both researchers and patients is strongly required. In Japan, the specific act for balancing privacy and public benefit is now under discussion. The author recommended the researchers including the field of pharmacology should pay attention to, participate in the discussion of, and make suggestion to such act or regulations.
Impact of line parameter database and continuum absorption on GOSAT TIR methane retrieval
NASA Astrophysics Data System (ADS)
Yamada, A.; Saitoh, N.; Nonogaki, R.; Imasu, R.; Shiomi, K.; Kuze, A.
2017-12-01
The current methane retrieval algorithm (V1) at wavenumber range from 1210 cm-1 to 1360 cm-1 including CH4 ν 4 band from the thermal infrared (TIR) band of Thermal and Near-infrared Sensor for Carbon Observation Fourier Transform Spectrometer (TANSO-FTS) onboard Greenhouse Gases Observing Satellite (GOSAT) uses LBLRTM V12.1 with AER V3.1 line database and MT CKD 2.5.2 continuum absorption model to calculate optical depth. Since line parameter databases have been updated and the continuum absorption may have large uncertainty, the purpose of this study is to assess the impact on {CH}4 retrieval from the choice of line parameter databases and the uncertainty of continuum absorption. We retrieved {CH}4 profiles with replacement of line parameter database from AER V3.1 to AER v1.0, HITRAN 2004, HITRAN 2008, AER V3.2, or HITRAN 2012 (Rothman et al. 2005, 2009, and 2013. Clough et al., 2005), we assumed 10% larger continuum absorption coefficients and 50% larger temperature dependent coefficient of continuum absorption based on the report by Paynter and Ramaswamy (2014). We compared the retrieved CH4 with the HIPPO CH4 observation (Wofsy et al., 2012). The difference from HIPPO observation of AER V3.2 was the smallest and 24.1 ± 45.9 ppbv. The differences of AER V1.0, HITRAN 2004, HITRAN 2008, and HITRAN 2012 were 35.6 ± 46.5 ppbv, 37.6 ± 46.3 ppbv, 32.1 ± 46.1 ppbv, and 35.2 ± 46.0 ppbv, respectively. Maximum {CH}4 retrieval differences were -0.4 ppbv at the layer of 314 hPa when we used 10% larger absorption coefficients of {H}2O foreign continuum. Comparing AER V3.2 case to HITRAN 2008 case, the line coupling effect reduced difference by 8.0 ppbv. Line coupling effects were important for GOSAT TIR {CH}4 retrieval. Effects from the uncertainty of continuum absorption were negligible small for GOSAT TIR CH4 retrieval.
A database of microwave and sub-millimetre ice particle single scattering properties
NASA Astrophysics Data System (ADS)
Ekelund, Robin; Eriksson, Patrick
2016-04-01
Ice crystal particles are today a large contributing factor as to why cold-type clouds such as cirrus remain a large uncertainty in global climate models and measurements. The reason for this is the complex and varied morphology in which ice particles appear, as compared to liquid droplets with an in general spheroidal shape, thus making the description of electromagnetic properties of ice particles more complicated. Single scattering properties of frozen hydrometers have traditionally been approximated by representing the particles as spheres using Mie theory. While such practices may work well in radio applications, where the size parameter of the particles is generally low, comparisons with measurements and simulations show that this assumption is insufficient when observing tropospheric cloud ice in the microwave or sub-millimetre regions. In order to assist the radiative transfer and remote sensing communities, a database of single scattering properties of semi-realistic particles is being produced. The data is being produced using DDA (Discrete Dipole Approximation) code which can treat arbitrarily shaped particles, and Tmatrix code for simpler shapes when found sufficiently accurate. The aim has been to mainly cover frequencies used by the upcoming ICI (Ice Cloud Imager) mission with launch in 2022. Examples of particles to be included are columns, plates, bullet rosettes, sector snowflakes and aggregates. The idea is to treat particles with good average optical properties with respect to the multitude of particles and aggregate types appearing in nature. The database will initially only cover macroscopically isotropic orientation, but will eventually also include horizontally aligned particles. Databases of DDA particles do already exist with varying accessibility. The goal of this database is to complement existing data. Regarding the distribution of the data, the plan is that the database shall be available in conjunction with the ARTS (Atmospheric Radiative Transfer Simulator) project.
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan
2010-10-04
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
Development of a reference database for assessing dietary nitrate in vegetables.
Blekkenhorst, Lauren C; Prince, Richard L; Ward, Natalie C; Croft, Kevin D; Lewis, Joshua R; Devine, Amanda; Shinde, Sujata; Woodman, Richard J; Hodgson, Jonathan M; Bondonno, Catherine P
2017-08-01
Nitrate from vegetables improves vascular health with short-term intake. Whether this translates into improved long-term health outcomes has yet to be investigated. To enable reliable analysis of nitrate intake from food records, there is a strong need for a comprehensive nitrate content of vegetables database. A systematic literature search (1980-2016) was performed using Medline, Agricola and Commonwealth Agricultural Bureaux abstracts databases. The nitrate content of vegetables database contains 4237 records from 255 publications with data on 178 vegetables and 22 herbs and spices. The nitrate content of individual vegetables ranged from Chinese flat cabbage (median; range: 4240; 3004-6310 mg/kg FW) to corn (median; range: 12; 5-1091 mg/kg FW). The database was applied to estimate vegetable nitrate intake using 24-h dietary recalls (24-HDRs) and food frequency questionnaires (FFQs). Significant correlations were observed between urinary nitrate excretion and 24-HDR (r = 0.4, P = 0.013), between 24-HDR and 12 month FFQs (r = 0.5, P < 0.001) as well as two 4 week FFQs administered 8 weeks apart (r = 0.86, P < 0.001). This comprehensive nitrate database allows quantification of dietary nitrate from a large variety of vegetables. It can be applied to dietary records to explore the associations between nitrate intake and health outcomes in human studies. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An experimental investigation of masking in the US FDA adverse event reporting system database.
Wang, Hsin-wei; Hochberg, Alan M; Pearson, Ronald K; Hauben, Manfred
2010-12-01
A phenomenon of 'masking' or 'cloaking' in pharmacovigilance data mining has been described, which can potentially cause signals of disproportionate reporting (SDRs) to be missed, particularly in pharmaceutical company databases. Masking has been predicted theoretically, observed anecdotally or studied to a limited extent in both pharmaceutical company and health authority databases, but no previous publication systematically assesses its occurrence in a large health authority database. To explore the nature, extent and possible consequences of masking in the US FDA Adverse Event Reporting System (AERS) database by applying various experimental unmasking protocols to a set of drugs and events representing realistic pharmacovigilance analysis conditions. This study employed AERS data from 2001 through 2005. For a set of 63 Medical Dictionary for Regulatory Activities (MedDRA®) Preferred Terms (PTs), disproportionality analysis was carried out with respect to all drugs included in the AERS database, using a previously described urn-model-based algorithm. We specifically sought masking in which drug removal induced an increase in the statistical representation of a drug-event combination (DEC) that resulted in the emergence of a new SDR. We performed a series of unmasking experiments selecting drugs for removal using rational statistical decision rules based on the requirement of a reporting ratio (RR) >1, top-ranked statistical unexpectedness (SU) and relatedness as reflected in the WHO Anatomical Therapeutic Chemical level 4 (ATC4) grouping. In order to assess the possible extent of residual masking we performed two supplemental purely empirical analyses on a limited subset of data. This entailed testing every drug and drug group to determine which was most influential in uncovering masked SDRs. We assessed the strength of external evidence for a causal association for a small number of masked SDRs involving a subset of 29 drugs for which level of evidence adjudication was available from a previous study. The original disproportionality analysis identified 8719 SDRs for the 63 PTs. The SU-based unmasking protocols generated variable numbers of masked SDRs ranging from 38 to 156, representing a 0.43-1.8% increase over the number of baseline SDRs. A significant number of baseline SDRs were also lost in the course of our experiments. The trend in the number of gained SDRs per report removed was inversely related to the number of lost SDRs per protocol. Both the number and nature of the reports removed influenced the number of gained SDRs observed. The purely empirical protocols unmasked up to ten times as many SDRs. None of the masked SDRs had strong external evidence supporting a causal association. Most involved associations for which there was no external supporting evidence or were in the original product label. For two masked SDRs, there was external evidence of a possible causal association. We documented masking in the FDA AERS database. Attempts at unmasking SDRs using practically implementable protocols produced only small changes in the output of SDRs in our analysis. This is undoubtedly related to the large size and diversity of the database, but the complex interdependencies between drugs and events in authentic spontaneous reporting system (SRS) databases, and the impact of measures of statistical variability that are typically used in real-world disproportionality analysis, may be additional factors that constrain the discovery of masked SDRs and which may also operate in pharmaceutical company databases. Empirical determination of the most influential drugs may uncover significantly more SDRs than protocols based on predetermined statistical selection rules but are impractical except possibly for evaluating specific events. Routine global exercises to elicit masking, especially in large health authority databases are not justified based on results available to date. Exercises to elicit unmasking should be driven by prior knowledge or obvious data imbalances.
Jagtap, Pratik; Goslinga, Jill; Kooren, Joel A; McGowan, Thomas; Wroblewski, Matthew S; Seymour, Sean L; Griffin, Timothy J
2013-04-01
Large databases (>10(6) sequences) used in metaproteomic and proteogenomic studies present challenges in matching peptide sequences to MS/MS data using database-search programs. Most notably, strict filtering to avoid false-positive matches leads to more false negatives, thus constraining the number of peptide matches. To address this challenge, we developed a two-step method wherein matches derived from a primary search against a large database were used to create a smaller subset database. The second search was performed against a target-decoy version of this subset database merged with a host database. High confidence peptide sequence matches were then used to infer protein identities. Applying our two-step method for both metaproteomic and proteogenomic analysis resulted in twice the number of high confidence peptide sequence matches in each case, as compared to the conventional one-step method. The two-step method captured almost all of the same peptides matched by the one-step method, with a majority of the additional matches being false negatives from the one-step method. Furthermore, the two-step method improved results regardless of the database search program used. Our results show that our two-step method maximizes the peptide matching sensitivity for applications requiring large databases, especially valuable for proteogenomics and metaproteomics studies. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Spectroscopic Investigations of Fragment Species in the Coma
NASA Technical Reports Server (NTRS)
Feldman, Paul D.; Cochran, Anita L.; Combi, Michael R.
2004-01-01
The content of the gaseous coma of a comet is dominated by fragment species produced by photolysis of the parent molecules issuing directly from the icy nucleus of the comet. Spectroscopy of these species provides complementary information on the physical state of the coma to that obtained from observations of the parent species. Extraction of physical parameters requires detailed molecular and atomic data together with reliable high-resolution spectra and absolute fluxes of the primary source of excitation, the Sun. The large database of observations, dating back more than a century, provides a means to assess the chemical and evolutionary diversity of comets.
Discovering H-bonding rules in crystals with inductive logic programming.
Ando, Howard Y; Dehaspe, Luc; Luyten, Walter; Van Craenenbroeck, Elke; Vandecasteele, Henk; Van Meervelt, Luc
2006-01-01
In the domain of crystal engineering, various schemes have been proposed for the classification of hydrogen bonding (H-bonding) patterns observed in 3D crystal structures. In this study, the aim is to complement these schemes with rules that predict H-bonding in crystals from 2D structural information only. Modern computational power and the advances in inductive logic programming (ILP) can now provide computational chemistry with the opportunity for extracting structure-specific rules from large databases that can be incorporated into expert systems. ILP technology is here applied to H-bonding in crystals to develop a self-extracting expert system utilizing data in the Cambridge Structural Database of small molecule crystal structures. A clear increase in performance was observed when the ILP system DMax was allowed to refer to the local structural environment of the possible H-bond donor/acceptor pairs. This ability distinguishes ILP from more traditional approaches that build rules on the basis of global molecular properties.
The thermodynamic scale of inorganic crystalline metastability
Sun, Wenhao; Dacek, Stephen T.; Ong, Shyue Ping; Hautier, Geoffroy; Jain, Anubhav; Richards, William D.; Gamst, Anthony C.; Persson, Kristin A.; Ceder, Gerbrand
2016-01-01
The space of metastable materials offers promising new design opportunities for next-generation technological materials, such as complex oxides, semiconductors, pharmaceuticals, steels, and beyond. Although metastable phases are ubiquitous in both nature and technology, only a heuristic understanding of their underlying thermodynamics exists. We report a large-scale data-mining study of the Materials Project, a high-throughput database of density functional theory–calculated energetics of Inorganic Crystal Structure Database structures, to explicitly quantify the thermodynamic scale of metastability for 29,902 observed inorganic crystalline phases. We reveal the influence of chemistry and composition on the accessible thermodynamic range of crystalline metastability for polymorphic and phase-separating compounds, yielding new physical insights that can guide the design of novel metastable materials. We further assert that not all low-energy metastable compounds can necessarily be synthesized, and propose a principle of ‘remnant metastability’—that observable metastable crystalline phases are generally remnants of thermodynamic conditions where they were once the lowest free-energy phase. PMID:28138514
Mapel, D; Pearson, M
2002-08-01
Healthcare payers make decisions on funding for treatments for diseases, such as chronic obstructive pulmonary disease (COPD), on a population level, so require evidence of treatment success in appropriate populations, using usual routine care as the comparison for alternative management approaches. Such health outcomes evidence can be obtained from a number of sources. The 'gold standard' method for obtaining evidence of treatment success is usually taken as the randomized controlled prospective clinical trial. Yet the value of such studies in providing evidence for decision-makers can be questioned due to the restricted entry criteria limiting the ability to generalize to real life populations, narrow focus on individual parameters, use of placebo for comparison rather than usual therapy and unrealistic intense monitoring of patients. Evidence obtained from retrospective and observational studies can supplement that from randomized clinical trials, providing that care is taken to guard against bias and confounders. However, very large numbers of patients must be investigated if small differences between drugs and treatment approaches are to be detected. Administrative databases from healthcare systems provide an opportunity to obtain observational data on large numbers of patients. Such databases have shown that high healthcare costs in patients with COPD are associated with co-morbid conditions and current smoking status. Analysis of an administrative database has also shown that elderly patients with COPD who received inhaled corticosteroids within 90 days of discharge from hospital had 24% fewer repeat hospitalizations for COPD and were 29% less likely to die during the 1-year follow-up period. In conclusion, there are a number of sources of meaningful evidence of the health outcomes arising from different therapeutic approaches that should be of value to healthcare payers making decisions on resource allocation.
Orthographic and Phonological Neighborhood Databases across Multiple Languages.
Marian, Viorica
2017-01-01
The increased globalization of science and technology and the growing number of bilinguals and multilinguals in the world have made research with multiple languages a mainstay for scholars who study human function and especially those who focus on language, cognition, and the brain. Such research can benefit from large-scale databases and online resources that describe and measure lexical, phonological, orthographic, and semantic information. The present paper discusses currently-available resources and underscores the need for tools that enable measurements both within and across multiple languages. A general review of language databases is followed by a targeted introduction to databases of orthographic and phonological neighborhoods. A specific focus on CLEARPOND illustrates how databases can be used to assess and compare neighborhood information across languages, to develop research materials, and to provide insight into broad questions about language. As an example of how using large-scale databases can answer questions about language, a closer look at neighborhood effects on lexical access reveals that not only orthographic, but also phonological neighborhoods can influence visual lexical access both within and across languages. We conclude that capitalizing upon large-scale linguistic databases can advance, refine, and accelerate scientific discoveries about the human linguistic capacity.
Large Scale Landslide Database System Established for the Reservoirs in Southern Taiwan
NASA Astrophysics Data System (ADS)
Tsai, Tsai-Tsung; Tsai, Kuang-Jung; Shieh, Chjeng-Lun
2017-04-01
Typhoon Morakot seriously attack southern Taiwan awaken the public awareness of large scale landslide disasters. Large scale landslide disasters produce large quantity of sediment due to negative effects on the operating functions of reservoirs. In order to reduce the risk of these disasters within the study area, the establishment of a database for hazard mitigation / disaster prevention is necessary. Real time data and numerous archives of engineering data, environment information, photo, and video, will not only help people make appropriate decisions, but also bring the biggest concern for people to process and value added. The study tried to define some basic data formats / standards from collected various types of data about these reservoirs and then provide a management platform based on these formats / standards. Meanwhile, in order to satisfy the practicality and convenience, the large scale landslide disasters database system is built both provide and receive information abilities, which user can use this large scale landslide disasters database system on different type of devices. IT technology progressed extreme quick, the most modern system might be out of date anytime. In order to provide long term service, the system reserved the possibility of user define data format /standard and user define system structure. The system established by this study was based on HTML5 standard language, and use the responsive web design technology. This will make user can easily handle and develop this large scale landslide disasters database system.
NASA Astrophysics Data System (ADS)
Cramer, C. H.; Kutliroff, J.; Dangkua, D.
2010-12-01
A five-year Next Generation Attenuation (NGA) East project to develop new ground motion prediction equations for stable continental regions (SCRs), including eastern North America (ENA), has begun at the Pacific Earthquake Engineering Research (PEER) Center funded by the Nuclear Regulatory Commission (NRC), the U.S. Geological Survey (USGS), the Electric Power Research Institute (EPRI), and the Department of Energy (DOE). The initial effort focused on database design and collection of appropriate M>4 ENA broadband and accelerograph records to populate the database. Ongoing work has focused on adding records from smaller ENA earthquakes and from other SCRs such as Europe, Australia, and India. Currently, over 6500 horizontal and vertical component records from 60 ENA earthquakes have been collected and prepared (instrument response removed, filtering to acceptable-signal band, determining peak and spectral parameter values, quality assurance, etc.) for the database. Geologic Survey of Canada (GSC) strong motion recordings, previously not available, have also been added to the NGA East database. The additional earthquakes increase the number of ground motion recordings in the 10 - 100 km range, particularly from the 2008 M5.2 Mt. Carmel, IL event, and the 2005 M4.7 Riviere du Loup and 2010 M5.0 Val des Bois earthquakes in Quebec, Canada. The goal is to complete the ENA database and make it available in 2011 followed by a SCR database in 2012. Comparisons of ground motion observations from four recent M5 ENA earthquakes with current ENA ground motion prediction equations (GMPEs) suggest that current GMPEs, as a group, reasonably agree with M5 observations at short periods, particularly at distances less than 200 km. However, at one second, current GMPEs over predict M5 ground motion observations. The 2001 M7.6 Bhuj, India, earthquake provides some constraint at large magnitudes, as geology and regional attenuation is analogous to ENA. Cramer and Kumar, 2003, have shown that ENA GMPE’s generally agree with the Bhuj dataset within 300 km at short and long periods. But the Bhuj earthquake does not exhibit the intermediate-period spectral sag (Atkinson, 1993) of larger ENA earthquakes and thus the Bhuj ground motions may be larger than what could be expected at one second for M7s in ENA.
Large-scale annotation of small-molecule libraries using public databases.
Zhou, Yingyao; Zhou, Bin; Chen, Kaisheng; Yan, S Frank; King, Frederick J; Jiang, Shumei; Winzeler, Elizabeth A
2007-01-01
While many large publicly accessible databases provide excellent annotation for biological macromolecules, the same is not true for small chemical compounds. Commercial data sources also fail to encompass an annotation interface for large numbers of compounds and tend to be cost prohibitive to be widely available to biomedical researchers. Therefore, using annotation information for the selection of lead compounds from a modern day high-throughput screening (HTS) campaign presently occurs only under a very limited scale. The recent rapid expansion of the NIH PubChem database provides an opportunity to link existing biological databases with compound catalogs and provides relevant information that potentially could improve the information garnered from large-scale screening efforts. Using the 2.5 million compound collection at the Genomics Institute of the Novartis Research Foundation (GNF) as a model, we determined that approximately 4% of the library contained compounds with potential annotation in such databases as PubChem and the World Drug Index (WDI) as well as related databases such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) and ChemIDplus. Furthermore, the exact structure match analysis showed 32% of GNF compounds can be linked to third party databases via PubChem. We also showed annotations such as MeSH (medical subject headings) terms can be applied to in-house HTS databases in identifying signature biological inhibition profiles of interest as well as expediting the assay validation process. The automated annotation of thousands of screening hits in batch is becoming feasible and has the potential to play an essential role in the hit-to-lead decision making process.
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
NASA Astrophysics Data System (ADS)
Dykstra, Dave
2012-12-01
One of the main attractions of non-relational “NoSQL” databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It also compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.
Comparison of the Frontier Distributed Database Caching System to NoSQL Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dykstra, Dave
One of the main attractions of non-relational NoSQL databases is their ability to scale to large numbers of readers, including readers spread over a wide area. The Frontier distributed database caching system, used in production by the Large Hadron Collider CMS and ATLAS detector projects for Conditions data, is based on traditional SQL databases but also adds high scalability and the ability to be distributed over a wide-area for an important subset of applications. This paper compares the major characteristics of the two different approaches and identifies the criteria for choosing which approach to prefer over the other. It alsomore » compares in some detail the NoSQL databases used by CMS and ATLAS: MongoDB, CouchDB, HBase, and Cassandra.« less
NASA Astrophysics Data System (ADS)
Paiva, L. M. S.; Bodstein, G. C. R.; Pimentel, L. C. G.
2014-08-01
Large-eddy simulations are performed using the Advanced Regional Prediction System (ARPS) code at horizontal grid resolutions as fine as 300 m to assess the influence of detailed and updated surface databases on the modeling of local atmospheric circulation systems of urban areas with complex terrain. Applications to air pollution and wind energy are sought. These databases are comprised of 3 arc-sec topographic data from the Shuttle Radar Topography Mission, 10 arc-sec vegetation-type data from the European Space Agency (ESA) GlobCover project, and 30 arc-sec leaf area index and fraction of absorbed photosynthetically active radiation data from the ESA GlobCarbon project. Simulations are carried out for the metropolitan area of Rio de Janeiro using six one-way nested-grid domains that allow the choice of distinct parametric models and vertical resolutions associated to each grid. ARPS is initialized using the Global Forecasting System with 0.5°-resolution data from the National Center of Environmental Prediction, which is also used every 3 h as lateral boundary condition. Topographic shading is turned on and two soil layers are used to compute the soil temperature and moisture budgets in all runs. Results for two simulated runs covering three periods of time are compared to surface and upper-air observational data to explore the dependence of the simulations on initial and boundary conditions, grid resolution, topographic and land-use databases. Our comparisons show overall good agreement between simulated and observational data, mainly for the potential temperature and the wind speed fields, and clearly indicate that the use of high-resolution databases improves significantly our ability to predict the local atmospheric circulation.
Forecasting Safe or Dangerous Space Weather from HMI Magnetograms
NASA Technical Reports Server (NTRS)
Falconer, David; Barghouty, Abdulnasser F.; Khazanov, Igor; Moore, Ron
2011-01-01
We have developed a space-weather forecasting tool using an active-region free-energy proxy that was measured from MDI line-of-sight magnetograms. To develop this forecasting tool (Falconer et al 2011, Space Weather Journal, in press), we used a database of 40,000 MDI magnetograms of 1300 active regions observed by MDI during the previous solar cycle (cycle 23). From each magnetogram we measured our free-energy proxy and for each active region we determined its history of major flare, CME and Solar Particle Event (SPE) production. This database determines from the value of an active region s free-energy proxy the active region s expected rate of production of 1) major flares, 2) CMEs, 3) fast CMEs, and 4) SPEs during the next few days. This tool was delivered to NASA/SRAG in 2010. With MDI observations ending, we have to be able to use HMI magnetograms instead of MDI magnetograms. One of the difficulties is that the measured value of the free-energy proxy is sensitive to the spatial resolution of the measured magnetogram: the 0.5 /pixel resolution of HMI gives a different value for the free-energy proxy than the 2 /pixels resolution of MDI. To use our MDI-database forecasting curves until a comparably large HMI database is accumulated, we smooth HMI line-of-sight magnetograms to MDI resolution, so that we can use HMI to find the value of the free-energy proxy that MDI would have measured, and then use the forecasting curves given by the MDI database. The new version for use with HMI magnetograms was delivered to NASA/SRAG (March 2011). It can also use GONG magnetograms, as a backup.
An Emerging Role for Polystores in Precision Medicine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Begoli, Edmon; Christian, J. Blair; Gadepally, Vijay
Medical data is organically heterogeneous, and it usually varies significantly in both size and composition. Yet, this data is also a key for the recent and promising field of precision medicine, which focuses on identifying and tailoring appropriate medical treatments for the needs of the individual patients, based on their specific conditions, their medical history, lifestyle, genetic, and other individual factors. As we, and a database community at large, recognize that a “one size does not fit all” solution is required to work with such data, we present in this paper our observations based on our experiences, and the applicationsmore » in the field of precision medicine. Finally, we make the case for the use of polystore architecture; how it applies for precision medicine; we discuss the reference architecture; describe some of its critical components (array database); and discuss the specific types of analysis that directly benefit from this database architecture, and the ways it serves the data.« less
Monitoring product safety in the postmarketing environment.
Sharrar, Robert G; Dieck, Gretchen S
2013-10-01
The safety profile of a medicinal product may change in the postmarketing environment. Safety issues not identified in clinical development may be seen and need to be evaluated. Methods of evaluating spontaneous adverse experience reports and identifying new safety risks include a review of individual reports, a review of a frequency distribution of a list of the adverse experiences, the development and analysis of a case series, and various ways of examining the database for signals of disproportionality, which may suggest a possible association. Regulatory agencies monitor product safety through a variety of mechanisms including signal detection of the adverse experience safety reports in databases and by requiring and monitoring risk management plans, periodic safety update reports and postauthorization safety studies. The United States Food and Drug Administration is working with public, academic and private entities to develop methods for using large electronic databases to actively monitor product safety. Important identified risks will have to be evaluated through observational studies and registries.
Speisky, Hernan; López-Alarcón, Camilo; Gómez, Maritza; Fuentes, Jocelyn; Sandoval-Acuña, Cristian
2012-09-12
This paper reports the first database on antioxidants contained in fruits produced and consumed within the south Andes region of South America. The database ( www.portalantioxidantes.com ) contains over 500 total phenolics (TP) and ORAC values for more than 120 species/varieties of fruits. All analyses were conducted by a single ISO/IEC 17025-certified laboratory. The characterization comprised native berries such as maqui ( Aristotelia chilensis ), murtilla ( Ugni molinae ), and calafate ( Barberis microphylla ), which largely outscored all other studied fruits. Major differences in TP and ORAC were observed as a function of the fruit variety in berries, avocado, cherries, and apples. In fruits such as pears, apples, apricots, and peaches, a significant part of the TP and ORAC was accounted for by the antioxidants present in the peel. These data should be useful to estimate the fruit-based intake of TP and, through the ORAC data, their antioxidant-related contribution to the diet of south Andes populations.
From randomized controlled trials to observational studies.
Silverman, Stuart L
2009-02-01
Randomized controlled trials are considered the gold standard in the hierarchy of research designs for evaluating the efficacy and safety of a treatment intervention. However, their results can have limited applicability to patients in clinical settings. Observational studies using large health care databases can complement findings from randomized controlled trials by assessing treatment effectiveness in patients encountered in day-to-day clinical practice. Results from these designs can expand upon outcomes of randomized controlled trials because of the use of larger and more diverse patient populations with common comorbidities and longer follow-up periods. Furthermore, well-designed observational studies can identify clinically important differences among therapeutic options and provide data on long-term drug effectiveness and safety.
DaVIE: Database for the Visualization and Integration of Epigenetic data
Fejes, Anthony P.; Jones, Meaghan J.; Kobor, Michael S.
2014-01-01
One of the challenges in the analysis of large data sets, particularly in a population-based setting, is the ability to perform comparisons across projects. This has to be done in such a way that the integrity of each individual project is maintained, while ensuring that the data are comparable across projects. These issues are beginning to be observed in human DNA methylation studies, as the Illumina 450k platform and next generation sequencing-based assays grow in popularity and decrease in price. This increase in productivity is enabling new insights into epigenetics, but also requires the development of pipelines and software capable of handling the large volumes of data. The specific problems inherent in creating a platform for the storage, comparison, integration, and visualization of DNA methylation data include data storage, algorithm efficiency and ability to interpret the results to derive biological meaning from them. Databases provide a ready-made solution to these issues, but as yet no tools exist that that leverage these advantages while providing an intuitive user interface for interpreting results in a genomic context. We have addressed this void by integrating a database to store DNA methylation data with a web interface to query and visualize the database and a set of libraries for more complex analysis. The resulting platform is called DaVIE: Database for the Visualization and Integration of Epigenetics data. DaVIE can use data culled from a variety of sources, and the web interface includes the ability to group samples by sub-type, compare multiple projects and visualize genomic features in relation to sites of interest. We have used DaVIE to identify patterns of DNA methylation in specific projects and across different projects, identify outlier samples, and cross-check differentially methylated CpG sites identified in specific projects across large numbers of samples. A demonstration server has been setup using GEO data at http://echelon.cmmt.ubc.ca/dbaccess/, with login “guest” and password “guest.” Groups may download and install their own version of the server following the instructions on the project's wiki. PMID:25278960
Mobile Source Observation Database (MSOD)
The Mobile Source Observation Database (MSOD) is a relational database developed by the Assessment and Standards Division (ASD) of the U.S. EPA Office of Transportation and Air Quality (formerly the Office of Mobile Sources).
High performance semantic factoring of giga-scale semantic graph databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Adolf, Bob; Haglin, David
2010-10-01
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
Advancing the LSST Operations Simulator
NASA Astrophysics Data System (ADS)
Saha, Abhijit; Ridgway, S. T.; Cook, K. H.; Delgado, F.; Chandrasekharan, S.; Petry, C. E.; Operations Simulator Group
2013-01-01
The Operations Simulator for the Large Synoptic Survey Telescope (LSST; http://lsst.org) allows the planning of LSST observations that obey explicit science driven observing specifications, patterns, schema, and priorities, while optimizing against the constraints placed by design-specific opto-mechanical system performance of the telescope facility, site specific conditions (including weather and seeing), as well as additional scheduled and unscheduled downtime. A simulation run records the characteristics of all observations (e.g., epoch, sky position, seeing, sky brightness) in a MySQL database, which can be queried for any desired purpose. Derivative information digests of the observing history database are made with an analysis package called Simulation Survey Tools for Analysis and Reporting (SSTAR). Merit functions and metrics have been designed to examine how suitable a specific simulation run is for several different science applications. This poster reports recent work which has focussed on an architectural restructuring of the code that will allow us to a) use "look-ahead" strategies that avoid cadence sequences that cannot be completed due to observing constraints; and b) examine alternate optimization strategies, so that the most efficient scheduling algorithm(s) can be identified and used: even few-percent efficiency gains will create substantive scientific opportunity. The enhanced simulator will be used to assess the feasibility of desired observing cadences, study the impact of changing science program priorities, and assist with performance margin investigations of the LSST system.
Distance correlation methods for discovering associations in large astrophysical databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martínez-Gómez, Elizabeth; Richards, Mercedes T.; Richards, Donald St. P., E-mail: elizabeth.martinez@itam.mx, E-mail: mrichards@astro.psu.edu, E-mail: richards@stat.psu.edu
2014-01-20
High-dimensional, large-sample astrophysical databases of galaxy clusters, such as the Chandra Deep Field South COMBO-17 database, provide measurements on many variables for thousands of galaxies and a range of redshifts. Current understanding of galaxy formation and evolution rests sensitively on relationships between different astrophysical variables; hence an ability to detect and verify associations or correlations between variables is important in astrophysical research. In this paper, we apply a recently defined statistical measure called the distance correlation coefficient, which can be used to identify new associations and correlations between astrophysical variables. The distance correlation coefficient applies to variables of any dimension,more » can be used to determine smaller sets of variables that provide equivalent astrophysical information, is zero only when variables are independent, and is capable of detecting nonlinear associations that are undetectable by the classical Pearson correlation coefficient. Hence, the distance correlation coefficient provides more information than the Pearson coefficient. We analyze numerous pairs of variables in the COMBO-17 database with the distance correlation method and with the maximal information coefficient. We show that the Pearson coefficient can be estimated with higher accuracy from the corresponding distance correlation coefficient than from the maximal information coefficient. For given values of the Pearson coefficient, the distance correlation method has a greater ability than the maximal information coefficient to resolve astrophysical data into highly concentrated horseshoe- or V-shapes, which enhances classification and pattern identification. These results are observed over a range of redshifts beyond the local universe and for galaxies from elliptical to spiral.« less
[Pharmacovigilance in Germany : It is about time].
Douros, A; Schaefer, C; Kreutz, R; Garbe, E
2016-06-01
Pharmacovigilance is defined as the activities relating to the detection, assessment, and prevention of adverse drug reactions (ADRs). Although its beginnings in Germany date back more than 50 years, a stagnation in this field has been observed lately. Different tools of pharmacovigilance will be illustrated and the reasons for its stagnation in Germany will be elucidated. Spontaneous reporting systems are an important tool in pharmacovigilance and are based on reports of ADRs from treating physicians, other healthcare professionals, or patients. Due to several weaknesses of spontaneous reporting systems such as underreporting, media bias, confounding by comorbidity or comedication, and due to the limited quality of the reports, the development of electronic healthcare databases was publicly funded in recent years so that they can be used for pharmacovigilance research. In the US different electronic healthcare databases were merged in a project sponsored by public means resulting in more than 193 million individuals. In Germany the establishment of large longitudinal databases was never conceived as a public duty and has not been implemented so far. Further attempts to use administrative healthcare data for pharmacovigilance purposes are severely restricted by the Code of Social Law (Section 75, Book 10). This situation has led to a stagnation in pharmacovigilance research in Germany. Without publicly funded large longitudinal healthcare databases and an amendment of Section 75, Book 10, of the Code of Social Law, the use of healthcare data in pharmacovigilance research in Germany will remain a rarity. This could have negative effects on the medical care of the general population.
Problem of Mistakes in Databases, Processing and Interpretation of Observations of the Sun. I.
NASA Astrophysics Data System (ADS)
Lozitska, N. I.
In databases of observations unnoticed mistakes and misprints could occur at any stage of observation, preparation and processing of databases. The current detection of errors is complicated by the fact that the works of observer, databases compiler and researcher were divided. Data acquisition from a spacecraft requires the greater amount of researchers than for ground-based observations. As a result, the probability of errors is increasing. Keeping track of the errors on each stage is very difficult, so we use of cross-comparison of data from different sources. We revealed some misprints in the typographic and digital results of sunspot group area measurements.
Visualizing the semantic content of large text databases using text maps
NASA Technical Reports Server (NTRS)
Combs, Nathan
1993-01-01
A methodology for generating text map representations of the semantic content of text databases is presented. Text maps provide a graphical metaphor for conceptualizing and visualizing the contents and data interrelationships of large text databases. Described are a set of experiments conducted against the TIPSTER corpora of Wall Street Journal articles. These experiments provide an introduction to current work in the representation and visualization of documents by way of their semantic content.
Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-01-01
Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. PMID:25670757
The EpiSLI Database: A Publicly Available Database on Speech and Language
ERIC Educational Resources Information Center
Tomblin, J. Bruce
2010-01-01
Purpose: This article describes a database that was created in the process of conducting a large-scale epidemiologic study of specific language impairment (SLI). As such, this database will be referred to as the EpiSLI database. Children with SLI have unexpected and unexplained difficulties learning and using spoken language. Although there is no…
Deployment and Evaluation of an Observations Data Model
NASA Astrophysics Data System (ADS)
Horsburgh, J. S.; Tarboton, D. G.; Zaslavsky, I.; Maidment, D. R.; Valentine, D.
2007-12-01
Environmental observations are fundamental to hydrology and water resources, and the way these data are organized and manipulated either enables or inhibits the analyses that can be performed. The CUAHSI Hydrologic Information System project is developing information technology infrastructure to support hydrologic science. This includes an Observations Data Model (ODM) that provides a new and consistent format for the storage and retrieval of environmental observations in a relational database designed to facilitate integrated analysis of large datasets collected by multiple investigators. Within this data model, observations are stored with sufficient ancillary information (metadata) about the observations to allow them to be unambiguously interpreted and used, and to provide traceable heritage from raw measurements to useable information. The design is based upon a relational database model that exposes each single observation as a record, taking advantage of the capability in relational database systems for querying based upon data values and enabling cross dimension data retrieval and analysis. This data model has been deployed, as part of the HIS Server, at the WATERS Network test bed observatories across the U.S where it serves as a repository for real time data in the observatory information system. The ODM holds the data that is then made available to investigators and the public through web services and the Data Access System for Hydrology (DASH) map based interface. In the WATERS Network test bed settings the ODM has been used to ingest, analyze and publish data from a variety of sources and disciplines. This paper will present an evaluation of the effectiveness of this initial deployment and the revisions that are being instituted to address shortcomings. The ODM represents a new, systematic way for hydrologists, scientists, and engineers to organize and share their data and thereby facilitate a fuller integrated understanding of water resources based on more extensive and fully specified information.
Paquet, Agnes C; Solberg, Owen D; Napolitano, Laura A; Volpe, Joseph M; Walworth, Charles; Whitcomb, Jeannette M; Petropoulos, Christos J; Haddad, Mojgan
2014-01-01
Drug resistance testing and co-receptor tropism determination are key components of the management of antiretroviral therapy for HIV-1-infected individuals. The purpose of this study was to examine trends of HIV-1 resistance and viral evolution in the past decade by surveying a large commercial patient testing database. Temporal trends of drug resistance, viral fitness and co-receptor usage among samples submitted for routine phenotypic and genotypic resistance testing to protease inhibitors (PIs), nucleoside reverse transcriptase inhibitors (NRTIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs), as well as for tropism determination were investigated. Within 62,397 resistant viruses reported from 2003 to 2012, we observed a decreasing trend in the prevalence of three-class resistance (from 25% to 9%) driven by decreased resistance to PIs (43% to 21%) and NRTIs (79% to 57%), while observing a slight increase in NNRTI resistance (68% to 75%). The prevalence of CXCR4-mediated entry among tropism testing samples (n=52,945) declined over time from 47% in 2007 to 40% in 2012. A higher proportion of CXCR4-tropic viruses was observed within samples with three-class resistance (50%) compared with the group with no resistance (36%). Decreased prevalence of three-class resistance and increased prevalence of one-class resistance was observed within samples reported between 2003 and 2012. The fraction of CXCR4-tropic viruses has decreased over time; however, CXCR4 usage was more prevalent among multi-class-resistant samples, which may be due to the more advanced disease stage of treatment-experienced patients. These trends have important implications for clinical practice and future drug discovery and development.
Search and retrieval of medical images for improved diagnosis of neurodegenerative diseases
NASA Astrophysics Data System (ADS)
Ekin, Ahmet; Jasinschi, Radu; Turan, Erman; Engbers, Rene; van der Grond, Jeroen; van Buchem, Mark A.
2007-01-01
In the medical world, the accuracy of diagnosis is mainly affected by either lack of sufficient understanding of some diseases or the inter-, and/or intra-observer variability of the diagnoses. The former requires understanding the progress of diseases at much earlier stages, extraction of important information from ever growing amounts of data, and finally finding correlations with certain features and complications that will illuminate the disease progression. The latter (inter-, and intra- observer variability) is caused by the differences in the experience levels of different medical experts (inter-observer variability) or by mental and physical tiredness of one expert (intra-observer variability). We believe that the use of large databases can help improve the current status of disease understanding and decision making. By comparing large number of patients, some of the otherwise hidden relations can be revealed that results in better understanding, patients with similar complications can be found, the diagnosis and treatment can be compared so that the medical expert can make a better diagnosis. To this effect, this paper introduces a search and retrieval system for brain MR databases and shows that brain iron accumulation shape provides additional information to the shape-insensitive features, such as the total brain iron load, that are commonly used in the clinics. We propose to use Kendall's correlation value to automatically compare various returns to a query. We also describe a fully automated and fast brain MR image analysis system to detect degenerative iron accumulation in brain, as it is the case in Alzheimer's and Parkinson's. The system is composed of several novel image processing algorithms and has been extensively tested in Leiden University Medical Center over so far more than 600 patients.
USDA-ARS?s Scientific Manuscript database
Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency.
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.
Zhang, Yaoyang; Xu, Tao; Shan, Bing; Hart, Jonathan; Aslanian, Aaron; Han, Xuemei; Zong, Nobel; Li, Haomin; Choi, Howard; Wang, Dong; Acharya, Lipi; Du, Lisa; Vogt, Peter K; Ping, Peipei; Yates, John R
2015-11-03
Shotgun proteomics generates valuable information from large-scale and target protein characterizations, including protein expression, protein quantification, protein post-translational modifications (PTMs), protein localization, and protein-protein interactions. Typically, peptides derived from proteolytic digestion, rather than intact proteins, are analyzed by mass spectrometers because peptides are more readily separated, ionized and fragmented. The amino acid sequences of peptides can be interpreted by matching the observed tandem mass spectra to theoretical spectra derived from a protein sequence database. Identified peptides serve as surrogates for their proteins and are often used to establish what proteins were present in the original mixture and to quantify protein abundance. Two major issues exist for assigning peptides to their originating protein. The first issue is maintaining a desired false discovery rate (FDR) when comparing or combining multiple large datasets generated by shotgun analysis and the second issue is properly assigning peptides to proteins when homologous proteins are present in the database. Herein we demonstrate a new computational tool, ProteinInferencer, which can be used for protein inference with both small- or large-scale data sets to produce a well-controlled protein FDR. In addition, ProteinInferencer introduces confidence scoring for individual proteins, which makes protein identifications evaluable. This article is part of a Special Issue entitled: Computational Proteomics. Copyright © 2015. Published by Elsevier B.V.
Wang, Shirley V; Schneeweiss, Sebastian; Berger, Marc L; Brown, Jeffrey; de Vries, Frank; Douglas, Ian; Gagne, Joshua J; Gini, Rosa; Klungel, Olaf; Mullins, C Daniel; Nguyen, Michael D; Rassen, Jeremy A; Smeeth, Liam; Sturkenboom, Miriam
2017-09-01
Defining a study population and creating an analytic dataset from longitudinal healthcare databases involves many decisions. Our objective was to catalogue scientific decisions underpinning study execution that should be reported to facilitate replication and enable assessment of validity of studies conducted in large healthcare databases. We reviewed key investigator decisions required to operate a sample of macros and software tools designed to create and analyze analytic cohorts from longitudinal streams of healthcare data. A panel of academic, regulatory, and industry experts in healthcare database analytics discussed and added to this list. Evidence generated from large healthcare encounter and reimbursement databases is increasingly being sought by decision-makers. Varied terminology is used around the world for the same concepts. Agreeing on terminology and which parameters from a large catalogue are the most essential to report for replicable research would improve transparency and facilitate assessment of validity. At a minimum, reporting for a database study should provide clarity regarding operational definitions for key temporal anchors and their relation to each other when creating the analytic dataset, accompanied by an attrition table and a design diagram. A substantial improvement in reproducibility, rigor and confidence in real world evidence generated from healthcare databases could be achieved with greater transparency about operational study parameters used to create analytic datasets from longitudinal healthcare databases. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
Jantzen, Rodolphe; Rance, Bastien; Katsahian, Sandrine; Burgun, Anita; Looten, Vincent
2018-01-01
Open data available largely and with minimal constraints to the general public and journalists are needed to help rebuild trust between citizens and the health system. By opening data, we can expect to increase the democratic accountability, the self-empowerment of citizens. This article aims at assessing the quality and reusability of the Transparency - Health database with regards to the FAIR principles. More specifically, we observe the quality of the identity of the French medical doctors in the Transp-db. This study shows that the quality of the data in the Transp-db does not allow to identity with certainty those who benefit from an advantage or remuneration to be confirmed, reducing noticeably the impact of the open data effort.
Automating the Generation of the Cassini Tour Atlas Database
NASA Technical Reports Server (NTRS)
Grazier, Kevin R.; Roumeliotis, Chris; Lange, Robert D.
2010-01-01
The Tour Atlas is a large database of geometrical tables, plots, and graphics used by Cassini science planning engineers and scientists primarily for science observation planning. Over time, as the contents of the Tour Atlas grew, the amount of time it took to recreate the Tour Atlas similarly grew--to the point that it took one person a week of effort. When Cassini tour designers estimated that they were going to create approximately 30 candidate Extended Mission trajectories--which needed to be analyzed for science return in a short amount of time--it became a necessity to automate. We report on the automation methodology that reduced the amount of time it took one person to (re)generate a Tour Atlas from a week to, literally, one UNIX command.
Using XMM-OM UV Data to Study Cluster Galaxy Evolution
NASA Astrophysics Data System (ADS)
Miller, Neal A.; O'Steen, R.
2010-01-01
The XMM-Newton satellite includes an Optical Monitor (XMM-OM) for the simultaneous observation of its X-ray targets at UV and optical wavelengths. On account of XMM's excellent characteristics for the observation of the hot intracluster medium, a large number of galaxy clusters have been observed by XMM and there is consequently a large and virtually unused database of XMM-OM UV data for galaxies in the cores of these clusters. We have begun a program to capitalize on such data, and describe here our efforts on a subsample of ten nearby clusters having XMM-OM, GALEX, and SDSS data. We present our methods for photometry and calibration of the XMM-OM UV data, and briefly present some applications including galaxy color magnitude diagrams (and identification of the red sequence, blue cloud, and green valley) and SED fitting (and galaxy stellar masses and star formation histories). Support for this work is provided by NASA Award Number NNX09AC76G.
AgeFactDB--the JenAge Ageing Factor Database--towards data integration in ageing research.
Hühne, Rolf; Thalheim, Torsten; Sühnel, Jürgen
2014-01-01
AgeFactDB (http://agefactdb.jenage.de) is a database aimed at the collection and integration of ageing phenotype data including lifespan information. Ageing factors are considered to be genes, chemical compounds or other factors such as dietary restriction, whose action results in a changed lifespan or another ageing phenotype. Any information related to the effects of ageing factors is called an observation and is presented on observation pages. To provide concise access to the complete information for a particular ageing factor, corresponding observations are also summarized on ageing factor pages. In a first step, ageing-related data were primarily taken from existing databases such as the Ageing Gene Database--GenAge, the Lifespan Observations Database and the Dietary Restriction Gene Database--GenDR. In addition, we have started to include new ageing-related information. Based on homology data taken from the HomoloGene Database, AgeFactDB also provides observation and ageing factor pages of genes that are homologous to known ageing-related genes. These homologues are considered as candidate or putative ageing-related genes. AgeFactDB offers a variety of search and browse options, and also allows the download of ageing factor or observation lists in TSV, CSV and XML formats.
Adsorption structures and energetics of molecules on metal surfaces: Bridging experiment and theory
NASA Astrophysics Data System (ADS)
Maurer, Reinhard J.; Ruiz, Victor G.; Camarillo-Cisneros, Javier; Liu, Wei; Ferri, Nicola; Reuter, Karsten; Tkatchenko, Alexandre
2016-05-01
Adsorption geometry and stability of organic molecules on surfaces are key parameters that determine the observable properties and functions of hybrid inorganic/organic systems (HIOSs). Despite many recent advances in precise experimental characterization and improvements in first-principles electronic structure methods, reliable databases of structures and energetics for large adsorbed molecules are largely amiss. In this review, we present such a database for a range of molecules adsorbed on metal single-crystal surfaces. The systems we analyze include noble-gas atoms, conjugated aromatic molecules, carbon nanostructures, and heteroaromatic compounds adsorbed on five different metal surfaces. The overall objective is to establish a diverse benchmark dataset that enables an assessment of current and future electronic structure methods, and motivates further experimental studies that provide ever more reliable data. Specifically, the benchmark structures and energetics from experiment are here compared with the recently developed van der Waals (vdW) inclusive density-functional theory (DFT) method, DFT + vdWsurf. In comparison to 23 adsorption heights and 17 adsorption energies from experiment we find a mean average deviation of 0.06 Å and 0.16 eV, respectively. This confirms the DFT + vdWsurf method as an accurate and efficient approach to treat HIOSs. A detailed discussion identifies remaining challenges to be addressed in future development of electronic structure methods, for which the here presented benchmark database may serve as an important reference.
Robinson, William P
2017-12-01
Ruptured abdominal aortic aneurysm is one of the most difficult clinical problems in surgical practice, with extraordinarily high morbidity and mortality. During the past 23 years, the literature has become replete with reports regarding ruptured endovascular aneurysm repair. A variety of study designs and databases have been utilized to compare ruptured endovascular aneurysm repair and open surgical repair for ruptured abdominal aortic aneurysm and studies of various designs from different databases have yielded vastly different conclusions. It therefore remains controversial whether ruptured endovascular aneurysm repair improves outcomes after ruptured abdominal aortic aneurysm in comparison to open surgical repair. The purpose of this article is to review the best available evidence comparing ruptured endovascular aneurysm repair and open surgical repair of ruptured abdominal aortic aneurysm, including single institution and multi-institutional retrospective observational studies, large national population-based studies, large national registries of prospectively collected data, and randomized controlled clinical trials. This article will analyze the study designs and databases utilized with their attendant strengths and weaknesses to understand the sometimes vastly different conclusions the studies have reached. This article will attempt to integrate the data to distill some of the lessons that have been learned regarding ruptured endovascular aneurysm repair and identify ongoing needs in this field. Copyright © 2017 Elsevier Inc. All rights reserved.
Development and Operation of a Database Machine for Online Access and Update of a Large Database.
ERIC Educational Resources Information Center
Rush, James E.
1980-01-01
Reviews the development of a fault tolerant database processor system which replaced OCLC's conventional file system. A general introduction to database management systems and the operating environment is followed by a description of the hardware selection, software processes, and system characteristics. (SW)
Hypersonic and Supersonic Flow Roadmaps Using Bibliometrics and Database Tomography.
ERIC Educational Resources Information Center
Kostoff, R. N.; Eberhart, Henry J.; Toothman, Darrell Ray
1999-01-01
Database Tomography (DT) is a textual database-analysis system consisting of algorithms for extracting multiword phrase frequencies and proximities from a large textual database, to augment interpretative capabilities of the expert human analyst. Describes use of the DT process, supplemented by literature bibliometric analyses, to derive technical…
Generation of Fine Scale Wind and Wave Climatologies
NASA Astrophysics Data System (ADS)
Vandenberghe, F. C.; Filipot, J.; Mouche, A.
2013-12-01
A tool to generate 'on demand' large databases of atmospheric parameters at high resolution has been developed for defense applications. The approach takes advantage of the zooming and relocation capabilities of the embedded domains that can be found in regional models like the community Weather Research and Forecast model (WRF). The WRF model is applied to dynamically downscale NNRP, CFSR and ERA40 global analyses and to generate long records, up to 30 years, of hourly gridded data over 200km2 domains at 3km grid increment. To insure accuracy, observational data from the NCAR ADP historical database are used in combination with the Four-Dimensional Data Assimilation (FDDA) techniques to constantly nudge the model analysis toward observations. The atmospheric model is coupled to secondary applications such as the NOAA's Wave Watch III model the Navy's APM Electromagnetic Propagation model, allowing the creation of high-resolution climatologies of surface winds, waves and electromagnetic propagation parameters. The system was applied at several coastal locations of the Mediterranean Sea where SAR wind and wave observations were available during the entire year of 2008. Statistical comparisons between the model output and SAR observations are presented. Issues related to the global input data, and the model drift, as well as the impact of the wind biases on wave simulations will be discussed.
Comparison of the NCI open database with seven large chemical structural databases.
Voigt, J H; Bienfait, B; Wang, S; Nicklaus, M C
2001-01-01
Eight large chemical databases have been analyzed and compared to each other. Central to this comparison is the open National Cancer Institute (NCI) database, consisting of approximately 250 000 structures. The other databases analyzed are the Available Chemicals Directory ("ACD," from MDL, release 1.99, 3D-version); the ChemACX ("ACX," from CamSoft, Version 4.5); the Maybridge Catalog and the Asinex database (both as distributed by CamSoft as part of ChemInfo 4.5); the Sigma-Aldrich Catalog (CD-ROM, 1999 Version); the World Drug Index ("WDI," Derwent, version 1999.03); and the organic part of the Cambridge Crystallographic Database ("CSD," from Cambridge Crystallographic Data Center, 1999 Version 5.18). The database properties analyzed are internal duplication rates; compounds unique to each database; cumulative occurrence of compounds in an increasing number of databases; overlap of identical compounds between two databases; similarity overlap; diversity; and others. The crystallographic database CSD and the WDI show somewhat less overlap with the other databases than those with each other. In particular the collections of commercial compounds and compilations of vendor catalogs have a substantial degree of overlap among each other. Still, no database is completely a subset of any other, and each appears to have its own niche and thus "raison d'être". The NCI database has by far the highest number of compounds that are unique to it. Approximately 200 000 of the NCI structures were not found in any of the other analyzed databases.
Leaf optical properties shed light on foliar trait variability at individual to global scales
NASA Astrophysics Data System (ADS)
Shiklomanov, A. N.; Serbin, S.; Dietze, M.
2017-12-01
Recent syntheses of large trait databases have contributed immensely to our understanding of drivers of plant function at the global scale. However, the global trade-offs revealed by such syntheses, such as the trade-off between leaf productivity and resilience (i.e. "leaf economics spectrum"), are often absent at smaller scales and fail to correlate with actual functional limitations. An improved understanding of how traits vary among communities, species, and individuals is critical to accurate representations of vegetation ecophysiology and ecological dynamics in ecosystem models. Spectral data from both field observations and remote sensing platforms present a rich and widely available source of information on plant traits. Here, we apply Bayesian inversion of the PROSPECT leaf radiative transfer model to a large global database of over 60,000 field spectra and plant traits to (1) comprehensively assess the accuracy of leaf trait estimation using PROSPECT spectral inversion; (2) investigate the correlations between optical traits estimable from PROSPECT and other important foliar traits such as nitrogen and lignin concentrations; and (3) identify dominant sources of variability and characterize trade-offs in optical and non-optical foliar traits. Our work provides a key methodological contribution by validating physically-based retrieval of plant traits from remote sensing observations, and provides insights about trait trade-offs related to plant acclimation, adaptation, and community assembly.
A DNA Barcoding Approach to Characterize Pollen Collected by Honeybees
Bruni, Ilaria; Scaccabarozzi, Daniela; Sandionigi, Anna; Barbuto, Michela; Casiraghi, Maurizio; Labra, Massimo
2014-01-01
In the present study, we investigated DNA barcoding effectiveness to characterize honeybee pollen pellets, a food supplement largely used for human nutrition due to its therapeutic properties. We collected pollen pellets using modified beehives placed in three zones within an alpine protected area (Grigna Settentrionale Regional Park, Italy). A DNA barcoding reference database, including rbcL and trnH-psbA sequences from 693 plant species (104 sequenced in this study) was assembled. The database was used to identify pollen collected from the hives. Fifty-two plant species were identified at the molecular level. Results suggested rbcL alone could not distinguish among congeneric plants; however, psbA-trnH identified most of the pollen samples at the species level. Substantial variability in pollen composition was observed between the highest elevation locality (Alpe Moconodeno), characterized by arid grasslands and a rocky substrate, and the other two sites (Cornisella and Ortanella) at lower altitudes. Pollen from Ortanella and Cornisella showed the presence of typical deciduous forest species; however in samples collected at Ortanella, pollen of the invasive Lonicera japonica, and the ornamental Pelargonium x hortorum were observed. Our results indicated pollen composition was largely influenced by floristic local biodiversity, plant phenology, and the presence of alien flowering species. Therefore, pollen molecular characterization based on DNA barcoding might serve useful to beekeepers in obtaining honeybee products with specific nutritional or therapeutic characteristics desired by food market demands. PMID:25296114
Observation of Markarian 421 in TeV Gamma Rays Over a 14-Year Time Span
NASA Technical Reports Server (NTRS)
Acciari, V. A.; Arlen, T.; Aune, T.; Benbow, W.; Bird, R.; Bouvier, A.; Bradbury, S. M.; Buckley, J. H.; Bugaev, V.; McEnery, Julie E.
2013-01-01
The variability of the blazar Markarian 421 in TeV gamma rays over a 14-year time period has been explored with theWhipple 10 m telescope. It is shown that the dynamic range of its flux variations is large and similar to that in X-rays. A correlation between the X-ray and TeV energy bands is observed during some bright flares and when the complete data sets are binned on long timescales. The main database consists of 878.4 hours of observation with theWhipple telescope, spread over 783 nights. The peak energy response of the telescope was 400 GeV with 20% uncertainty. This is the largest database of any TeV-emitting active galactic nucleus (AGN) and hence was used to explore the variability profile of Markarian 421. The time-averaged flux from Markarian 421 over this period was 0.446+/-0.008 Crab flux units. The flux exceeded 10 Crab flux units on three separate occasions. For the 2000-2001 season the average flux reached 1.86 Crab units, while in the 1996-1997 season the average flux was only 0.23 Crab units.
Asteroids Search Results in Large Photographic Sky Surveys
NASA Astrophysics Data System (ADS)
Shatokhina, S. V.; Kazantseva, L. V.; Yizhakevych, O. M.; Eglitis, I.; Andruk, V. M.
Photographic observations of XX century contained numerous and varied information about all objects and events of the Universe fixed on plates. The original and interesting observations of small bodies of the Solar system in previous years can be selected and used for various scientific tasks. Existing databases and online services can help make such selection easily and quickly. The observations of chronologically earlier ppositions, photometric evaluation of brightness for long periods of time allow refining the orbits of asteroids and identifying various non-stationaries. Photographic observations of Northern Sky Survey project and observations of clusters in UBVR bands were used for global search for small bodies of Solar system. Total we founded 2486 positions of asteroids and 13 positions of comets. All positions were compared with ephemeris. It was found that 80 positions of asteroids have a moment of observation preceding their discovery, and 19 of them are chronologically the earliest observations of these asteroids in the world.
A Database of Supercooled Large Droplet Ice Accretions [Supplement
NASA Technical Reports Server (NTRS)
VanZante, Judith Foss
2007-01-01
A unique, publicly available database regarding supercooled large droplet (SLD) ice accretions has been developed in NASA Glenn's Icing Research Tunnel. Identical cloud and flight conditions were generated for five different airfoil models. The models chosen represent a variety of aircraft types from the horizontal stabilizer of a large transport aircraft to the wings of regional, business, and general aviation aircraft. In addition to the standard documentation methods of 2D ice shape tracing and imagery, ice mass measurements were also taken. This database will also be used to validate and verify the extension of the ice accretion code, LEWICE, into the SLD realm.
A Database of Supercooled Large Droplet Ice Accretions
NASA Technical Reports Server (NTRS)
VanZante, Judith Foss
2007-01-01
A unique, publicly available database regarding supercooled large droplet ice accretions has been developed in NASA Glenn's Icing Research Tunnel. Identical cloud and flight conditions were generated for five different airfoil models. The models chosen represent a variety of aircraft types from the horizontal stabilizer of a large trans-port aircraft to the wings of regional, business, and general aviation aircraft. In addition to the standard documentation methods of 2D ice shape tracing and imagery, ice mass measurements were also taken. This database will also be used to validate and verify the extension of the ice accretion code, LEWICE, into the SLD realm.
Voss, Erica A; Makadia, Rupa; Matcho, Amy; Ma, Qianli; Knoll, Chris; Schuemie, Martijn; DeFalco, Frank J; Londhe, Ajit; Zhu, Vivienne; Ryan, Patrick B
2015-05-01
To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research. Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results. Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour. The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases. Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
NASA Technical Reports Server (NTRS)
Handley, Thomas H., Jr.; Collins, Donald J.; Doyle, Richard J.; Jacobson, Allan S.
1991-01-01
Viewgraphs on DataHub knowledge based assistance for science visualization and analysis using large distributed databases. Topics covered include: DataHub functional architecture; data representation; logical access methods; preliminary software architecture; LinkWinds; data knowledge issues; expert systems; and data management.
Risk model of valve surgery in Japan using the Japan Adult Cardiovascular Surgery Database.
Motomura, Noboru; Miyata, Hiroaki; Tsukihara, Hiroyuki; Takamoto, Shinichi
2010-11-01
Risk models of cardiac valve surgery using a large database are useful for improving surgical quality. In order to obtain accurate, high-quality assessments of surgical outcome, each geographic area should maintain its own database. The study aim was to collect Japanese data and to prepare a risk stratification of cardiac valve procedures, using the Japan Adult Cardiovascular Surgery Database (JACVSD). A total of 6562 valve procedure records from 97 participating sites throughout Japan was analyzed, using a data entry form with 255 variables that was sent to the JACVSD office from a web-based data collection system. The statistical model was constructed using multiple logistic regression. Model discrimination was tested using the area under the receiver operating characteristic curve (C-index). The model calibration was tested using the Hosmer-Lemeshow (H-L) test. Among 6562 operated cases, 15% had diabetes mellitus, 5% were urgent, and 12% involved preoperative renal failure. The observed 30-day and operative mortality rates were 2.9% and 4.0%, respectively. Significant variables with high odds ratios included emergent or salvage status (3.83), reoperation (3.43), and left ventricular dysfunction (3.01). The H-L test and C-index values for 30-day mortality were satisfactory (0.44 and 0.80, respectively). The results obtained in Japan were at least as good as those reported elsewhere. The performance of this risk model also matched that of the STS National Adult Cardiac Database and the European Society Database.
Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency
Aniceto, Rodrigo; Xavier, Rene; Guimarães, Valeria; Hondo, Fernanda; Holanda, Maristela; Walter, Maria Emilia; Lifschitz, Sérgio
2015-01-01
Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB. PMID:26558254
NASA's MERBoard: An Interactive Collaborative Workspace Platform. Chapter 4
NASA Technical Reports Server (NTRS)
Trimble, Jay; Wales, Roxana; Gossweiler, Rich
2003-01-01
This chapter describes the ongoing process by which a multidisciplinary group at NASA's Ames Research Center is designing and implementing a large interactive work surface called the MERBoard Collaborative Workspace. A MERBoard system involves several distributed, large, touch-enabled, plasma display systems with custom MERBoard software. A centralized server and database back the system. We are continually tuning MERBoard to support over two hundred scientists and engineers during the surface operations of the Mars Exploration Rover Missions. These scientists and engineers come from various disciplines and are working both in small and large groups over a span of space and time. We describe the multidisciplinary, human-centered process by which this h4ERBoard system is being designed, the usage patterns and social interactions that we have observed, and issues we are currently facing.
Digitizing Olin Eggen's Card Database
NASA Astrophysics Data System (ADS)
Crast, J.; Silvis, G.
2017-06-01
The goal of the Eggen Card Database Project is to recover as many of the photometric observations from Olin Eggen's Card Database as possible and preserve these observations, in digital forms that are accessible by anyone. Any observations of interest to the AAVSO will be added to the AAVSO International Database (AID). Given to the AAVSO on long-term loan by the Cerro Tololo Inter-American Observatory, the database is a collection of over 78,000 index cards holding all Eggen's observations made between 1960 and 1990. The cards were electronically scanned and the resulting 108,000 card images have been published as a series of 2,216 PDF files, which are available from the AAVSO web site. The same images are also stored in an AAVSO online database where they are indexed by star name and card content. These images can be viewed using the eggen card portal online tool. Eggen made observations using filter bands from five different photometric systems. He documented these observations using 15 different data recording formats. Each format represents a combination of filter magnitudes and color indexes. These observations are being transcribed onto spreadsheets, from which observations of value to the AAVSO are added to the AID. A total of 506 U, B, V, R, and I observations were added to the AID for the variable stars S Car and l Car. We would like the reader to search through the card database using the eggen card portal for stars of particular interest. If such stars are found and retrieval of the observations is desired, e-mail the authors, and we will be happy to help retrieve those data for the reader.
Process evaluation distributed system
NASA Technical Reports Server (NTRS)
Moffatt, Christopher L. (Inventor)
2006-01-01
The distributed system includes a database server, an administration module, a process evaluation module, and a data display module. The administration module is in communication with the database server for providing observation criteria information to the database server. The process evaluation module is in communication with the database server for obtaining the observation criteria information from the database server and collecting process data based on the observation criteria information. The process evaluation module utilizes a personal digital assistant (PDA). A data display module in communication with the database server, including a website for viewing collected process data in a desired metrics form, the data display module also for providing desired editing and modification of the collected process data. The connectivity established by the database server to the administration module, the process evaluation module, and the data display module, minimizes the requirement for manual input of the collected process data.
Blumenfeld, Olga O
2002-04-01
Recent advances in molecular biology and technology have provided evidence, at a molecular level, for long-known observations that the human genome is not unique but is characterized by individual sequence variation. At the present time, documentation of genetic variation occurring in a large number of genes is increasing exponentially. The characterization of alleles that encode a variety of blood group antigens has been particularly fruitful for transfusion medicine. Phenotypic variation, as identified by the serologic study of blood group variants, is required to identify the presence of a variant allele. Many of the other alleles currently recorded have been selected and identified on the basis of inherited disease traits. New approaches document single nucleotide polymorphisms that occur throughout the genome and best show how the DNA sequence varies in the human population. The primary data dealing with variant alleles or more general genomic variation are scattered throughout the scientific literature and only within the last few years has information begun to be organized into databases. This article provides guidance on how to access those databases online as a source of information about genetic variation for purposes of molecular, clinical, and diagnostic medicine, research, and teaching. The attributes of the sites are described. A more detailed view of the database dealing specifically with alleles of genes encoding the blood group antigens includes a brief preliminary analysis of the molecular basis for observed polymorphisms. Other online sites that may be particularly useful to the transfusion medicine readership as well as a brief historical account are also presented. Copyright 2002, Elsevier Science (USA). All rights reserved.
Brief Report: The Negev Hospital-University-Based (HUB) Autism Database
ERIC Educational Resources Information Center
Meiri, Gal; Dinstein, Ilan; Michaelowski, Analya; Flusser, Hagit; Ilan, Michal; Faroy, Michal; Bar-Sinai, Asif; Manelis, Liora; Stolowicz, Dana; Yosef, Lili Lea; Davidovitch, Nadav; Golan, Hava; Arbelle, Shosh; Menashe, Idan
2017-01-01
Elucidating the heterogeneous etiologies of autism will require investment in comprehensive longitudinal data acquisition from large community based cohorts. With this in mind, we have established a hospital-university-based (HUB) database of autism which incorporates prospective and retrospective data from a large and ethnically diverse…
ERIC Educational Resources Information Center
Johnson, Doug
2004-01-01
Schools gather, store and use an increasingly large amount of data. Keeping track of everything from bus routes to building access codes to test scores to sports equipment is done with the help of electronic database programs. Large databases designed for budgeting and student record keeping have long been an integral part of the educational…
NASA Technical Reports Server (NTRS)
Meegan, Charles A.
2004-01-01
The Gamma Ray Large Area Space Telescope (GLAST) observatory, scheduled for launch in 2007, comprises the Large Area Telescope (LAT) and the GLAST Burst Monitor (GBM). spectral changes that are known to occur within GRBs. between the NASA Marshall Space Flight Center, the University of Alabama in Huntsville, and the Max Planck Institute for Extraterrestrial Physics. It consists of an array of NaI and BGO scintillation detectors operating in the 10 kev to 25 MeV range. The field of view includes the entire unocculted sky when the observatory is pointing close to the zenith. The GBM will enhance LAT observations of GRBs by extending the spectral coverage into the range of current GRB databases, and will provide a trigger for reorienting the spacecraft to observe delayed emission from bursts outside the LAT field of view. GBM is expected to trigger on about 200 bursts per year, and will provide on-board locations of strong bursts accurate to better than 10 degrees.
Teaching Advanced SQL Skills: Text Bulk Loading
ERIC Educational Resources Information Center
Olsen, David; Hauser, Karina
2007-01-01
Studies show that advanced database skills are important for students to be prepared for today's highly competitive job market. A common task for database administrators is to insert a large amount of data into a database. This paper illustrates how an up-to-date, advanced database topic, namely bulk insert, can be incorporated into a database…
Sagace: A web-based search engine for biomedical databases in Japan
2012-01-01
Background In the big data era, biomedical research continues to generate a large amount of data, and the generated information is often stored in a database and made publicly available. Although combining data from multiple databases should accelerate further studies, the current number of life sciences databases is too large to grasp features and contents of each database. Findings We have developed Sagace, a web-based search engine that enables users to retrieve information from a range of biological databases (such as gene expression profiles and proteomics data) and biological resource banks (such as mouse models of disease and cell lines). With Sagace, users can search more than 300 databases in Japan. Sagace offers features tailored to biomedical research, including manually tuned ranking, a faceted navigation to refine search results, and rich snippets constructed with retrieved metadata for each database entry. Conclusions Sagace will be valuable for experts who are involved in biomedical research and drug development in both academia and industry. Sagace is freely available at http://sagace.nibio.go.jp/en/. PMID:23110816
A high performance, ad-hoc, fuzzy query processing system for relational databases
NASA Technical Reports Server (NTRS)
Mansfield, William H., Jr.; Fleischman, Robert M.
1992-01-01
Database queries involving imprecise or fuzzy predicates are currently an evolving area of academic and industrial research. Such queries place severe stress on the indexing and I/O subsystems of conventional database environments since they involve the search of large numbers of records. The Datacycle architecture and research prototype is a database environment that uses filtering technology to perform an efficient, exhaustive search of an entire database. It has recently been modified to include fuzzy predicates in its query processing. The approach obviates the need for complex index structures, provides unlimited query throughput, permits the use of ad-hoc fuzzy membership functions, and provides a deterministic response time largely independent of query complexity and load. This paper describes the Datacycle prototype implementation of fuzzy queries and some recent performance results.
Integration of NASA/GSFC and USGS Rock Magnetic Databases.
NASA Astrophysics Data System (ADS)
Nazarova, K. A.; Glen, J. M.
2004-05-01
A global Magnetic Petrology Database (MPDB) was developed and continues to be updated at NASA/Goddard Space Flight Center. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via the Internet for a more realistic interpretation of satellite (as well as aeromagnetic and ground) lithospheric magnetic anomalies. The MPDB contains data on rocks from localities around the world (about 19,000 samples) including the Ukranian and Baltic Shields, Kamchatka, Iceland, Urals Mountains, etc. The MPDB is designed, managed and presented on the web as a research oriented database. Several database applications have been specifically developed for data manipulation and analysis of the MPDB. The geophysics unit at the USGS in Menlo Park has over 17,000 rock-property data, largely from sites within the western U.S. This database contains rock-density and rock-magnetic parameters collected for use in gravity and magnetic field modeling, and paleomagnetic studies. Most of these data were taken from surface outcrops and together they span a broad range of rock types. Measurements were made either in-situ at the outcrop, or in the laboratory on hand samples and paleomagnetic cores acquired in the field. The USGS and NASA/GSFC data will be integrated as part of an effort to provide public access to a single, uniformly maintained database. Due to the large number of data and the very large area sampled, the database can yield rock-property statistics on a broad range of rock types; it is thus applicable to study areas beyond the geographic scope of the database. The intent of this effort is to provide incentive for others to further contribute to the database, and a tool with which the geophysical community can entertain studies formerly precluded.
NASA Astrophysics Data System (ADS)
Templeton, Matthew R.
2009-08-01
Nova Ophiuchi 2009 was discovered by Koichi Itagaki, Teppo-Cho, Yamagata, Japan, at unfiltered CCD magnitude 10.1 on August 16.515 UT, and confirmed by him on Aug. 16.526. After posting to the CBET Unconfirmed Observations page, the object was confirmed independently by several observers. The discovery and confirmatory information were intially reported in CBET 1910, CBET 1911, and AAVSO Special Notice #166. The nova, located in a very crowded field within the Milky Way, is reported by T. Kato (vsnet-alert 11399) to have a large B-V (+1.6), indicating it is highly reddened. N Oph 2009 has been assigned the identifiers VSX J173819.7-264413 and the AUID 000-BJP-605. Please submit observations to the AAVSO International Database using the name N OPH 2009.
NASA Astrophysics Data System (ADS)
Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.
2013-09-01
Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying Lateral Boundary Conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2000-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complimented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone vertical profiles. The results show performance is largely within uncertainty estimates for the Tropospheric Emission Spectrometer (TES) with some exceptions. The major difference shows a high bias in the upper troposphere along the southern boundary in January. This publication documents the global simulation database, the tool for conversion to LBC, and the fidelity of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.
Compressing DNA sequence databases with coil.
White, W Timothy J; Hendy, Michael D
2008-05-20
Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression - an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression - the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work.
Compressing DNA sequence databases with coil
White, W Timothy J; Hendy, Michael D
2008-01-01
Background Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work. PMID:18489794
BioMart: a data federation framework for large collaborative projects.
Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek
2011-01-01
BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.
Characterising droughts in Central America with uncertain hydro-meteorological data
NASA Astrophysics Data System (ADS)
Quesada Montano, B.; Westerberg, I.; Wetterhall, F.; Hidalgo, H. G.; Halldin, S.
2015-12-01
Droughts studies are scarce in Central America, a region frequently affected by droughts that cause significant socio-economic and environmental problems. Drought characterisation is important for water management and planning and can be done with the help of drought indices. Many indices have been developed in the last decades but their ability to suitably characterise droughts depends on the region of application. In Central America, comprehensive and high-quality observational networks of meteorological and hydrological data are not available. This limits the choice of drought indices and denotes the need to evaluate the quality of the data used in their calculation. This paper aimed to find which combination(s) of drought index and meteorological database are most suitable for characterising droughts in Central America. The drought indices evaluated were the standardised precipitation index (SPI), deciles (DI), the standardised precipitation evapotranspiration index (SPEI) and the effective drought index (EDI). These were calculated using precipitation data from the Climate Hazards Group Infra-Red Precipitation with station (CHIRPS), CRN073, the Climate Research Unit (CRU), ERA-Interim and station databases, and temperature data from the CRU database. All the indices were calculated at 1-, 3-, 6-, 9- and 12-month accumulation times. As a first step, the large-scale meteorological precipitation datasets were compared to have an overview of the level of agreement between them and find possible quality problems. Then, the performance of all the combinations of drought indices and meteorological datasets were evaluated against independent river discharge data, in form of the standardised streamflow index (SSI). Results revealed the large disagreement between the precipitation datasets; we found the selection of database to be more important than the selection of drought index. We found that the best combinations of meteorological drought index and database were obtained using the SPI and DI, calculated with CHIRPS and station data.
Evaluating Land-Atmosphere Interactions with the North American Soil Moisture Database
NASA Astrophysics Data System (ADS)
Giles, S. M.; Quiring, S. M.; Ford, T.; Chavez, N.; Galvan, J.
2015-12-01
The North American Soil Moisture Database (NASMD) is a high-quality observational soil moisture database that was developed to study land-atmosphere interactions. It includes over 1,800 monitoring stations the United States, Canada and Mexico. Soil moisture data are collected from multiple sources, quality controlled and integrated into an online database (soilmoisture.tamu.edu). The period of record varies substantially and only a few of these stations have an observation record extending back into the 1990s. Daily soil moisture observations have been quality controlled using the North American Soil Moisture Database QAQC algorithm. The database is designed to facilitate observationally-driven investigations of land-atmosphere interactions, validation of the accuracy of soil moisture simulations in global land surface models, satellite calibration/validation for SMOS and SMAP, and an improved understanding of how soil moisture influences climate on seasonal to interannual timescales. This paper provides some examples of how the NASMD has been utilized to enhance understanding of land-atmosphere interactions in the U.S. Great Plains.
A Community Data Model for Hydrologic Observations
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Horsburgh, J. S.; Zaslavsky, I.; Maidment, D. R.; Valentine, D.; Jennings, B.
2006-12-01
The CUAHSI Hydrologic Information System project is developing information technology infrastructure to support hydrologic science. Hydrologic information science involves the description of hydrologic environments in a consistent way, using data models for information integration. This includes a hydrologic observations data model for the storage and retrieval of hydrologic observations in a relational database designed to facilitate data retrieval for integrated analysis of information collected by multiple investigators. It is intended to provide a standard format to facilitate the effective sharing of information between investigators and to facilitate analysis of information within a single study area or hydrologic observatory, or across hydrologic observatories and regions. The observations data model is designed to store hydrologic observations and sufficient ancillary information (metadata) about the observations to allow them to be unambiguously interpreted and used and provide traceable heritage from raw measurements to usable information. The design is based on the premise that a relational database at the single observation level is most effective for providing querying capability and cross dimension data retrieval and analysis. This premise is being tested through the implementation of a prototype hydrologic observations database, and the development of web services for the retrieval of data from and ingestion of data into the database. These web services hosted by the San Diego Supercomputer center make data in the database accessible both through a Hydrologic Data Access System portal and directly from applications software such as Excel, Matlab and ArcGIS that have Standard Object Access Protocol (SOAP) capability. This paper will (1) describe the data model; (2) demonstrate the capability for representing diverse data in the same database; (3) demonstrate the use of the database from applications software for the performance of hydrologic analysis across different observation types.
Gagnon, Alain; Smith, Ken R; Tremblay, Marc; Vézina, Hélène; Paré, Paul-Philippe; Desjardins, Bertrand
2009-01-01
Frontier populations provide exceptional opportunities to test the hypothesis of a trade-off between fertility and longevity. In such populations, mechanisms favoring reproduction usually find fertile ground, and if these mechanisms reduce longevity, demographers should observe higher postreproductive mortality among highly fertile women. We test this hypothesis using complete female reproductive histories from three large demographic databases: the Registre de la population du Québec ancien (Université de Montréal), which covers the first centuries of settlement in Quebec; the BALSAC database (Université du Québec à Chicoutimi), including comprehensive records for the Saguenay-Lac-St-Jean (SLSJ) in Quebec in the nineteenth and twentieth centuries; and the Utah Population Database (University of Utah), including all individuals who experienced a vital event on the Mormon Trail and their descendants. Together, the three samples allow for comparisons over time and space, and represent one of the largest set of natural fertility cohorts used to simultaneously assess reproduction and longevity. Using survival analyses, we found a negative influence of parity and a positive influence of age at last child on postreproductive survival in the three populations, as well as a significant interaction between these two variables. The effect sizes of all these parameters were remarkably similar in the three samples. However, we found little evidence that early fertility affects postreproductive survival. The use of Heckman's procedure assessing the impact of mortality selection during reproductive ages did not appreciably alter these results. We conclude our empirical investigation by discussing the advantages of comparative approaches. 2009 Wiley-Liss, Inc.
Application of Large-Scale Database-Based Online Modeling to Plant State Long-Term Estimation
NASA Astrophysics Data System (ADS)
Ogawa, Masatoshi; Ogai, Harutoshi
Recently, attention has been drawn to the local modeling techniques of a new idea called “Just-In-Time (JIT) modeling”. To apply “JIT modeling” to a large amount of database online, “Large-scale database-based Online Modeling (LOM)” has been proposed. LOM is a technique that makes the retrieval of neighboring data more efficient by using both “stepwise selection” and quantization. In order to predict the long-term state of the plant without using future data of manipulated variables, an Extended Sequential Prediction method of LOM (ESP-LOM) has been proposed. In this paper, the LOM and the ESP-LOM are introduced.
Challenges in the automated classification of variable stars in large databases
NASA Astrophysics Data System (ADS)
Graham, Matthew; Drake, Andrew; Djorgovski, S. G.; Mahabal, Ashish; Donalek, Ciro
2017-09-01
With ever-increasing numbers of astrophysical transient surveys, new facilities and archives of astronomical time series, time domain astronomy is emerging as a mainstream discipline. However, the sheer volume of data alone - hundreds of observations for hundreds of millions of sources - necessitates advanced statistical and machine learning methodologies for scientific discovery: characterization, categorization, and classification. Whilst these techniques are slowly entering the astronomer's toolkit, their application to astronomical problems is not without its issues. In this paper, we will review some of the challenges posed by trying to identify variable stars in large data collections, including appropriate feature representations, dealing with uncertainties, establishing ground truths, and simple discrete classes.
Evolution of the Tropical Cyclone Integrated Data Exchange And Analysis System (TC-IDEAS)
NASA Technical Reports Server (NTRS)
Turk, J.; Chao, Y.; Haddad, Z.; Hristova-Veleva, S.; Knosp, B.; Lambrigtsen, B.; Li, P.; Licata, S.; Poulsen, W.; Su, H.;
2010-01-01
The Tropical Cyclone Integrated Data Exchange and Analysis System (TC-IDEAS) is being jointly developed by the Jet Propulsion Laboratory (JPL) and the Marshall Space Flight Center (MSFC) as part of NASA's Hurricane Science Research Program. The long-term goal is to create a comprehensive tropical cyclone database of satellite and airborne observations, in-situ measurements and model simulations containing parameters that pertain to the thermodynamic and microphysical structure of the storms; the air-sea interaction processes; and the large-scale environment.
Mining the Galaxy Zoo Database: Machine Learning Applications
NASA Astrophysics Data System (ADS)
Borne, Kirk D.; Wallin, J.; Vedachalam, A.; Baehr, S.; Lintott, C.; Darg, D.; Smith, A.; Fortson, L.
2010-01-01
The new Zooniverse initiative is addressing the data flood in the sciences through a transformative partnership between professional scientists, volunteer citizen scientists, and machines. As part of this project, we are exploring the application of machine learning techniques to data mining problems associated with the large and growing database of volunteer science results gathered by the Galaxy Zoo citizen science project. We will describe the basic challenge, some machine learning approaches, and early results. One of the motivators for this study is the acquisition (through the Galaxy Zoo results database) of approximately 100 million classification labels for roughly one million galaxies, yielding a tremendously large and rich set of training examples for improving automated galaxy morphological classification algorithms. In our first case study, the goal is to learn which morphological and photometric features in the Sloan Digital Sky Survey (SDSS) database correlate most strongly with user-selected galaxy morphological class. As a corollary to this study, we are also aiming to identify which galaxy parameters in the SDSS database correspond to galaxies that have been the most difficult to classify (based upon large dispersion in their volunter-provided classifications). Our second case study will focus on similar data mining analyses and machine leaning algorithms applied to the Galaxy Zoo catalog of merging and interacting galaxies. The outcomes of this project will have applications in future large sky surveys, such as the LSST (Large Synoptic Survey Telescope) project, which will generate a catalog of 20 billion galaxies and will produce an additional astronomical alert database of approximately 100 thousand events each night for 10 years -- the capabilities and algorithms that we are exploring will assist in the rapid characterization and classification of such massive data streams. This research has been supported in part through NSF award #0941610.
Release of (and lessons learned from mining) a pioneering large toxicogenomics database.
Sandhu, Komal S; Veeramachaneni, Vamsi; Yao, Xiang; Nie, Alex; Lord, Peter; Amaratunga, Dhammika; McMillian, Michael K; Verheyen, Geert R
2015-07-01
We release the Janssen Toxicogenomics database. This rat liver gene-expression database was generated using Codelink microarrays, and has been used over the past years within Janssen to derive signatures for multiple end points and to classify proprietary compounds. The release consists of gene-expression responses to 124 compounds, selected to give a broad coverage of liver-active compounds. A selection of the compounds were also analyzed on Affymetrix microarrays. The release includes results of an in-house reannotation pipeline to Entrez gene annotations, to classify probes into different confidence classes. High confidence unambiguously annotated probes were used to create gene-level data which served as starting point for cross-platform comparisons. Connectivity map-based similarity methods show excellent agreement between Codelink and Affymetrix runs of the same samples. We also compared our dataset with the Japanese Toxicogenomics Project and observed reasonable agreement, especially for compounds with stronger gene signatures. We describe an R-package containing the gene-level data and show how it can be used for expression-based similarity searches. Comparing the same biological samples run on the Affymetrix and the Codelink platform, good correspondence is observed using connectivity mapping approaches. As expected, this correspondence is smaller when the data are compared with an independent dataset such as TG-GATE. We hope that this collection of gene-expression profiles will be incorporated in toxicogenomics pipelines of users.
Dione's resurfacing history as determined from a global impact crater database
NASA Astrophysics Data System (ADS)
Kirchoff, Michelle R.; Schenk, Paul
2015-08-01
Saturn's moon Dione has an interesting and unique resurfacing history recorded by the impact craters on its surface. In order to further resolve this history, we compile a crater database that is nearly global for diameters (D) equal to and larger than 4 km using standard techniques and Cassini Imaging Science Subsystem images. From this database, spatial crater density maps for different diameter ranges are generated. These maps, along with the observed surface morphology, have been used to define seven terrain units for Dione, including refinement of the smooth and "wispy" (or faulted) units from Voyager observations. Analysis of the terrains' crater size-frequency distributions (SFDs) indicates that: (1) removal of D ≈ 4-50 km craters in the "wispy" terrain was most likely by the formation of D ≳ 50 km craters, not faulting, and likely occurred over a couple billion of years; (2) resurfacing of the smooth plains was most likely by cryovolcanism at ∼2 Ga; (3) most of Dione's largest craters (D ⩾ 100 km), including Evander (D = 350 km), may have formed quite recently (<2 Ga), but are still relaxed, indicating Dione has been thermally active for at least half its history; and (4) the variation in crater SFDs at D ≈ 4-15 km is plausibly due to different levels of minor resurfacing (mostly subsequent large impacts) within each terrain.
Evaluating the Impact of Database Heterogeneity on Observational Study Results
Madigan, David; Ryan, Patrick B.; Schuemie, Martijn; Stang, Paul E.; Overhage, J. Marc; Hartzema, Abraham G.; Suchard, Marc A.; DuMouchel, William; Berlin, Jesse A.
2013-01-01
Clinical studies that use observational databases to evaluate the effects of medical products have become commonplace. Such studies begin by selecting a particular database, a decision that published papers invariably report but do not discuss. Studies of the same issue in different databases, however, can and do generate different results, sometimes with strikingly different clinical implications. In this paper, we systematically study heterogeneity among databases, holding other study methods constant, by exploring relative risk estimates for 53 drug-outcome pairs and 2 widely used study designs (cohort studies and self-controlled case series) across 10 observational databases. When holding the study design constant, our analysis shows that estimated relative risks range from a statistically significant decreased risk to a statistically significant increased risk in 11 of 53 (21%) of drug-outcome pairs that use a cohort design and 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design. This exceeds the proportion of pairs that were consistent across databases in both direction and statistical significance, which was 9 of 53 (17%) for cohort studies and 5 of 53 (9%) for self-controlled case series. Our findings show that clinical studies that use observational databases can be sensitive to the choice of database. More attention is needed to consider how the choice of data source may be affecting results. PMID:23648805
Using Large-Scale Databases in Evaluation: Advances, Opportunities, and Challenges
ERIC Educational Resources Information Center
Penuel, William R.; Means, Barbara
2011-01-01
Major advances in the number, capabilities, and quality of state, national, and transnational databases have opened up new opportunities for evaluators. Both large-scale data sets collected for administrative purposes and those collected by other researchers can provide data for a variety of evaluation-related activities. These include (a)…
Improving the Scalability of an Exact Approach for Frequent Item Set Hiding
ERIC Educational Resources Information Center
LaMacchia, Carolyn
2013-01-01
Technological advances have led to the generation of large databases of organizational data recognized as an information-rich, strategic asset for internal analysis and sharing with trading partners. Data mining techniques can discover patterns in large databases including relationships considered strategically relevant to the owner of the data.…
Reflections on CD-ROM: Bridging the Gap between Technology and Purpose.
ERIC Educational Resources Information Center
Saviers, Shannon Smith
1987-01-01
Provides a technological overview of CD-ROM (Compact Disc-Read Only Memory), an optically-based medium for data storage offering large storage capacity, computer-based delivery system, read-only medium, and economic mass production. CD-ROM database attributes appropriate for information delivery are also reviewed, including large database size,…
ERIC Educational Resources Information Center
Rice, Michael; Gladstone, William; Weir, Michael
2004-01-01
We discuss how relational databases constitute an ideal framework for representing and analyzing large-scale genomic data sets in biology. As a case study, we describe a Drosophila splice-site database that we recently developed at Wesleyan University for use in research and teaching. The database stores data about splice sites computed by a…
Cost and cost-effectiveness studies in urologic oncology using large administrative databases.
Wang, Ye; Mossanen, Matthew; Chang, Steven L
2018-04-01
Urologic cancers are not only among the most common types of cancers, but also among the most expensive cancers to treat in the United States. This study aimed to review the use of CEAs and other cost analyses in urologic oncology using large databases to better understand the value of management strategies of these cancers. A literature review on CEAs and other cost analyses in urologic oncology using large databases. The options for and costs of diagnosing, treating, and following patients with urologic cancers can be expected to rise in the coming years. There are numerous opportunities in each urologic cancer to use CEAs to both lower costs and provide high-quality services. Improved cancer care must balance the integration of novelty with ensuring reasonable costs to patients and the health care system. With the increasing focus cost containment, appreciating the value of competing strategies in caring for our patients is pivotal. Leveraging methods such as CEAs and harnessing large databases may help evaluate the merit of established or emerging strategies. Copyright © 2018 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Castillo, Richard; Castillo, Edward; Fuentes, David; Ahmad, Moiz; Wood, Abbie M.; Ludwig, Michelle S.; Guerrero, Thomas
2013-05-01
Landmark point-pairs provide a strategy to assess deformable image registration (DIR) accuracy in terms of the spatial registration of the underlying anatomy depicted in medical images. In this study, we propose to augment a publicly available database (www.dir-lab.com) of medical images with large sets of manually identified anatomic feature pairs between breath-hold computed tomography (BH-CT) images for DIR spatial accuracy evaluation. Ten BH-CT image pairs were randomly selected from the COPDgene study cases. Each patient had received CT imaging of the entire thorax in the supine position at one-fourth dose normal expiration and maximum effort full dose inspiration. Using dedicated in-house software, an imaging expert manually identified large sets of anatomic feature pairs between images. Estimates of inter- and intra-observer spatial variation in feature localization were determined by repeat measurements of multiple observers over subsets of randomly selected features. 7298 anatomic landmark features were manually paired between the 10 sets of images. Quantity of feature pairs per case ranged from 447 to 1172. Average 3D Euclidean landmark displacements varied substantially among cases, ranging from 12.29 (SD: 6.39) to 30.90 (SD: 14.05) mm. Repeat registration of uniformly sampled subsets of 150 landmarks for each case yielded estimates of observer localization error, which ranged in average from 0.58 (SD: 0.87) to 1.06 (SD: 2.38) mm for each case. The additions to the online web database (www.dir-lab.com) described in this work will broaden the applicability of the reference data, providing a freely available common dataset for targeted critical evaluation of DIR spatial accuracy performance in multiple clinical settings. Estimates of observer variance in feature localization suggest consistent spatial accuracy for all observers across both four-dimensional CT and COPDgene patient cohorts.
NASA Technical Reports Server (NTRS)
Benson, Robert F.; Fainberg, Joseph; Osherovich, Vladimir A.; Truhlik, Vladimir; Wang, Yongli; Bilitza, Dieter; Fung, Shing F.
2016-01-01
Large magnetic-storm-induced changes were detected in high-latitude topside vertical electron density profiles Ne(h) in a database of profiles and digital topside ionograms, from the International Satellites for Ionospheric Studies (ISIS) program, that enabled Ne(h) profiles to be obtained in nearly the same region of space before, during, and after a major magnetic storm (Dst -100nT). Storms where Ne(h) profiles were available in the high-latitude Northern Hemisphere had better coverage of solar wind parameters than storms with available Ne(h) profiles in the high-latitude Southern Hemisphere. Large Ne(h) changes were observed during all storms, with enhancements and depletions sometimes near a factor of 10 and 0.1, respectively, but with substantial differences in the responses in the two hemispheres. Large spatial andor temporal Ne(h) changes were often observed during Dst minimum and during the storm recovery phase. The storm-induced Ne(h) changes were the most pronounced and consistent in the Northern Hemisphere in that large enhancements were observed during winter nighttime and large depletions during winter and spring daytime. The limited available cases suggested that these Northern Hemisphere enhancements increased with increases of the time-shifted solar wind velocity v, magnetic field B, and with more negative values of the B components except for the highest common altitude (1100km) of the profiles. There was also some evidence suggesting that the Northern Hemisphere depletions were related to changes in the solar wind parameters. Southern Hemisphere storm-induced enhancements and depletions were typically considerably less with depletions observed during summer nighttime conditions and enhancements during summer daytime and fall nighttime conditions.
NASA Astrophysics Data System (ADS)
Benson, Robert F.; Fainberg, Joseph; Osherovich, Vladimir A.; Truhlik, Vladimir; Wang, Yongli; Bilitza, Dieter; Fung, Shing F.
2016-05-01
Large magnetic-storm-induced changes were detected in high-latitude topside vertical electron density profiles Ne(h) in a database of profiles and digital topside ionograms, from the International Satellites for Ionospheric Studies (ISIS) program, that enabled Ne(h) profiles to be obtained in nearly the same region of space before, during, and after a major magnetic storm (Dst < -100 nT). Storms where Ne(h) profiles were available in the high-latitude Northern Hemisphere had better coverage of solar wind parameters than storms with available Ne(h) profiles in the high-latitude Southern Hemisphere. Large Ne(h) changes were observed during all storms, with enhancements and depletions sometimes near a factor of 10 and 0.1, respectively, but with substantial differences in the responses in the two hemispheres. Large spatial and/or temporal Ne(h) changes were often observed during Dst minimum and during the storm recovery phase. The storm-induced Ne(h) changes were the most pronounced and consistent in the Northern Hemisphere in that large enhancements were observed during winter nighttime and large depletions during winter and spring daytime. The limited available cases suggested that these Northern Hemisphere enhancements increased with increases of the time-shifted solar wind velocity v, magnetic field B, and with more negative values of the B components except for the highest common altitude (1100 km) of the profiles. There was also some evidence suggesting that the Northern Hemisphere depletions were related to changes in the solar wind parameters. Southern Hemisphere storm-induced enhancements and depletions were typically considerably less with depletions observed during summer nighttime conditions and enhancements during summer daytime and fall nighttime conditions.
LSD: Large Survey Database framework
NASA Astrophysics Data System (ADS)
Juric, Mario
2012-09-01
The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures.
Footprint Database and web services for the Herschel space observatory
NASA Astrophysics Data System (ADS)
Verebélyi, Erika; Dobos, László; Kiss, Csaba
2015-08-01
Using all telemetry and observational meta-data, we created a searchable database of Herschel observation footprints. Data from the Herschel space observatory is freely available for everyone but no uniformly processed catalog of all observations has been published yet. As a first step, we unified the data model for all three Herschel instruments in all observation modes and compiled a database of sky coverage information. As opposed to methods using a pixellation of the sphere, in our database, sky coverage is stored in exact geometric form allowing for precise area calculations. Indexing of the footprints allows for very fast search among observations based on pointing, time, sky coverage overlap and meta-data. This enables us, for example, to find moving objects easily in Herschel fields. The database is accessible via a web site and also as a set of REST web service functions which makes it usable from program clients like Python or IDL scripts. Data is available in various formats including Virtual Observatory standards.
Systematic Observations of the Slip-pulse Properties of Large Earthquake Ruptures
NASA Astrophysics Data System (ADS)
Melgar, D.; Hayes, G. P.
2017-12-01
In earthquake dynamics there are two end member models of rupture: propagating cracks and self-healing pulses. These arise due to different properties of ruptures and have implications for seismic hazard; rupture mode controls near-field strong ground motions. Past studies favor the pulse-like mode of rupture, however, due to a variety of limitations, it has proven difficult to systematically establish their kinematic properties. Here we synthesize observations from a database of >150 rupture models of earthquakes spanning M7-M9 processed in a uniform manner and show the magnitude scaling properties (rise time, pulse width, and peak slip rate) of these slip pulses indicates self-similarity. Self similarity suggests a weak form of rupture determinism, where early on in the source process broader, higher amplitude slip pulses will distinguish between events of icnreasing magnitude. Indeed, we find by analyzing the moment rate functions that large and very large events are statistically distinguishable relatively early (at 15 seconds) in the rupture process. This suggests that with dense regional geophysical networks strong ground motions from a large rupture can be identified before their onset across the source region.
TAPAS, a VO archive at the IRAM 30-m telescope
NASA Astrophysics Data System (ADS)
Leon, Stephane; Espigares, Victor; Ruíz, José Enrique; Verdes-Montenegro, Lourdes; Mauersberger, Rainer; Brunswig, Walter; Kramer, Carsten; Santander-Vela, Juan de Dios; Wiesemeyer, Helmut
2012-07-01
Astronomical observatories are today generating increasingly large volumes of data. For an efficient use of them, databases have been built following the standards proposed by the International Virtual Observatory Alliance (IVOA), providing a common protocol to query them and make them interoperable. The IRAM 30-m radio telescope, located in Sierra Nevada (Granada, Spain) is a millimeter wavelength telescope with a constantly renewed, extensive choice of instruments, and capable of covering the frequency range between 80 and 370 GHz. It is continuously producing a large amount of data thanks to the more than 200 scientific projects observed each year. The TAPAS archive at the IRAM 30-m telescope is aimed to provide public access to the headers describing the observations performed with the telescope, according to a defined data policy, making as well the technical data available to the IRAM staff members. A special emphasis has been made to make it Virtual Observatory (VO) compliant, and to offer a VO compliant web interface allowing to make the information available to the scientific community. TAPAS is built using the Django Python framework on top of a relational MySQL database, and is fully integrated with the telescope control system. The TAPAS data model (DM) is based on the Radio Astronomical DAta Model for Single dish radio telescopes (RADAMS), to allow for easy integration into the VO infrastructure. A metadata modeling layer is used by the data-filler to allow an implementation free from assumptions about the control system and the underlying database. TAPAS and its public web interface (
Designing Reliable Cohorts of Cardiac Patients across MIMIC and eICU
Chronaki, Catherine; Shahin, Abdullah; Mark, Roger
2016-01-01
The design of the patient cohort is an essential and fundamental part of any clinical patient study. Knowledge of the Electronic Health Records, underlying Database Management System, and the relevant clinical workflows are central to an effective cohort design. However, with technical, semantic, and organizational interoperability limitations, the database queries associated with a patient cohort may need to be reconfigured in every participating site. i2b2 and SHRINE advance the notion of patient cohorts as first class objects to be shared, aggregated, and recruited for research purposes across clinical sites. This paper reports on initial efforts to assess the integration of Medical Information Mart for Intensive Care (MIMIC) and Philips eICU, two large-scale anonymized intensive care unit (ICU) databases, using standard terminologies, i.e. LOINC, ICD9-CM and SNOMED-CT. Focus of this work is lab and microbiology observations and key demographics for patients with a primary cardiovascular ICD9-CM diagnosis. Results and discussion reflecting on reference core terminology standards, offer insights on efforts to combine detailed intensive care data from multiple ICUs worldwide. PMID:27774488
Australia's continental-scale acoustic tracking database and its automated quality control process
NASA Astrophysics Data System (ADS)
Hoenner, Xavier; Huveneers, Charlie; Steckenreuter, Andre; Simpfendorfer, Colin; Tattersall, Katherine; Jaine, Fabrice; Atkins, Natalia; Babcock, Russ; Brodie, Stephanie; Burgess, Jonathan; Campbell, Hamish; Heupel, Michelle; Pasquer, Benedicte; Proctor, Roger; Taylor, Matthew D.; Udyawer, Vinay; Harcourt, Robert
2018-01-01
Our ability to predict species responses to environmental changes relies on accurate records of animal movement patterns. Continental-scale acoustic telemetry networks are increasingly being established worldwide, producing large volumes of information-rich geospatial data. During the last decade, the Integrated Marine Observing System's Animal Tracking Facility (IMOS ATF) established a permanent array of acoustic receivers around Australia. Simultaneously, IMOS developed a centralised national database to foster collaborative research across the user community and quantify individual behaviour across a broad range of taxa. Here we present the database and quality control procedures developed to collate 49.6 million valid detections from 1891 receiving stations. This dataset consists of detections for 3,777 tags deployed on 117 marine species, with distances travelled ranging from a few to thousands of kilometres. Connectivity between regions was only made possible by the joint contribution of IMOS infrastructure and researcher-funded receivers. This dataset constitutes a valuable resource facilitating meta-analysis of animal movement, distributions, and habitat use, and is important for relating species distribution shifts with environmental covariates.
The dye-sensitized solar cell database.
Venkatraman, Vishwesh; Raju, Rajesh; Oikonomopoulos, Solon P; Alsberg, Bjørn K
2018-04-03
Dye-sensitized solar cells (DSSCs) have garnered a lot of attention in recent years. The solar energy to power conversion efficiency of a DSSC is influenced by various components of the cell such as the dye, electrolyte, electrodes and additives among others leading to varying experimental configurations. A large number of metal-based and metal-free dye sensitizers have now been reported and tools using such data to indicate new directions for design and development are on the rise. DSSCDB, the first of its kind dye-sensitized solar cell database, aims to provide users with up-to-date information from publications on the molecular structures of the dyes, experimental details and reported measurements (efficiencies and spectral properties) and thereby facilitate a comprehensive and critical evaluation of the data. Currently, the DSSCDB contains over 4000 experimental observations spanning multiple dye classes such as triphenylamines, carbazoles, coumarins, phenothiazines, ruthenium and porphyrins. The DSSCDB offers a web-based, comprehensive source of property data for dye sensitized solar cells. Access to the database is available through the following URL: www.dyedb.com .
Affective norms for 720 French words rated by children and adolescents (FANchild).
Monnier, Catherine; Syssau, Arielle
2017-10-01
FANchild (French Affective Norms for Children) provides norms of valence and arousal for a large corpus of French words (N = 720) rated by 908 French children and adolescents (ages 7, 9, 11, and 13). The ratings were made using the Self-Assessment Manikin (Lang, 1980). Because it combines evaluations of arousal and valence and includes ratings provided by 7-, 9-, 11-, and 13-year-olds, this database complements and extends existing French-language databases. Good response reliability was observed in each of the four age groups. Despite a significant level of consensus, we found age differences in both the valence and arousal ratings: Seven- and 9-year-old children gave higher mean valence and arousal ratings than did the other age groups. Moreover, the tendency to judge words positively (i.e., positive bias) decreased with age. This age- and sex-related database will enable French-speaking researchers to study how the emotional character of words influences their cognitive processing, and how this influence evolves with age. FANchild is available at https://www.researchgate.net/profile/Catherine_Monnier/contributions .
The Arctic Observing Viewer: A Web-mapping Application for U.S. Arctic Observing Activities
NASA Astrophysics Data System (ADS)
Cody, R. P.; Manley, W. F.; Gaylord, A. G.; Kassin, A.; Villarreal, S.; Barba, M.; Dover, M.; Escarzaga, S. M.; Habermann, T.; Kozimor, J.; Score, R.; Tweedie, C. E.
2015-12-01
Although a great deal of progress has been made with various arctic observing efforts, it can be difficult to assess such progress when so many agencies, organizations, research groups and others are making such rapid progress over such a large expanse of the Arctic. To help meet the strategic needs of the U.S. SEARCH-AON program and facilitate the development of SAON and other related initiatives, the Arctic Observing Viewer (AOV; http://ArcticObservingViewer.org) has been developed. This web mapping application compiles detailed information pertaining to U.S. Arctic Observing efforts. Contributing partners include the U.S. NSF, USGS, ACADIS, ADIwg, AOOS, a2dc, AON, ARMAP, BAID, IASOA, INTERACT, and others. Over 7700 observation sites are currently in the AOV database and the application allows users to visualize, navigate, select, advance search, draw, print, and more. During 2015, the web mapping application has been enhanced by the addition of a query builder that allows users to create rich and complex queries. AOV is founded on principles of software and data interoperability and includes an emerging "Project" metadata standard, which uses ISO 19115-1 and compatible web services. Substantial efforts have focused on maintaining and centralizing all database information. In order to keep up with emerging technologies, the AOV data set has been structured and centralized within a relational database and the application front-end has been ported to HTML5 to enable mobile access. Other application enhancements include an embedded Apache Solr search platform which provides users with the capability to perform advance searches and an administration web based data management system that allows administrators to add, update, and delete information in real time. We encourage all collaborators to use AOV tools and services for their own purposes and to help us extend the impact of our efforts and ensure AOV complements other cyber-resources. Reinforcing dispersed but interoperable resources in this way will help to ensure improved capacities for conducting activities such as assessing the status of arctic observing efforts, optimizing logistic operations, and for quickly accessing external and project-focused web resources for more detailed information and access to scientific data and derived products.
2013-01-01
This evidence-based analysis reviews relational and management continuity of care. Relational continuity refers to the duration and quality of the relationship between the care provider and the patient. Management continuity ensures that patients receive coherent, complementary, and timely care. There are 4 components of continuity of care: duration, density, dispersion, and sequence. The objective of this evidence-based analysis was to determine if continuity of care is associated with decreased health resource utilization, improved patient outcomes, and patient satisfaction. MEDLINE, EMBASE, CINAHL, the Cochrane Library, and the Centre for Reviews and Dissemination database were searched for studies on continuity of care and chronic disease published from January 2002 until December 2011. Systematic reviews, randomized controlled trials, and observational studies were eligible if they assessed continuity of care in adults and reported health resource utilization, patient outcomes, or patient satisfaction. Eight systematic reviews and 13 observational studies were identified. The reviews concluded that there is an association between continuity of care and outcomes; however, the literature base is weak. The observational studies found that higher continuity of care was frequently associated with fewer hospitalizations and emergency department visits. Three systematic reviews reported that higher continuity of care is associated with improved patient satisfaction, especially among patients with chronic conditions. Most of the studies were retrospective cross-sectional studies of large administrative databases. The databases do not capture information on trust and confidence in the provider, which is a critical component of relational continuity of care. The definitions for the selection of patients from the databases varied across studies. There is low quality evidence that: Higher continuity of care is associated with decreased health service utilization.There is insufficient evidence on the relationship of continuity of care with disease-specific outcomes.There is an association between high continuity of care and patient satisfaction, particularly among patients with chronic diseases.
Assignment to database industy
NASA Astrophysics Data System (ADS)
Abe, Kohichiroh
Various kinds of databases are considered to be essential part in future large sized systems. Information provision only by databases is also considered to be growing as the market becomes mature. This paper discusses how such circumstances have been built and will be developed from now on.
The MPI Emotional Body Expressions Database for Narrative Scenarios
Volkova, Ekaterina; de la Rosa, Stephan; Bülthoff, Heinrich H.; Mohler, Betty
2014-01-01
Emotion expression in human-human interaction takes place via various types of information, including body motion. Research on the perceptual-cognitive mechanisms underlying the processing of natural emotional body language can benefit greatly from datasets of natural emotional body expressions that facilitate stimulus manipulation and analysis. The existing databases have so far focused on few emotion categories which display predominantly prototypical, exaggerated emotion expressions. Moreover, many of these databases consist of video recordings which limit the ability to manipulate and analyse the physical properties of these stimuli. We present a new database consisting of a large set (over 1400) of natural emotional body expressions typical of monologues. To achieve close-to-natural emotional body expressions, amateur actors were narrating coherent stories while their body movements were recorded with motion capture technology. The resulting 3-dimensional motion data recorded at a high frame rate (120 frames per second) provides fine-grained information about body movements and allows the manipulation of movement on a body joint basis. For each expression it gives the positions and orientations in space of 23 body joints for every frame. We report the results of physical motion properties analysis and of an emotion categorisation study. The reactions of observers from the emotion categorisation study are included in the database. Moreover, we recorded the intended emotion expression for each motion sequence from the actor to allow for investigations regarding the link between intended and perceived emotions. The motion sequences along with the accompanying information are made available in a searchable MPI Emotional Body Expression Database. We hope that this database will enable researchers to study expression and perception of naturally occurring emotional body expressions in greater depth. PMID:25461382
The statistical power to detect cross-scale interactions at macroscales
Wagner, Tyler; Fergus, C. Emi; Stow, Craig A.; Cheruvelil, Kendra S.; Soranno, Patricia A.
2016-01-01
Macroscale studies of ecological phenomena are increasingly common because stressors such as climate and land-use change operate at large spatial and temporal scales. Cross-scale interactions (CSIs), where ecological processes operating at one spatial or temporal scale interact with processes operating at another scale, have been documented in a variety of ecosystems and contribute to complex system dynamics. However, studies investigating CSIs are often dependent on compiling multiple data sets from different sources to create multithematic, multiscaled data sets, which results in structurally complex, and sometimes incomplete data sets. The statistical power to detect CSIs needs to be evaluated because of their importance and the challenge of quantifying CSIs using data sets with complex structures and missing observations. We studied this problem using a spatially hierarchical model that measures CSIs between regional agriculture and its effects on the relationship between lake nutrients and lake productivity. We used an existing large multithematic, multiscaled database, LAke multiscaled GeOSpatial, and temporal database (LAGOS), to parameterize the power analysis simulations. We found that the power to detect CSIs was more strongly related to the number of regions in the study rather than the number of lakes nested within each region. CSI power analyses will not only help ecologists design large-scale studies aimed at detecting CSIs, but will also focus attention on CSI effect sizes and the degree to which they are ecologically relevant and detectable with large data sets.
Astronomical database and VO-tools of Nikolaev Astronomical Observatory
NASA Astrophysics Data System (ADS)
Mazhaev, A. E.; Protsyuk, Yu. I.
2010-05-01
Results of work in 2006-2009 on creation of astronomical databases aiming at development of Nikolaev Virtual Observatory (NVO) are presented in this abstract. Results of observations and theirreduction, which were obtained during the whole history of Nikolaev Astronomical Observatory (NAO), are included in the databases. The databases may be considered as a basis for construction of a data centre. Images of different regions of the celestial sphere have been stored in NAO since 1929. About 8000 photo plates were obtained during observations in the 20th century. Observations with CCD have been started since 1996. Annually, telescopes of NAO, using CCD cameras, create data volume of several tens of gigabytes (GB) in the form of CCD images and up to 100 GB of video records. At the end of 2008, the volume of accumulated data in the form of CCD images was about 300 GB. Problems of data volume growth are common in astronomy, nuclear physics and bioinformatics. Therefore, the astronomical community needs to use archives, databases and distributed grid computing to cope with this problem in astronomy. The International Virtual Observatory Alliance (IVOA) was formed in June 2002 with a mission to "enable the international utilization of astronomical archives..." The NVO was created at the NAO website in 2008, and consists of three main parts. The first part contains 27 astrometric stellar catalogues with short descriptions. The files of catalogues were compiled in the standard VOTable format using eXtensible Markup Language (XML), and they are available for downloading. This is an example of the so-called science-ready product. The VOTable format was developed by the International Virtual Observatory Alliance (IVOA) for exchange of tabular data. A user may download these catalogues and open them using any standalone application that supports standards of the IVOA. There are several directions of development for such applications, for example, search of catalogues and images, search and visualisation of spectra, spectral energy distribution (SED) building, search of cross-correlation between objects in different catalogues, statistical data processing of large data volumes etc. The second part includes database of observations, accumulated in NAO, with access via a browser. The database has a common interface for searching of textual and graphical information concerning photographic and CCD observations. The database contains: textual information about 7437 plates as well as 2700 preview images in JPEG format with resolution of 300 DPI (dots per inch); textual information about 16660 CCD frames as well as 1100 preview images in JPEG format. Absent preview images will be added to the database as soon as they will be ready after plates scanning and CCD frames processing. The user has to define the equatorial coordinates of search centre, a search radius and a period of observations. Then he or she may also specify additional filters, such as: any combination of objects given separately for plates and CCD frames, output parameters for plates, telescope names for CCD observations. Results of search are generated in the form of two tables for photographic and CCD observations. To obtain access to the source images in FITS format with support of World Coordinate System (WCS), the user has to fill and submit electronic form given after the tables. The third part includes database of observations with access via a standalone application such as Aladin, which has been developed by Strasbourg Astronomical Data Centre. To obtain access to the database, the user has to perform a series of simple actions, which are described on a corresponding site page. Then he or she may get access to the database via a server selector of Aladin, which has a menu with wide range of image and catalogue servers located world wide, including two menu items for photographic and CCD observations of a NVO image server. The user has to define the equatorial coordinates of search centre and a search radius. The search results are outputted into a main window of Aladin in textual and graphical forms using XML and Simple Object Access Protocol (SOAP). In this way, the NVO image server is integrated with other astronomical servers, using a special configuration file. The user may conveniently request information from many servers using the same server selector of Aladin, although the servers are located in different countries. Aladin has a wide range of special tools for data analysis and handling, including connection with other standalone applications. As a conclusion, we should note that a research team of a data centre, which provides the infrastructure for data output to the internet, is responsible for creation of corresponding archives. Therefore, each observatory or data centre has to provide an access to its archives in accordance with the IVOA standards and a resolution adopted by the IAU XXV General Assembly #B.1, titled: Public Access to Astronomical Archives. A research team of NAO copes successfully with this task and continues to develop the NVO. Using our databases and VO-tools, we also take part in development of the Ukrainian Virtual Observatory (UkrVO). All three main parts of the NVO are used as prototypes for the UkrVO. Informational resources provided by other astronomical institutions from Ukraine will be included in corresponding databases and VO interfaces.
Integration and management of massive remote-sensing data based on GeoSOT subdivision model
NASA Astrophysics Data System (ADS)
Li, Shuang; Cheng, Chengqi; Chen, Bo; Meng, Li
2016-07-01
Owing to the rapid development of earth observation technology, the volume of spatial information is growing rapidly; therefore, improving query retrieval speed from large, rich data sources for remote-sensing data management systems is quite urgent. A global subdivision model, geographic coordinate subdivision grid with one-dimension integer coding on 2n-tree, which we propose as a solution, has been used in data management organizations. However, because a spatial object may cover several grids, ample data redundancy will occur when data are stored in relational databases. To solve this redundancy problem, we first combined the subdivision model with the spatial array database containing the inverted index. We proposed an improved approach for integrating and managing massive remote-sensing data. By adding a spatial code column in an array format in a database, spatial information in remote-sensing metadata can be stored and logically subdivided. We implemented our method in a Kingbase Enterprise Server database system and compared the results with the Oracle platform by simulating worldwide image data. Experimental results showed that our approach performed better than Oracle in terms of data integration and time and space efficiency. Our approach also offers an efficient storage management system for existing storage centers and management systems.
BRCA Share: A Collection of Clinical BRCA Gene Variants.
Béroud, Christophe; Letovsky, Stanley I; Braastad, Corey D; Caputo, Sandrine M; Beaudoux, Olivia; Bignon, Yves Jean; Bressac-De Paillerets, Brigitte; Bronner, Myriam; Buell, Crystal M; Collod-Béroud, Gwenaëlle; Coulet, Florence; Derive, Nicolas; Divincenzo, Christina; Elzinga, Christopher D; Garrec, Céline; Houdayer, Claude; Karbassi, Izabela; Lizard, Sarab; Love, Angela; Muller, Danièle; Nagan, Narasimhan; Nery, Camille R; Rai, Ghadi; Revillion, Françoise; Salgado, David; Sévenet, Nicolas; Sinilnikova, Olga; Sobol, Hagay; Stoppa-Lyonnet, Dominique; Toulas, Christine; Trautman, Edwin; Vaur, Dominique; Vilquin, Paul; Weymouth, Katelyn S; Willis, Alecia; Eisenberg, Marcia; Strom, Charles M
2016-12-01
As next-generation sequencing increases access to human genetic variation, the challenge of determining clinical significance of variants becomes ever more acute. Germline variants in the BRCA1 and BRCA2 genes can confer substantial lifetime risk of breast and ovarian cancer. Assessment of variant pathogenicity is a vital part of clinical genetic testing for these genes. A database of clinical observations of BRCA variants is a critical resource in that process. This article describes BRCA Share™, a database created by a unique international alliance of academic centers and commercial testing laboratories. By integrating the content of the Universal Mutation Database generated by the French Unicancer Genetic Group with the testing results of two large commercial laboratories, Quest Diagnostics and Laboratory Corporation of America (LabCorp), BRCA Share™ has assembled one of the largest publicly accessible collections of BRCA variants currently available. Although access is available to academic researchers without charge, commercial participants in the project are required to pay a support fee and contribute their data. The fees fund the ongoing curation effort, as well as planned experiments to functionally characterize variants of uncertain significance. BRCA Share™ databases can therefore be considered as models of successful data sharing between private companies and the academic world. © 2016 WILEY PERIODICALS, INC.
Array Processing in the Cloud: the rasdaman Approach
NASA Astrophysics Data System (ADS)
Merticariu, Vlad; Dumitru, Alex
2015-04-01
The multi-dimensional array data model is gaining more and more attention when dealing with Big Data challenges in a variety of domains such as climate simulations, geographic information systems, medical imaging or astronomical observations. Solutions provided by classical Big Data tools such as Key-Value Stores and MapReduce, as well as traditional relational databases, proved to be limited in domains associated with multi-dimensional data. This problem has been addressed by the field of array databases, in which systems provide database services for raster data, without imposing limitations on the number of dimensions that a dataset can have. Examples of datasets commonly handled by array databases include 1-dimensional sensor data, 2-D satellite imagery, 3-D x/y/t image time series as well as x/y/z geophysical voxel data, and 4-D x/y/z/t weather data. And this can grow as large as simulations of the whole universe when it comes to astrophysics. rasdaman is a well established array database, which implements many optimizations for dealing with large data volumes and operation complexity. Among those, the latest one is intra-query parallelization support: a network of machines collaborate for answering a single array database query, by dividing it into independent sub-queries sent to different servers. This enables massive processing speed-ups, which promise solutions to research challenges on multi-Petabyte data cubes. There are several correlated factors which influence the speedup that intra-query parallelisation brings: the number of servers, the capabilities of each server, the quality of the network, the availability of the data to the server that needs it in order to compute the result and many more. In the effort of adapting the engine to cloud processing patterns, two main components have been identified: one that handles communication and gathers information about the arrays sitting on every server, and a processing unit responsible with dividing work among available nodes and executing operations on local data. The federation daemon collects and stores statistics from the other network nodes and provides real time updates about local changes. Information exchanged includes available datasets, CPU load and memory usage per host. The processing component is represented by the rasdaman server. Using information from the federation daemon it breaks queries into subqueries to be executed on peer nodes, ships them, and assembles the intermediate results. Thus, we define a rasdaman network node as a pair of a federation daemon and a rasdaman server. Any node can receive a query and will subsequently act as this query's dispatcher, so all peers are at the same level and there is no single point of failure. Should a node become inaccessible then the peers will recognize this and will not any longer consider this peer for distribution. Conversely, a peer at any time can join the network. To assess the feasibility of our approach, we deployed a rasdaman network in the Amazon Elastic Cloud environment on 1001 nodes, and observed that this feature can greatly increase the performance and scalability of the system, offering a large throughput of processed data.
Hegedűs, Tamás; Chaubey, Pururawa Mayank; Várady, György; Szabó, Edit; Sarankó, Hajnalka; Hofstetter, Lia; Roschitzki, Bernd; Sarkadi, Balázs
2015-01-01
Based on recent results, the determination of the easily accessible red blood cell (RBC) membrane proteins may provide new diagnostic possibilities for assessing mutations, polymorphisms or regulatory alterations in diseases. However, the analysis of the current mass spectrometry-based proteomics datasets and other major databases indicates inconsistencies—the results show large scattering and only a limited overlap for the identified RBC membrane proteins. Here, we applied membrane-specific proteomics studies in human RBC, compared these results with the data in the literature, and generated a comprehensive and expandable database using all available data sources. The integrated web database now refers to proteomic, genetic and medical databases as well, and contains an unexpected large number of validated membrane proteins previously thought to be specific for other tissues and/or related to major human diseases. Since the determination of protein expression in RBC provides a method to indicate pathological alterations, our database should facilitate the development of RBC membrane biomarker platforms and provide a unique resource to aid related further research and diagnostics. Database URL: http://rbcc.hegelab.org PMID:26078478
A blue carbon soil database: Tidal wetland stocks for the US National Greenhouse Gas Inventory
NASA Astrophysics Data System (ADS)
Feagin, R. A.; Eriksson, M.; Hinson, A.; Najjar, R. G.; Kroeger, K. D.; Herrmann, M.; Holmquist, J. R.; Windham-Myers, L.; MacDonald, G. M.; Brown, L. N.; Bianchi, T. S.
2015-12-01
Coastal wetlands contain large reservoirs of carbon, and in 2015 the US National Greenhouse Gas Inventory began the work of placing blue carbon within the national regulatory context. The potential value of a wetland carbon stock, in relation to its location, soon could be influential in determining governmental policy and management activities, or in stimulating market-based CO2 sequestration projects. To meet the national need for high-resolution maps, a blue carbon stock database was developed linking National Wetlands Inventory datasets with the USDA Soil Survey Geographic Database. Users of the database can identify the economic potential for carbon conservation or restoration projects within specific estuarine basins, states, wetland types, physical parameters, and land management activities. The database is geared towards both national-level assessments and local-level inquiries. Spatial analysis of the stocks show high variance within individual estuarine basins, largely dependent on geomorphic position on the landscape, though there are continental scale trends to the carbon distribution as well. Future plans including linking this database with a sedimentary accretion database to predict carbon flux in US tidal wetlands.
The Einstein database of IPC x-ray observations of optically selected and radio-selected quasars, 1.
NASA Technical Reports Server (NTRS)
Wilkes, Belinda J.; Tananbaum, Harvey; Worrall, D. M.; Avni, Yoram; Oey, M. S.; Flanagan, Joan
1994-01-01
We present the first volume of the Einstein quasar database. The database includes estimates of the X-ray count rates, fluxes, and luminosities for 514 quasars and Seyfert 1 galaxies observed with the Imaging Proportional Counter (IPC) aboard the Einstein Observatory. All were previously known optically selected or radio-selected objects, and most were the targets of the X-ray observations. The X-ray properties of the Active Galactic Nuclei (AGNs) have been derived by reanalyzing the IPC data in a systematic manner to provide a uniform database for general use by the astronomical community. We use the database to extend earlier quasar luminosity studies which were made using only a subset of the currently available data. The database can be accessed on internet via the SAO Einstein on-line system ('Einline') and is available in ASCII format on magnetic tape and DOS diskette.
ERIC Educational Resources Information Center
Gruner, Richard; Heron, Carol E.
1984-01-01
Examines usefulness of DIALOG as legal research tool through use of DIALOG's DIALINDEX database to identify those databases among almost 200 available that contain large numbers of records related to federal securities regulation. Eight databases selected for further study are detailed. Twenty-six footnotes, database statistics, and samples are…
Recent advances on terrain database correlation testing
NASA Astrophysics Data System (ADS)
Sakude, Milton T.; Schiavone, Guy A.; Morelos-Borja, Hector; Martin, Glenn; Cortes, Art
1998-08-01
Terrain database correlation is a major requirement for interoperability in distributed simulation. There are numerous situations in which terrain database correlation problems can occur that, in turn, lead to lack of interoperability in distributed training simulations. Examples are the use of different run-time terrain databases derived from inconsistent on source data, the use of different resolutions, and the use of different data models between databases for both terrain and culture data. IST has been developing a suite of software tools, named ZCAP, to address terrain database interoperability issues. In this paper we discuss recent enhancements made to this suite, including improved algorithms for sampling and calculating line-of-sight, an improved method for measuring terrain roughness, and the application of a sparse matrix method to the terrain remediation solution developed at the Visual Systems Lab of the Institute for Simulation and Training. We review the application of some of these new algorithms to the terrain correlation measurement processes. The application of these new algorithms improves our support for very large terrain databases, and provides the capability for performing test replications to estimate the sampling error of the tests. With this set of tools, a user can quantitatively assess the degree of correlation between large terrain databases.
NASA Astrophysics Data System (ADS)
Penteado, Paulo F.; Trilling, David; Szalay, Alexander; Budavári, Tamás; Fuentes, César
2014-11-01
We are building the first system that will allow efficient data mining in the astronomical archives for observations of Solar System Bodies. While the Virtual Observatory has enabled data-intensive research making use of large collections of observations across multiple archives, Planetary Science has largely been denied this opportunity: most astronomical data services are built based on sky positions, and moving objects are often filtered out.To identify serendipitous observations of Solar System objects, we ingest the archive metadata. The coverage of each image in an archive is a volume in a 3D space (RA,Dec,time), which we can represent efficiently through a hierarchical triangular mesh (HTM) for the spatial dimensions, plus a contiguous time interval. In this space, an asteroid occupies a curve, which we determine integrating its orbit into the past. Thus when an asteroid trajectory intercepts the volume of an archived image, we have a possible observation of that body. Our pipeline then looks in the archive's catalog for a source with the corresponding coordinates, to retrieve its photometry. All these matches are stored into a database, which can be queried by object identifier.This database consists of archived observations of known Solar System objects. This means that it grows not only from the ingestion of new images, but also from the growth in the number of known objects. As new bodies are discovered, our pipeline can find archived observations where they could have been recorded, providing colors for these newly-found objects. This growth becomes more relevant with the new generation of wide-field surveys, particularly LSST.We also present one use case of our prototype archive: after ingesting the metadata for SDSS, 2MASS and GALEX, we were able to identify serendipitous observations of Solar System bodies in these 3 archives. Cross-matching these occurrences provided us with colors from the UV to the IR, a much wider spectral range than that commonly used for asteroid taxonomy. We present here archive-derived spectrophotometry from searching for 440 thousand asteroids, from 0.3 to 3 µm. In the future we will expand to other archives, including HST, Spitzer, WISE and Pan-STARRS.
Multiresource inventories incorporating GIS, GPS, and database management systems
Loukas G. Arvanitis; Balaji Ramachandran; Daniel P. Brackett; Hesham Abd-El Rasol; Xuesong Du
2000-01-01
Large-scale natural resource inventories generate enormous data sets. Their effective handling requires a sophisticated database management system. Such a system must be robust enough to efficiently store large amounts of data and flexible enough to allow users to manipulate a wide variety of information. In a pilot project, related to a multiresource inventory of the...
Data-driven indexing mechanism for the recognition of polyhedral objects
NASA Astrophysics Data System (ADS)
McLean, Stewart; Horan, Peter; Caelli, Terry M.
1992-02-01
This paper is concerned with the problem of searching large model databases. To date, most object recognition systems have concentrated on the problem of matching using simple searching algorithms. This is quite acceptable when the number of object models is small. However, in the future, general purpose computer vision systems will be required to recognize hundreds or perhaps thousands of objects and, in such circumstances, efficient searching algorithms will be needed. The problem of searching a large model database is one which must be addressed if future computer vision systems are to be at all effective. In this paper we present a method we call data-driven feature-indexed hypothesis generation as one solution to the problem of searching large model databases.
Fujiwara, Masakazu; Kawasaki, Yohei; Yamada, Hiroshi
2016-01-01
Rapid dissemination of information regarding adverse drug reactions is a key aspect for improving pharmacovigilance. There is a possibility that unknown adverse drug reactions will become apparent through post-marketing administration. Currently, although there have been studies evaluating the relationships between a drug and adverse drug reactions using the JADER database which collects reported spontaneous adverse drug reactions, an efficient approach to assess the association between adverse drug reactions of drugs with the same indications as well as the influence of demographics (e.g. gender) has not been proposed. We utilized the REAC and DEMO tables from the May 2015 version of JADER for patients taking antidepressant drugs (SSRI, SNRI, and NaSSA). We evaluated the associations using association analyses with an apriori algorithm. Support, confidence, lift, and conviction were used as indicators for associations. The highest score in adverse drug reactions for SSRI was obtained for "aspartate aminotransferase increased", "alanine aminotransferase increased", with values of 0.0059, 0.93, 135.5, and 13.9 for support, confidence, lift and conviction, respectively. For SNRI, "international normalized ratio increased", "drug interaction" were observed with 0.0064, 1.00, 71.9, and NA. For NaSSA, "anxiety", "irritability" were observed with 0.0058, 0.80, 49.9, and 4.9. For female taking SSRI, the highest support scores were observed in "twenties", "suicide attempt", whereas "thirties", "neuroleptic malignant syndrome" were observed for male. Second, for SNRI, "eighties", "inappropriate antidiuretic hormone secretion" were observed for female, whereas "interstitial lung disease" and "hepatitis fulminant" were for male. Finally, for NaSSA, "suicidal ideation" was for female, and "rhabdomyolysis" was for male. Different combinations of adverse drug reactions were noted between the antidepressants. In addition, the reported adverse drug reactions differed by gender. This approach using a large database for examining the associations can improve safety monitoring during the post-marketing phase.
Fujiwara, Masakazu; Kawasaki, Yohei; Yamada, Hiroshi
2016-01-01
Background Rapid dissemination of information regarding adverse drug reactions is a key aspect for improving pharmacovigilance. There is a possibility that unknown adverse drug reactions will become apparent through post-marketing administration. Currently, although there have been studies evaluating the relationships between a drug and adverse drug reactions using the JADER database which collects reported spontaneous adverse drug reactions, an efficient approach to assess the association between adverse drug reactions of drugs with the same indications as well as the influence of demographics (e.g. gender) has not been proposed. Methods and Findings We utilized the REAC and DEMO tables from the May 2015 version of JADER for patients taking antidepressant drugs (SSRI, SNRI, and NaSSA). We evaluated the associations using association analyses with an apriori algorithm. Support, confidence, lift, and conviction were used as indicators for associations. The highest score in adverse drug reactions for SSRI was obtained for "aspartate aminotransferase increased", "alanine aminotransferase increased", with values of 0.0059, 0.93, 135.5, and 13.9 for support, confidence, lift and conviction, respectively. For SNRI, "international normalized ratio increased", "drug interaction" were observed with 0.0064, 1.00, 71.9, and NA. For NaSSA, "anxiety", "irritability" were observed with 0.0058, 0.80, 49.9, and 4.9. For female taking SSRI, the highest support scores were observed in "twenties", "suicide attempt", whereas "thirties", "neuroleptic malignant syndrome" were observed for male. Second, for SNRI, "eighties", "inappropriate antidiuretic hormone secretion" were observed for female, whereas "interstitial lung disease" and "hepatitis fulminant" were for male. Finally, for NaSSA, "suicidal ideation" was for female, and "rhabdomyolysis" was for male. Conclusions Different combinations of adverse drug reactions were noted between the antidepressants. In addition, the reported adverse drug reactions differed by gender. This approach using a large database for examining the associations can improve safety monitoring during the post-marketing phase. PMID:27119382
Jenkins, Clinton N.; Flocks, J.; Kulp, M.; ,
2006-01-01
Information-processing methods are described that integrate the stratigraphic aspects of large and diverse collections of sea-floor sample data. They efficiently convert common types of sea-floor data into database and GIS (geographical information system) tables, visual core logs, stratigraphic fence diagrams and sophisticated stratigraphic statistics. The input data are held in structured documents, essentially written core logs that are particularly efficient to create from raw input datasets. Techniques are described that permit efficient construction of regional databases consisting of hundreds of cores. The sedimentological observations in each core are located by their downhole depths (metres below sea floor - mbsf) and also by a verbal term that describes the sample 'situation' - a special fraction of the sediment or position in the core. The main processing creates a separate output event for each instance of top, bottom and situation, assigning top-base mbsf values from numeric or, where possible, from word-based relative locational information such as 'core catcher' in reference to sampler device, and recovery or penetration length. The processing outputs represent the sub-bottom as a sparse matrix of over 20 sediment properties of interest, such as grain size, porosity and colour. They can be plotted in a range of core-log programs including an in-built facility that better suits the requirements of sea-floor data. Finally, a suite of stratigraphic statistics are computed, including volumetric grades, overburdens, thicknesses and degrees of layering. ?? The Geological Society of London 2006.
A Database of Young Star Clusters for Five Hundred Galaxies
NASA Astrophysics Data System (ADS)
Evans, Jessica; Whitmore, B. C.; Lindsay, K.; Chandar, R.; Larsen, S.
2009-01-01
The study of young massive stellar clusters has faced a series of observational challenges, such as the use of inconsistent data sets and low number statistics. To rectify these shortcomings, this project will use the source lists developed as part of the Hubble Legacy Archive to obtain a large, uniform database of super star clusters in nearby star-forming galaxies in order to address two fundamental astronomical questions: 1) To what degree is the cluster luminosity (and mass) function of star clusters universal? 2) What fraction of super star clusters are "missing" in optical studies (i.e., are hidden by dust)? The archive's recent data release (Data Release 2 - September, 2008) will help us achieve the large sample necessary (N 50 galaxies for multi-wavelength, N 500 galaxies for ACS F814W). The uniform data set will comprise of ACS, WFPC2, and NICMOS data, with DAOphot used for object detection. This database will also support comparisons with new Monte-Carlo simulations that have independently been developed in the past few years, and will be used to test the Whitmore, Chandar, Fall (2007) framework designed to understand the demographics of star clusters in all star forming galaxies. The catalogs will increase the number of galaxies with measured mass and luminosity functions by an order of magnitude, and will provide a powerful new tool for comparative studies, both ours and the community's. The poster will describe our preliminary investigation for the first 30 galaxies in the sample.
NASA Astrophysics Data System (ADS)
Ryan, E. M.; Brucker, L.; Forman, B. A.
2015-12-01
During the winter months, the occurrence of rain-on-snow (ROS) events can impact snow stratigraphy via generation of large scale ice crusts, e.g., on or within the snowpack. The formation of such layers significantly alters the electromagnetic response of the snowpack, which can be witnessed using space-based microwave radiometers. In addition, ROS layers can hinder the ability of wildlife to burrow in the snow for vegetation, which limits their foraging capability. A prime example occurred on 23 October 2003 in Banks Island, Canada, where an ROS event is believed to have caused the deaths of over 20,000 musk oxen. Through the use of passive microwave remote sensing, ROS events can be detected by utilizing observed brightness temperatures (Tb) from AMSR-E. Tb observed at different microwave frequencies and polarizations depends on snow properties. A wet snowpack formed from an ROS event yields a larger Tb than a typical dry snowpack would. This phenomenon makes observed Tb useful when detecting ROS events. With the use of data retrieved from AMSR-E, in conjunction with observations from ground-based weather station networks, a database of estimated ROS events over the past twelve years was generated. Using this database, changes in measured Tb following the ROS events was also observed. This study adds to the growing knowledge of ROS events and has the potential to help inform passive microwave snow water equivalent (SWE) retrievals or snow cover properties in polar regions.
Joseph, Gabby B.; McCulloch, Charles E.; Nevitt, Michael C.; Heilmeier, Ursula; Nardo, Lorenzo; Lynch, John A.; Liu, Felix; Baum, Thomas; Link, Thomas M.
2015-01-01
Objective The purpose of this study was 1) to establish a gender- and BMI-specific reference database of cartilage T2 values, and 2) to assess the associations between cartilage T2 values and gender, age, and BMI in knees without radiographic osteoarthritis or MRI-based (WORMS 0/1) evidence of cartilage degeneration. Design 481 subjects between the ages of 45-65 years with Kellgren-Lawrence Scores 0/1 in the study knee were selected from the Osteoarthritis Initiative database. Baseline morphologic cartilage 3T MRI readings (WORMS scoring) and T2 measurements (resolution=0.313mmx0.446mm) were performed in the medial femur, lateral femur, medial tibia, lateral tibia, and patella compartments. In order to create a reference database, a logarithmic transformation was applied to the data to obtain the 5th-95th percentile values for T2. Results Significant differences in mean cartilage T2 values were observed between joint compartments. Although females had slightly higher T2 values than males in a majority of compartments, the differences were only significant in the medial femur (p<0.0001). A weak positive association was seen between age and T2 in all compartments, and was most pronounced in the patella (3.27% increase in median T2/10 years, p=0.009). Significant associations between BMI and T2 were observed, and were most pronounced in the lateral tibia (5.33% increase in median T2/5 kg/m2 increase in BMI, p<0.0001), and medial tibia (4.81% increase in median T2 /5 kg/m2 increase in BMI, p<0.0001). Conclusions This study established the first reference database of T2 values in a large sample of morphologically normal cartilage plates in knees without radiographic knee osteoarthritis. While cartilage T2 values were weakly associated with age and gender, they had the highest correlations with BMI. PMID:25680652
NASA Astrophysics Data System (ADS)
Paiva, L. M. S.; Bodstein, G. C. R.; Pimentel, L. C. G.
2013-12-01
Large-eddy simulations are performed using the Advanced Regional Prediction System (ARPS) code at horizontal grid resolutions as fine as 300 m to assess the influence of detailed and updated surface databases on the modeling of local atmospheric circulation systems of urban areas with complex terrain. Applications to air pollution and wind energy are sought. These databases are comprised of 3 arc-sec topographic data from the Shuttle Radar Topography Mission, 10 arc-sec vegetation type data from the European Space Agency (ESA) GlobCover Project, and 30 arc-sec Leaf Area Index and Fraction of Absorbed Photosynthetically Active Radiation data from the ESA GlobCarbon Project. Simulations are carried out for the Metropolitan Area of Rio de Janeiro using six one-way nested-grid domains that allow the choice of distinct parametric models and vertical resolutions associated to each grid. ARPS is initialized using the Global Forecasting System with 0.5°-resolution data from the National Center of Environmental Prediction, which is also used every 3 h as lateral boundary condition. Topographic shading is turned on and two soil layers with depths of 0.01 and 1.0 m are used to compute the soil temperature and moisture budgets in all runs. Results for two simulated runs covering the period from 6 to 7 September 2007 are compared to surface and upper-air observational data to explore the dependence of the simulations on initial and boundary conditions, topographic and land-use databases and grid resolution. Our comparisons show overall good agreement between simulated and observed data and also indicate that the low resolution of the 30 arc-sec soil database from United States Geological Survey, the soil moisture and skin temperature initial conditions assimilated from the GFS analyses and the synoptic forcing on the lateral boundaries of the finer grids may affect an adequate spatial description of the meteorological variables.
NASA Astrophysics Data System (ADS)
Penton, Steven Victor
1999-05-01
A database of all active galactic nuclei (AGN) observed with the International Ultraviolet Explorer (IUE, 1976-1995) was created to determine the brightest UV (1250 Å) extragalactic sources. Combined spectra, and continuum lightcurves are available for ~700 AGN. Fifteen targets were selected from this database for observation of the low-z Lyα forest with the Hubble Space Telescope. These observations were taken with the Goddard High Resolution spectrograph and the G160M grating (1991-1997). 111 significance level >3σ Lyα absorbers were detected in the redshift range, 0.002 < z < 0.069. This Thesis evaluates the physical properties of these Lyα absorbers and compares them to their high-z counterparts. In addition, we use large galaxy catalogs (i.e. the CfA Redshift Survey) to compare the relationship between known galaxies and the low-z Lyα forest. We find that the low-z absorbers are similar in physical characteristic and density to those detected at high- z. Some of these clouds appear to be primordial matter, owing to the lack of detected metallicity. A comparison to the known galaxy distribution indicates that the low-z Lyα forest clusters less than galaxies, but more than random. This suggests that at least a fraction of the absorbers are associated with the gas in galaxy associations (i.e. filaments), while a second population is distributed more uniformly. Over equal pathlengths (cΔz ~60,000 km s -1 each) of galaxy-rich and galaxy-poor environments (voids), we determine that 80% of Lyα absorbers are near large-scale galactic structures (i.e. filaments), while 20% are in galaxy voids.
Observational Search for Cometary Aging Processes
NASA Technical Reports Server (NTRS)
Meech, Karen J.
1997-01-01
The scientific objectives of this study were (i) to search for physical differences in the behavior of the dynamically new comets (those which are entering the solar system for the first time from the Oort cloud) and the periodic comets, and (ii) to interpret these differences, if any, in terms of the physical and chemical nature of the comets and the evolutionary histories of the two comet groups. Because outer solar system comets may be direct remnants of the planetary formation processes, it is clear that the understanding of both the physical characteristics of these bodies at the edge of the planet forming zone and of their activity at large heliocentric distances, r, will ultimately provide constraints on the planetary formation process both in our Solar System and in extra-solar planetary systems. A combination of new solar system models which suggest that the protoplanetary disk was relatively massive and as a consequence comets could form at large distances from the sun (e.g. from the Uranus-Neptune region to the vicinity of the Kuiper belt), observations of activity in comets at large r, and laboratory experiments on low temperature volatile condensation, are dramatically changing our understanding of the chemical'and physical conditions in the early solar nebula. In order to understand the physical processes driving the apparent large r activity, and to address the question of possible physical and chemical differences between periodic, non-periodic and Oort comets, the PI has been undertaking a long-term study of the behavior of a significant sample of these comets (approximately 50) over a wide range of r to watch the development, disappearance and changing morphology of the dust coma. The ultimate goal is to search for systematic physical differences between the comet classes by modelling the coma growth in terms of volatile-driven activity. The systematic observations for this have been ongoing since 1986, and have been obtained over the course of approximately 300 nights using the telescopes on Mauna Kea, Kitt Peak, Cerro Tololo, the European Southern Observatory, and several other large aperture facilities. A greater than 2 TB database of broad band comet images has been obtained which follows the systematic development and fading of the cometary coma for the comets in the database. The results to date, indicate that there is a substantial difference in the brightness and the amount of dust as a function of r between the two comet classes. In addition to this major finding, the program has been responsible for several exciting discoveries, including: the P/Halley outburst at r = 14.3 AU, the discovery of Chiron's coma and modelling and observations of the gravitationally bound component, observational evidence that activity continues out beyond r = 17 AU for many dynamically new comets
Medical data mining: knowledge discovery in a clinical data warehouse.
Prather, J. C.; Lobach, D. F.; Goodwin, L. K.; Hales, J. W.; Hage, M. L.; Hammond, W. E.
1997-01-01
Clinical databases have accumulated large quantities of information about patients and their medical conditions. Relationships and patterns within this data could provide new medical knowledge. Unfortunately, few methodologies have been developed and applied to discover this hidden knowledge. In this study, the techniques of data mining (also known as Knowledge Discovery in Databases) were used to search for relationships in a large clinical database. Specifically, data accumulated on 3,902 obstetrical patients were evaluated for factors potentially contributing to preterm birth using exploratory factor analysis. Three factors were identified by the investigators for further exploration. This paper describes the processes involved in mining a clinical database including data warehousing, data query and cleaning, and data analysis. PMID:9357597
Fortier, Isabel; Doiron, Dany; Little, Julian; Ferretti, Vincent; L’Heureux, François; Stolk, Ronald P; Knoppers, Bartha M; Hudson, Thomas J; Burton, Paul R
2011-01-01
Background Proper understanding of the roles of, and interactions between genetic, lifestyle, environmental and psycho-social factors in determining the risk of development and/or progression of chronic diseases requires access to very large high-quality databases. Because of the financial, technical and time burdens related to developing and maintaining very large studies, the scientific community is increasingly synthesizing data from multiple studies to construct large databases. However, the data items collected by individual studies must be inferentially equivalent to be meaningfully synthesized. The DataSchema and Harmonization Platform for Epidemiological Research (DataSHaPER; http://www.datashaper.org) was developed to enable the rigorous assessment of the inferential equivalence, i.e. the potential for harmonization, of selected information from individual studies. Methods This article examines the value of using the DataSHaPER for retrospective harmonization of established studies. Using the DataSHaPER approach, the potential to generate 148 harmonized variables from the questionnaires and physical measures collected in 53 large population-based studies (6.9 million participants) was assessed. Variable and study characteristics that might influence the potential for data synthesis were also explored. Results Out of all assessment items evaluated (148 variables for each of the 53 studies), 38% could be harmonized. Certain characteristics of variables (i.e. relative importance, individual targeted, reference period) and of studies (i.e. observational units, data collection start date and mode of questionnaire administration) were associated with the potential for harmonization. For example, for variables deemed to be essential, 62% of assessment items paired could be harmonized. Conclusion The current article shows that the DataSHaPER provides an effective and flexible approach for the retrospective harmonization of information across studies. To implement data synthesis, some additional scientific, ethico-legal and technical considerations must be addressed. The success of the DataSHaPER as a harmonization approach will depend on its continuing development and on the rigour and extent of its use. The DataSHaPER has the potential to take us closer to a truly collaborative epidemiology and offers the promise of enhanced research potential generated through synthesized databases. PMID:21804097
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bond-Lamberty, Benjamin; Bunn, Andrew G.; Thomson, Allison M.
High-latitude northern ecosystems are experiencing rapid climate changes, and represent a large potential climate feedback because of their high soil carbon densities and shifting disturbance regimes. A significant carbon flow from these ecosystems is soil respiration (RS, the flow of carbon dioxide, generated by plant roots and soil fauna, from the soil surface to atmosphere), and any change in the high-latitude carbon cycle might thus be reflected in RS observed in the field. This study used two variants of a machine-learning algorithm and least squares regression to examine how remotely-sensed canopy greenness (NDVI), climate, and other variables are coupled tomore » annual RS based on 105 observations from 64 circumpolar sites in a global database. The addition of NDVI roughly doubled model performance, with the best-performing models explaining ~62% of observed RS variability« less
A VO-Driven Astronomical Data Grid in China
NASA Astrophysics Data System (ADS)
Cui, C.; He, B.; Yang, Y.; Zhao, Y.
2010-12-01
With the implementation of many ambitious observation projects, including LAMOST, FAST, and Antarctic observatory at Doom A, observational astronomy in China is stepping into a brand new era with emerging data avalanche. In the era of e-Science, both these cutting-edge projects and traditional astronomy research need much more powerful data management, sharing and interoperability. Based on data-grid concept, taking advantages of the IVOA interoperability technologies, China-VO is developing a VO-driven astronomical data grid environment to enable multi-wavelength science and large database science. In the paper, latest progress and data flow of the LAMOST, architecture of the data grid, and its supports to the VO are discussed.
Keeping Track of Our Treasures: Managing Historical Data with Relational Database Software.
ERIC Educational Resources Information Center
Gutmann, Myron P.; And Others
1989-01-01
Describes the way a relational database management system manages a large historical data collection project. Shows that such databases are practical to construct. States that the programing tasks involved are not for beginners, but the rewards of having data organized are worthwhile. (GG)
The Chemical Aquatic Fate and Effects (CAFE) database, developed by NOAA’s Emergency Response Division (ERD), is a centralized data repository that allows for unrestricted access to fate and effects data. While this database was originally designed to help support decisions...
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
Image-based query-by-example for big databases of galaxy images
NASA Astrophysics Data System (ADS)
Shamir, Lior; Kuminski, Evan
2017-01-01
Very large astronomical databases containing millions or even billions of galaxy images have been becoming increasingly important tools in astronomy research. However, in many cases the very large size makes it more difficult to analyze these data manually, reinforcing the need for computer algorithms that can automate the data analysis process. An example of such task is the identification of galaxies of a certain morphology of interest. For instance, if a rare galaxy is identified it is reasonable to expect that more galaxies of similar morphology exist in the database, but it is virtually impossible to manually search these databases to identify such galaxies. Here we describe computer vision and pattern recognition methodology that receives a galaxy image as an input, and searches automatically a large dataset of galaxies to return a list of galaxies that are visually similar to the query galaxy. The returned list is not necessarily complete or clean, but it provides a substantial reduction of the original database into a smaller dataset, in which the frequency of objects visually similar to the query galaxy is much higher. Experimental results show that the algorithm can identify rare galaxies such as ring galaxies among datasets of 10,000 astronomical objects.
DBMap: a TreeMap-based framework for data navigation and visualization of brain research registry
NASA Astrophysics Data System (ADS)
Zhang, Ming; Zhang, Hong; Tjandra, Donny; Wong, Stephen T. C.
2003-05-01
The purpose of this study is to investigate and apply a new, intuitive and space-conscious visualization framework to facilitate efficient data presentation and exploration of large-scale data warehouses. We have implemented the DBMap framework for the UCSF Brain Research Registry. Such a novel utility would facilitate medical specialists and clinical researchers in better exploring and evaluating a number of attributes organized in the brain research registry. The current UCSF Brain Research Registry consists of a federation of disease-oriented database modules, including Epilepsy, Brain Tumor, Intracerebral Hemorrphage, and CJD (Creuzfeld-Jacob disease). These database modules organize large volumes of imaging and non-imaging data to support Web-based clinical research. While the data warehouse supports general information retrieval and analysis, there lacks an effective way to visualize and present the voluminous and complex data stored. This study investigates whether the TreeMap algorithm can be adapted to display and navigate categorical biomedical data warehouse or registry. TreeMap is a space constrained graphical representation of large hierarchical data sets, mapped to a matrix of rectangles, whose size and color represent interested database fields. It allows the display of a large amount of numerical and categorical information in limited real estate of computer screen with an intuitive user interface. The paper will describe, DBMap, the proposed new data visualization framework for large biomedical databases. Built upon XML, Java and JDBC technologies, the prototype system includes a set of software modules that reside in the application server tier and provide interface to backend database tier and front-end Web tier of the brain registry.
NASA Astrophysics Data System (ADS)
Krämer, Martina
2016-04-01
Numerous airborne field campaigns were performed in the last decades to record cirrus clouds microphysical properties. Beside the understanding of the processes of cirrus formation and evolution, an additional motivation for those studies is to provide a database to evaluate the representation of cirrus clouds in global climate models. This is of importance for an improved certainty of climate predictions, which are affected by the poor understanding of the microphysical processes of ice clouds (IPCC, 2013). To this end, the observations should ideally cover the complete respective parameter range and not be influenced by instrumental artifacts. However, due to the difficulties in measuring cirrus properties on fast-flying, high-altitude aircraft, some issues with respect to the measurements %evolved have arisen. In particular, concerns about the relative humidity in and around cirrus clouds and the ice crystal number concentrations were under discussion. Too high ice supersaturations as well as ice number concentrations were often reported. These issues have made more challenging the goal of compiling a large database using data from a suite of different instruments that were used on different campaigns. In this study, we have have addressed these challenges and compiled a large data set of cirrus clouds, sampled during eighteen field campaigns between 75°N and 25°S, representing measurements fulfilling the above mentioned requirements. The most recent campaigns were performed in 2014; namely, the ATTREX campaign with the research aircraft Global Hawk and the ML-CIRRUS and ACRIDICON campaigns with HALO. % The observations include ice water content (IWC: 130 hours of observations), ice crystal numbers (N_ice: 83 hours), ice crystal mean mass size (Rice: 83 hours) and relative humidity (RH_ice) in- and outside of cirrus clouds (78 and 140 hours). % We will present the parameters as PDFs versus temperature and derive medians and core ranges (including the most frequent observations) for each parameter. The new large data sets confirm the earlier results presented by Schiller et al. (JGR, 2008), Krämer et al. (ACP, 2009) and Luebke et al. (ACP, 2013), which are all based on much smaller datasets. Further, we will show the geographical and altitude distribution of IWC, N_ice, R_ice and RH_ice.
Developing a Large Lexical Database for Information Retrieval, Parsing, and Text Generation Systems.
ERIC Educational Resources Information Center
Conlon, Sumali Pin-Ngern; And Others
1993-01-01
Important characteristics of lexical databases and their applications in information retrieval and natural language processing are explained. An ongoing project using various machine-readable sources to build a lexical database is described, and detailed designs of individual entries with examples are included. (Contains 66 references.) (EAM)
NREL Opens Large Database of Inorganic Thin-Film Materials | News | NREL
Inorganic Thin-Film Materials April 3, 2018 An extensive experimental database of inorganic thin-film Energy Laboratory (NREL) is now publicly available. The High Throughput Experimental Materials (HTEM Schroeder / NREL) "All existing experimental databases either contain many entries or have all this
Weng, W; Liang, Y; Kimball, E S; Hobbs, T; Kong, S; Sakurada, B; Bouchard, J
2016-07-01
Objective To explore trends in demographics, comorbidities, anti-diabetic drug usage, and healthcare utilization costs in patients with newly-diagnosed type 2 diabetes mellitus (T2DM) using a large US claims database. Methods For the years 2007 and 2012, Truven Health Marketscan Research Databases were used to identify adults with newly-diagnosed T2DM and continuous 12-month enrollment with prescription benefits. Variables examined included patient demographics, comorbidities, inpatient utilization patterns, healthcare costs (inpatient and outpatient), drug costs, and diabetes drug claim patterns. Results Despite an increase in the overall database population between 2007-2012, the incidence of newly-diagnosed T2DM decreased from 1.1% (2007) to 0.65% (2012). Hyperlipidemia and hypertension were the most common comorbidities and increased in prevalence from 2007 to 2012. In 2007, 48.3% of newly-diagnosed T2DM patients had no claims for diabetes medications, compared with 36.2% of patients in 2012. The use of a single oral anti-diabetic drug (OAD) was the most common diabetes medication-related claim (46.2% of patients in 2007; 56.7% of patients in 2012). Among OAD monotherapy users, metformin was the most commonly used and increased from 2007 (74.7% of OAD monotherapy users) to 2012 (90.8%). Decreases were observed for sulfonylureas (14.1% to 6.2%) and thiazolidinediones (7.3% to 0.6%). Insulin, predominantly basal insulin, was used by 3.9% of patients in 2007 and 5.3% of patients in 2012. Mean total annual healthcare costs increased from $13,744 in 2007 to $15,175 in 2012, driven largely by outpatient services, although costs in all individual categories of healthcare services (inpatient and outpatient) increased. Conversely, total drug costs per patient were lower in 2012 compared with 2007. Conclusions Despite a drop in the rate of newly-diagnosed T2DM from 2007 to 2012 in the US, increased total medical costs and comorbidities per individual patient suggest that the clinical and economic trends for T2DM are not declining.
Active Exploration of Large 3D Model Repositories.
Gao, Lin; Cao, Yan-Pei; Lai, Yu-Kun; Huang, Hao-Zhi; Kobbelt, Leif; Hu, Shi-Min
2015-12-01
With broader availability of large-scale 3D model repositories, the need for efficient and effective exploration becomes more and more urgent. Existing model retrieval techniques do not scale well with the size of the database since often a large number of very similar objects are returned for a query, and the possibilities to refine the search are quite limited. We propose an interactive approach where the user feeds an active learning procedure by labeling either entire models or parts of them as "like" or "dislike" such that the system can automatically update an active set of recommended models. To provide an intuitive user interface, candidate models are presented based on their estimated relevance for the current query. From the methodological point of view, our main contribution is to exploit not only the similarity between a query and the database models but also the similarities among the database models themselves. We achieve this by an offline pre-processing stage, where global and local shape descriptors are computed for each model and a sparse distance metric is derived that can be evaluated efficiently even for very large databases. We demonstrate the effectiveness of our method by interactively exploring a repository containing over 100 K models.
NASA Astrophysics Data System (ADS)
Bulan, Orhan; Bernal, Edgar A.; Loce, Robert P.; Wu, Wencheng
2013-03-01
Video cameras are widely deployed along city streets, interstate highways, traffic lights, stop signs and toll booths by entities that perform traffic monitoring and law enforcement. The videos captured by these cameras are typically compressed and stored in large databases. Performing a rapid search for a specific vehicle within a large database of compressed videos is often required and can be a time-critical life or death situation. In this paper, we propose video compression and decompression algorithms that enable fast and efficient vehicle or, more generally, event searches in large video databases. The proposed algorithm selects reference frames (i.e., I-frames) based on a vehicle having been detected at a specified position within the scene being monitored while compressing a video sequence. A search for a specific vehicle in the compressed video stream is performed across the reference frames only, which does not require decompression of the full video sequence as in traditional search algorithms. Our experimental results on videos captured in a local road show that the proposed algorithm significantly reduces the search space (thus reducing time and computational resources) in vehicle search tasks within compressed video streams, particularly those captured in light traffic volume conditions.
Crystallography Open Databases and Preservation: a World-wide Initiative
NASA Astrophysics Data System (ADS)
Chateigner, Daniel
In 2003, an international team of crystallographers proposed the Crystallography Open Database (COD), a fully-free collection of crystal structure data, in the aim of ensuring their preservation. With nearly 250000 entries, this database represents a large open set of data for crystallographers, academics and industrials, located at five different places world-wide, and included in Thomson-Reuters’ ISI. As a large step towards data preservation, raw data can now be uploaded along with «digested» structure files, and COD can be questioned by most of the crystallography-linked industrial software. The COD initiative work deserves several other open developments.
Iris indexing based on local intensity order pattern
NASA Astrophysics Data System (ADS)
Emerich, Simina; Malutan, Raul; Crisan, Septimiu; Lefkovits, Laszlo
2017-03-01
In recent years, iris biometric systems have increased in popularity and have been proven that are capable of handling large-scale databases. The main advantage of these systems is accuracy and reliability. A proper iris patterns classification is expected to reduce the matching time in huge databases. This paper presents an iris indexing technique based on Local Intensity Order Pattern. The performance of the present approach is evaluated on UPOL database and is compared with other recent systems designed for iris indexing. The results illustrate the potential of the proposed method for large scale iris identification.
Object classification and outliers analysis in the forthcoming Gaia mission
NASA Astrophysics Data System (ADS)
Ordóñez-Blanco, D.; Arcay, B.; Dafonte, C.; Manteiga, M.; Ulla, A.
2010-12-01
Astrophysics is evolving towards the rational optimization of costly observational material by the intelligent exploitation of large astronomical databases from both terrestrial telescopes and spatial mission archives. However, there has been relatively little advance in the development of highly scalable data exploitation and analysis tools needed to generate the scientific returns from these large and expensively obtained datasets. Among the upcoming projects of astronomical instrumentation, Gaia is the next cornerstone ESA mission. The Gaia survey foresees the creation of a data archive and its future exploitation with automated or semi-automated analysis tools. This work reviews some of the work that is being developed by the Gaia Data Processing and Analysis Consortium for the object classification and analysis of outliers in the forthcoming mission.
NASA Technical Reports Server (NTRS)
See, Thomas H.; Hoerz, Friedrich; Zolensky, Michael E.; Allbrooks, Martha K.; Atkinson, Dale R.; Simon, Charles G.
1992-01-01
All craters greater than or equal to 500 microns and penetration holes greater than or equal to 300 microns in diameter on the entire Long Duration Exposure Facility (LDEF) were documented. Summarized here are the observations on the LDEF frame, which exposed aluminum 6061-T6 in 26 specific directions relative to LDEF's velocity vector. In addition, the opportunity arose to characterize the penetration holes in the A0178 thermal blankets, which pointed in nine directions. For each of the 26 directions, LDEF provided time-area products that approach those afforded by all previous space-retrieved materials combined. The objective here is to provide a factual database pertaining to the largest collisional events on the entire LDEF spacecraft with a minimum of interpretation. This database may serve to encourage and guide more interpretative efforts and modeling attempts.
FRASS: the web-server for RNA structural comparison
2010-01-01
Background The impressive increase of novel RNA structures, during the past few years, demands automated methods for structure comparison. While many algorithms handle only small motifs, few techniques, developed in recent years, (ARTS, DIAL, SARA, SARSA, and LaJolla) are available for the structural comparison of large and intact RNA molecules. Results The FRASS web-server represents a RNA chain with its Gauss integrals and allows one to compare structures of RNA chains and to find similar entries in a database derived from the Protein Data Bank. We observed that FRASS scores correlate well with the ARTS and LaJolla similarity scores. Moreover, the-web server can also reproduce satisfactorily the DARTS classification of RNA 3D structures and the classification of the SCOR functions that was obtained by the SARA method. Conclusions The FRASS web-server can be easily used to detect relationships among RNA molecules and to scan efficiently the rapidly enlarging structural databases. PMID:20553602
Kuhn, Jens H.; Andersen, Kristian G.; Baize, Sylvain; Bào, Yīmíng; Bavari, Sina; Berthet, Nicolas; Blinkova, Olga; Brister, J. Rodney; Clawson, Anna N.; Fair, Joseph; Gabriel, Martin; Garry, Robert F.; Gire, Stephen K.; Goba, Augustine; Gonzalez, Jean-Paul; Günther, Stephan; Happi, Christian T.; Jahrling, Peter B.; Kapetshi, Jimmy; Kobinger, Gary; Kugelman, Jeffrey R.; Leroy, Eric M.; Maganga, Gael Darren; Mbala, Placide K.; Moses, Lina M.; Muyembe-Tamfum, Jean-Jacques; N’Faly, Magassouba; Nichol, Stuart T.; Omilabu, Sunday A.; Palacios, Gustavo; Park, Daniel J.; Paweska, Janusz T.; Radoshitzky, Sheli R.; Rossi, Cynthia A.; Sabeti, Pardis C.; Schieffelin, John S.; Schoepp, Randal J.; Sealfon, Rachel; Swanepoel, Robert; Towner, Jonathan S.; Wada, Jiro; Wauquier, Nadia; Yozwiak, Nathan L.; Formenty, Pierre
2014-01-01
In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures. PMID:25421896
Recently active traces of the Bartlett Springs Fault, California: a digital database
Lienkaemper, James J.
2010-01-01
The purpose of this map is to show the location of and evidence for recent movement on active fault traces within the Bartlett Springs Fault Zone, California. The location and recency of the mapped traces is primarily based on geomorphic expression of the fault as interpreted from large-scale aerial photography. In a few places, evidence of fault creep and offset Holocene strata in trenches and natural exposures have confirmed the activity of some of these traces. This publication is formatted both as a digital database for use within a geographic information system (GIS) and for broader public access as map images that may be browsed on-line or download a summary map. The report text describes the types of scientific observations used to make the map, gives references pertaining to the fault and the evidence of faulting, and provides guidance for use of and limitations of the map.
A VBA Desktop Database for Proposal Processing at National Optical Astronomy Observatories
NASA Astrophysics Data System (ADS)
Brown, Christa L.
National Optical Astronomy Observatories (NOAO) has developed a relational Microsoft Windows desktop database using Microsoft Access and the Microsoft Office programming language, Visual Basic for Applications (VBA). The database is used to track data relating to observing proposals from original receipt through the review process, scheduling, observing, and final statistical reporting. The database has automated proposal processing and distribution of information. It allows NOAO to collect and archive data so as to query and analyze information about our science programs in new ways.
Mobile Source Observation Database (MSOD)
The Mobile Source Observation Database (MSOD) is a relational database being developed by the Assessment and Standards Division (ASD) of the US Environmental Protection Agency Office of Transportation and Air Quality (formerly the Office of Mobile Sources). The MSOD contains emission test data from in-use mobile air- pollution sources such as cars, trucks, and engines from trucks and nonroad vehicles. Data in the database was collected from 1982 to the present. The data is intended to be representative of in-use vehicle emissions in the United States.
Big Data and Total Hip Arthroplasty: How Do Large Databases Compare?
Bedard, Nicholas A; Pugely, Andrew J; McHugh, Michael A; Lux, Nathan R; Bozic, Kevin J; Callaghan, John J
2018-01-01
Use of large databases for orthopedic research has become extremely popular in recent years. Each database varies in the methods used to capture data and the population it represents. The purpose of this study was to evaluate how these databases differed in reported demographics, comorbidities, and postoperative complications for primary total hip arthroplasty (THA) patients. Primary THA patients were identified within National Surgical Quality Improvement Programs (NSQIP), Nationwide Inpatient Sample (NIS), Medicare Standard Analytic Files (MED), and Humana administrative claims database (HAC). NSQIP definitions for comorbidities and complications were matched to corresponding International Classification of Diseases, 9th Revision/Current Procedural Terminology codes to query the other databases. Demographics, comorbidities, and postoperative complications were compared. The number of patients from each database was 22,644 in HAC, 371,715 in MED, 188,779 in NIS, and 27,818 in NSQIP. Age and gender distribution were clinically similar. Overall, there was variation in prevalence of comorbidities and rates of postoperative complications between databases. As an example, NSQIP had more than twice the obesity than NIS. HAC and MED had more than 2 times the diabetics than NSQIP. Rates of deep infection and stroke 30 days after THA had more than 2-fold difference between all databases. Among databases commonly used in orthopedic research, there is considerable variation in complication rates following THA depending upon the database used for analysis. It is important to consider these differences when critically evaluating database research. Additionally, with the advent of bundled payments, these differences must be considered in risk adjustment models. Copyright © 2017 Elsevier Inc. All rights reserved.
Kamali, Parisa; Zettervall, Sara L; Wu, Winona; Ibrahim, Ahmed M S; Medin, Caroline; Rakhorst, Hinne A; Schermerhorn, Marc L; Lee, Bernard T; Lin, Samuel J
2017-04-01
Research derived from large-volume databases plays an increasing role in the development of clinical guidelines and health policy. In breast cancer research, the Surveillance, Epidemiology and End Results, National Surgical Quality Improvement Program, and Nationwide Inpatient Sample databases are widely used. This study aims to compare the trends in immediate breast reconstruction and identify the drawbacks and benefits of each database. Patients with invasive breast cancer and ductal carcinoma in situ were identified from each database (2005-2012). Trends of immediate breast reconstruction over time were evaluated. Patient demographics and comorbidities were compared. Subgroup analysis of immediate breast reconstruction use per race was conducted. Within the three databases, 1.2 million patients were studied. Immediate breast reconstruction in invasive breast cancer patients increased significantly over time in all databases. A similar significant upward trend was seen in ductal carcinoma in situ patients. Significant differences in immediate breast reconstruction rates were seen among races; and the disparity differed among the three databases. Rates of comorbidities were similar among the three databases. There has been a significant increase in immediate breast reconstruction; however, the extent of the reporting of overall immediate breast reconstruction rates and of racial disparities differs significantly among databases. The Nationwide Inpatient Sample and the National Surgical Quality Improvement Program report similar findings, with the Surveillance, Epidemiology and End Results database reporting results significantly lower in several categories. These findings suggest that use of the Surveillance, Epidemiology and End Results database may not be universally generalizable to the entire U.S.
Mean velocity and turbulence measurements in a 90 deg curved duct with thin inlet boundary layer
NASA Technical Reports Server (NTRS)
Crawford, R. A.; Peters, C. E.; Steinhoff, J.; Hornkohl, J. O.; Nourinejad, J.; Ramachandran, K.
1985-01-01
The experimental database established by this investigation of the flow in a large rectangular turning duct is of benchmark quality. The experimental Reynolds numbers, Deans numbers and boundary layer characteristics are significantly different from previous benchmark curved-duct experimental parameters. This investigation extends the experimental database to higher Reynolds number and thinner entrance boundary layers. The 5% to 10% thick boundary layers, based on duct half-width, results in a large region of near-potential flow in the duct core surrounded by developing boundary layers with large crossflows. The turbulent entrance boundary layer case at R sub ed = 328,000 provides an incompressible flowfield which approaches real turbine blade cascade characteristics. The results of this investigation provide a challenging benchmark database for computational fluid dynamics code development.
The Odense University Pharmacoepidemiological Database (OPED)
The Odense University Pharmacoepidemiological Database is one of two large prescription registries in Denmark and covers a stable population that is representative of the Danish population as a whole.
Considerations for Observational Research Using Large Data Sets in Radiation Oncology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jagsi, Reshma, E-mail: rjagsi@med.umich.edu; Bekelman, Justin E.; Chen, Aileen
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based data sets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the International Journal of Radiation Oncology, Biology, Physics assembled a panel of experts in health services research to provide a concisemore » and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytical challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology.« less
Considerations for observational research using large data sets in radiation oncology.
Jagsi, Reshma; Bekelman, Justin E; Chen, Aileen; Chen, Ronald C; Hoffman, Karen; Shih, Ya-Chen Tina; Smith, Benjamin D; Yu, James B
2014-09-01
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based data sets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the International Journal of Radiation Oncology, Biology, Physics assembled a panel of experts in health services research to provide a concise and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytical challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology. Copyright © 2014 Elsevier Inc. All rights reserved.
Considerations for Observational Research using Large Datasets in Radiation Oncology
Jagsi, Reshma; Bekelman, Justin E.; Chen, Aileen; Chen, Ronald C.; Hoffman, Karen; Shih, Ya-Chen Tina; Smith, Benjamin D.; Yu, James B.
2014-01-01
The radiation oncology community has witnessed growing interest in observational research conducted using large-scale data sources such as registries and claims-based datasets. With the growing emphasis on observational analyses in health care, the radiation oncology community must possess a sophisticated understanding of the methodological considerations of such studies in order to evaluate evidence appropriately to guide practice and policy. Because observational research has unique features that distinguish it from clinical trials and other forms of traditional radiation oncology research, the Red Journal assembled a panel of experts in health services research to provide a concise and well-referenced review, intended to be informative for the lay reader, as well as for scholars who wish to embark on such research without prior experience. This review begins by discussing the types of research questions relevant to radiation oncology that large-scale databases may help illuminate. It then describes major potential data sources for such endeavors, including information regarding access and insights regarding the strengths and limitations of each. Finally, it provides guidance regarding the analytic challenges that observational studies must confront, along with discussion of the techniques that have been developed to help minimize the impact of certain common analytical issues in observational analysis. Features characterizing a well-designed observational study include clearly defined research questions, careful selection of an appropriate data source, consultation with investigators with relevant methodological expertise, inclusion of sensitivity analyses, caution not to overinterpret small but significant differences, and recognition of limitations when trying to evaluate causality. This review concludes that carefully designed and executed studies using observational data that possess these qualities hold substantial promise for advancing our understanding of many unanswered questions of importance to the field of radiation oncology. PMID:25195986
Content Is King: Databases Preserve the Collective Information of Science.
Yates, John R
2018-04-01
Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.
Use of a German longitudinal prescription database (LRx) in pharmacoepidemiology.
Richter, Hartmut; Dombrowski, Silvia; Hamer, Hajo; Hadji, Peyman; Kostev, Karel
2015-01-01
Large epidemiological databases are often used to examine matters pertaining to drug utilization, health services, and drug safety. The major strength of such databases is that they include large sample sizes, which allow precise estimates to be made. The IMS® LRx database has in recent years been used as a data source for epidemiological research. The aim of this paper is to review a number of recent studies published with the aid of this database and compare these with the results of similar studies using independent data published in the literature. In spite of being somewhat limited to studies for which comparative independent results were available, it was possible to include a wide range of possible uses of the LRx database in a variety of therapeutic fields: prevalence/incidence rate determination (diabetes, epilepsy), persistence analyses (diabetes, osteoporosis), use of comedication (diabetes), drug utilization (G-CSF market) and treatment costs (diabetes, G-CSF market). In general, the results of the LRx studies were found to be clearly in line with previously published reports. In some cases, noticeable discrepancies between the LRx results and the literature data were found (e.g. prevalence in epilepsy, persistence in osteoporosis) and these were discussed and possible reasons presented. Overall, it was concluded that the IMS® LRx database forms a suitable database for pharmacoepidemiological studies.
NASA Astrophysics Data System (ADS)
2011-04-01
Metallic asteroid 216 Kleopatra is shaped like a dog's bone and has two tiny moons - which came from the asteroid itself - according to a team of astronomers from France and the US, who also measured its surprisingly low density and concluded that it is a collection of rubble. The recent solar minimum was longer and lower than expected, with a low polar field and an unusually large number of days with no sunspots visible. Models of the magnetic field and plasma flow within the Sun suggest that fast, then slow meridional flow could account for this pattern. Variable stars are a significant scientific target for amateur astronomers. The American Association of Variable Star Observers runs the world's largest database of variable star observations, from volunteers, and reached 20 million observations in February.
A Catalog of Eclipsing Binaries and Variable Stars Observed with ASTEP 400 from Dome C, Antarctica
NASA Astrophysics Data System (ADS)
Chapellier, E.; Mékarnia, D.; Abe, L.; Guillot, T.; Agabi, K.; Rivet, J.-P.; Schmider, F.-X.; Crouzet, N.; Aristidi, E.
2016-10-01
We used the large photometric database of the ASTEP program, whose primary goal was to detect exoplanets in the southern hemisphere from Antarctica, to search for eclipsing binaries (EcBs) and variable stars. 673 EcBs and 1166 variable stars were detected, including 31 previously known stars. The resulting online catalogs give the identification, the classification, the period, and the depth or semi-amplitude of each star. Data and light curves for each object are available at http://astep-vo.oca.eu.
Aging assessment of large electric motors in nuclear power plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villaran, M.; Subudhi, M.
1996-03-01
Large electric motors serve as the prime movers to drive high capacity pumps, fans, compressors, and generators in a variety of nuclear plant systems. This study examined the stressors that cause degradation and aging in large electric motors operating in various plant locations and environments. The operating history of these machines in nuclear plant service was studied by review and analysis of failure reports in the NPRDS and LER databases. This was supplemented by a review of motor designs, and their nuclear and balance of plant applications, in order to characterize the failure mechanisms that cause degradation, aging, and failuremore » in large electric motors. A generic failure modes and effects analysis for large squirrel cage induction motors was performed to identify the degradation and aging mechanisms affecting various components of these large motors, the failure modes that result, and their effects upon the function of the motor. The effects of large motor failures upon the systems in which they are operating, and on the plant as a whole, were analyzed from failure reports in the databases. The effectiveness of the industry`s large motor maintenance programs was assessed based upon the failure reports in the databases and reviews of plant maintenance procedures and programs.« less
Techniques for Efficiently Managing Large Geosciences Data Sets
NASA Astrophysics Data System (ADS)
Kruger, A.; Krajewski, W. F.; Bradley, A. A.; Smith, J. A.; Baeck, M. L.; Steiner, M.; Lawrence, R. E.; Ramamurthy, M. K.; Weber, J.; Delgreco, S. A.; Domaszczynski, P.; Seo, B.; Gunyon, C. A.
2007-12-01
We have developed techniques and software tools for efficiently managing large geosciences data sets. While the techniques were developed as part of an NSF-Funded ITR project that focuses on making NEXRAD weather data and rainfall products available to hydrologists and other scientists, they are relevant to other geosciences disciplines that deal with large data sets. Metadata, relational databases, data compression, and networking are central to our methodology. Data and derived products are stored on file servers in a compressed format. URLs to, and metadata about the data and derived products are managed in a PostgreSQL database. Virtually all access to the data and products is through this database. Geosciences data normally require a number of processing steps to transform the raw data into useful products: data quality assurance, coordinate transformations and georeferencing, applying calibration information, and many more. We have developed the concept of crawlers that manage this scientific workflow. Crawlers are unattended processes that run indefinitely, and at set intervals query the database for their next assignment. A database table functions as a roster for the crawlers. Crawlers perform well-defined tasks that are, except for perhaps sequencing, largely independent from other crawlers. Once a crawler is done with its current assignment, it updates the database roster table, and gets its next assignment by querying the database. We have developed a library that enables one to quickly add crawlers. The library provides hooks to external (i.e., C-language) compiled codes, so that developers can work and contribute independently. Processes called ingesters inject data into the system. The bulk of the data are from a real-time feed using UCAR/Unidata's IDD/LDM software. An exciting recent development is the establishment of a Unidata HYDRO feed that feeds value-added metadata over the IDD/LDM. Ingesters grab the metadata and populate the PostgreSQL tables. These and other concepts we have developed have enabled us to efficiently manage a 70 Tb (and growing) data weather radar data set.
NASA Astrophysics Data System (ADS)
Lee, Sangho; Suh, Jangwon; Park, Hyeong-Dong
2015-03-01
Boring logs are widely used in geological field studies since the data describes various attributes of underground and surface environments. However, it is difficult to manage multiple boring logs in the field as the conventional management and visualization methods are not suitable for integrating and combining large data sets. We developed an iPad application to enable its user to search the boring log rapidly and visualize them using the augmented reality (AR) technique. For the development of the application, a standard borehole database appropriate for a mobile-based borehole database management system was designed. The application consists of three modules: an AR module, a map module, and a database module. The AR module superimposes borehole data on camera imagery as viewed by the user and provides intuitive visualization of borehole locations. The map module shows the locations of corresponding borehole data on a 2D map with additional map layers. The database module provides data management functions for large borehole databases for other modules. Field survey was also carried out using more than 100,000 borehole data.
Angermeier, Paul L.; Frimpong, Emmanuel A.
2009-01-01
The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. FishTraits is a database of >100 traits for 809 (731 native and 78 exotic) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database contains information on four major categories of traits: (1) trophic ecology, (2) body size and reproductive ecology (life history), (3) habitat associations, and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status is also included. Together, we refer to the traits, distribution, and conservation status information as attributes. Descriptions of attributes are available here. Many sources were consulted to compile attributes, including state and regional species accounts and other databases.
Active browsing using similarity pyramids
NASA Astrophysics Data System (ADS)
Chen, Jau-Yuen; Bouman, Charles A.; Dalton, John C.
1998-12-01
In this paper, we describe a new approach to managing large image databases, which we call active browsing. Active browsing integrates relevance feedback into the browsing environment, so that users can modify the database's organization to suit the desired task. Our method is based on a similarity pyramid data structure, which hierarchically organizes the database, so that it can be efficiently browsed. At coarse levels, the similarity pyramid allows users to view the database as large clusters of similar images. Alternatively, users can 'zoom into' finer levels to view individual images. We discuss relevance feedback for the browsing process, and argue that it is fundamentally different from relevance feedback for more traditional search-by-query tasks. We propose two fundamental operations for active browsing: pruning and reorganization. Both of these operations depend on a user-defined relevance set, which represents the image or set of images desired by the user. We present statistical methods for accurately pruning the database, and we propose a new 'worm hole' distance metric for reorganizing the database, so that members of the relevance set are grouped together.
A practical approach for inexpensive searches of radiology report databases.
Desjardins, Benoit; Hamilton, R Curtis
2007-06-01
We present a method to perform full text searches of radiology reports for the large number of departments that do not have this ability as part of their radiology or hospital information system. A tool written in Microsoft Access (front-end) has been designed to search a server (back-end) containing the indexed backup weekly copy of the full relational database extracted from a radiology information system (RIS). This front end-/back-end approach has been implemented in a large academic radiology department, and is used for teaching, research and administrative purposes. The weekly second backup of the 80 GB, 4 million record RIS database takes 2 hours. Further indexing of the exported radiology reports takes 6 hours. Individual searches of the indexed database typically take less than 1 minute on the indexed database and 30-60 minutes on the nonindexed database. Guidelines to properly address privacy and institutional review board issues are closely followed by all users. This method has potential to improve teaching, research, and administrative programs within radiology departments that cannot afford more expensive technology.
Cardiovascular safety of biologic therapies for the treatment of RA.
Greenberg, Jeffrey D; Furer, Victoria; Farkouh, Michael E
2011-11-15
Cardiovascular disease represents a major source of extra-articular comorbidity in patients with rheumatoid arthritis (RA). A combination of traditional cardiovascular risk factors and RA-related factors accounts for the excess risk in RA. Among RA-related factors, chronic systemic inflammation has been implicated in the pathogenesis and progression of atherosclerosis. A growing body of evidence--mainly derived from observational databases and registries--suggests that specific RA therapies, including methotrexate and anti-TNF biologic agents, can reduce the risk of future cardiovascular events in patients with RA. The cardiovascular profile of other biologic therapies for the treatment of RA has not been adequately studied, including of investigational drugs that improve systemic inflammation but alter traditional cardiovascular risk factors. In the absence of large clinical trials adequately powered to detect differences in cardiovascular events between biologic drugs in RA, deriving firm conclusions on cardiovascular safety is challenging. Nevertheless, observational research using large registries has emerged as a promising approach to study the cardiovascular risk of emerging RA biologic therapies.
2010-01-01
Background A plant-based diet protects against chronic oxidative stress-related diseases. Dietary plants contain variable chemical families and amounts of antioxidants. It has been hypothesized that plant antioxidants may contribute to the beneficial health effects of dietary plants. Our objective was to develop a comprehensive food database consisting of the total antioxidant content of typical foods as well as other dietary items such as traditional medicine plants, herbs and spices and dietary supplements. This database is intended for use in a wide range of nutritional research, from in vitro and cell and animal studies, to clinical trials and nutritional epidemiological studies. Methods We procured samples from countries worldwide and assayed the samples for their total antioxidant content using a modified version of the FRAP assay. Results and sample information (such as country of origin, product and/or brand name) were registered for each individual food sample and constitute the Antioxidant Food Table. Results The results demonstrate that there are several thousand-fold differences in antioxidant content of foods. Spices, herbs and supplements include the most antioxidant rich products in our study, some exceptionally high. Berries, fruits, nuts, chocolate, vegetables and products thereof constitute common foods and beverages with high antioxidant values. Conclusions This database is to our best knowledge the most comprehensive Antioxidant Food Database published and it shows that plant-based foods introduce significantly more antioxidants into human diet than non-plant foods. Because of the large variations observed between otherwise comparable food samples the study emphasizes the importance of using a comprehensive database combined with a detailed system for food registration in clinical and epidemiological studies. The present antioxidant database is therefore an essential research tool to further elucidate the potential health effects of phytochemical antioxidants in diet. PMID:20096093
Development of an Integrated Biospecimen Database among the Regional Biobanks in Korea.
Park, Hyun Sang; Cho, Hune; Kim, Hwa Sun
2016-04-01
This study developed an integrated database for 15 regional biobanks that provides large quantities of high-quality bio-data to researchers to be used for the prevention of disease, for the development of personalized medicines, and in genetics studies. We collected raw data, managed independently by 15 regional biobanks, for database modeling and analyzed and defined the metadata of the items. We also built a three-step (high, middle, and low) classification system for classifying the item concepts based on the metadata. To generate clear meanings of the items, clinical items were defined using the Systematized Nomenclature of Medicine Clinical Terms, and specimen items were defined using the Logical Observation Identifiers Names and Codes. To optimize database performance, we set up a multi-column index based on the classification system and the international standard code. As a result of subdividing 7,197,252 raw data items collected, we refined the metadata into 1,796 clinical items and 1,792 specimen items. The classification system consists of 15 high, 163 middle, and 3,588 low class items. International standard codes were linked to 69.9% of the clinical items and 71.7% of the specimen items. The database consists of 18 tables based on a table from MySQL Server 5.6. As a result of the performance evaluation, the multi-column index shortened query time by as much as nine times. The database developed was based on an international standard terminology system, providing an infrastructure that can integrate the 7,197,252 raw data items managed by the 15 regional biobanks. In particular, it resolved the inevitable interoperability issues in the exchange of information among the biobanks, and provided a solution to the synonym problem, which arises when the same concept is expressed in a variety of ways.
The impact of large-scale, long-term optical surveys on pulsating star research
NASA Astrophysics Data System (ADS)
Soszyński, Igor
2017-09-01
The era of large-scale photometric variability surveys began a quarter of a century ago, when three microlensing projects - EROS, MACHO, and OGLE - started their operation. These surveys initiated a revolution in the field of variable stars and in the next years they inspired many new observational projects. Large-scale optical surveys multiplied the number of variable stars known in the Universe. The huge, homogeneous and complete catalogs of pulsating stars, such as Cepheids, RR Lyrae stars, or long-period variables, offer an unprecedented opportunity to calibrate and test the accuracy of various distance indicators, to trace the three-dimensional structure of the Milky Way and other galaxies, to discover exotic types of intrinsically variable stars, or to study previously unknown features and behaviors of pulsators. We present historical and recent findings on various types of pulsating stars obtained from the optical large-scale surveys, with particular emphasis on the OGLE project which currently offers the largest photometric database among surveys for stellar variability.
Considerations to improve functional annotations in biological databases.
Benítez-Páez, Alfonso
2009-12-01
Despite the great effort to design efficient systems allowing the electronic indexation of information concerning genes, proteins, structures, and interactions published daily in scientific journals, some problems are still observed in specific tasks such as functional annotation. The annotation of function is a critical issue for bioinformatic routines, such as for instance, in functional genomics and the further prediction of unknown protein function, which are highly dependent of the quality of existing annotations. Some information management systems evolve to efficiently incorporate information from large-scale projects, but often, annotation of single records from the literature is difficult and slow. In this short report, functional characterizations of a representative sample of the entire set of uncharacterized proteins from Escherichia coli K12 was compiled from Swiss-Prot, PubMed, and EcoCyc and demonstrate a functional annotation deficit in biological databases. Some issues are postulated as causes of the lack of annotation, and different solutions are evaluated and proposed to avoid them. The hope is that as a consequence of these observations, there will be new impetus to improve the speed and quality of functional annotation and ultimately provide updated, reliable information to the scientific community.
The HARPS-N archive through a Cassandra, NoSQL database suite?
NASA Astrophysics Data System (ADS)
Molinari, Emilio; Guerra, Jose; Harutyunyan, Avet; Lodi, Marcello; Martin, Adrian
2016-07-01
The TNG-INAF is developing the science archive for the WEAVE instrument. The underlying architecture of the archive is based on a non relational database, more precisely, on Apache Cassandra cluster, which uses a NoSQL technology. In order to test and validate the use of this architecture, we created a local archive which we populated with all the HARPSN spectra collected at the TNG since the instrument's start of operations in mid-2012, as well as developed tools for the analysis of this data set. The HARPS-N data set is two orders of magnitude smaller than WEAVE, but we want to demonstrate the ability to walk through a complete data set and produce scientific output, as valuable as that produced by an ordinary pipeline, though without accessing directly the FITS files. The analytics is done by Apache Solr and Spark and on a relational PostgreSQL database. As an example, we produce observables like metallicity indexes for the targets in the archive and compare the results with the ones coming from the HARPS-N regular data reduction software. The aim of this experiment is to explore the viability of a high availability cluster and distributed NoSQL database as a platform for complex scientific analytics on a large data set, which will then be ported to the WEAVE Archive System (WAS) which we are developing for the WEAVE multi object, fiber spectrograph.
Large-scale feature searches of collections of medical imagery
NASA Astrophysics Data System (ADS)
Hedgcock, Marcus W.; Karshat, Walter B.; Levitt, Tod S.; Vosky, D. N.
1993-09-01
Large scale feature searches of accumulated collections of medical imagery are required for multiple purposes, including clinical studies, administrative planning, epidemiology, teaching, quality improvement, and research. To perform a feature search of large collections of medical imagery, one can either search text descriptors of the imagery in the collection (usually the interpretation), or (if the imagery is in digital format) the imagery itself. At our institution, text interpretations of medical imagery are all available in our VA Hospital Information System. These are downloaded daily into an off-line computer. The text descriptors of most medical imagery are usually formatted as free text, and so require a user friendly database search tool to make searches quick and easy for any user to design and execute. We are tailoring such a database search tool (Liveview), developed by one of the authors (Karshat). To further facilitate search construction, we are constructing (from our accumulated interpretation data) a dictionary of medical and radiological terms and synonyms. If the imagery database is digital, the imagery which the search discovers is easily retrieved from the computer archive. We describe our database search user interface, with examples, and compare the efficacy of computer assisted imagery searches from a clinical text database with manual searches. Our initial work on direct feature searches of digital medical imagery is outlined.
Open source database of images DEIMOS: extension for large-scale subjective image quality assessment
NASA Astrophysics Data System (ADS)
Vítek, Stanislav
2014-09-01
DEIMOS (Database of Images: Open Source) is an open-source database of images and video sequences for testing, verification and comparison of various image and/or video processing techniques such as compression, reconstruction and enhancement. This paper deals with extension of the database allowing performing large-scale web-based subjective image quality assessment. Extension implements both administrative and client interface. The proposed system is aimed mainly at mobile communication devices, taking into account advantages of HTML5 technology; it means that participants don't need to install any application and assessment could be performed using web browser. The assessment campaign administrator can select images from the large database and then apply rules defined by various test procedure recommendations. The standard test procedures may be fully customized and saved as a template. Alternatively the administrator can define a custom test, using images from the pool and other components, such as evaluating forms and ongoing questionnaires. Image sequence is delivered to the online client, e.g. smartphone or tablet, as a fully automated assessment sequence or viewer can decide on timing of the assessment if required. Environmental data and viewing conditions (e.g. illumination, vibrations, GPS coordinates, etc.), may be collected and subsequently analyzed.
NASA Technical Reports Server (NTRS)
Lulla, Kamlesh
1994-01-01
There have been many significant improvements in the public access to the Space Shuttle Earth Observations Photography Database. New information is provided for the user community on the recently released videodisc of this database. Topics covered included the following: earlier attempts; our first laser videodisc in 1992; the new laser videodisc in 1994; and electronic database access.
A Brief Review of RNA–Protein Interaction Database Resources
Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong
2017-01-01
RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278
Remote visual analysis of large turbulence databases at multiple scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Remote visual analysis of large turbulence databases at multiple scales
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...
2018-06-15
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Testing AGN unification via inference from large catalogs
NASA Astrophysics Data System (ADS)
Nikutta, Robert; Ivezic, Zeljko; Elitzur, Moshe; Nenkova, Maia
2018-01-01
Source orientation and clumpiness of the central dust are the main factors in AGN classification. Type-1 QSOs are easy to observe and large samples are available (e.g. in SDSS), but obscured type-2 AGN are dimmer and redder as our line of sight is more obscured, making it difficult to obtain a complete sample. WISE has found up to a million QSOs. With only 4 bands and a relatively small aperture the analysis of individual sources is challenging, but the large sample allows inference of bulk properties at a very significant level.CLUMPY (www.clumpy.org) is arguably the most popular database of AGN torus SEDs. We model the ensemble properties of the entire WISE AGN content using regularized linear regression, with orientation-dependent CLUMPY color-color-magnitude (CCM) tracks as basis functions. We can reproduce the observed number counts per CCM bin with percent-level accuracy, and simultaneously infer the probability distributions of all torus parameters, redshifts, additional SED components, and identify type-1/2 AGN populations through their IR properties alone. We increase the statistical power of our AGN unification tests even further, by adding other datasets as axes in the regression problem. To this end, we make use of the NOAO Data Lab (datalab.noao.edu), which hosts several high-level large datasets and provides very powerful tools for handling large data, e.g. cross-matched catalogs, fast remote queries, etc.
Daniell, Nathan; Fraysse, François; Paul, Gunther
2012-01-01
Anthropometry has long been used for a range of ergonomic applications & product design. Although products are often designed for specific cohorts, anthropometric data are typically sourced from large scale surveys representative of the general population. Additionally, few data are available for emerging markets like China and India. This study measured 80 Chinese males that were representative of a specific cohort targeted for the design of a new product. Thirteen anthropometric measurements were recorded and compared to two large databases that represented a general population, a Chinese database and a Western database. Substantial differences were identified between the Chinese males measured in this study and both databases. The subjects were substantially taller, heavier and broader than subjects in the older Chinese database. However, they were still substantially smaller, lighter and thinner than Western males. Data from current Western anthropometric surveys are unlikely to accurately represent the target population for product designers and manufacturers in emerging markets like China.
Mars Global Digital Dune Database: MC2-MC29
Hayward, Rosalyn K.; Mullins, Kevin F.; Fenton, L.K.; Hare, T.M.; Titus, T.N.; Bourke, M.C.; Colaprete, Anthony; Christensen, P.R.
2007-01-01
Introduction The Mars Global Digital Dune Database presents data and describes the methodology used in creating the database. The database provides a comprehensive and quantitative view of the geographic distribution of moderate- to large-size dune fields from 65? N to 65? S latitude and encompasses ~ 550 dune fields. The database will be expanded to cover the entire planet in later versions. Although we have attempted to include all dune fields between 65? N and 65? S, some have likely been excluded for two reasons: 1) incomplete THEMIS IR (daytime) coverage may have caused us to exclude some moderate- to large-size dune fields or 2) resolution of THEMIS IR coverage (100m/pixel) certainly caused us to exclude smaller dune fields. The smallest dune fields in the database are ~ 1 km2 in area. While the moderate to large dune fields are likely to constitute the largest compilation of sediment on the planet, smaller stores of sediment of dunes are likely to be found elsewhere via higher resolution data. Thus, it should be noted that our database excludes all small dune fields and some moderate to large dune fields as well. Therefore the absence of mapped dune fields does not mean that such dune fields do not exist and is not intended to imply a lack of saltating sand in other areas. Where availability and quality of THEMIS visible (VIS) or Mars Orbiter Camera narrow angle (MOC NA) images allowed, we classifed dunes and included dune slipface measurements, which were derived from gross dune morphology and represent the prevailing wind direction at the last time of significant dune modification. For dunes located within craters, the azimuth from crater centroid to dune field centroid was calculated. Output from a general circulation model (GCM) is also included. In addition to polygons locating dune fields, the database includes over 1800 selected Thermal Emission Imaging System (THEMIS) infrared (IR), THEMIS visible (VIS) and Mars Orbiter Camera Narrow Angle (MOC NA) images that were used to build the database. The database is presented in a variety of formats. It is presented as a series of ArcReader projects which can be opened using the free ArcReader software. The latest version of ArcReader can be downloaded at http://www.esri.com/software/arcgis/arcreader/download.html. The database is also presented in ArcMap projects. The ArcMap projects allow fuller use of the data, but require ESRI ArcMap? software. Multiple projects were required to accommodate the large number of images needed. A fuller description of the projects can be found in the Dunes_ReadMe file and the ReadMe_GIS file in the Documentation folder. For users who prefer to create their own projects, the data is available in ESRI shapefile and geodatabase formats, as well as the open Geographic Markup Language (GML) format. A printable map of the dunes and craters in the database is available as a Portable Document Format (PDF) document. The map is also included as a JPEG file. ReadMe files are available in PDF and ASCII (.txt) files. Tables are available in both Excel (.xls) and ASCII formats.
Varrone, Andrea; Dickson, John C; Tossici-Bolt, Livia; Sera, Terez; Asenbaum, Susanne; Booij, Jan; Kapucu, Ozlem L; Kluge, Andreas; Knudsen, Gitte M; Koulibaly, Pierre Malick; Nobili, Flavio; Pagani, Marco; Sabri, Osama; Vander Borght, Thierry; Van Laere, Koen; Tatsch, Klaus
2013-01-01
Dopamine transporter (DAT) imaging with [(123)I]FP-CIT (DaTSCAN) is an established diagnostic tool in parkinsonism and dementia. Although qualitative assessment criteria are available, DAT quantification is important for research and for completion of a diagnostic evaluation. One critical aspect of quantification is the availability of normative data, considering possible age and gender effects on DAT availability. The aim of the European Normal Control Database of DaTSCAN (ENC-DAT) study was to generate a large database of [(123)I]FP-CIT SPECT scans in healthy controls. SPECT data from 139 healthy controls (74 men, 65 women; age range 20-83 years, mean 53 years) acquired in 13 different centres were included. Images were reconstructed using the ordered-subset expectation-maximization algorithm without correction (NOACSC), with attenuation correction (AC), and with both attenuation and scatter correction using the triple-energy window method (ACSC). Region-of-interest analysis was performed using the BRASS software (caudate and putamen), and the Southampton method (striatum). The outcome measure was the specific binding ratio (SBR). A significant effect of age on SBR was found for all data. Gender had a significant effect on SBR in the caudate and putamen for the NOACSC and AC data, and only in the left caudate for the ACSC data (BRASS method). Significant effects of age and gender on striatal SBR were observed for all data analysed with the Southampton method. Overall, there was a significant age-related decline in SBR of between 4 % and 6.7 % per decade. This study provides a large database of [(123)I]FP-CIT SPECT scans in healthy controls across a wide age range and with balanced gender representation. Higher DAT availability was found in women than in men. An average age-related decline in DAT availability of 5.5 % per decade was found for both genders, in agreement with previous reports. The data collected in this study may serve as a reference database for nuclear medicine centres and for clinical trials using [(123)I]FP-CIT SPECT as the imaging marker.
Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W
2014-12-01
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Digital geomorphological landslide hazard mapping of the Alpago area, Italy
NASA Astrophysics Data System (ADS)
van Westen, Cees J.; Soeters, Rob; Sijmons, Koert
Large-scale geomorphological maps of mountainous areas are traditionally made using complex symbol-based legends. They can serve as excellent "geomorphological databases", from which an experienced geomorphologist can extract a large amount of information for hazard mapping. However, these maps are not designed to be used in combination with a GIS, due to their complex cartographic structure. In this paper, two methods are presented for digital geomorphological mapping at large scales using GIS and digital cartographic software. The methods are applied to an area with a complex geomorphological setting on the Borsoia catchment, located in the Alpago region, near Belluno in the Italian Alps. The GIS database set-up is presented with an overview of the data layers that have been generated and how they are interrelated. The GIS database was also converted into a paper map, using a digital cartographic package. The resulting largescale geomorphological hazard map is attached. The resulting GIS database and cartographic product can be used to analyse the hazard type and hazard degree for each polygon, and to find the reasons for the hazard classification.
HC Forum®: a web site based on an international human cytogenetic database
Cohen, Olivier; Mermet, Marie-Ange; Demongeot, Jacques
2001-01-01
Familial structural rearrangements of chromosomes represent a factor of malformation risk that could vary over a large range, making genetic counseling difficult. However, they also represent a powerful tool for increasing knowledge of the genome, particularly by studying breakpoints and viable imbalances of the genome. We have developed a collaborative database that now includes data on more than 4100 families, from which we have developed a web site called HC Forum® (http://HCForum.imag.fr). It offers geneticists assistance in diagnosis and in genetic counseling by assessing the malformation risk with statistical models. For researchers, interactive interfaces exhibit the distribution of chromosomal breakpoints and of the genome regions observed at birth in trisomy or in monosomy. Dedicated tools including an interactive pedigree allow electronic submission of data, which will be anonymously shown in a forum for discussions. After validation, data are definitively registered in the database with the email of the sender, allowing direct location of biological material. Thus HC Forum® constitutes a link between diagnosis laboratories and genome research centers, and after 1 year, more than 700 users from about 40 different countries already exist. PMID:11125121
Sandfort, Veit; Johnson, Alistair E W; Kunz, Lauren M; Vargas, Jose D; Rosing, Douglas R
2018-01-01
We sought to evaluate the association of prolonged elevated heart rate (peHR) with survival in acutely ill patients. We used a large observational intensive care unit (ICU) database (Multiparameter Intelligent Monitoring in Intensive Care III [MIMIC-III]), where frequent heart rate measurements were available. The peHR was defined as a heart rate >100 beats/min in 11 of 12 consecutive hours. The outcome was survival status at 90 days. We collected heart rates, disease severity (simplified acute physiology scores [SAPS II]), comorbidities (Charlson scores), and International Classification of Diseases (ICD) diagnosis information in 31 513 patients from the MIMIC-III ICU database. Propensity score (PS) methods followed by inverse probability weighting based on the PS was used to balance the 2 groups (the presence/absence of peHR). Multivariable weighted logistic regression was used to assess for association of peHR with the outcome survival at 90 days adjusting for additional covariates. The mean age was 64 years, and the most frequent main disease category was circulatory disease (41%). The mean SAPS II score was 35, and the mean Charlson comorbidity score was 2.3. Overall survival of the cohort at 90 days was 82%. Adjusted logistic regression showed a significantly increased risk of death within 90 days in patients with an episode of peHR ( P < .001; odds ratio for death 1.79; confidence interval, 1.69-1.88). This finding was independent of median heart rate. We found a significant association of peHR with decreased survival in a large and heterogenous cohort of ICU patients.
Fermi-LAT Gamma-ray Bursts and Insight from Swift
NASA Technical Reports Server (NTRS)
Racusin, Judith L.
2011-01-01
A new revolution in GRB observation and theory has begun over the last 3 years since the launch of the Fermi gamma-ray space telescope. The new window into high energy gamma-rays opened by the Fermi-LAT is providing insight into prompt emission mechanisms and possibly also afterglow physics. The LAT detected GRBs appear to be a new unique subset of extremely energetic and bright bursts. In this talk I will discuss the context and recent discoveries from these LAT GRBs and the large database of broadband observations collected by Swift over the last 7 years and how through comparisons between the Swift, GBM, and LAT GRB samples, we can learn about the unique characteristics and relationships between each population.
Spin Observables in h Meson Photoproduction on the Proton
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tucker, Ross
2016-05-01
A series of experiments using a polarized beam incident on a polarized frozen spin target (FROST) was conducted at Jefferson Lab in 2010. Results presented here were taken during the second running period with the FROST target using the CEBAF Large Acceptance Spectrometer (CLAS) detector at Jefferson Lab, which used transversely-polarized protons in a butanol target and a circularly-polarized incident tagged photon beam with energies between 0:62 and 2:93 GeV. Data are presented for the F and T polarization observables for n meson photoproduction on the proton from W = 1:55 GeV to 1:80 GeV. The data presented here willmore » improve the world database and refine theoretical approaches of nucleon structure.« less
The Footprint Database and Web Services of the Herschel Space Observatory
NASA Astrophysics Data System (ADS)
Dobos, László; Varga-Verebélyi, Erika; Verdugo, Eva; Teyssier, David; Exter, Katrina; Valtchanov, Ivan; Budavári, Tamás; Kiss, Csaba
2016-10-01
Data from the Herschel Space Observatory is freely available to the public but no uniformly processed catalogue of the observations has been published so far. To date, the Herschel Science Archive does not contain the exact sky coverage (footprint) of individual observations and supports search for measurements based on bounding circles only. Drawing on previous experience in implementing footprint databases, we built the Herschel Footprint Database and Web Services for the Herschel Space Observatory to provide efficient search capabilities for typical astronomical queries. The database was designed with the following main goals in mind: (a) provide a unified data model for meta-data of all instruments and observational modes, (b) quickly find observations covering a selected object and its neighbourhood, (c) quickly find every observation in a larger area of the sky, (d) allow for finding solar system objects crossing observation fields. As a first step, we developed a unified data model of observations of all three Herschel instruments for all pointing and instrument modes. Then, using telescope pointing information and observational meta-data, we compiled a database of footprints. As opposed to methods using pixellation of the sphere, we represent sky coverage in an exact geometric form allowing for precise area calculations. For easier handling of Herschel observation footprints with rather complex shapes, two algorithms were implemented to reduce the outline. Furthermore, a new visualisation tool to plot footprints with various spherical projections was developed. Indexing of the footprints using Hierarchical Triangular Mesh makes it possible to quickly find observations based on sky coverage, time and meta-data. The database is accessible via a web site http://herschel.vo.elte.hu and also as a set of REST web service functions, which makes it readily usable from programming environments such as Python or IDL. The web service allows downloading footprint data in various formats including Virtual Observatory standards.
Early ICU Standardized Rehabilitation Therapy for the Critically Injured Burn Patient
2017-10-01
phase proposed to examine medical records within a large national hospital database to identify optimal care delivery patters. Minimizing the...The original study was deemed phase I and closed. The second phase proposed to examine medical records within a large national hospital database to...engineering, and the academic world on areas such as: • improving public knowledge, attitudes, skills, and abilities; • changing behavior, practices
Knowledge Quality Functions for Rule Discovery
1994-09-01
Managers in many organizations finding themselves in the possession of large and rapidly growing databases are beginning to suspect the information in their...missing values (Smyth and Goodman, 1992, p. 303). Decision trees "tend to grow very large for realistic applications and are thus difficult to interpret...by humans" (Holsheimer, 1994, p. 42). Decision trees also grow excessively complicated in the presence of noisy databases (Dhar and Tuzhilin, 1993, p
Leveraging Cognitive Context for Object Recognition
2014-06-01
learned from large image databases. We build upon this concept by exploring cognitive context, demonstrating how rich dynamic context provided by...context that people rely upon as they perceive the world. Context in ACT-R/E takes the form of associations between related concepts that are learned ...and accuracy of object recognition. Context is most often viewed as a static concept, learned from large image databases. We build upon this concept by
Reference Material Kydex(registered trademark)-100 Test Data Message for Flammability Testing
NASA Technical Reports Server (NTRS)
Engel, Carl D.; Richardson, Erin; Davis, Eddie
2003-01-01
The Marshall Space Flight Center (MSFC) Materials and Processes Technical Information System (MAPTIS) database contains, as an engineering resource, a large amount of material test data carefully obtained and recorded over a number of years. Flammability test data obtained using Test 1 of NASA-STD-6001 is a significant component of this database. NASA-STD-6001 recommends that Kydex 100 be used as a reference material for testing certification and for comparison between test facilities in the round-robin certification testing that occurs every 2 years. As a result of these regular activities, a large volume of test data is recorded within the MAPTIS database. The activity described in this technical report was undertaken to mine the database, recover flammability (Test 1) Kydex 100 data, and review the lessons learned from analysis of these data.
An ab initio electronic transport database for inorganic materials.
Ricci, Francesco; Chen, Wei; Aydemir, Umut; Snyder, G Jeffrey; Rignanese, Gian-Marco; Jain, Anubhav; Hautier, Geoffroy
2017-07-04
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material's band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present the workflow to generate the data, the data validation procedure, and the database structure. Our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.
Architectural Implications for Spatial Object Association Algorithms*
Kumar, Vijay S.; Kurc, Tahsin; Saltz, Joel; Abdulla, Ghaleb; Kohn, Scott R.; Matarazzo, Celeste
2013-01-01
Spatial object association, also referred to as crossmatch of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server®, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation provides insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST). PMID:25692244
Arntzen, Magnus Ø; Thiede, Bernd
2012-02-01
Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no.
Arntzen, Magnus Ø.; Thiede, Bernd
2012-01-01
Apoptosis is the most commonly described form of programmed cell death, and dysfunction is implicated in a large number of human diseases. Many quantitative proteome analyses of apoptosis have been performed to gain insight in proteins involved in the process. This resulted in large and complex data sets that are difficult to evaluate. Therefore, we developed the ApoptoProteomics database for storage, browsing, and analysis of the outcome of large scale proteome analyses of apoptosis derived from human, mouse, and rat. The proteomics data of 52 publications were integrated and unified with protein annotations from UniProt-KB, the caspase substrate database homepage (CASBAH), and gene ontology. Currently, more than 2300 records of more than 1500 unique proteins were included, covering a large proportion of the core signaling pathways of apoptosis. Analysis of the data set revealed a high level of agreement between the reported changes in directionality reported in proteomics studies and expected apoptosis-related function and may disclose proteins without a current recognized involvement in apoptosis based on gene ontology. Comparison between induction of apoptosis by the intrinsic and the extrinsic apoptotic signaling pathway revealed slight differences. Furthermore, proteomics has significantly contributed to the field of apoptosis in identifying hundreds of caspase substrates. The database is available at http://apoptoproteomics.uio.no. PMID:22067098
SING: Subgraph search In Non-homogeneous Graphs
2010-01-01
Background Finding the subgraphs of a graph database that are isomorphic to a given query graph has practical applications in several fields, from cheminformatics to image understanding. Since subgraph isomorphism is a computationally hard problem, indexing techniques have been intensively exploited to speed up the process. Such systems filter out those graphs which cannot contain the query, and apply a subgraph isomorphism algorithm to each residual candidate graph. The applicability of such systems is limited to databases of small graphs, because their filtering power degrades on large graphs. Results In this paper, SING (Subgraph search In Non-homogeneous Graphs), a novel indexing system able to cope with large graphs, is presented. The method uses the notion of feature, which can be a small subgraph, subtree or path. Each graph in the database is annotated with the set of all its features. The key point is to make use of feature locality information. This idea is used to both improve the filtering performance and speed up the subgraph isomorphism task. Conclusions Extensive tests on chemical compounds, biological networks and synthetic graphs show that the proposed system outperforms the most popular systems in query time over databases of medium and large graphs. Other specific tests show that the proposed system is effective for single large graphs. PMID:20170516
Patterns of Undergraduates' Use of Scholarly Databases in a Large Research University
ERIC Educational Resources Information Center
Mbabu, Loyd Gitari; Bertram, Albert; Varnum, Ken
2013-01-01
Authentication data was utilized to explore undergraduate usage of subscription electronic databases. These usage patterns were linked to the information literacy curriculum of the library. The data showed that out of the 26,208 enrolled undergraduate students, 42% of them accessed a scholarly database at least once in the course of the entire…
The Chandra Source Catalog : Automated Source Correlation
NASA Astrophysics Data System (ADS)
Hain, Roger; Evans, I. N.; Evans, J. D.; Glotfelty, K. J.; Anderson, C. S.; Bonaventura, N. R.; Chen, J. C.; Davis, J. E.; Doe, S. M.; Fabbiano, G.; Galle, E.; Gibbs, D. G.; Grier, J. D.; Hall, D. M.; Harbo, P. N.; He, X.; Houck, J. C.; Karovska, M.; Lauer, J.; McCollough, M. L.; McDowell, J. C.; Miller, J. B.; Mitschang, A. W.; Morgan, D. L.; Nichols, J. S.; Nowak, M. A.; Plummer, D. A.; Primini, F. A.; Refsdal, B. L.; Rots, A. H.; Siemiginowska, A. L.; Sundheim, B. A.; Tibbetts, M. S.; Van Stone, D. W.; Winkelman, S. L.; Zografou, P.
2009-01-01
Chandra Source Catalog (CSC) master source pipeline processing seeks to automatically detect sources and compute their properties. Since Chandra is a pointed mission and not a sky survey, different sky regions are observed for a different number of times at varying orientations, resolutions, and other heterogeneous conditions. While this provides an opportunity to collect data from a potentially large number of observing passes, it also creates challenges in determining the best way to combine different detection results for the most accurate characterization of the detected sources. The CSC master source pipeline correlates data from multiple observations by updating existing cataloged source information with new data from the same sky region as they become available. This process sometimes leads to relatively straightforward conclusions, such as when single sources from two observations are similar in size and position. Other observation results require more logic to combine, such as one observation finding a single, large source and another identifying multiple, smaller sources at the same position. We present examples of different overlapping source detections processed in the current version of the CSC master source pipeline. We explain how they are resolved into entries in the master source database, and examine the challenges of computing source properties for the same source detected multiple times. Future enhancements are also discussed. This work is supported by NASA contract NAS8-03060 (CXC).
Macy, Michelle L; Stanley, Rachel M; Sasson, Comilla; Gebremariam, Achamyeleh; Davis, Matthew M
2010-09-01
Pediatric observation units provide an alternative to traditional hospitalization. The extent to which observation units could replace inpatient care for asthmatic children is unknown. To describe brief inpatient ("high-turnover," HTO) stays for US children hospitalized with a principal discharge diagnosis of asthma, to characterize cases that may be appropriate for observation. We analyzed the 2006 Kids' Inpatient Database, a nationally representative sample of hospital discharges. HTO stays were defined as hospitalizations of 0 or 1 night in duration. We conducted descriptive statistics and case-mix adjusted, sample-weighted regression analysis of HTO stays, and associated hospital charges. Discharges among children aged 2 to 20 years with a principal discharge diagnosis of asthma. HTO stays and total charges. Overall, 34,592 (34%) pediatric asthma hospitalizations were HTO, accounting for 66,278 hospital days in 2006. HTO stays were associated with younger age, uncomplicated asthma, and private insurance. Freestanding children's hospitals had the highest proportion of HTO stays, 38% (95% CI: 34%-42%) compared with 32% (95% CI: 28%-36%) for children's units and 33% (95% CI: 31%-34%) for general hospitals. In multivariate regression analyses, charges were significantly higher across hospital types when HTO stays begin in the emergency department. The presence of a large number of HTO stays for children hospitalized for asthma suggests the need to explore opportunities to restructure care for this condition, perhaps through the development of physically or operationally distinct observation units.
Constraints on global oceanic emissions of N2O from observations and models
NASA Astrophysics Data System (ADS)
Buitenhuis, Erik T.; Suntharalingam, Parvadha; Le Quéré, Corinne
2018-04-01
We estimate the global ocean N2O flux to the atmosphere and its confidence interval using a statistical method based on model perturbation simulations and their fit to a database of ΔpN2O (n = 6136). We evaluate two submodels of N2O production. The first submodel splits N2O production into oxic and hypoxic pathways following previous publications. The second submodel explicitly represents the redox transformations of N that lead to N2O production (nitrification and hypoxic denitrification) and N2O consumption (suboxic denitrification), and is presented here for the first time. We perturb both submodels by modifying the key parameters of the N2O cycling pathways (nitrification rates; NH4+ uptake; N2O yields under oxic, hypoxic and suboxic conditions) and determine a set of optimal model parameters by minimisation of a cost function against four databases of N cycle observations. Our estimate of the global oceanic N2O flux resulting from this cost function minimisation derived from observed and model ΔpN2O concentrations is 2.4 ± 0.8 and 2.5 ± 0.8 Tg N yr-1 for the two N2O submodels. These estimates suggest that the currently available observational data of surface ΔpN2O constrain the global N2O flux to a narrower range relative to the large range of results presented in the latest IPCC report.
NASA Astrophysics Data System (ADS)
Herper, H. C.; Ahmed, T.; Wills, J. M.; Di Marco, I.; Björkman, T.; Iuşan, D.; Balatsky, A. V.; Eriksson, O.
2017-08-01
Recent progress in materials informatics has opened up the possibility of a new approach to accessing properties of materials in which one assays the aggregate properties of a large set of materials within the same class in addition to a detailed investigation of each compound in that class. Here we present a large scale investigation of electronic properties and correlated magnetism in Ce-based compounds accompanied by a systematic study of the electronic structure and 4 f -hybridization function of a large body of Ce compounds. We systematically study the electronic structure and 4 f -hybridization function of a large body of Ce compounds with the goal of elucidating the nature of the 4 f states and their interrelation with the measured Kondo energy in these compounds. The hybridization function has been analyzed for more than 350 data sets (being part of the IMS database) of cubic Ce compounds using electronic structure theory that relies on a full-potential approach. We demonstrate that the strength of the hybridization function, evaluated in this way, allows us to draw precise conclusions about the degree of localization of the 4 f states in these compounds. The theoretical results are entirely consistent with all experimental information, relevant to the degree of 4 f localization for all investigated materials. Furthermore, a more detailed analysis of the electronic structure and the hybridization function allows us to make precise statements about Kondo correlations in these systems. The calculated hybridization functions, together with the corresponding density of states, reproduce the expected exponential behavior of the observed Kondo temperatures and prove a consistent trend in real materials. This trend allows us to predict which systems may be correctly identified as Kondo systems. A strong anticorrelation between the size of the hybridization function and the volume of the systems has been observed. The information entropy for this set of systems is about 0.42. Our approach demonstrates the predictive power of materials informatics when a large number of materials is used to establish significant trends. This predictive power can be used to design new materials with desired properties. The applicability of this approach for other correlated electron systems is discussed.
Integrated Primary Care Information Database (IPCI)
The Integrated Primary Care Information Database is a longitudinal observational database that was created specifically for pharmacoepidemiological and pharmacoeconomic studies, inlcuding data from computer-based patient records supplied voluntarily by general practitioners.
The database design of LAMOST based on MYSQL/LINUX
NASA Astrophysics Data System (ADS)
Li, Hui-Xian, Sang, Jian; Wang, Sha; Luo, A.-Li
2006-03-01
The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) will be set up in the coming years. A fully automated software system for reducing and analyzing the spectra has to be developed with the telescope. This database system is an important part of the software system. The requirements for the database of the LAMOST, the design of the LAMOST database system based on MYSQL/LINUX and performance tests of this system are described in this paper.
Building a Database for a Quantitative Model
NASA Technical Reports Server (NTRS)
Kahn, C. Joseph; Kleinhammer, Roger
2014-01-01
A database can greatly benefit a quantitative analysis. The defining characteristic of a quantitative risk, or reliability, model is the use of failure estimate data. Models can easily contain a thousand Basic Events, relying on hundreds of individual data sources. Obviously, entering so much data by hand will eventually lead to errors. Not so obviously entering data this way does not aid linking the Basic Events to the data sources. The best way to organize large amounts of data on a computer is with a database. But a model does not require a large, enterprise-level database with dedicated developers and administrators. A database built in Excel can be quite sufficient. A simple spreadsheet database can link every Basic Event to the individual data source selected for them. This database can also contain the manipulations appropriate for how the data is used in the model. These manipulations include stressing factors based on use and maintenance cycles, dormancy, unique failure modes, the modeling of multiple items as a single "Super component" Basic Event, and Bayesian Updating based on flight and testing experience. A simple, unique metadata field in both the model and database provides a link from any Basic Event in the model to its data source and all relevant calculations. The credibility for the entire model often rests on the credibility and traceability of the data.
ARCPHdb: A comprehensive protein database for SF1 and SF2 helicase from archaea.
Moukhtar, Mirna; Chaar, Wafi; Abdel-Razzak, Ziad; Khalil, Mohamad; Taha, Samir; Chamieh, Hala
2017-01-01
Superfamily 1 and Superfamily 2 helicases, two of the largest helicase protein families, play vital roles in many biological processes including replication, transcription and translation. Study of helicase proteins in the model microorganisms of archaea have largely contributed to the understanding of their function, architecture and assembly. Based on a large phylogenomics approach, we have identified and classified all SF1 and SF2 protein families in ninety five sequenced archaea genomes. Here we developed an online webserver linked to a specialized protein database named ARCPHdb to provide access for SF1 and SF2 helicase families from archaea. ARCPHdb was implemented using MySQL relational database. Web interfaces were developed using Netbeans. Data were stored according to UniProt accession numbers, NCBI Ref Seq ID, PDB IDs and Entrez Databases. A user-friendly interactive web interface has been developed to browse, search and download archaeal helicase protein sequences, their available 3D structure models, and related documentation available in the literature provided by ARCPHdb. The database provides direct links to matching external databases. The ARCPHdb is the first online database to compile all protein information on SF1 and SF2 helicase from archaea in one platform. This database provides essential resource information for all researchers interested in the field. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Henderson, B. H.; Akhtar, F.; Pye, H. O. T.; Napelenok, S. L.; Hutzell, W. T.
2014-02-01
Transported air pollutants receive increasing attention as regulations tighten and global concentrations increase. The need to represent international transport in regional air quality assessments requires improved representation of boundary concentrations. Currently available observations are too sparse vertically to provide boundary information, particularly for ozone precursors, but global simulations can be used to generate spatially and temporally varying lateral boundary conditions (LBC). This study presents a public database of global simulations designed and evaluated for use as LBC for air quality models (AQMs). The database covers the contiguous United States (CONUS) for the years 2001-2010 and contains hourly varying concentrations of ozone, aerosols, and their precursors. The database is complemented by a tool for configuring the global results as inputs to regional scale models (e.g., Community Multiscale Air Quality or Comprehensive Air quality Model with extensions). This study also presents an example application based on the CONUS domain, which is evaluated against satellite retrieved ozone and carbon monoxide vertical profiles. The results show performance is largely within uncertainty estimates for ozone from the Ozone Monitoring Instrument and carbon monoxide from the Measurements Of Pollution In The Troposphere (MOPITT), but there were some notable biases compared with Tropospheric Emission Spectrometer (TES) ozone. Compared with TES, our ozone predictions are high-biased in the upper troposphere, particularly in the south during January. This publication documents the global simulation database, the tool for conversion to LBC, and the evaluation of concentrations on the boundaries. This documentation is intended to support applications that require representation of long-range transport of air pollutants.
NASA Technical Reports Server (NTRS)
Benson, Robert F.; Truhlik, Vladimir; Huang, Xueqin; Wang, Yongli; Bilitza, Dieter
2012-01-01
The topside sounders of the International Satellites for Ionospheric Studies (ISIS) program were designed as analog systems. The resulting ionograms were displayed on 35 mm film for analysis by visual inspection. Each of these satellites, launched between 1962 and 1971, produced data for 10 to 20 years. A number of the original telemetry tapes from this large data set have been converted directly into digital records. Software, known as the Topside Ionogram Scalar With True-Height (TOPIST) algorithm, has been produced and used for the automatic inversion of the ionogram reflection traces on more than 100,000 ISIS-2 digital topside ionograms into topside vertical electron density profiles Ne(h). Here we present some topside ionospheric solar cycle variations deduced from the TOPIST database to illustrate the scientific benefit of improving and expanding the topside ionospheric Ne(h) database. The profile improvements will be based on improvements in the TOPIST software motivated by direct comparisons between TOPIST profiles and profiles produced by manual scaling in the early days of the ISIS program. The database expansion will be based on new software designed to overcome limitations in the original digital topside ionogram database caused by difficulties encountered during the analog-to-digital conversion process in the detection of the ionogram frame sync pulse and/or the frequency markers. This improved and expanded TOPIST topside Ne(h) database will greatly enhance investigations into both short- and long-term ionospheric changes, e.g., the observed topside ionospheric responses to magnetic storms, induced by interplanetary magnetic clouds, and solar cycle variations, respectively.
High-Performance Secure Database Access Technologies for HEP Grids
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matthew Vranicar; John Weicher
2006-04-17
The Large Hadron Collider (LHC) at the CERN Laboratory will become the largest scientific instrument in the world when it starts operations in 2007. Large Scale Analysis Computer Systems (computational grids) are required to extract rare signals of new physics from petabytes of LHC detector data. In addition to file-based event data, LHC data processing applications require access to large amounts of data in relational databases: detector conditions, calibrations, etc. U.S. high energy physicists demand efficient performance of grid computing applications in LHC physics research where world-wide remote participation is vital to their success. To empower physicists with data-intensive analysismore » capabilities a whole hyperinfrastructure of distributed databases cross-cuts a multi-tier hierarchy of computational grids. The crosscutting allows separation of concerns across both the global environment of a federation of computational grids and the local environment of a physicist’s computer used for analysis. Very few efforts are on-going in the area of database and grid integration research. Most of these are outside of the U.S. and rely on traditional approaches to secure database access via an extraneous security layer separate from the database system core, preventing efficient data transfers. Our findings are shared by the Database Access and Integration Services Working Group of the Global Grid Forum, who states that "Research and development activities relating to the Grid have generally focused on applications where data is stored in files. However, in many scientific and commercial domains, database management systems have a central role in data storage, access, organization, authorization, etc, for numerous applications.” There is a clear opportunity for a technological breakthrough, requiring innovative steps to provide high-performance secure database access technologies for grid computing. We believe that an innovative database architecture where the secure authorization is pushed into the database engine will eliminate inefficient data transfer bottlenecks. Furthermore, traditionally separated database and security layers provide an extra vulnerability, leaving a weak clear-text password authorization as the only protection on the database core systems. Due to the legacy limitations of the systems’ security models, the allowed passwords often can not even comply with the DOE password guideline requirements. We see an opportunity for the tight integration of the secure authorization layer with the database server engine resulting in both improved performance and improved security. Phase I has focused on the development of a proof-of-concept prototype using Argonne National Laboratory’s (ANL) Argonne Tandem-Linac Accelerator System (ATLAS) project as a test scenario. By developing a grid-security enabled version of the ATLAS project’s current relation database solution, MySQL, PIOCON Technologies aims to offer a more efficient solution to secure database access.« less
GIS applications for military operations in coastal zones
Fleming, S.; Jordan, T.; Madden, M.; Usery, E.L.; Welch, R.
2009-01-01
In order to successfully support current and future US military operations in coastal zones, geospatial information must be rapidly integrated and analyzed to meet ongoing force structure evolution and new mission directives. Coastal zones in a military-operational environment are complex regions that include sea, land and air features that demand high-volume databases of extreme detail within relatively narrow geographic corridors. Static products in the form of analog maps at varying scales traditionally have been used by military commanders and their operational planners. The rapidly changing battlefield of 21st Century warfare, however, demands dynamic mapping solutions. Commercial geographic information system (GIS) software for military-specific applications is now being developed and employed with digital databases to provide customized digital maps of variable scale, content and symbolization tailored to unique demands of military units. Research conducted by the Center for Remote Sensing and Mapping Science at the University of Georgia demonstrated the utility of GIS-based analysis and digital map creation when developing large-scale (1:10,000) products from littoral warfare databases. The methodology employed-selection of data sources (including high resolution commercial images and Lidar), establishment of analysis/modeling parameters, conduct of vehicle mobility analysis, development of models and generation of products (such as a continuous sea-land DEM and geo-visualization of changing shorelines with tidal levels)-is discussed. Based on observations and identified needs from the National Geospatial-Intelligence Agency, formerly the National Imagery and Mapping Agency, and the Department of Defense, prototype GIS models for military operations in sea, land and air environments were created from multiple data sets of a study area at US Marine Corps Base Camp Lejeune, North Carolina. Results of these models, along with methodologies for developing large-scale littoral warfare databases, aid the National Geospatial-Intelligence Agency in meeting littoral warfare analysis, modeling and map generation requirements for US military organizations. ?? 2008 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS).
GIS applications for military operations in coastal zones
NASA Astrophysics Data System (ADS)
Fleming, S.; Jordan, T.; Madden, M.; Usery, E. L.; Welch, R.
In order to successfully support current and future US military operations in coastal zones, geospatial information must be rapidly integrated and analyzed to meet ongoing force structure evolution and new mission directives. Coastal zones in a military-operational environment are complex regions that include sea, land and air features that demand high-volume databases of extreme detail within relatively narrow geographic corridors. Static products in the form of analog maps at varying scales traditionally have been used by military commanders and their operational planners. The rapidly changing battlefield of 21st Century warfare, however, demands dynamic mapping solutions. Commercial geographic information system (GIS) software for military-specific applications is now being developed and employed with digital databases to provide customized digital maps of variable scale, content and symbolization tailored to unique demands of military units. Research conducted by the Center for Remote Sensing and Mapping Science at the University of Georgia demonstrated the utility of GIS-based analysis and digital map creation when developing large-scale (1:10,000) products from littoral warfare databases. The methodology employed-selection of data sources (including high resolution commercial images and Lidar), establishment of analysis/modeling parameters, conduct of vehicle mobility analysis, development of models and generation of products (such as a continuous sea-land DEM and geo-visualization of changing shorelines with tidal levels)-is discussed. Based on observations and identified needs from the National Geospatial-Intelligence Agency, formerly the National Imagery and Mapping Agency, and the Department of Defense, prototype GIS models for military operations in sea, land and air environments were created from multiple data sets of a study area at US Marine Corps Base Camp Lejeune, North Carolina. Results of these models, along with methodologies for developing large-scale littoral warfare databases, aid the National Geospatial-Intelligence Agency in meeting littoral warfare analysis, modeling and map generation requirements for US military organizations.
Morphology-based Query for Galaxy Image Databases
NASA Astrophysics Data System (ADS)
Shamir, Lior
2017-02-01
Galaxies of rare morphology are of paramount scientific interest, as they carry important information about the past, present, and future Universe. Once a rare galaxy is identified, studying it more effectively requires a set of galaxies of similar morphology, allowing generalization and statistical analysis that cannot be done when N=1. Databases generated by digital sky surveys can contain a very large number of galaxy images, and therefore once a rare galaxy of interest is identified it is possible that more instances of the same morphology are also present in the database. However, when a researcher identifies a certain galaxy of rare morphology in the database, it is virtually impossible to mine the database manually in the search for galaxies of similar morphology. Here we propose a computer method that can automatically search databases of galaxy images and identify galaxies that are morphologically similar to a certain user-defined query galaxy. That is, the researcher provides an image of a galaxy of interest, and the pattern recognition system automatically returns a list of galaxies that are visually similar to the target galaxy. The algorithm uses a comprehensive set of descriptors, allowing it to support different types of galaxies, and it is not limited to a finite set of known morphologies. While the list of returned galaxies is neither clean nor complete, it contains a far higher frequency of galaxies of the morphology of interest, providing a substantial reduction of the data. Such algorithms can be integrated into data management systems of autonomous digital sky surveys such as the Large Synoptic Survey Telescope (LSST), where the number of galaxies in the database is extremely large. The source code of the method is available at http://vfacstaff.ltu.edu/lshamir/downloads/udat.
An SQL query generator for CLIPS
NASA Technical Reports Server (NTRS)
Snyder, James; Chirica, Laurian
1990-01-01
As expert systems become more widely used, their access to large amounts of external information becomes increasingly important. This information exists in several forms such as statistical, tabular data, knowledge gained by experts and large databases of information maintained by companies. Because many expert systems, including CLIPS, do not provide access to this external information, much of the usefulness of expert systems is left untapped. The scope of this paper is to describe a database extension for the CLIPS expert system shell. The current industry standard database language is SQL. Due to SQL standardization, large amounts of information stored on various computers, potentially at different locations, will be more easily accessible. Expert systems should be able to directly access these existing databases rather than requiring information to be re-entered into the expert system environment. The ORACLE relational database management system (RDBMS) was used to provide a database connection within the CLIPS environment. To facilitate relational database access a query generation system was developed as a CLIPS user function. The queries are entered in a CLlPS-like syntax and are passed to the query generator, which constructs and submits for execution, an SQL query to the ORACLE RDBMS. The query results are asserted as CLIPS facts. The query generator was developed primarily for use within the ICADS project (Intelligent Computer Aided Design System) currently being developed by the CAD Research Unit in the California Polytechnic State University (Cal Poly). In ICADS, there are several parallel or distributed expert systems accessing a common knowledge base of facts. Expert system has a narrow domain of interest and therefore needs only certain portions of the information. The query generator provides a common method of accessing this information and allows the expert system to specify what data is needed without specifying how to retrieve it.
Large-eddy simulation of the urban boundary layer in the MEGAPOLI Paris Plume experiment
NASA Astrophysics Data System (ADS)
Esau, Igor
2010-05-01
This study presents results from the specific large-eddy simulation study of the urban boundary layer in the MEGAPOLI Paris Plume field campaign. We used LESNIC and PALM codes, MEGAPOLI city morphology database, nudging to the observed meteorological conditions during the Paris Plume campaign and some concentration measurements from that campaign to simulate and better understand the nature of the urban boundary layer on scales larger then the street canyon scales. The primary attention was paid to turbulence self-organization and structure-to-surface interaction. The study has been aimed to demonstrate feasibility and estimate required resources for such research. Therefore, at this stage we do not compare the simulation with other relevant studies as well as we do not formulate the theoretical conclusions.
Cyclic subway networks are less risky in metropolises
NASA Astrophysics Data System (ADS)
Xiao, Ying; Zhang, Hai-Tao; Xu, Bowen; Zhu, Tao; Chen, Guanrong; Chen, Duxin
2018-02-01
Subways are crucial in modern transportation systems of metropolises. To quantitatively evaluate the potential risks of subway networks suffered from natural disasters or deliberate attacks, real data from seven Chinese subway systems are collected and their population distributions and anti-risk capabilities are analyzed. Counterintuitively, it is found that transfer stations with large numbers of connections are not the most crucial, but the stations and lines with large betweenness centrality are essential, if subway networks are being attacked. It is also found that cycles reduce such correlations due to the existence of alternative paths. To simulate the data-based observations, a network model is proposed to characterize the dynamics of subway systems under various intensities of attacks on stations and lines. This study sheds some light onto risk assessment of subway networks in metropolitan cities.
Statistical Downscaling in Multi-dimensional Wave Climate Forecast
NASA Astrophysics Data System (ADS)
Camus, P.; Méndez, F. J.; Medina, R.; Losada, I. J.; Cofiño, A. S.; Gutiérrez, J. M.
2009-04-01
Wave climate at a particular site is defined by the statistical distribution of sea state parameters, such as significant wave height, mean wave period, mean wave direction, wind velocity, wind direction and storm surge. Nowadays, long-term time series of these parameters are available from reanalysis databases obtained by numerical models. The Self-Organizing Map (SOM) technique is applied to characterize multi-dimensional wave climate, obtaining the relevant "wave types" spanning the historical variability. This technique summarizes multi-dimension of wave climate in terms of a set of clusters projected in low-dimensional lattice with a spatial organization, providing Probability Density Functions (PDFs) on the lattice. On the other hand, wind and storm surge depend on instantaneous local large-scale sea level pressure (SLP) fields while waves depend on the recent history of these fields (say, 1 to 5 days). Thus, these variables are associated with large-scale atmospheric circulation patterns. In this work, a nearest-neighbors analog method is used to predict monthly multi-dimensional wave climate. This method establishes relationships between the large-scale atmospheric circulation patterns from numerical models (SLP fields as predictors) with local wave databases of observations (monthly wave climate SOM PDFs as predictand) to set up statistical models. A wave reanalysis database, developed by Puertos del Estado (Ministerio de Fomento), is considered as historical time series of local variables. The simultaneous SLP fields calculated by NCEP atmospheric reanalysis are used as predictors. Several applications with different size of sea level pressure grid and with different temporal domain resolution are compared to obtain the optimal statistical model that better represents the monthly wave climate at a particular site. In this work we examine the potential skill of this downscaling approach considering perfect-model conditions, but we will also analyze the suitability of this methodology to be used for seasonal forecast and for long-term climate change scenario projection of wave climate.
NASA Astrophysics Data System (ADS)
Guenther, A. B.; Duhl, T.
2011-12-01
Increasing computational resources have enabled a steady improvement in the spatial resolution used for earth system models. Land surface models and landcover distributions have kept ahead by providing higher spatial resolution than typically used in these models. Satellite observations have played a major role in providing high resolution landcover distributions over large regions or the entire earth surface but ground observations are needed to calibrate these data and provide accurate inputs for models. As our ability to resolve individual landscape components improves, it is important to consider what scale is sufficient for providing inputs to earth system models. The required spatial scale is dependent on the processes being represented and the scientific questions being addressed. This presentation will describe the development a contiguous U.S. landcover database using high resolution imagery (1 to 1000 meters) and surface observations of species composition and other landcover characteristics. The database includes plant functional types and species composition and is suitable for driving land surface models (CLM and MEGAN) that predict land surface exchange of carbon, water, energy and biogenic reactive gases (e.g., isoprene, sesquiterpenes, and NO). We investigate the sensitivity of model results to landcover distributions with spatial scales ranging over six orders of magnitude (1 meter to 1000000 meters). The implications for predictions of regional climate and air quality will be discussed along with recommendations for regional and global earth system modeling.
HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.
O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D
2015-04-01
The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. Copyright © 2015 Elsevier Inc. All rights reserved.
Large scale database scrubbing using object oriented software components.
Herting, R L; Barnes, M R
1998-01-01
Now that case managers, quality improvement teams, and researchers use medical databases extensively, the ability to share and disseminate such databases while maintaining patient confidentiality is paramount. A process called scrubbing addresses this problem by removing personally identifying information while keeping the integrity of the medical information intact. Scrubbing entire databases, containing multiple tables, requires that the implicit relationships between data elements in different tables of the database be maintained. To address this issue we developed DBScrub, a Java program that interfaces with any JDBC compliant database and scrubs the database while maintaining the implicit relationships within it. DBScrub uses a small number of highly configurable object-oriented software components to carry out the scrubbing. We describe the structure of these software components and how they maintain the implicit relationships within the database.
Robino, C; Ralf, A; Pasino, S; De Marchi, M R; Ballantyne, K N; Barbaro, A; Bini, C; Carnevali, E; Casarino, L; Di Gaetano, C; Fabbri, M; Ferri, G; Giardina, E; Gonzalez, A; Matullo, G; Nutini, A L; Onofri, V; Piccinini, A; Piglionica, M; Ponzano, E; Previderè, C; Resta, N; Scarnicci, F; Seidita, G; Sorçaburu-Cigliero, S; Turrina, S; Verzeletti, A; Kayser, M
2015-03-01
Recently introduced rapidly mutating Y-chromosomal short tandem repeat (RM Y-STR) loci, displaying a multiple-fold higher mutation rate relative to any other Y-STRs, including those conventionally used in forensic casework, have been demonstrated to improve the resolution of male lineage differentiation and to allow male relative separation usually impossible with standard Y-STRs. However, large and geographically-detailed frequency haplotype databases are required to estimate the statistical weight of RM Y-STR haplotype matches if observed in forensic casework. With this in mind, the Italian Working Group (GEFI) of the International Society for Forensic Genetics launched a collaborative exercise aimed at generating an Italian quality controlled forensic RM Y-STR haplotype database. Overall 1509 male individuals from 13 regional populations covering northern, central and southern areas of the Italian peninsula plus Sicily were collected, including both "rural" and "urban" samples classified according to population density in the sampling area. A subset of individuals was additionally genotyped for Y-STR loci included in the Yfiler and PowerPlex Y23 (PPY23) systems (75% and 62%, respectively), allowing the comparison of RM and conventional Y-STRs. Considering the whole set of 13 RM Y-STRs, 1501 unique haplotypes were observed among the 1509 sampled Italian men with a haplotype diversity of 0.999996, largely superior to Yfiler and PPY23 with 0.999914 and 0.999950, respectively. AMOVA indicated that 99.996% of the haplotype variation was within populations, confirming that genetic-geographic structure is almost undetected by RM Y-STRs. Haplotype sharing among regional Italian populations was not observed at all with the complete set of 13 RM Y-STRs. Haplotype sharing within Italian populations was very rare (0.27% non-unique haplotypes), and lower in urban (0.22%) than rural (0.29%) areas. Additionally, 422 father-son pairs were investigated, and 20.1% of them could be discriminated by the whole set of 13 RM Y-STRs, which was very close to the theoretically expected estimate of 19.5% given the mutation rates of the markers used. Results obtained from a high-coverage Italian haplotype dataset confirm on the regional scale the exceptional ability of RM Y-STRs to resolve male lineages previously observed globally, and attest the unsurpassed value of RM Y-STRs for male-relative differentiation purposes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rahardianto, Trias; Saputra, Aditya; Gomez, Christopher
2017-07-01
Research on landslide susceptibility has evolved rapidly over the few last decades thanks to the availability of large databases. Landslide research used to be focused on discreet events but the usage of large inventory dataset has become a central pillar of landslide susceptibility, hazard, and risk assessment. Indeed, extracting meaningful information from the large database is now at the forth of geoscientific research, following the big-data research trend. Indeed, the more comprehensive information of the past landslide available in a particular area is, the better the produced map will be, in order to support the effective decision making, planning, and engineering practice. The landslide inventory data which is freely accessible online gives an opportunity for many researchers and decision makers to prevent casualties and economic loss caused by future landslides. This data is advantageous especially for areas with poor landslide historical data. Since the construction criteria of landslide inventory map and its quality evaluation remain poorly defined, the assessment of open source landslide inventory map reliability is required. The present contribution aims to assess the reliability of open-source landslide inventory data based on the particular topographical setting of the observed area in Niigata prefecture, Japan. Geographic Information System (GIS) platform and statistical approach are applied to analyze the data. Frequency ratio method is utilized to model and assess the landslide map. The outcomes of the generated model showed unsatisfactory results with AUC value of 0.603 indicate the low prediction accuracy and unreliability of the model.
NUCFRG2: An evaluation of the semiempirical nuclear fragmentation database
NASA Technical Reports Server (NTRS)
Wilson, J. W.; Tripathi, R. K.; Cucinotta, F. A.; Shinn, J. L.; Badavi, F. F.; Chun, S. Y.; Norbury, J. W.; Zeitlin, C. J.; Heilbronn, L.; Miller, J.
1995-01-01
A semiempirical abrasion-ablation model has been successful in generating a large nuclear database for the study of high charge and energy (HZE) ion beams, radiation physics, and galactic cosmic ray shielding. The cross sections that are generated are compared with measured HZE fragmentation data from various experimental groups. A research program for improvement of the database generator is also discussed.
NCBI2RDF: enabling full RDF-based access to NCBI databases.
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments.
Angermeier, Paul L.; Frimpong, Emmanuel A.
2011-01-01
The need for integrated and widely accessible sources of species traits data to facilitate studies of ecology, conservation, and management has motivated development of traits databases for various taxa. In spite of the increasing number of traits-based analyses of freshwater fishes in the United States, no consolidated database of traits of this group exists publicly, and much useful information on these species is documented only in obscure sources. The largely inaccessible and unconsolidated traits information makes large-scale analysis involving many fishes and/or traits particularly challenging. We have compiled a database of > 100 traits for 809 (731 native and 78 nonnative) fish species found in freshwaters of the conterminous United States, including 37 native families and 145 native genera. The database, named Fish Traits, contains information on four major categories of traits: (1) trophic ecology; (2) body size, reproductive ecology, and life history; (3) habitat preferences; and (4) salinity and temperature tolerances. Information on geographic distribution and conservation status was also compiled. The database enhances many opportunities for conducting research on fish species traits and constitutes the first step toward establishing a central repository for a continually expanding set of traits of North American fishes.
Evolution of the use of relational and NoSQL databases in the ATLAS experiment
NASA Astrophysics Data System (ADS)
Barberis, D.
2016-09-01
The ATLAS experiment used for many years a large database infrastructure based on Oracle to store several different types of non-event data: time-dependent detector configuration and conditions data, calibrations and alignments, configurations of Grid sites, catalogues for data management tools, job records for distributed workload management tools, run and event metadata. The rapid development of "NoSQL" databases (structured storage services) in the last five years allowed an extended and complementary usage of traditional relational databases and new structured storage tools in order to improve the performance of existing applications and to extend their functionalities using the possibilities offered by the modern storage systems. The trend is towards using the best tool for each kind of data, separating for example the intrinsically relational metadata from payload storage, and records that are frequently updated and benefit from transactions from archived information. Access to all components has to be orchestrated by specialised services that run on front-end machines and shield the user from the complexity of data storage infrastructure. This paper describes this technology evolution in the ATLAS database infrastructure and presents a few examples of large database applications that benefit from it.
NASA Astrophysics Data System (ADS)
Lanckman, Jean-Pierre; Elger, Kirsten; Karlsson, Ævar Karl; Johannsson, Halldór; Lantuit, Hugues
2013-04-01
Permafrost is a direct indicator of climate change and has been identified as Essential Climate Variable (ECV) by the global observing community. The monitoring of permafrost temperatures, active-layer thicknesses and other parameters has been performed for several decades already, but it was brought together within the Global Terrestrial Network for Permafrost (GTN-P) in the 1990's only, including the development of measurement protocols to provide standardized data. GTN-P is the primary international observing network for permafrost sponsored by the Global Climate Observing System (GCOS) and the Global Terrestrial Observing System (GTOS), and managed by the International Permafrost Association (IPA). All GTN-P data was outfitted with an "open data policy" with free data access via the World Wide Web. The existing data, however, is far from being homogeneous: it is not yet optimized for databases, there is no framework for data reporting or archival and data documentation is incomplete. As a result, and despite the utmost relevance of permafrost in the Earth's climate system, the data has not been used by as many researchers as intended by the initiators of the programs. While the monitoring of many other ECVs has been tackled by organized international networks (e.g. FLUXNET), there is still no central database for all permafrost-related parameters. The European Union project PAGE21 created opportunities to develop this central database for permafrost monitoring parameters of GTN-P during the duration of the project and beyond. The database aims to be the one location where the researcher can find data, metadata, and information of all relevant parameters for a specific site. Each component of the Data Management System (DMS), including parameters, data levels and metadata formats were developed in cooperation with the GTN-P and the IPA. The general framework of the GTN-P DMS is based on an object oriented model (OOM), open for as many parameters as possible, and implemented into a spatial database. To ensure interoperability and enable potential inter-database search, field names are following international metadata standards and are based on a control vocabulary registry. Tools are developed to provide data processing, analysis capability, and quality control. Our system aims to be a reference model, improvable and reusable. It allows a maximum top-down and bottom-up data flow, giving scientists one global searchable data and metadata repository, the public a full access to scientific data, and the policy maker a powerful cartographic and statistical tool. To engage the international community in GTN-P, it was essential to develop an online interface for data upload. Aim for this was that it is easy-to-use and allows data input with a minimum of technical and personal effort. In addition to this, large efforts will have to be produced in order to be able to query, visualize and retrieve information over many platforms and type of measurements. Ultimately, it is not the layer in itself that matter, but more the relationship that these information layers maintain with each other.
Stojanović, Emilija; Ristić, Vladimir; McMaster, Daniel Travis; Milanović, Zoran
2017-05-01
Plyometric training is an effective method to prevent knee injuries in female athletes; however, the effects of plyometric training on jump performance in female athletes is unclear. The aim of this systematic review and meta-analysis was to determine the effectiveness of plyometric training on vertical jump (VJ) performance of amateur, collegiate and elite female athletes. Six electronic databases were searched (PubMed, MEDLINE, ERIC, Google Scholar, SCIndex and ScienceDirect). The included studies were coded for the following criteria: training status, training modality and type of outcome measures. The methodological quality of each study was assessed using the physiotherapy evidence database (PEDro) scale. The effects of plyometric training on VJ performance were based on the following standardised pre-post testing effect size (ES) thresholds: trivial (<0.20), small (0.21-0.60), moderate (0.61-1.20), large (1.21-2.00), very large (2.01-4.00) and extremely large (>4.00). A total of 16 studies met the inclusion criteria. The meta-analysis revealed that plyometric training had a most likely moderate effect on countermovement jump (CMJ) height performance (ES = 1.09; 95 % confidence interval [CI] 0.57-1.61; I 2 = 75.60 %). Plyometric training interventions of less than 10 weeks in duration had a most likely small effect on CMJ height performance (ES = 0.58; 95 % CI 0.25-0.91). In contrast, plyometric training durations greater than 10 weeks had a most likely large effect on CMJ height (ES = 1.87; 95 % CI 0.73-3.01). The effect of plyometric training on concentric-only squat jump (SJ) height was likely small (ES = 0.44; 95 % CI -0.09 to 0.97). Similar effects were observed on SJ height after 6 weeks of plyometric training in amateur (ES = 0.35) and young (ES = 0.49) athletes, respectively. The effect of plyometric training on CMJ height with the arm swing was likely large (ES = 1.31; 95 % CI -0.04 to 2.65). The largest plyometric training effects were observed in drop jump (DJ) height performance (ES = 3.59; 95 % CI -3.04 to 10.23). Most likely extremely large plyometric training effects on DJ height performance (ES = 7.07; 95 % CI 4.71-9.43) were observed following 12 weeks of plyometric training. In contrast, a possibly small positive training effect (ES = 0.30; 95 % CI -0.63 to 1.23) was observed following 6 weeks of plyometric training. Plyometric training is an effective form of training to improve VJ performance (e.g. CMJ, SJ and DJ) in female athletes. The benefits of plyometric training on VJ performance are greater for interventions of longer duration (≥10 weeks).
NASA Astrophysics Data System (ADS)
Hashimoto, Shoji; Nanko, Kazuki; Ťupek, Boris; Lehtonen, Aleksi
2017-03-01
Future climate change will dramatically change the carbon balance in the soil, and this change will affect the terrestrial carbon stock and the climate itself. Earth system models (ESMs) are used to understand the current climate and to project future climate conditions, but the soil organic carbon (SOC) stock simulated by ESMs and those of observational databases are not well correlated when the two are compared at fine grid scales. However, the specific key processes and factors, as well as the relationships among these factors that govern the SOC stock, remain unclear; the inclusion of such missing information would improve the agreement between modeled and observational data. In this study, we sought to identify the influential factors that govern global SOC distribution in observational databases, as well as those simulated by ESMs. We used a data-mining (machine-learning) (boosted regression trees - BRT) scheme to identify the factors affecting the SOC stock. We applied BRT scheme to three observational databases and 15 ESM outputs from the fifth phase of the Coupled Model Intercomparison Project (CMIP5) and examined the effects of 13 variables/factors categorized into five groups (climate, soil property, topography, vegetation, and land-use history). Globally, the contributions of mean annual temperature, clay content, carbon-to-nitrogen (CN) ratio, wetland ratio, and land cover were high in observational databases, whereas the contributions of the mean annual temperature, land cover, and net primary productivity (NPP) were predominant in the SOC distribution in ESMs. A comparison of the influential factors at a global scale revealed that the most distinct differences between the SOCs from the observational databases and ESMs were the low clay content and CN ratio contributions, and the high NPP contribution in the ESMs. The results of this study will aid in identifying the causes of the current mismatches between observational SOC databases and ESM outputs and improve the modeling of terrestrial carbon dynamics in ESMs. This study also reveals how a data-mining algorithm can be used to assess model outputs.
Querying databases of trajectories of differential equations: Data structures for trajectories
NASA Technical Reports Server (NTRS)
Grossman, Robert
1989-01-01
One approach to qualitative reasoning about dynamical systems is to extract qualitative information by searching or making queries on databases containing very large numbers of trajectories. The efficiency of such queries depends crucially upon finding an appropriate data structure for trajectories of dynamical systems. Suppose that a large number of parameterized trajectories gamma of a dynamical system evolving in R sup N are stored in a database. Let Eta is contained in set R sup N denote a parameterized path in Euclidean Space, and let the Euclidean Norm denote a norm on the space of paths. A data structure is defined to represent trajectories of dynamical systems, and an algorithm is sketched which answers queries.
Ice Accretion Test Results for Three Large-Scale Swept-Wing Models in the NASA Icing Research Tunnel
NASA Technical Reports Server (NTRS)
Broeren, Andy; Potapczuk, Mark; Lee, Sam; Malone, Adam; Paul, Ben; Woodard, Brian
2016-01-01
The design and certification of modern transport airplanes for flight in icing conditions increasing relies on three-dimensional numerical simulation tools for ice accretion prediction. There is currently no publically available, high-quality, ice accretion database upon which to evaluate the performance of icing simulation tools for large-scale swept wings that are representative of modern commercial transport airplanes. The purpose of this presentation is to present the results of a series of icing wind tunnel test campaigns whose aim was to provide an ice accretion database for large-scale, swept wings.
An ab initio electronic transport database for inorganic materials
Ricci, Francesco; Chen, Wei; Aydemir, Umut; ...
2017-07-04
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
Architectural Implications for Spatial Object Association Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, V S; Kurc, T; Saltz, J
2009-01-29
Spatial object association, also referred to as cross-match of spatial datasets, is the problem of identifying and comparing objects in two or more datasets based on their positions in a common spatial coordinate system. In this work, we evaluate two crossmatch algorithms that are used for astronomical sky surveys, on the following database system architecture configurations: (1) Netezza Performance Server R, a parallel database system with active disk style processing capabilities, (2) MySQL Cluster, a high-throughput network database system, and (3) a hybrid configuration consisting of a collection of independent database system instances with data replication support. Our evaluation providesmore » insights about how architectural characteristics of these systems affect the performance of the spatial crossmatch algorithms. We conducted our study using real use-case scenarios borrowed from a large-scale astronomy application known as the Large Synoptic Survey Telescope (LSST).« less
The future of medical diagnostics: large digitized databases.
Kerr, Wesley T; Lau, Edward P; Owens, Gwen E; Trefler, Aaron
2012-09-01
The electronic health record mandate within the American Recovery and Reinvestment Act of 2009 will have a far-reaching affect on medicine. In this article, we provide an in-depth analysis of how this mandate is expected to stimulate the production of large-scale, digitized databases of patient information. There is evidence to suggest that millions of patients and the National Institutes of Health will fully support the mining of such databases to better understand the process of diagnosing patients. This data mining likely will reaffirm and quantify known risk factors for many diagnoses. This quantification may be leveraged to further develop computer-aided diagnostic tools that weigh risk factors and provide decision support for health care providers. We expect that creation of these databases will stimulate the development of computer-aided diagnostic support tools that will become an integral part of modern medicine.
An ab initio electronic transport database for inorganic materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ricci, Francesco; Chen, Wei; Aydemir, Umut
Electronic transport in materials is governed by a series of tensorial properties such as conductivity, Seebeck coefficient, and effective mass. These quantities are paramount to the understanding of materials in many fields from thermoelectrics to electronics and photovoltaics. Transport properties can be calculated from a material’s band structure using the Boltzmann transport theory framework. We present here the largest computational database of electronic transport properties based on a large set of 48,000 materials originating from the Materials Project database. Our results were obtained through the interpolation approach developed in the BoltzTraP software, assuming a constant relaxation time. We present themore » workflow to generate the data, the data validation procedure, and the database structure. In conclusion, our aim is to target the large community of scientists developing materials selection strategies and performing studies involving transport properties.« less
Kaulard, Kathrin; Cunningham, Douglas W.; Bülthoff, Heinrich H.; Wallraven, Christian
2012-01-01
The ability to communicate is one of the core aspects of human life. For this, we use not only verbal but also nonverbal signals of remarkable complexity. Among the latter, facial expressions belong to the most important information channels. Despite the large variety of facial expressions we use in daily life, research on facial expressions has so far mostly focused on the emotional aspect. Consequently, most databases of facial expressions available to the research community also include only emotional expressions, neglecting the largely unexplored aspect of conversational expressions. To fill this gap, we present the MPI facial expression database, which contains a large variety of natural emotional and conversational expressions. The database contains 55 different facial expressions performed by 19 German participants. Expressions were elicited with the help of a method-acting protocol, which guarantees both well-defined and natural facial expressions. The method-acting protocol was based on every-day scenarios, which are used to define the necessary context information for each expression. All facial expressions are available in three repetitions, in two intensities, as well as from three different camera angles. A detailed frame annotation is provided, from which a dynamic and a static version of the database have been created. In addition to describing the database in detail, we also present the results of an experiment with two conditions that serve to validate the context scenarios as well as the naturalness and recognizability of the video sequences. Our results provide clear evidence that conversational expressions can be recognized surprisingly well from visual information alone. The MPI facial expression database will enable researchers from different research fields (including the perceptual and cognitive sciences, but also affective computing, as well as computer vision) to investigate the processing of a wider range of natural facial expressions. PMID:22438875
Biological Databases for Behavioral Neurobiology
Baker, Erich J.
2014-01-01
Databases are, at their core, abstractions of data and their intentionally derived relationships. They serve as a central organizing metaphor and repository, supporting or augmenting nearly all bioinformatics. Behavioral domains provide a unique stage for contemporary databases, as research in this area spans diverse data types, locations, and data relationships. This chapter provides foundational information on the diversity and prevalence of databases, how data structures support the various needs of behavioral neuroscience analysis and interpretation. The focus is on the classes of databases, data curation, and advanced applications in bioinformatics using examples largely drawn from research efforts in behavioral neuroscience. PMID:23195119
Roche, Nicolas; Reddel, Helen; Martin, Richard; Brusselle, Guy; Papi, Alberto; Thomas, Mike; Postma, Dirjke; Thomas, Vicky; Rand, Cynthia; Chisholm, Alison; Price, David
2014-02-01
Real-world research can use observational or clinical trial designs, in both cases putting emphasis on high external validity, to complement the classical efficacy randomized controlled trials (RCTs) with high internal validity. Real-world research is made necessary by the variety of factors that can play an important a role in modulating effectiveness in real life but are often tightly controlled in RCTs, such as comorbidities and concomitant treatments, adherence, inhalation technique, access to care, strength of doctor-caregiver communication, and socio-economic and other organizational factors. Real-world studies belong to two main categories: pragmatic trials and observational studies, which can be prospective or retrospective. Focusing on comparative database observational studies, the process aimed at ensuring high-quality research can be divided into three parts: preparation of research, analyses and reporting, and discussion of results. Key points include a priori planning of data collection and analyses, identification of appropriate database(s), proper outcomes definition, study registration with commitment to publish, bias minimization through matching and adjustment processes accounting for potential confounders, and sensitivity analyses testing the robustness of results. When these conditions are met, observational database studies can reach a sufficient level of evidence to help create guidelines (i.e., clinical and regulatory decision-making).
Kelly Elder; Don Cline; Angus Goodbody; Paul Houser; Glen E. Liston; Larry Mahrt; Nick Rutter
2009-01-01
A short-term meteorological database has been developed for the Cold Land Processes Experiment (CLPX). This database includes meteorological observations from stations designed and deployed exclusively for CLPXas well as observations available from other sources located in the small regional study area (SRSA) in north-central Colorado. The measured weather parameters...
Students as Ground Observers for Satellite Cloud Retrieval Validation
NASA Technical Reports Server (NTRS)
Chambers, Lin H.; Costulis, P. Kay; Young, David F.; Rogerson, Tina M.
2004-01-01
The Students' Cloud Observations On-Line (S'COOL) Project was initiated in 1997 to obtain student observations of clouds coinciding with the overpass of the Clouds and the Earth's Radiant Energy System (CERES) instruments on NASA's Earth Observing System satellites. Over the past seven years we have accumulated more than 9,000 cases worldwide where student observations are available within 15 minutes of a CERES observation. This paper reports on comparisons between the student and satellite data as one facet of the validation of the CERES cloud retrievals. Available comparisons include cloud cover, cloud height, cloud layering, and cloud visual opacity. The large volume of comparisons allows some assessment of the impact of surface cover, such as snow and ice, reported by the students. The S'COOL observation database, accessible via the Internet at http://scool.larc.nasa.gov, contains over 32,000 student observations and is growing by over 700 observations each month. Some of these observations may be useful for assessment of other satellite cloud products. In particular, some observing sites have been making hourly observations of clouds during the school day to learn about the diurnal cycle of cloudiness.
Fong, Mackenzie; Caterson, Ian D; Madigan, Claire D
2017-10-01
There are suggestions that large evening meals are associated with greater BMI. This study reviewed systematically the association between evening energy intake and weight in adults and aimed to determine whether reducing evening intake achieves weight loss. Databases searched were MEDLINE, PubMed, Cinahl, Web of Science, Cochrane Library of Clinical Trials, EMBASE and SCOPUS. Eligible observational studies investigated the relationship between BMI and evening energy intake. Eligible intervention trials compared weight change between groups where the proportion of evening intake was manipulated. Evening intake was defined as energy consumed during a certain time - for example 18.00-21.00 hours - or self-defined meal slots - that is 'dinner'. The search yielded 121 full texts that were reviewed for eligibility by two independent reviewers. In all, ten observational studies and eight clinical trials were included in the systematic review with four and five included in the meta-analyses, respectively. Four observational studies showed a positive association between large evening intake and BMI, five showed no association and one showed an inverse relationship. The meta-analysis of observational studies showed a non-significant trend between BMI and evening intake (P=0·06). The meta-analysis of intervention trials showed no difference in weight change between small and large dinner groups (-0·89 kg; 95 % CI -2·52, 0·75, P=0·29). This analysis was limited by significant heterogeneity, and many trials had an unknown or high risk of bias. Recommendations to reduce evening intake for weight loss cannot be substantiated by clinical evidence, and more well-controlled intervention trials are needed.
BioCarian: search engine for exploratory searches in heterogeneous biological databases.
Zaki, Nazar; Tennakoon, Chandana
2017-10-02
There are a large number of biological databases publicly available for scientists in the web. Also, there are many private databases generated in the course of research projects. These databases are in a wide variety of formats. Web standards have evolved in the recent times and semantic web technologies are now available to interconnect diverse and heterogeneous sources of data. Therefore, integration and querying of biological databases can be facilitated by techniques used in semantic web. Heterogeneous databases can be converted into Resource Description Format (RDF) and queried using SPARQL language. Searching for exact queries in these databases is trivial. However, exploratory searches need customized solutions, especially when multiple databases are involved. This process is cumbersome and time consuming for those without a sufficient background in computer science. In this context, a search engine facilitating exploratory searches of databases would be of great help to the scientific community. We present BioCarian, an efficient and user-friendly search engine for performing exploratory searches on biological databases. The search engine is an interface for SPARQL queries over RDF databases. We note that many of the databases can be converted to tabular form. We first convert the tabular databases to RDF. The search engine provides a graphical interface based on facets to explore the converted databases. The facet interface is more advanced than conventional facets. It allows complex queries to be constructed, and have additional features like ranking of facet values based on several criteria, visually indicating the relevance of a facet value and presenting the most important facet values when a large number of choices are available. For the advanced users, SPARQL queries can be run directly on the databases. Using this feature, users will be able to incorporate federated searches of SPARQL endpoints. We used the search engine to do an exploratory search on previously published viral integration data and were able to deduce the main conclusions of the original publication. BioCarian is accessible via http://www.biocarian.com . We have developed a search engine to explore RDF databases that can be used by both novice and advanced users.
A Study of the Efficiency of Spatial Indexing Methods Applied to Large Astronomical Databases
NASA Astrophysics Data System (ADS)
Donaldson, Tom; Berriman, G. Bruce; Good, John; Shiao, Bernie
2018-01-01
Spatial indexing of astronomical databases generally uses quadrature methods, which partition the sky into cells used to create an index (usually a B-tree) written as database column. We report the results of a study to compare the performance of two common indexing methods, HTM and HEALPix, on Solaris and Windows database servers installed with a PostgreSQL database, and a Windows Server installed with MS SQL Server. The indexing was applied to the 2MASS All-Sky Catalog and to the Hubble Source catalog. On each server, the study compared indexing performance by submitting 1 million queries at each index level with random sky positions and random cone search radius, which was computed on a logarithmic scale between 1 arcsec and 1 degree, and measuring the time to complete the query and write the output. These simulated queries, intended to model realistic use patterns, were run in a uniform way on many combinations of indexing method and indexing level. The query times in all simulations are strongly I/O-bound and are linear with number of records returned for large numbers of sources. There are, however, considerable differences between simulations, which reveal that hardware I/O throughput is a more important factor in managing the performance of a DBMS than the choice of indexing scheme. The choice of index itself is relatively unimportant: for comparable index levels, the performance is consistent within the scatter of the timings. At small index levels (large cells; e.g. level 4; cell size 3.7 deg), there is large scatter in the timings because of wide variations in the number of sources found in the cells. At larger index levels, performance improves and scatter decreases, but the improvement at level 8 (14 min) and higher is masked to some extent in the timing scatter caused by the range of query sizes. At very high levels (20; 0.0004 arsec), the granularity of the cells becomes so high that a large number of extraneous empty cells begin to degrade performance. Thus, for the use patterns studied here the database performance is not critically dependent on the exact choices of index or level.
CVD2014-A Database for Evaluating No-Reference Video Quality Assessment Algorithms.
Nuutinen, Mikko; Virtanen, Toni; Vaahteranoksa, Mikko; Vuori, Tero; Oittinen, Pirkko; Hakkinen, Jukka
2016-07-01
In this paper, we present a new video database: CVD2014-Camera Video Database. In contrast to previous video databases, this database uses real cameras rather than introducing distortions via post-processing, which results in a complex distortion space in regard to the video acquisition process. CVD2014 contains a total of 234 videos that are recorded using 78 different cameras. Moreover, this database contains the observer-specific quality evaluation scores rather than only providing mean opinion scores. We have also collected open-ended quality descriptions that are provided by the observers. These descriptions were used to define the quality dimensions for the videos in CVD2014. The dimensions included sharpness, graininess, color balance, darkness, and jerkiness. At the end of this paper, a performance study of image and video quality algorithms for predicting the subjective video quality is reported. For this performance study, we proposed a new performance measure that accounts for observer variance. The performance study revealed that there is room for improvement regarding the video quality assessment algorithms. The CVD2014 video database has been made publicly available for the research community. All video sequences and corresponding subjective ratings can be obtained from the CVD2014 project page (http://www.helsinki.fi/psychology/groups/visualcognition/).
Quantitative approach for optimizing e-beam condition of photoresist inspection and measurement
NASA Astrophysics Data System (ADS)
Lin, Chia-Jen; Teng, Chia-Hao; Cheng, Po-Chung; Sato, Yoshishige; Huang, Shang-Chieh; Chen, Chu-En; Maruyama, Kotaro; Yamazaki, Yuichiro
2018-03-01
Severe process margin in advanced technology node of semiconductor device is controlled by e-beam metrology system and e-beam inspection system with scanning electron microscopy (SEM) image. By using SEM, larger area image with higher image quality is required to collect massive amount of data for metrology and to detect defect in a large area for inspection. Although photoresist is the one of the critical process in semiconductor device manufacturing, observing photoresist pattern by SEM image is crucial and troublesome especially in the case of large image. The charging effect by e-beam irradiation on photoresist pattern causes deterioration of image quality, and it affect CD variation on metrology system and causes difficulties to continue defect inspection in a long time for a large area. In this study, we established a quantitative approach for optimizing e-beam condition with "Die to Database" algorithm of NGR3500 on photoresist pattern to minimize charging effect. And we enhanced the performance of measurement and inspection on photoresist pattern by using optimized e-beam condition. NGR3500 is the geometry verification system based on "Die to Database" algorithm which compares SEM image with design data [1]. By comparing SEM image and design data, key performance indicator (KPI) of SEM image such as "Sharpness", "S/N", "Gray level variation in FOV", "Image shift" can be retrieved. These KPIs were analyzed with different e-beam conditions which consist of "Landing Energy", "Probe Current", "Scanning Speed" and "Scanning Method", and the best e-beam condition could be achieved with maximum image quality, maximum scanning speed and minimum image shift. On this quantitative approach of optimizing e-beam condition, we could observe dependency of SEM condition on photoresist charging. By using optimized e-beam condition, measurement could be continued on photoresist pattern over 24 hours stably. KPIs of SEM image proved image quality during measurement and inspection was stabled enough.
Shibata, Natsumi; Kimura, Shinya; Hoshino, Takahiro; Takeuchi, Masato; Urushihara, Hisashi
2018-05-11
To date, few large-scale comparative effectiveness studies of influenza vaccination have been conducted in Japan, since marketing authorization for influenza vaccines in Japan has been granted based only on the results of seroconversion and safety in small-sized populations in clinical trial phases not on the vaccine effectiveness. We evaluated the clinical effectiveness of influenza vaccination for children aged 1-15 years in Japan throughout four influenza seasons from 2010 to 2014 in the real world setting. We conducted a cohort study using a large-scale claims database for employee health care insurance plans covering more than 3 million people, including enrollees and their dependents. Vaccination status was identified using plan records for the influenza vaccination subsidies. The effectiveness of influenza vaccination in preventing influenza and its complications was evaluated. To control confounding related to influenza vaccination, odds ratios (OR) were calculated by applying a doubly robust method using the propensity score for vaccination. Total study population throughout the four consecutive influenza seasons was over 116,000. Vaccination rate was higher in younger children and in the recent influenza seasons. Throughout the four seasons, the estimated ORs for influenza onset were statistically significant and ranged from 0.797 to 0.894 after doubly robust adjustment. On age stratification, significant ORs were observed in younger children. Additionally, ORs for influenza complication outcomes, such as pneumonia, hospitalization with influenza and respiratory tract diseases, were significantly reduced, except for hospitalization with influenza in the 2010/2011 and 2012/2013 seasons. We confirmed the clinical effectiveness of influenza vaccination in children aged 1-15 years from the 2010/2011 to 2013/2014 influenza seasons. Influenza vaccine significantly prevented the onset of influenza and was effective in reducing its secondary complications. Copyright © 2018 Elsevier Ltd. All rights reserved.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects
Buerstedde, Jean-Marie; Prill, Florian
2001-01-01
Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Chess databases as a research vehicle in psychology: Modeling large data.
Vaci, Nemanja; Bilalić, Merim
2017-08-01
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment
NASA Astrophysics Data System (ADS)
Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.
2017-11-01
The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
NASA Astrophysics Data System (ADS)
Aires, Filipe; Miolane, Léo; Prigent, Catherine; Pham Duc, Binh; Papa, Fabrice; Fluet-Chouinard, Etienne; Lehner, Bernhard
2017-04-01
The Global Inundation Extent from Multi-Satellites (GIEMS) provides multi-year monthly variations of the global surface water extent at 25kmx25km resolution. It is derived from multiple satellite observations. Its spatial resolution is usually compatible with climate model outputs and with global land surface model grids but is clearly not adequate for local applications that require the characterization of small individual water bodies. There is today a strong demand for high-resolution inundation extent datasets, for a large variety of applications such as water management, regional hydrological modeling, or for the analysis of mosquitos-related diseases. A new procedure is introduced to downscale the GIEMS low spatial resolution inundations to a 3 arc second (90 m) dataset. The methodology is based on topography and hydrography information from the HydroSHEDS database. A new floodability index is adopted and an innovative smoothing procedure is developed to ensure the smooth transition, in the high-resolution maps, between the low-resolution boxes from GIEMS. Topography information is relevant for natural hydrology environments controlled by elevation, but is more limited in human-modified basins. However, the proposed downscaling approach is compatible with forthcoming fusion with other more pertinent satellite information in these difficult regions. The resulting GIEMS-D3 database is the only high spatial resolution inundation database available globally at the monthly time scale over the 1993-2007 period. GIEMS-D3 is assessed by analyzing its spatial and temporal variability, and evaluated by comparisons to other independent satellite observations from visible (Google Earth and Landsat), infrared (MODIS) and active microwave (SAR).
Chousterman, Benjamin G; Pirracchio, Romain; Guidet, Bertrand; Aegerter, Philippe; Mentec, Hervé
2016-01-01
The impact of resident rotation on patient outcomes in the intensive care unit (ICU) has been poorly studied. The aim of this study was to address this question using a large ICU database. We retrospectively analyzed the French CUB-REA database. French residents rotate every six months. Two periods were compared: the first (POST) and fifth (PRE) months of the rotation. The primary endpoint was ICU mortality. The secondary endpoints were the length of ICU stay (LOS), the number of organ supports, and the duration of mechanical ventilation (DMV). The impact of resident rotation was explored using multivariate regression, classification tree and random forest models. 262,772 patients were included between 1996 and 2010 in the database. The patient characteristics were similar between the PRE (n = 44,431) and POST (n = 49,979) periods. Multivariate analysis did not reveal any impact of resident rotation on ICU mortality (OR = 1.01, 95% CI = 0.94; 1.07, p = 0.91). Based on the classification trees, the SAPS II and the number of organ failures were the strongest predictors of ICU mortality. In the less severe patients (SAPS II<24), the POST period was associated with increased mortality (OR = 1.65, 95%CI = 1.17-2.33, p = 0.004). After adjustment, no significant association was observed between the rotation period and the LOS, the number of organ supports, or the DMV. Resident rotation exerts no impact on overall ICU mortality at French teaching hospitals but might affect the prognosis of less severe ICU patients. Surveillance should be reinforced when treating those patients.
Ball-Damerow, Joan E.; Oboyski, Peter T.; Resh, Vincent H.
2015-01-01
Abstract The recently completed Odonata database for California consists of specimen records from the major entomology collections of the state, large Odonata collections outside of the state, previous literature, historical and recent field surveys, and from enthusiast group observations. The database includes 32,025 total records and 19,000 unique records for 106 species of dragonflies and damselflies, with records spanning 1879–2013. Records have been geographically referenced using the point-radius method to assign coordinates and an uncertainty radius to specimen locations. In addition to describing techniques used in data acquisition, georeferencing, and quality control, we present assessments of the temporal, spatial, and taxonomic distribution of records. We use this information to identify biases in the data, and to determine changes in species prevalence, latitudinal ranges, and elevation ranges when comparing records before 1976 and after 1979. The average latitude of where records occurred increased by 78 km over these time periods. While average elevation did not change significantly, the average minimum elevation across species declined by 108 m. Odonata distribution may be generally shifting northwards as temperature warms and to lower minimum elevations in response to increased summer water availability in low-elevation agricultural regions. The unexpected decline in elevation may also be partially the result of bias in recent collections towards centers of human population, which tend to occur at lower elevations. This study emphasizes the need to address temporal, spatial, and taxonomic biases in museum and observational records in order to produce reliable conclusions from such data. PMID:25709531
NASA Astrophysics Data System (ADS)
Barbieux, Marie; Uitz, Julia; Bricaud, Annick; Organelli, Emanuele; Poteau, Antoine; Schmechtig, Catherine; Gentili, Bernard; Obolensky, Grigor; Leymarie, Edouard; Penkerc'h, Christophe; D'Ortenzio, Fabrizio; Claustre, Hervé
2018-02-01
Characterizing phytoplankton distribution and dynamics in the world's open oceans requires in situ observations over a broad range of space and time scales. In addition to temperature/salinity measurements, Biogeochemical-Argo (BGC-Argo) profiling floats are capable of autonomously observing at high-frequency bio-optical properties such as the chlorophyll fluorescence, a proxy of the chlorophyll a concentration (Chla), the particulate backscattering coefficient (bbp), a proxy of the stock of particulate organic carbon, and the light available for photosynthesis. We analyzed an unprecedented BGC-Argo database of more than 8,500 multivariable profiles collected in various oceanic conditions, from subpolar waters to subtropical gyres. Our objective is to refine previously established Chla versus bbp relationships and gain insights into the sources of vertical, seasonal, and regional variability in this relationship. Despite some regional, seasonal and vertical variations, a general covariation occurs at a global scale. We distinguish two main contrasted situations: (1) concomitant changes in Chla and bbp that correspond to actual variations in phytoplankton biomass, e.g., in subpolar regimes; (2) a decoupling between the two variables attributed to photoacclimation or changes in the relative abundance of nonalgal particles, e.g., in subtropical regimes. The variability in the bbp:Chla ratio in the surface layer appears to be essentially influenced by the type of particles and by photoacclimation processes. The large BGC-Argo database helps identifying the spatial and temporal scales at which this ratio is predominantly driven by one or the other of these two factors.
Krischer, Jeffrey P; Gopal-Srivastava, Rashmi; Groft, Stephen C; Eckstein, David J
2014-08-01
Established in 2003 by the Office of Rare Diseases Research (ORDR), in collaboration with several National Institutes of Health (NIH) Institutes/Centers, the Rare Diseases Clinical Research Network (RDCRN) consists of multiple clinical consortia conducting research in more than 200 rare diseases. The RDCRN supports longitudinal or natural history, pilot, Phase I, II, and III, case-control, cross-sectional, chart review, physician survey, bio-repository, and RDCRN Contact Registry (CR) studies. To date, there have been 24,684 participants enrolled on 120 studies from 446 sites worldwide. An additional 11,533 individuals participate in the CR. Through a central data management and coordinating center (DMCC), the RDCRN's platform for the conduct of observational research encompasses electronic case report forms, federated databases, and an online CR for epidemiological and survey research. An ORDR-governed data repository (through dbGaP, a database for genotype and phenotype information from the National Library of Medicine) has been created. DMCC coordinates with ORDR to register and upload study data to dbGaP for data sharing with the scientific community. The platform provided by the RDCRN DMCC has supported 128 studies, six of which were successfully conducted through the online CR, with 2,352 individuals accrued and a median enrollment time of just 2 months. The RDCRN has built a powerful suite of web-based tools that provide for integration of federated and online database support that can accommodate a large number of rare diseases on a global scale. RDCRN studies have made important advances in the diagnosis and treatment of rare diseases.
NASA Astrophysics Data System (ADS)
Garcia Menendez, F.; Afrin, S.
2017-12-01
Prescribed fires are used extensively across the Southeastern United States and are a major source of air pollutant emissions in the region. These land management projects can adversely impact local and regional air quality. However, the emissions and air pollution impacts of prescribed fires remain largely uncertain. Satellite data, commonly used to estimate fire emissions, is often unable to detect the low-intensity, short-lived prescribed fires characteristic of the region. Additionally, existing ground-based prescribed burn records are incomplete, inconsistent and scattered. Here we present a new unified database of prescribed fire occurrence and characteristics developed from systemized digital burn permit records collected from public and private land management organizations in the Southeast. This bottom-up fire database is used to analyze the correlation between high PM2.5 concentrations measured by monitoring networks in southern states and prescribed fire occurrence at varying spatial and temporal scales. We show significant associations between ground-based records of prescribed fire activity and the observational air quality record at numerous sites by applying regression analysis and controlling confounding effects of meteorology. Furthermore, we demonstrate that the response of measured PM2.5 concentrations to prescribed fire estimates based on burning permits is significantly stronger than their response to satellite fire observations from MODIS (moderate-resolution imaging spectroradiometer) and geostationary satellites or prescribed fire emissions data in the National Emissions Inventory. These results show the importance of bottom-up smoke emissions estimates and reflect the need for improved ground-based fire data to advance air quality impacts assessments focused on prescribed burning.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
2016-09-28
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
Intermediate Palomar Transient Factory: Realtime Image Subtraction Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cao, Yi; Nugent, Peter E.; Kasliwal, Mansi M.
A fast-turnaround pipeline for realtime data reduction plays an essential role in discovering and permitting followup observations to young supernovae and fast-evolving transients in modern time-domain surveys. In this paper, we present the realtime image subtraction pipeline in the intermediate Palomar Transient Factory. By using highperformance computing, efficient databases, and machine-learning algorithms, this pipeline manages to reliably deliver transient candidates within 10 minutes of images being taken. Our experience in using high-performance computing resources to process big data in astronomy serves as a trailblazer to dealing with data from large-scale time-domain facilities in the near future.
NASA Astrophysics Data System (ADS)
Hadamcik, E.; Rrenard, J.; Levasseur-Regourd, A. C.; Worms, J. C.
Polarimetric phase curves were obtained with the PROGRA2 instrument for different particles: glass beads, polyhedral solids, rough particles, dense aggregates and aggregates with porosity higher than 90 %. The main purpose of these measurements is to build a large database, which allows interpreting remote sensing observations of solar system bodies. For some samples numerical or experimental models (i.e. DDA, stochastically built particles, microwave analogue) and laboratory experiments are compared to better disentangle the involved physical properties. This paper gives some main results of the experiment, and their applications to Earth atmosphere, comets and asteroids.
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, N.; Sellis, Timos
1992-01-01
One of biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental database access method, VIEWCACHE, provides such an interface for accessing distributed data sets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image data sets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate distributed database search.
Database systems for knowledge-based discovery.
Jagarlapudi, Sarma A R P; Kishan, K V Radha
2009-01-01
Several database systems have been developed to provide valuable information from the bench chemist to biologist, medical practitioner to pharmaceutical scientist in a structured format. The advent of information technology and computational power enhanced the ability to access large volumes of data in the form of a database where one could do compilation, searching, archiving, analysis, and finally knowledge derivation. Although, data are of variable types the tools used for database creation, searching and retrieval are similar. GVK BIO has been developing databases from publicly available scientific literature in specific areas like medicinal chemistry, clinical research, and mechanism-based toxicity so that the structured databases containing vast data could be used in several areas of research. These databases were classified as reference centric or compound centric depending on the way the database systems were designed. Integration of these databases with knowledge derivation tools would enhance the value of these systems toward better drug design and discovery.
ERIC Educational Resources Information Center
Freeman, Carla; And Others
In order to understand how the database software or online database functioned in the overall curricula, the use of database management (DBMs) systems was studied at eight elementary and middle schools through classroom observation and interviews with teachers and administrators, librarians, and students. Three overall areas were addressed:…
Changing climate shifts timing of European floods.
Blöschl, Günter; Hall, Julia; Parajka, Juraj; Perdigão, Rui A P; Merz, Bruno; Arheimer, Berit; Aronica, Giuseppe T; Bilibashi, Ardian; Bonacci, Ognjen; Borga, Marco; Čanjevac, Ivan; Castellarin, Attilio; Chirico, Giovanni B; Claps, Pierluigi; Fiala, Károly; Frolova, Natalia; Gorbachova, Liudmyla; Gül, Ali; Hannaford, Jamie; Harrigan, Shaun; Kireeva, Maria; Kiss, Andrea; Kjeldsen, Thomas R; Kohnová, Silvia; Koskela, Jarkko J; Ledvinka, Ondrej; Macdonald, Neil; Mavrova-Guirguinova, Maria; Mediero, Luis; Merz, Ralf; Molnar, Peter; Montanari, Alberto; Murphy, Conor; Osuch, Marzena; Ovcharuk, Valeryia; Radevski, Ivan; Rogger, Magdalena; Salinas, José L; Sauquet, Eric; Šraj, Mojca; Szolgay, Jan; Viglione, Alberto; Volpi, Elena; Wilson, Donna; Zaimi, Klodian; Živković, Nenad
2017-08-11
A warming climate is expected to have an impact on the magnitude and timing of river floods; however, no consistent large-scale climate change signal in observed flood magnitudes has been identified so far. We analyzed the timing of river floods in Europe over the past five decades, using a pan-European database from 4262 observational hydrometric stations, and found clear patterns of change in flood timing. Warmer temperatures have led to earlier spring snowmelt floods throughout northeastern Europe; delayed winter storms associated with polar warming have led to later winter floods around the North Sea and some sectors of the Mediterranean coast; and earlier soil moisture maxima have led to earlier winter floods in western Europe. Our results highlight the existence of a clear climate signal in flood observations at the continental scale. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
On the Photometric Calibration of FORS2 and the Sloan Digital Sky Survey
NASA Astrophysics Data System (ADS)
Bramich, D.; Moehler, S.; Coccato, L.; Freudling, W.; Garcia-Dabó, C. E.; Müller, P.; Saviane, I.
2012-09-01
An accurate absolute calibration of photometric data to place them on a standard magnitude scale is very important for many science goals. Absolute calibration requires the observation of photometric standard stars and analysis of the observations with an appropriate photometric model including all relevant effects. In the FORS Absolute Photometry (FAP) project, we have developed a standard star observing strategy and modelling procedure that enables calibration of science target photometry to better than 3% accuracy on photometrically stable nights given sufficient signal-to-noise. In the application of this photometric modelling to large photometric databases, we have investigated the Sloan Digital Sky Survey (SDSS) and found systematic trends in the published photometric data. The amplitudes of these trends are similar to the reported typical precision (˜1% and ˜2%) of the SDSS photometry in the griz- and u-bands, respectively.
Verberkmoes, Nathan C; Hervey, W Judson; Shah, Manesh; Land, Miriam; Hauser, Loren; Larimer, Frank W; Van Berkel, Gary J; Goeringer, Douglas E
2005-02-01
There is currently a great need for rapid detection and positive identification of biological threat agents, as well as microbial species in general, directly from complex environmental samples. This need is most urgent in the area of homeland security, but also extends into medical, environmental, and agricultural sciences. Mass-spectrometry-based analysis is one of the leading technologies in the field with a diversity of different methodologies for biothreat detection. Over the past few years, "shotgun"proteomics has become one method of choice for the rapid analysis of complex protein mixtures by mass spectrometry. Recently, it was demonstrated that this methodology is capable of distinguishing a target species against a large database of background species from a single-component sample or dual-component mixtures with relatively the same concentration. Here, we examine the potential of shotgun proteomics to analyze a target species in a background of four contaminant species. We tested the capability of a common commercial mass-spectrometry-based shotgun proteomics platform for the detection of the target species (Escherichia coli) at four different concentrations and four different time points of analysis. We also tested the effect of database size on positive identification of the four microbes used in this study by testing a small (13-species) database and a large (261-species) database. The results clearly indicated that this technology could easily identify the target species at 20% in the background mixture at a 60, 120, 180, or 240 min analysis time with the small database. The results also indicated that the target species could easily be identified at 20% or 6% but could not be identified at 0.6% or 0.06% in either a 240 min analysis or a 30 h analysis with the small database. The effects of the large database were severe on the target species where detection above the background at any concentration used in this study was impossible, though the three other microbes used in this study were clearly identified above the background when analyzed with the large database. This study points to the potential application of this technology for biological threat agent detection but highlights many areas of needed research before the technology will be useful in real world samples.
Generating Shifting Workloads to Benchmark Adaptability in Relational Database Systems
NASA Astrophysics Data System (ADS)
Rabl, Tilmann; Lang, Andreas; Hackl, Thomas; Sick, Bernhard; Kosch, Harald
A large body of research concerns the adaptability of database systems. Many commercial systems already contain autonomic processes that adapt configurations as well as data structures and data organization. Yet there is virtually no possibility for a just measurement of the quality of such optimizations. While standard benchmarks have been developed that simulate real-world database applications very precisely, none of them considers variations in workloads produced by human factors. Today’s benchmarks test the performance of database systems by measuring peak performance on homogeneous request streams. Nevertheless, in systems with user interaction access patterns are constantly shifting. We present a benchmark that simulates a web information system with interaction of large user groups. It is based on the analysis of a real online eLearning management system with 15,000 users. The benchmark considers the temporal dependency of user interaction. Main focus is to measure the adaptability of a database management system according to shifting workloads. We will give details on our design approach that uses sophisticated pattern analysis and data mining techniques.
Wang, Penghao; Wilson, Susan R
2013-01-01
Mass spectrometry-based protein identification is a very challenging task. The main identification approaches include de novo sequencing and database searching. Both approaches have shortcomings, so an integrative approach has been developed. The integrative approach firstly infers partial peptide sequences, known as tags, directly from tandem spectra through de novo sequencing, and then puts these sequences into a database search to see if a close peptide match can be found. However the current implementation of this integrative approach has several limitations. Firstly, simplistic de novo sequencing is applied and only very short sequence tags are used. Secondly, most integrative methods apply an algorithm similar to BLAST to search for exact sequence matches and do not accommodate sequence errors well. Thirdly, by applying these methods the integrated de novo sequencing makes a limited contribution to the scoring model which is still largely based on database searching. We have developed a new integrative protein identification method which can integrate de novo sequencing more efficiently into database searching. Evaluated on large real datasets, our method outperforms popular identification methods.
2009-01-01
Background Insertional mutagenesis is an effective method for functional genomic studies in various organisms. It can rapidly generate easily tractable mutations. A large-scale insertional mutagenesis with the piggyBac (PB) transposon is currently performed in mice at the Institute of Developmental Biology and Molecular Medicine (IDM), Fudan University in Shanghai, China. This project is carried out via collaborations among multiple groups overseeing interconnected experimental steps and generates a large volume of experimental data continuously. Therefore, the project calls for an efficient database system for recording, management, statistical analysis, and information exchange. Results This paper presents a database application called MP-PBmice (insertional mutation mapping system of PB Mutagenesis Information Center), which is developed to serve the on-going large-scale PB insertional mutagenesis project. A lightweight enterprise-level development framework Struts-Spring-Hibernate is used here to ensure constructive and flexible support to the application. The MP-PBmice database system has three major features: strict access-control, efficient workflow control, and good expandability. It supports the collaboration among different groups that enter data and exchange information on daily basis, and is capable of providing real time progress reports for the whole project. MP-PBmice can be easily adapted for other large-scale insertional mutation mapping projects and the source code of this software is freely available at http://www.idmshanghai.cn/PBmice. Conclusion MP-PBmice is a web-based application for large-scale insertional mutation mapping onto the mouse genome, implemented with the widely used framework Struts-Spring-Hibernate. This system is already in use by the on-going genome-wide PB insertional mutation mapping project at IDM, Fudan University. PMID:19958505
Searching mixed DNA profiles directly against profile databases.
Bright, Jo-Anne; Taylor, Duncan; Curran, James; Buckleton, John
2014-03-01
DNA databases have revolutionised forensic science. They are a powerful investigative tool as they have the potential to identify persons of interest in criminal investigations. Routinely, a DNA profile generated from a crime sample could only be searched for in a database of individuals if the stain was from single contributor (single source) or if a contributor could unambiguously be determined from a mixed DNA profile. This meant that a significant number of samples were unsuitable for database searching. The advent of continuous methods for the interpretation of DNA profiles offers an advanced way to draw inferential power from the considerable investment made in DNA databases. Using these methods, each profile on the database may be considered a possible contributor to a mixture and a likelihood ratio (LR) can be formed. Those profiles which produce a sufficiently large LR can serve as an investigative lead. In this paper empirical studies are described to determine what constitutes a large LR. We investigate the effect on a database search of complex mixed DNA profiles with contributors in equal proportions with dropout as a consideration, and also the effect of an incorrect assignment of the number of contributors to a profile. In addition, we give, as a demonstration of the method, the results using two crime samples that were previously unsuitable for database comparison. We show that effective management of the selection of samples for searching and the interpretation of the output can be highly informative. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
A Multi-Purpose Data Dissemination Infrastructure for the Marine-Earth Observations
NASA Astrophysics Data System (ADS)
Hanafusa, Y.; Saito, H.; Kayo, M.; Suzuki, H.
2015-12-01
To open the data from a variety of observations, the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has developed a multi-purpose data dissemination infrastructure. Although many observations have been made in the earth science, all the data are not opened completely. We think data centers may provide researchers with a universal data dissemination service which can handle various kinds of observation data with little effort. For this purpose JAMSTEC Data Management Office has developed the "Information Catalog Infrastructure System (Catalog System)". This is a kind of catalog management system which can create, renew and delete catalogs (= databases) and has following features, - The Catalog System does not depend on data types or granularity of data records. - By registering a new metadata schema to the system, a new database can be created on the same system without sytem modification. - As web pages are defined by the cascading style sheets, databases have different look and feel, and operability. - The Catalog System provides databases with basic search tools; search by text, selection from a category tree, and selection from a time line chart. - For domestic users it creates the Japanese and English pages at the same time and has dictionary to control terminology and proper noun. As of August 2015 JAMSTEC operates 7 databases on the Catalog System. We expect to transfer existing databases to this system, or create new databases on it. In comparison with a dedicated database developed for the specific dataset, the Catalog System is suitable for the dissemination of small datasets, with minimum cost. Metadata held in the catalogs may be transfered to other metadata schema to exchange global databases or portals. Examples: JAMSTEC Data Catalog: http://www.godac.jamstec.go.jp/catalog/data_catalog/metadataList?lang=enJAMSTEC Document Catalog: http://www.godac.jamstec.go.jp/catalog/doc_catalog/metadataList?lang=en&tab=categoryResearch Information and Data Access Site of TEAMS: http://www.i-teams.jp/catalog/rias/metadataList?lang=en&tab=list
National Institute of Standards and Technology Data Gateway
SRD 115 Hydrocarbon Spectral Database (Web, free access) All of the rotational spectral lines observed and reported in the open literature for 91 hydrocarbon molecules have been tabulated. The isotopic molecular species, assigned quantum numbers, observed frequency, estimated measurement uncertainty and reference are given for each transition reported.
National Institute of Standards and Technology Data Gateway
SRD 114 Diatomic Spectral Database (Web, free access) All of the rotational spectral lines observed and reported in the open literature for 121 diatomic molecules have been tabulated. The isotopic molecular species, assigned quantum numbers, observed frequency, estimated measurement uncertainty, and reference are given for each transition reported.
National Institute of Standards and Technology Data Gateway
SRD 117 Triatomic Spectral Database (Web, free access) All of the rotational spectral lines observed and reported in the open literature for 55 triatomic molecules have been tabulated. The isotopic molecular species, assigned quantum numbers, observed frequency, estimated measurement uncertainty and reference are given for each transition reported.
Impact of derived global weather data on simulated crop yields
van Wart, Justin; Grassini, Patricio; Cassman, Kenneth G
2013-01-01
Crop simulation models can be used to estimate impact of current and future climates on crop yields and food security, but require long-term historical daily weather data to obtain robust simulations. In many regions where crops are grown, daily weather data are not available. Alternatively, gridded weather databases (GWD) with complete terrestrial coverage are available, typically derived from: (i) global circulation computer models; (ii) interpolated weather station data; or (iii) remotely sensed surface data from satellites. The present study's objective is to evaluate capacity of GWDs to simulate crop yield potential (Yp) or water-limited yield potential (Yw), which can serve as benchmarks to assess impact of climate change scenarios on crop productivity and land use change. Three GWDs (CRU, NCEP/DOE, and NASA POWER data) were evaluated for their ability to simulate Yp and Yw of rice in China, USA maize, and wheat in Germany. Simulations of Yp and Yw based on recorded daily data from well-maintained weather stations were taken as the control weather data (CWD). Agreement between simulations of Yp or Yw based on CWD and those based on GWD was poor with the latter having strong bias and large root mean square errors (RMSEs) that were 26–72% of absolute mean yield across locations and years. In contrast, simulated Yp or Yw using observed daily weather data from stations in the NOAA database combined with solar radiation from the NASA-POWER database were in much better agreement with Yp and Yw simulated with CWD (i.e. little bias and an RMSE of 12–19% of the absolute mean). We conclude that results from studies that rely on GWD to simulate agricultural productivity in current and future climates are highly uncertain. An alternative approach would impose a climate scenario on location-specific observed daily weather databases combined with an appropriate upscaling method. PMID:23801639
Impact of derived global weather data on simulated crop yields.
van Wart, Justin; Grassini, Patricio; Cassman, Kenneth G
2013-12-01
Crop simulation models can be used to estimate impact of current and future climates on crop yields and food security, but require long-term historical daily weather data to obtain robust simulations. In many regions where crops are grown, daily weather data are not available. Alternatively, gridded weather databases (GWD) with complete terrestrial coverage are available, typically derived from: (i) global circulation computer models; (ii) interpolated weather station data; or (iii) remotely sensed surface data from satellites. The present study's objective is to evaluate capacity of GWDs to simulate crop yield potential (Yp) or water-limited yield potential (Yw), which can serve as benchmarks to assess impact of climate change scenarios on crop productivity and land use change. Three GWDs (CRU, NCEP/DOE, and NASA POWER data) were evaluated for their ability to simulate Yp and Yw of rice in China, USA maize, and wheat in Germany. Simulations of Yp and Yw based on recorded daily data from well-maintained weather stations were taken as the control weather data (CWD). Agreement between simulations of Yp or Yw based on CWD and those based on GWD was poor with the latter having strong bias and large root mean square errors (RMSEs) that were 26-72% of absolute mean yield across locations and years. In contrast, simulated Yp or Yw using observed daily weather data from stations in the NOAA database combined with solar radiation from the NASA-POWER database were in much better agreement with Yp and Yw simulated with CWD (i.e. little bias and an RMSE of 12-19% of the absolute mean). We conclude that results from studies that rely on GWD to simulate agricultural productivity in current and future climates are highly uncertain. An alternative approach would impose a climate scenario on location-specific observed daily weather databases combined with an appropriate upscaling method. © 2013 John Wiley & Sons Ltd.
The dynamics of innovation through the expansion in the adjacent possible
NASA Astrophysics Data System (ADS)
Tria, F.
2016-03-01
The experience of something new is part of our daily life. At different scales, innovation is also a crucial feature of many biological, technological and social systems. Recently, large databases witnessing human activities allowed the observation that novelties -such as the individual process of listening a song for the first time- and innovation processes -such as the fixation of new genes in a population of bacteria- share striking statistical regularities. We here indicate the expansion into the adjacent possible as a very general and powerful mechanism able to explain such regularities. Further, we will identify statistical signatures of the presence of the expansion into the adjacent possible in the analyzed datasets, and we will show that our modeling scheme is able to predict remarkably well these observations.
Practice databases and their uses in clinical research.
Tierney, W M; McDonald, C J
1991-04-01
A few large clinical information databases have been established within larger medical information systems. Although they are smaller than claims databases, these clinical databases offer several advantages: accurate and timely data, rich clinical detail, and continuous parameters (for example, vital signs and laboratory results). However, the nature of the data vary considerably, which affects the kinds of secondary analyses that can be performed. These databases have been used to investigate clinical epidemiology, risk assessment, post-marketing surveillance of drugs, practice variation, resource use, quality assurance, and decision analysis. In addition, practice databases can be used to identify subjects for prospective studies. Further methodologic developments are necessary to deal with the prevalent problems of missing data and various forms of bias if such databases are to grow and contribute valuable clinical information.
Makhoul, Issam; Yacoub, Abdulraheem; Siegel, Eric
2016-01-01
The etiology of pancreatic cancer remains elusive. Several studies have suggested a role for diabetes mellitus, but the magnitude of its contribution remains controversial. Utilizing a large administrative database, this retrospective cohort study was designed to investigate the relationship between type 2 diabetes mellitus and pancreatic cancer. Using the Veterans Integrated Services Network 16 database, 322,614 subjects were enrolled in the study, including 110,919 with type 2 diabetes mellitus and 211,695 diabetes-free controls matched by gender, year of birth and healthcare facility. A significantly higher incidence of pancreatic cancer was observed in patients with type 2 diabetes mellitus, with an adjusted hazard ratio (95% confidence interval) of 2.17 (1.70-2.77) for type 2 diabetes mellitus compared to controls (p < 10 -9 ) after controlling for the matching factors. The association between type 2 diabetes mellitus and pancreatic cancer was statistically significant and may, in part, explain the rising incidence of pancreatic cancer.
Interaction of birth order, handedness, and sexual orientation in the Kinsey interview data.
Bogaert, Anthony F; Blanchard, Ray; Crosthwait, Lesley E
2007-10-01
Recent evidence indicates that 2 of the most consistently observed correlates of men's sexual orientation--handedness and older brothers--may be linked interactively in their prediction of men's sexual orientation. In this article, the authors studied the relationship among handedness, older brothers, and men's sexual orientation in the large and historically significant database originally compiled by Alfred C. Kinsey and his colleagues (A. C. Kinsey, W. B. Pomeroy, & C. E. Martin, 1948). The results demonstrated that handedness moderates the relationship between older brothers and sexual orientation. Specifically, older brothers increased the odds of homosexuality in right-handers only; in non-righthanders, older brothers did not affect the odds of homosexuality. These results refine the possible biological explanations reported to underlie both the handedness and older brother relationships to men's sexual orientation. These results also suggest that biological explanations of men's sexual orientation are likely relevant across time, as the Kinsey data comprise an older cohort relative to modern samples. (PsycINFO Database Record (c) 2007 APA, all rights reserved).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lees, R. M.; Xu, Li-Hong; Appadoo, D. R. T.
New astronomical facilities such as HIFI on the Herschel Space Observatory, the SOFIA airborne IR telescope and the ALMA sub-mm telescope array will yield spectra from interstellar and protostellar sources with vastly increased sensitivity and frequency coverage. This creates the need for major enhancements to laboratory databases for the more prominent interstellar 'weed' species in order to model and account for their lines in observed spectra in the search for new and more exotic interstellar molecular 'flowers'. With its large-amplitude internal torsional motion, methanol has particularly rich spectra throughout the FIR and IR regions and, being very widely distributed throughoutmore » the galaxy, is perhaps the most notorious interstellar weed. Thus, we have recorded new spectra for a variety of methanol isotopic species on the high-resolution FTIR spectrometer on the CLS FIR beamline. The aim is to extend quantum number coverage of the data, improve our understanding of the energy level structure, and provide the astronomical community with better databases and models of the spectral patterns with greater predictive power for a range of astrophysical conditions.« less
An Improved Algorithm to Generate a Wi-Fi Fingerprint Database for Indoor Positioning
Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi
2013-01-01
The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase. PMID:23966197
Phagonaute: A web-based interface for phage synteny browsing and protein function prediction.
Delattre, Hadrien; Souiai, Oussema; Fagoonee, Khema; Guerois, Raphaël; Petit, Marie-Agnès
2016-09-01
Distant homology search tools are of great help to predict viral protein functions. However, due to the lack of profile databases dedicated to viruses, they can lack sensitivity. We constructed HMM profiles for more than 80,000 proteins from both phages and archaeal viruses, and performed all pairwise comparisons with HHsearch program. The whole resulting database can be explored through a user-friendly "Phagonaute" interface to help predict functions. Results are displayed together with their genetic context, to strengthen inferences based on remote homology. Beyond function prediction, this tool permits detections of co-occurrences, often indicative of proteins completing a task together, and observation of conserved patterns across large evolutionary distances. As a test, Herpes simplex virus I was added to Phagonaute, and 25% of its proteome matched to bacterial or archaeal viral protein counterparts. Phagonaute should therefore help virologists in their quest for protein functions and evolutionary relationships. Copyright © 2016 Elsevier Inc. All rights reserved.
An improved algorithm to generate a Wi-Fi fingerprint database for indoor positioning.
Chen, Lina; Li, Binghao; Zhao, Kai; Rizos, Chris; Zheng, Zhengqi
2013-08-21
The major problem of Wi-Fi fingerprint-based positioning technology is the signal strength fingerprint database creation and maintenance. The significant temporal variation of received signal strength (RSS) is the main factor responsible for the positioning error. A probabilistic approach can be used, but the RSS distribution is required. The Gaussian distribution or an empirically-derived distribution (histogram) is typically used. However, these distributions are either not always correct or require a large amount of data for each reference point. Double peaks of the RSS distribution have been observed in experiments at some reference points. In this paper a new algorithm based on an improved double-peak Gaussian distribution is proposed. Kurtosis testing is used to decide if this new distribution, or the normal Gaussian distribution, should be applied. Test results show that the proposed algorithm can significantly improve the positioning accuracy, as well as reduce the workload of the off-line data training phase.
Development of the Global Earthquake Model’s neotectonic fault database
Christophersen, Annemarie; Litchfield, Nicola; Berryman, Kelvin; Thomas, Richard; Basili, Roberto; Wallace, Laura; Ries, William; Hayes, Gavin P.; Haller, Kathleen M.; Yoshioka, Toshikazu; Koehler, Richard D.; Clark, Dan; Wolfson-Schwehr, Monica; Boettcher, Margaret S.; Villamor, Pilar; Horspool, Nick; Ornthammarath, Teraphan; Zuñiga, Ramon; Langridge, Robert M.; Stirling, Mark W.; Goded, Tatiana; Costa, Carlos; Yeats, Robert
2015-01-01
The Global Earthquake Model (GEM) aims to develop uniform, openly available, standards, datasets and tools for worldwide seismic risk assessment through global collaboration, transparent communication and adapting state-of-the-art science. GEM Faulted Earth (GFE) is one of GEM’s global hazard module projects. This paper describes GFE’s development of a modern neotectonic fault database and a unique graphical interface for the compilation of new fault data. A key design principle is that of an electronic field notebook for capturing observations a geologist would make about a fault. The database is designed to accommodate abundant as well as sparse fault observations. It features two layers, one for capturing neotectonic faults and fold observations, and the other to calculate potential earthquake fault sources from the observations. In order to test the flexibility of the database structure and to start a global compilation, five preexisting databases have been uploaded to the first layer and two to the second. In addition, the GFE project has characterised the world’s approximately 55,000 km of subduction interfaces in a globally consistent manner as a basis for generating earthquake event sets for inclusion in earthquake hazard and risk modelling. Following the subduction interface fault schema and including the trace attributes of the GFE database schema, the 2500-km-long frontal thrust fault system of the Himalaya has also been characterised. We propose the database structure to be used widely, so that neotectonic fault data can make a more complete and beneficial contribution to seismic hazard and risk characterisation globally.
DynAstVO : a Europlanet database of NEA orbits
NASA Astrophysics Data System (ADS)
Desmars, J.; Thuillot, W.; Hestroffer, D.; David, P.; Le Sidaner, P.
2017-09-01
DynAstVO is a new orbital database developed within the Europlanet 2020 RI and the Virtual European Solar and Planetary Access (VESPA) frameworks. The database is dedicated to Near-Earth asteroids and provide parameters related to orbits: osculating elements, observational information, ephemeris through SPICE kernel, and in particular, orbit uncertainty and associated covariance matrix. DynAstVO is daily updated on a automatic process of orbit determination on the basis of the Minor Planet Electronic Circulars that reports new observations or the discover of a new asteroid. This database conforms to EPN-TAP environment and is accessible through VO protocols and on the VESPA portal web access (http://vespa.obspm.fr/). A comparison with other classical databases such as Astorb, MPCORB, NEODyS and JPL is also presented.
Attenuation relation for strong motion in Eastern Java based on appropriate database and method
NASA Astrophysics Data System (ADS)
Mahendra, Rian; Rohadi, Supriyanto; Rudyanto, Ariska
2017-07-01
The selection and determination of attenuation relation has become important for seismic hazard assessment in active seismic region. This research initially constructs the appropriate strong motion database, including site condition and type of the earthquake. The data set consisted of large number earthquakes of 5 ≤ Mw ≤ 9 and distance less than 500 km that occurred around Java from 2009 until 2016. The location and depth of earthquake are being relocated using double difference method to improve the quality of database. Strong motion data from twelve BMKG's accelerographs which are located in east Java is used. The site condition is known by using dominant period and Vs30. The type of earthquake is classified into crustal earthquake, interface, and intraslab based on slab geometry analysis. A total of 10 Ground Motion Prediction Equations (GMPEs) are tested using Likelihood (Scherbaum et al., 2004) and Euclidean Distance Ranking method (Kale and Akkar, 2012) with the associated database. The evaluation of these methods lead to a set of GMPEs that can be applied for seismic hazard in East Java where the strong motion data is collected. The result of these methods found that there is still high deviation of GMPEs, so the writer modified some GMPEs using inversion method. Validation was performed by analysing the attenuation curve of the selected GMPE and observation data in period 2015 up to 2016. The results show that the selected GMPE is suitable for estimated PGA value in East Java.
Public Release of Pan-STARRS Data
NASA Astrophysics Data System (ADS)
Flewelling, Heather; Consortium, panstarrs
2015-08-01
Pan-STARRS 1 is a 1.8 meter survey telescope, located on Haleakala, Hawaii, with a 1.4 Gigapixel camera, a 7 square degree field of view, and 5 filters (g,r,i,z,y). The public release of data, which is available to everyone, consists of 4 years of data taken between May 2010 and April 2014. Two of the surveys available in the public release are the 3pi survey and the Medium Deep (MD) survey. The 3pi survey has roughly 60 epochs (12 per filter) covering 3/4 of the sky and everything north of -30 degrees declination. The MD survey consists of 10 fields, observed in a couple of filters each night, usually 8 exposures per filter per field, for about 4000 epochs per MD field. The available data product are accessed through the “Postage Stamp Server” and through the Published Science Products Subsystem (PSPS), both of these are available through the Pan-STARRS Science Interface (PSI). The Postage Stamp Server provides images and catalogs for different stages of processing on single exposures, stack images, difference images, and forced photometry. The PSPS is a SQLServer database that can be queried via script or web interface, with a database for each MD field and a large database for the 3pi survey. This database has relative photometry and astrometry and object associations, making it easy to do searches across the entire sky as well as tools to generate lightcurves of individual objects as a function of time.
Current databases on biological variation: pros, cons and progress.
Ricós, C; Alvarez, V; Cava, F; García-Lario, J V; Hernández, A; Jiménez, C V; Minchinela, J; Perich, C; Simón, M
1999-11-01
A database with reliable information to derive definitive analytical quality specifications for a large number of clinical laboratory tests was prepared in this work. This was achieved by comparing and correlating descriptive data and relevant observations with the biological variation information, an approach that had not been used in the previous efforts of this type. The material compiled in the database was obtained from published articles referenced in BIOS, CURRENT CONTENTS, EMBASE and MEDLINE using "biological variation & laboratory medicine" as key words, as well as books and doctoral theses provided by their authors. The database covers 316 quantities and reviews 191 articles, fewer than 10 of which had to be rejected. The within- and between-subject coefficients of variation and the subsequent desirable quality specifications for precision, bias and total error for all the quantities accepted are presented. Sex-related stratification of results was justified for only four quantities and, in these cases, quality specifications were derived from the group with lower within-subject variation. For certain quantities, biological variation in pathological states was higher than in the healthy state. In these cases, quality specifications were derived only from the healthy population (most stringent). Several quantities (particularly hormones) have been treated in very few articles and the results found are highly discrepant. Therefore, professionals in laboratory medicine should be strongly encouraged to study the quantities for which results are discrepant, the 90 quantities described in only one paper and the numerous quantities that have not been the subject of study.
Digital hand atlas for web-based bone age assessment: system design and implementation
NASA Astrophysics Data System (ADS)
Cao, Fei; Huang, H. K.; Pietka, Ewa; Gilsanz, Vicente
2000-04-01
A frequently used assessment method of skeletal age is atlas matching by a radiological examination of a hand image against a small set of Greulich-Pyle patterns of normal standards. The method however can lead to significant deviation in age assessment, due to a variety of observers with different levels of training. The Greulich-Pyle atlas based on middle upper class white populations in the 1950s, is also not fully applicable for children of today, especially regarding the standard development in other racial groups. In this paper, we present our system design and initial implementation of a digital hand atlas and computer-aided diagnostic (CAD) system for Web-based bone age assessment. The digital atlas will remove the disadvantages of the currently out-of-date one and allow the bone age assessment to be computerized and done conveniently via Web. The system consists of a hand atlas database, a CAD module and a Java-based Web user interface. The atlas database is based on a large set of clinically normal hand images of diverse ethnic groups. The Java-based Web user interface allows users to interact with the hand image database form browsers. Users can use a Web browser to push a clinical hand image to the CAD server for a bone age assessment. Quantitative features on the examined image, which reflect the skeletal maturity, is then extracted and compared with patterns from the atlas database to assess the bone age.
Zhivotovsky, Lev A; Malyarchuk, Boris A; Derenko, Miroslava V; Wozniak, Marcin; Grzybowski, Tomasz
2009-09-01
Developing a forensic DNA database on a population that consists of local ethnic groups separated by physical and cultural barriers is questionable as it can be genetically subdivided. On the other side, small sizes of ethnic groups, especially in alpine regions where they are sub-structured further into small villages, prevent collecting a large sample from each ethnic group. For such situations, we suggest to obtain both a total population database on allele frequencies across ethnic groups and a list of theta-values between the groups and the total data. We have genotyped 558 individuals from the native population of South Siberia, consisting of nine ethnic groups, at 17 autosomal STR loci of the kit packages AmpFlSTR SGM Plus i, Cyrillic AmpFlSTR Profiler Plus. The groups differentiate from each other with average theta-values of around 1.1%, and some reach up to three to four percent at certain loci. There exists between-village differentiation as well. Therefore, a database for the population of South Siberia is composed of data on allele frequencies in the pool of ethnic groups and data on theta-values that indicate variation in allele frequencies across the groups. Comparison to additional data on northeastern Asia (the Chukchi and Koryak) shows that differentiation in allele frequencies among small groups that are separated by large geographic distance can be even greater. In contrast, populations of Russians that live in large cities of the European part of Russia are homogeneous in allele frequencies, despite large geographic distance between them, and thus can be described by a database on allele frequencies alone, without any specific information on theta-values.
Design and Implementation of an Environmental Mercury Database for Northeastern North America
NASA Astrophysics Data System (ADS)
Clair, T. A.; Evers, D.; Smith, T.; Goodale, W.; Bernier, M.
2002-12-01
An important issue faced when attempting to interpret geochemical variability studies across large regions, is the accumulation, access and consistent display of data from a large number of sources. We were given the opportunity to provide a regional assessment of mercury distribution in surface waters, sediments, invertebrates, fish, and birds in a region extending from New York State to the Island of Newfoundland. We received over 20 individual databases from State, Provincial, and Federal governments, as well as university researchers from both Canada and the United States. These databases came in a variety of formats and sizes. Our challenge was to find a way of accumulating and presenting the large amounts of acquired data, in a consistent, easily accessible fashion, which could then be more easily interpreted. Moreover, the database had to be portable and easily distributable to the large number of study participants. We developed a static database structure using a web-based approach which we were then able to mount on a server which was accessible to all project participants. The site also contained all the necessary documentation related to the data, its acquisition, as well as the methods used in its analysis and interpretation. We then copied the complete web site on CDROM's which we then distributed to all project participants, funding agencies, and other interested parties. The CDROM formed a permanent record of the project and was issued ISSN and ISBN numbers so that the information remained accessible to researchers in perpetuity. Here we present an overview of the CDROM and data structures, of the information accumulated over the first year of the study, and initial interpretation of the results.
Design and implementation of the NPOI database and website
NASA Astrophysics Data System (ADS)
Newman, K.; Jorgensen, A. M.; Landavazo, M.; Sun, B.; Hutter, D. J.; Armstrong, J. T.; Mozurkewich, David; Elias, N.; van Belle, G. T.; Schmitt, H. R.; Baines, E. K.
2014-07-01
The Navy Precision Optical Interferometer (NPOI) has been recording astronomical observations for nearly two decades, at this point with hundreds of thousands of individual observations recorded to date for a total data volume of many terabytes. To make maximum use of the NPOI data it is necessary to organize them in an easily searchable manner and be able to extract essential diagnostic information from the data to allow users to quickly gauge data quality and suitability for a specific science investigation. This sets the motivation for creating a comprehensive database of observation metadata as well as, at least, reduced data products. The NPOI database is implemented in MySQL using standard database tools and interfaces. The use of standard database tools allows us to focus on top-level database and interface implementation and take advantage of standard features such as backup, remote access, mirroring, and complex queries which would otherwise be time-consuming to implement. A website was created in order to give scientists a user friendly interface for searching the database. It allows the user to select various metadata to search for and also allows them to decide how and what results are displayed. This streamlines the searches, making it easier and quicker for scientists to find the information they are looking for. The website has multiple browser and device support. In this paper we present the design of the NPOI database and website, and give examples of its use.
NASA Astrophysics Data System (ADS)
Zhao, L.; Chen, P.; Jordan, T. H.; Olsen, K. B.; Maechling, P.; Faerman, M.
2004-12-01
The Southern California Earthquake Center (SCEC) is developing a Community Modeling Environment (CME) to facilitate the computational pathways of physics-based seismic hazard analysis (Maechling et al., this meeting). Major goals are to facilitate the forward modeling of seismic wavefields in complex geologic environments, including the strong ground motions that cause earthquake damage, and the inversion of observed waveform data for improved models of Earth structure and fault rupture. Here we report on a unified approach to these coupled inverse problems that is based on the ability to generate and manipulate wavefields in densely gridded 3D Earth models. A main element of this approach is a database of receiver Green tensors (RGT) for the seismic stations, which comprises all of the spatial-temporal displacement fields produced by the three orthogonal unit impulsive point forces acting at each of the station locations. Once the RGT database is established, synthetic seismograms for any earthquake can be simply calculated by extracting a small, source-centered volume of the RGT from the database and applying the reciprocity principle. The partial derivatives needed for point- and finite-source inversions can be generated in the same way. Moreover, the RGT database can be employed in full-wave tomographic inversions launched from a 3D starting model, because the sensitivity (Fréchet) kernels for travel-time and amplitude anomalies observed at seismic stations in the database can be computed by convolving the earthquake-induced displacement field with the station RGTs. We illustrate all elements of this unified analysis with an RGT database for 33 stations of the California Integrated Seismic Network in and around the Los Angeles Basin, which we computed for the 3D SCEC Community Velocity Model (SCEC CVM3.0) using a fourth-order staggered-grid finite-difference code. For a spatial grid spacing of 200 m and a time resolution of 10 ms, the calculations took ~19,000 node-hours on the Linux cluster at USC's High-Performance Computing Center. The 33-station database with a volume of ~23.5 TB was archived in the SCEC digital library at the San Diego Supercomputer Center using the Storage Resource Broker (SRB). From a laptop, anyone with access to this SRB collection can compute synthetic seismograms for an arbitrary source in the CVM in a matter of minutes. Efficient approaches have been implemented to use this RGT database in the inversions of waveforms for centroid and finite moment tensors and tomographic inversions to improve the CVM. Our experience with these large problems suggests areas where the cyberinfrastructure currently available for geoscience computation needs to be improved.
The use of Benford's law for evaluation of quality of occupational hygiene data.
De Vocht, Frank; Kromhout, Hans
2013-04-01
Benford's law is the contra-intuitive empirical observation that the digits 1-9 are not equally likely to appear as the initial digit in numbers resulting from the same phenomenon. Manipulated, unrelated, or created numbers usually do not follow Benford's law, and as such this law has been used in the investigation of fraudulent data in, for example, accounting and to identify errors in data sets due to, for example, data transfer. We describe the use of Benford's law to screen occupational hygiene measurement data sets using exposure data from the European rubber manufacturing industry as an illustration. Two rubber process dust measurement data sets added to the European Union ExAsRub project but initially collected by the UK Health and Safety Executive (HSE) and British Rubber Manufacturers' Association (BRMA) and one pre- and one post-treatment n-nitrosamines data set collated in the German MEGA database and also added to the ExAsRub database were compared with the expected first-digit (1BL) and second-digit (2BL) Benford distributions. Evaluation indicated only small deviations from the expected 1BL and 2BL distributions for the data sets collated by the UK HSE and industry (BRMA), respectively, while for the MEGA data larger deviations were observed. To a large extent the latter could be attributed to imputation and replacement by a constant of n-nitrosamine measurements below the limit of detection, but further evaluation of these data to determine why other deviations from 1BL and 2BL expected distributions exist may be beneficial. Benford's law is a straightforward and easy-to-implement analytical tool to evaluate the quality of occupational hygiene data sets, and as such can be used to detect potential problems in large data sets that may be caused by malcontent a priori or a posteriori manipulation of data sets and by issues like treatment of observations below the limit of detection, rounding and transfer of data.
Baxter, Roger; Hansen, John; Timbol, Julius; Pool, Vitali; Greenberg, David P; Johnson, David R; Decker, Michael D
2016-11-01
An observational post-licensure (Phase IV) retrospective large-database safety study was conducted at Kaiser Permanente, a US integrated medical care organization, to assess the safety of Tetanus Toxoid, Reduced Diphtheria Toxoid and 5-Component Acellular Pertussis Vaccine (Tdap5) administered as part of routine healthcare among adolescents and adults. We evaluated incidence rates of various clinical events resulting in outpatient clinic, emergency department (ED), and hospital visits during various time intervals (windows) following Tdap5 vaccination using 2 pharmacoepidemiological methods (risk interval and historic cohort) and several screening thresholds. Plausible outcomes of interest with elevated incidence rate ratios (IRRs) were further evaluated by reviewing individual patient records to confirm the diagnosis, timing (temporal relationship), alternative etiology, and other health record details to discern possible relatedness of the health events to vaccination. Overall, 124,139 people received Tdap5 vaccine from September 2005 through mid-October 2006, and 203,154 in the comparison cohort received a tetanus and diphtheria toxoid adsorbed vaccine (and no live virus vaccine) during the year prior to initiation of this study. In the outpatient, ED and hospital databases, respectively, we identified 11/26, 179/700 and 187/700 unique health outcomes with IRRs significantly >1.0. Among the same unique health outcomes in the outpatient, ED, and hospital databases, 9, 146, and 385, respectively, had IRRs significantly <1.0. Further scrutiny of the outcomes with elevated IRRs did not reveal unexpected signals of adverse outcomes related to vaccination. In conclusion, Tdap5 vaccine was found to be safe among this large population of adolescents and adults.
A review of the volatiles from the healthy human body.
de Lacy Costello, B; Amann, A; Al-Kateb, H; Flynn, C; Filipiak, W; Khalid, T; Osborne, D; Ratcliffe, N M
2014-03-01
A compendium of all the volatile organic compounds (VOCs) emanating from the human body (the volatolome) is for the first time reported. 1840 VOCs have been assigned from breath (872), saliva (359), blood (154), milk (256), skin secretions (532) urine (279), and faeces (381) in apparently healthy individuals. Compounds were assigned CAS registry numbers and named according to a common convention where possible. The compounds have been grouped into tables according to their chemical class or functionality to permit easy comparison. Some clear differences are observed, for instance, a lack of esters in urine with a high number in faeces. Careful use of the database is needed. The numbers may not be a true reflection of the actual VOCs present from each bodily excretion. The lack of a compound could be due to the techniques used or reflect the intensity of effort e.g. there are few publications on VOCs from blood compared to a large number on VOCs in breath. The large number of volatiles reported from skin is partly due to the methodologies used, e.g. collecting excretions on glass beads and then heating to desorb VOCs. All compounds have been included as reported (unless there was a clear discrepancy between name and chemical structure), but there may be some mistaken assignations arising from the original publications, particularly for isomers. It is the authors' intention that this database will not only be a useful database of VOCs listed in the literature, but will stimulate further study of VOCs from healthy individuals. Establishing a list of volatiles emanating from healthy individuals and increased understanding of VOC metabolic pathways is an important step for differentiating between diseases using VOCs.
Tomio, Jun; Yamana, Hayato; Matsui, Hiroki; Yamashita, Hiroyuki; Yoshiyama, Takashi; Yasunaga, Hideo
2017-11-01
Tuberculosis screening is recommended for patients with immune-mediated inflammatory diseases (IMIDs) prior to anti-tumor necrosis factor (TNF) therapy. However, adherence to the recommended practice is unknown in the current clinical setting in Japan. We used a large-scale health insurance claims database in Japan to conduct a longitudinal observational study. Of more than two million beneficiaries in the database between 2013 and 2014, we enrolled those with IMIDs aged 15-69 years who had initiated anti-TNF therapy. We defined tuberculosis screening primarily as tuberculin skin test and/or interferon-gamma release assay (TST/IGRA) within 2 months before commencing anti-TNF therapy. We analyzed the proportions of the patients who had undergone tuberculosis screening and the associations with primary disease, type of anti-TNF agent, methotrexate prescription prior to anti-TNF therapy, and treatment for latent tuberculosis infection (LTBI). Of 385 patients presumed to have initiated anti-TNF therapy, 252 (66%) had undergone tuberculosis screening by TST/IGRA (22% TST, 56% IGRA, and 12% both TST and IGRA), and 231 (60%) had undergone TST/IGRA and radiography. Patients with psoriasis tended to be more likely to undergo tuberculosis screening than those with other diseases; however, this association was not statistically significant. Treatment for LTBI was provided to 43 (11%) patients; 123 (32%) received neither TST/IGRA nor LTBI treatment. Tuberculosis screening was often not performed prior to anti-TNF therapy despite the guidelines' recommendations; thus, patients could be put at unnecessary risk of reactivation of tuberculosis. © 2017 Asia Pacific League of Associations for Rheumatology and John Wiley & Sons Australia, Ltd.
The USA-NPN Information Management System: A tool in support of phenological assessments
NASA Astrophysics Data System (ADS)
Rosemartin, A.; Vazquez, R.; Wilson, B. E.; Denny, E. G.
2009-12-01
The USA National Phenology Network (USA-NPN) serves science and society by promoting a broad understanding of plant and animal phenology and the relationships among phenological patterns and all aspects of environmental change. Data management and information sharing are central to the USA-NPN mission. The USA-NPN develops, implements, and maintains a comprehensive Information Management System (IMS) to serve the needs of the network, including the collection, storage and dissemination of phenology data, access to phenology-related information, tools for data interpretation, and communication among partners of the USA-NPN. The IMS includes components for data storage, such as the National Phenology Database (NPD), and several online user interfaces to accommodate data entry, data download, data visualization and catalog searches for phenology-related information. The IMS is governed by a set of standards to ensure security, privacy, data access, and data quality. The National Phenology Database is designed to efficiently accommodate large quantities of phenology data, to be flexible to the changing needs of the network, and to provide for quality control. The database stores phenology data from multiple sources (e.g., partner organizations, researchers and citizen observers), and provides for integration with legacy datasets. Several services will be created to provide access to the data, including reports, visualization interfaces, and web services. These services will provide integrated access to phenology and related information for scientists, decision-makers and general audiences. Phenological assessments at any scale will rely on secure and flexible information management systems for the organization and analysis of phenology data. The USA-NPN’s IMS can serve phenology assessments directly, through data management and indirectly as a model for large-scale integrated data management.
Mars Global Digital Dune Database (MGD3): Global dune distribution and wind pattern observations
Hayward, Rosalyn K.; Fenton, Lori; Titus, Timothy N.
2014-01-01
The Mars Global Digital Dune Database (MGD3) is complete and now extends from 90°N to 90°S latitude. The recently released south pole (SP) portion (MC-30) of MGD3 adds ∼60,000 km2 of medium to large-size dark dune fields and ∼15,000 km2 of sand deposits and smaller dune fields to the previously released equatorial (EQ, ∼70,000 km2), and north pole (NP, ∼845,000 km2) portions of the database, bringing the global total to ∼975,000 km2. Nearly all NP dunes are part of large sand seas, while the majority of EQ and SP dune fields are individual dune fields located in craters. Despite the differences between Mars and Earth, their dune and dune field morphologies are strikingly similar. Bullseye dune fields, named for their concentric ring pattern, are the exception, possibly owing their distinctive appearance to winds that are unique to the crater environment. Ground-based wind directions are derived from slipface (SF) orientation and dune centroid azimuth (DCA), a measure of the relative location of a dune field inside a crater. SF and DCA often preserve evidence of different wind directions, suggesting the importance of local, topographically influenced winds. In general however, ground-based wind directions are broadly consistent with expected global patterns, such as polar easterlies. Intriguingly, between 40°S and 80°S latitude both SF and DCA preserve their strongest, though different, dominant wind direction, with transport toward the west and east for SF-derived winds and toward the north and west for DCA-derived winds.
Baxter, Roger; Hansen, John; Timbol, Julius; Pool, Vitali; Greenberg, David P.; Johnson, David R.; Decker, Michael D.
2016-01-01
ABSTRACT An observational post-licensure (Phase IV) retrospective large-database safety study was conducted at Kaiser Permanente, a US integrated medical care organization, to assess the safety of Tetanus Toxoid, Reduced Diphtheria Toxoid and 5-Component Acellular Pertussis Vaccine (Tdap5) administered as part of routine healthcare among adolescents and adults. We evaluated incidence rates of various clinical events resulting in outpatient clinic, emergency department (ED), and hospital visits during various time intervals (windows) following Tdap5 vaccination using 2 pharmacoepidemiological methods (risk interval and historic cohort) and several screening thresholds. Plausible outcomes of interest with elevated incidence rate ratios (IRRs) were further evaluated by reviewing individual patient records to confirm the diagnosis, timing (temporal relationship), alternative etiology, and other health record details to discern possible relatedness of the health events to vaccination. Overall, 124,139 people received Tdap5 vaccine from September 2005 through mid-October 2006, and 203,154 in the comparison cohort received a tetanus and diphtheria toxoid adsorbed vaccine (and no live virus vaccine) during the year prior to initiation of this study. In the outpatient, ED and hospital databases, respectively, we identified 11/26, 179/700 and 187/700 unique health outcomes with IRRs significantly >1.0. Among the same unique health outcomes in the outpatient, ED, and hospital databases, 9, 146, and 385, respectively, had IRRs significantly <1.0. Further scrutiny of the outcomes with elevated IRRs did not reveal unexpected signals of adverse outcomes related to vaccination. In conclusion, Tdap5 vaccine was found to be safe among this large population of adolescents and adults. PMID:27388557
Spiegel, Paul B; Le, Phuoc; Ververs, Mija-Tesse; Salama, Peter
2007-01-01
Background The fields of expertise of natural disasters and complex emergencies (CEs) are quite distinct, with different tools for mitigation and response as well as different types of competent organizations and qualified professionals who respond. However, natural disasters and CEs can occur concurrently in the same geographic location, and epidemics can occur during or following either event. The occurrence and overlap of these three types of events have not been well studied. Methods All natural disasters, CEs and epidemics occurring within the past decade (1995–2004) that met the inclusion criteria were included. The largest 30 events in each category were based on the total number of deaths recorded. The main databases used were the Emergency Events Database for natural disasters, the Uppsala Conflict Database Program for CEs and the World Health Organization outbreaks archive for epidemics. Analysis During the past decade, 63% of the largest CEs had ≥1 epidemic compared with 23% of the largest natural disasters. Twenty-seven percent of the largest natural disasters occurred in areas with ≥1 ongoing CE while 87% of the largest CEs had ≥1 natural disaster. Conclusion Epidemics commonly occur during CEs. The data presented in this article do not support the often-repeated assertion that epidemics, especially large-scale epidemics, commonly occur following large-scale natural disasters. This observation has important policy and programmatic implications when preparing and responding to epidemics. There is an important and previously unrecognized overlap between natural disasters and CEs. Training and tools are needed to help bridge the gap between the different type of organizations and professionals who respond to natural disasters and CEs to ensure an integrated and coordinated response. PMID:17411460
NOVAC - Network for Observation of Volcanic and Atmospheric Change: Data archiving and management
NASA Astrophysics Data System (ADS)
Lehmann, T.; Kern, C.; Vogel, L.; Platt, U.; Johansson, M.; Galle, B.
2009-12-01
The potential for volcanic risk assessment using real-time gas emissions data and the recognized power of sharing data from multiple eruptive centers were the motivation for a European Union FP6 Research Program project entitled NOVAC: Network for Observation of Volcanic and Atmospheric Change. Starting in 2005, a worldwide network of permanent scanning Differential Optical Absorption Spectroscopy (DOAS) instruments was installed at 26 volcanoes around the world. These ground-based remote sensing instruments record the characteristic absorption of volcanic gas emissions (e.g. SO2, BrO) in the ultra-violet wavelength region. A real-time DOAS retrieval was implemented to evaluate the measured spectra, thus providing the respective observatories with gas emission data which can be used for volcanic risk assessment and hazard prediction. Observatory personnel at each partner institution were trained on technical and scientific aspects of the DOAS technique, and a central database was created to allow the exchange of data and ideas between all partners. A bilateral benefit for volcano observatories as well as scientific institutions (e.g. universities and research centers) resulted. Volcano observatories were provided with leading edge technology for measuring volcanic SO2 emission fluxes, and now use this technology for monitoring and risk assessment, while the involved universities and research centers are working on global studies and characterizing the atmospheric impact of the observed gas emissions. The NOVAC database takes into account that project members use the database in a variety of different ways. Therefore, the data is structured in layers, the top of which contains basic information about each instrument. The second layer contains evaluated emission data such as SO2 column densities, SO2 emission fluxes, and BrO/SO2 ratios. The lowest layer contains all spectra measured by the individual instruments. Online since the middle of 2006, the NOVAC database currently contains 26 volcanoes, 56 instruments and more than 50 million spectra. It is scalable for up to 200 or more volcanoes, as the NOVAC project is open to outside participation. The data is archived in a MySQL Database system, storing and querying is done with PHP functions. The web interface is dynamically created based on the existing dataset and offers approx. 150 different search, display, and sorting options. Each user has a separate account and can save his personal search configuration from session to session. Search results are displayed in table form and can also be downloaded. Both evaluated data files and measured spectra can be downloaded as single files or in packages. The spectra can be plotted directly from the database, as well as several measurement values and evaluated parameters over selectable timescales. Because of the large extent of the dataset, major emphasis was placed on performance optimization.
Chen, Mingyang; Stott, Amanda C; Li, Shenggang; Dixon, David A
2012-04-01
A robust metadata database called the Collaborative Chemistry Database Tool (CCDBT) for massive amounts of computational chemistry raw data has been designed and implemented. It performs data synchronization and simultaneously extracts the metadata. Computational chemistry data in various formats from different computing sources, software packages, and users can be parsed into uniform metadata for storage in a MySQL database. Parsing is performed by a parsing pyramid, including parsers written for different levels of data types and sets created by the parser loader after loading parser engines and configurations. Copyright © 2011 Elsevier Inc. All rights reserved.
[Status of libraries and databases for natural products at abroad].
Zhao, Li-Mei; Tan, Ning-Hua
2015-01-01
For natural products are one of the important sources for drug discovery, libraries and databases of natural products are significant for the development and research of natural products. At present, most of compound libraries at abroad are synthetic or combinatorial synthetic molecules, resulting to access natural products difficult; for information of natural products are scattered with different standards, it is difficult to construct convenient, comprehensive and large-scale databases for natural products. This paper reviewed the status of current accessing libraries and databases for natural products at abroad and provided some important information for the development of libraries and database for natural products.
Lessons Learned From Developing Reactor Pressure Vessel Steel Embrittlement Database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jy-An John
Materials behaviors caused by neutron irradiation under fission and/or fusion environments can be little understood without practical examination. Easily accessible material information system with large material database using effective computers is necessary for design of nuclear materials and analyses or simulations of the phenomena. The developed Embrittlement Data Base (EDB) at ORNL is this comprehensive collection of data. EDB database contains power reactor pressure vessel surveillance data, the material test reactor data, foreign reactor data (through bilateral agreements authorized by NRC), and the fracture toughness data. The lessons learned from building EDB program and the associated database management activity regardingmore » Material Database Design Methodology, Architecture and the Embedded QA Protocol are described in this report. The development of IAEA International Database on Reactor Pressure Vessel Materials (IDRPVM) and the comparison of EDB database and IAEA IDRPVM database are provided in the report. The recommended database QA protocol and database infrastructure are also stated in the report.« less
Global precipitation measurements for validating climate models
NASA Astrophysics Data System (ADS)
Tapiador, F. J.; Navarro, A.; Levizzani, V.; García-Ortega, E.; Huffman, G. J.; Kidd, C.; Kucera, P. A.; Kummerow, C. D.; Masunaga, H.; Petersen, W. A.; Roca, R.; Sánchez, J.-L.; Tao, W.-K.; Turk, F. J.
2017-11-01
The advent of global precipitation data sets with increasing temporal span has made it possible to use them for validating climate models. In order to fulfill the requirement of global coverage, existing products integrate satellite-derived retrievals from many sensors with direct ground observations (gauges, disdrometers, radars), which are used as reference for the satellites. While the resulting product can be deemed as the best-available source of quality validation data, awareness of the limitations of such data sets is important to avoid extracting wrong or unsubstantiated conclusions when assessing climate model abilities. This paper provides guidance on the use of precipitation data sets for climate research, including model validation and verification for improving physical parameterizations. The strengths and limitations of the data sets for climate modeling applications are presented, and a protocol for quality assurance of both observational databases and models is discussed. The paper helps elaborating the recent IPCC AR5 acknowledgment of large observational uncertainties in precipitation observations for climate model validation.
NASA Astrophysics Data System (ADS)
Richardson, Noel; Hardegree-Ullman, Kevin; Bjorkman, Jon Eric; Bjorkman, Karen S.; Ritter Observing Team
2017-01-01
With a 1-m telescope on the University of Toledo (OH) main campus, we have initiated a grad student-undergraduate partnership to help teach the undergraduates observational methods and introduce them to research through peer mentorship. For the last 3 years, we have trained up to 21 undergraduates (primarily physics/astronomy majors) in a given academic semester, ranging from freshman to seniors. Various projects are currently being conducted by undergraduate students with guidance from graduate student mentors, including constructing three-color images, observations of transiting exoplanets, and determination of binary star orbits from echelle spectra. This academic year we initiated a large group research project to help students learn about the databases, journal repositories, and online observing tools astronomers use for day-to-day research. We discuss early inclusion in observational astronomy and research of these students and the impact it has on departmental retention, undergraduate involvement, and academic success.
The role of Natural Flood Management in managing floods in large scale basins during extreme events
NASA Astrophysics Data System (ADS)
Quinn, Paul; Owen, Gareth; ODonnell, Greg; Nicholson, Alex; Hetherington, David
2016-04-01
There is a strong evidence database showing the negative impacts of land use intensification and soil degradation in NW European river basins on hydrological response and to flood impact downstream. However, the ability to target zones of high runoff production and the extent to which we can manage flood risk using nature-based flood management solution are less known. A move to planting more trees and having less intense farmed landscapes is part of natural flood management (NFM) solutions and these methods suggest that flood risk can be managed in alternative and more holistic ways. So what local NFM management methods should be used, where in large scale basin should they be deployed and how does flow is propagate to any point downstream? Generally, how much intervention is needed and will it compromise food production systems? If we are observing record levels of rainfall and flow, for example during Storm Desmond in Dec 2015 in the North West of England, what other flood management options are really needed to complement our traditional defences in large basins for the future? In this paper we will show examples of NFM interventions in the UK that have impacted at local scale sites. We will demonstrate the impact of interventions at local, sub-catchment (meso-scale) and finally at the large scale. These tools include observations, process based models and more generalised Flood Impact Models. Issues of synchronisation and the design level of protection will be debated. By reworking observed rainfall and discharge (runoff) for observed extreme events in the River Eden and River Tyne, during Storm Desmond, we will show how much flood protection is needed in large scale basins. The research will thus pose a number of key questions as to how floods may have to be managed in large scale basins in the future. We will seek to support a method of catchment systems engineering that holds water back across the whole landscape as a major opportunity to management water in large scale basins in the future. The broader benefits of engineering landscapes to hold water for pollution control, sediment loss and drought minimisation will also be shown.
Fernández, José M; Valencia, Alfonso
2004-10-12
Downloading the information stored in relational databases into XML and other flat formats is a common task in bioinformatics. This periodical dumping of information requires considerable CPU time, disk and memory resources. YAdumper has been developed as a purpose-specific tool to deal with the integral structured information download of relational databases. YAdumper is a Java application that organizes database extraction following an XML template based on an external Document Type Declaration. Compared with other non-native alternatives, YAdumper substantially reduces memory requirements and considerably improves writing performance.
Performance analysis of different database in new internet mapping system
NASA Astrophysics Data System (ADS)
Yao, Xing; Su, Wei; Gao, Shuai
2017-03-01
In the Mapping System of New Internet, Massive mapping entries between AID and RID need to be stored, added, updated, and deleted. In order to better deal with the problem when facing a large number of mapping entries update and query request, the Mapping System of New Internet must use high-performance database. In this paper, we focus on the performance of Redis, SQLite, and MySQL these three typical databases, and the results show that the Mapping System based on different databases can adapt to different needs according to the actual situation.
NASA Astrophysics Data System (ADS)
Scheers, B.; Bloemen, S.; Mühleisen, H.; Schellart, P.; van Elteren, A.; Kersten, M.; Groot, P. J.
2018-04-01
Coming high-cadence wide-field optical telescopes will image hundreds of thousands of sources per minute. Besides inspecting the near real-time data streams for transient and variability events, the accumulated data archive is a wealthy laboratory for making complementary scientific discoveries. The goal of this work is to optimise column-oriented database techniques to enable the construction of a full-source and light-curve database for large-scale surveys, that is accessible by the astronomical community. We adopted LOFAR's Transients Pipeline as the baseline and modified it to enable the processing of optical images that have much higher source densities. The pipeline adds new source lists to the archive database, while cross-matching them with the known cataloguedsources in order to build a full light-curve archive. We investigated several techniques of indexing and partitioning the largest tables, allowing for faster positional source look-ups in the cross matching algorithms. We monitored all query run times in long-term pipeline runs where we processed a subset of IPHAS data that have image source density peaks over 170,000 per field of view (500,000 deg-2). Our analysis demonstrates that horizontal table partitions of declination widths of one-degree control the query run times. Usage of an index strategy where the partitions are densely sorted according to source declination yields another improvement. Most queries run in sublinear time and a few (< 20%) run in linear time, because of dependencies on input source-list and result-set size. We observed that for this logical database partitioning schema the limiting cadence the pipeline achieved with processing IPHAS data is 25 s.
NASA Astrophysics Data System (ADS)
Tifafi, Marwa; Guenet, Bertrand; Hatté, Christine
2018-01-01
Soils are the major component of the terrestrial ecosystem and the largest organic carbon reservoir on Earth. However, they are a nonrenewable natural resource and especially reactive to human disturbance and climate change. Despite its importance, soil carbon dynamics is an important source of uncertainty for future climate predictions and there is a growing need for more precise information to better understand the mechanisms controlling soil carbon dynamics and better constrain Earth system models. The aim of our work is to compare soil organic carbon stocks given by different global and regional databases that already exist. We calculated global and regional soil carbon stocks at 1 m depth given by three existing databases (SoilGrids, the Harmonized World Soil Database, and the Northern Circumpolar Soil Carbon Database). We observed that total stocks predicted by each product differ greatly: it is estimated to be around 3,400 Pg by SoilGrids and is about 2,500 Pg according to Harmonized World Soil Database. This difference is marked in particular for boreal regions where differences can be related to high disparities in soil organic carbon concentration. Differences in other regions are more limited and may be related to differences in bulk density estimates. Finally, evaluation of the three data sets versus ground truth data shows that (i) there is a significant difference in spatial patterns between ground truth data and compared data sets and that (ii) data sets underestimate by more than 40% the soil organic carbon stock compared to field data.
Pacific walrus coastal haulout database, 1852-2016— Background report
Fischbach, Anthony S.; Kochnev, Anatoly A.; Garlich-Miller, Joel L.; Jay, Chadwick V.
2016-01-01
Walruses are large benthic predators that rest out of water between foraging bouts. Coastal “haulouts” (places where walruses rest) are formed by adult males in summer and sometimes by females and young when sea ice is absent, and are often used repeatedly across seasons and years. Understanding the geography and historical use of haulouts provides a context for conservation efforts. We summarize information on Pacific walrus haulouts from available reports (n =151), interviews with coastal residents and aviators, and personal observations of the authors. We provide this in the form of a georeferenced database that can be queried and displayed with standard geographic information system and database management software. The database contains 150 records of Pacific walrus haulouts, with a summary of basic characteristics on maximum haulout aggregation size, age-sex composition, season of use, and decade of most recent use. Citations to reports are provided in the appendix and as a bibliographic database. Haulouts were distributed across the coasts of the Pacific walrus range; however, the largest (maximum >10,000 walruses) of the haulouts reported in the recent 4 decades (n=19) were concentrated on the Russian shores in regions near the Bering Strait and northward into the western Chukchi Sea (n=17). Haulouts of adult female and young walruses primarily occurred in the Bering Strait region and areas northward, with others occurring in the central Bering Sea, Gulf of Anadyr, and Saint Lawrence Island regions. The Gulf of Anadyr was the only region to contain female and young walrus haulouts, which formed after the northward spring migration and prior to autumn ice formation.
Ecological selectivity of the emerging mass extinction in the oceans.
Payne, Jonathan L; Bush, Andrew M; Heim, Noel A; Knope, Matthew L; McCauley, Douglas J
2016-09-16
To better predict the ecological and evolutionary effects of the emerging biodiversity crisis in the modern oceans, we compared the association between extinction threat and ecological traits in modern marine animals to associations observed during past extinction events using a database of 2497 marine vertebrate and mollusc genera. We find that extinction threat in the modern oceans is strongly associated with large body size, whereas past extinction events were either nonselective or preferentially removed smaller-bodied taxa. Pelagic animals were victimized more than benthic animals during previous mass extinctions but are not preferentially threatened in the modern ocean. The differential importance of large-bodied animals to ecosystem function portends greater future ecological disruption than that caused by similar levels of taxonomic loss in past mass extinction events. Copyright © 2016, American Association for the Advancement of Science.
Earth observations taken during STS-79 mission
1996-09-25
STS079-785-103 (16-26 Sept. 1996) --- In this 70mm frame from the space shuttle Atlantis, the Brazilian state of Rondonia is featured. The photograph shows some of the major settlements and the habitat fragmentation caused by large agriculture programs in Brazil. The Rondonia state in southwestern Brazil is an area of about 240,000 square kilometers (92,000 square miles). Approximately 11.5% of the tropical forests in Rondonia have been cleared since 1970. There are indicators showing that roughly 20% of the cleared land is reverting back to scrub every year due to low fertility. Space Shuttle photography of this region has documented the forest clearing since the mid 1980's. This view adds to the large database of imagery, including other satellite-based imagery, and provides a natural color view of the region.
The Use of Intensity Scales In Exploiting Tsunami Historical Databases
NASA Astrophysics Data System (ADS)
Barberopoulou, A.; Scheele, F.
2015-12-01
Post-disaster assessments for historical tsunami events (>15 years old) are either scarce or contain limited information. In this study, we are assessing ways to examine tsunami impacts by utilizing data from old events, but more importantly we examine how to best utilize information contained in tsunami historical databases, in order to provide meaningful products that describe the impact of the event. As such, a tsunami intensity scale was applied to two historical events that were observed in New Zealand (one local and one distant), in order to utilize the largest possible number of observations in our dataset. This is especially important for countries like New Zealand where the tsunami historical record is short, going back to only the 19th century, and where instrument recordings are only available for the most recent events. We found that despite a number of challenges in using intensities -uncertainties partly due to limitations of historical event data - these data with the help of GIS tools can be used to produce hazard maps and offer an alternative way to exploit tsunami historical records. Most importantly the assignment of intensities at each point of observation allows for utilization of many more observations than if one depends on physical information alone, such as water heights. We hope these results may be used towards developing a well-defined methodology for hazard assessments, and refine our knowledge for past tsunami events for which the tsunami sources are largely unknown, and also for when physical quantities describing the tsunami (e.g. water height, flood depth, run-up) are scarce.
Methods to Secure Databases Against Vulnerabilities
2015-12-01
for several languages such as C, C++, PHP, Java and Python [16]. MySQL will work well with very large databases. The documentation references...using Eclipse and connected to each database management system using Python and Java drivers provided by MySQL , MongoDB, and Datastax (for Cassandra...tiers in Python and Java . Problem MySQL MongoDB Cassandra 1. Injection a. Tautologies Vulnerable Vulnerable Not Vulnerable b. Illegal query
Verification of the databases EXFOR and ENDF
NASA Astrophysics Data System (ADS)
Berton, Gottfried; Damart, Guillaume; Cabellos, Oscar; Beauzamy, Bernard; Soppera, Nicolas; Bossant, Manuel
2017-09-01
The objective of this work is for the verification of large experimental (EXFOR) and evaluated nuclear reaction databases (JEFF, ENDF, JENDL, TENDL…). The work is applied to neutron reactions in EXFOR data, including threshold reactions, isomeric transitions, angular distributions and data in the resonance region of both isotopes and natural elements. Finally, a comparison of the resonance integrals compiled in EXFOR database with those derived from the evaluated libraries is also performed.
Ogishima, Soichi; Takai, Takako; Shimokawa, Kazuro; Nagaie, Satoshi; Tanaka, Hiroshi; Nakaya, Jun
2015-01-01
The Tohoku Medical Megabank project is a national project to revitalization of the disaster area in the Tohoku region by the Great East Japan Earthquake, and have conducted large-scale prospective genome-cohort study. Along with prospective genome-cohort study, we have developed integrated database and knowledge base which will be key database for realizing personalized prevention and medicine.
The androgen receptor gene mutations database.
Gottlieb, B; Trifiro, M; Lumbroso, R; Pinsky, L
1997-01-01
The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 212 to 272. We have expanded the database: (i) by adding a large amount of new data on somatic mutations in prostatic cancer tissue; (ii) by defining a new constitutional phenotype, mild androgen insensitivity (MAI); (iii) by placing additional relevant information on an internet site (http://www.mcgill.ca/androgendb/ ). The database has allowed us to examine the contribution of CpG sites to the multiplicity of reports of the same mutation in different families. The database is also available from EMBL (ftp.ebi.ac.uk/pub/databases/androgen) or as a Macintosh Filemaker Pro or Word file (MC33@musica,mcgill.ca)
The androgen receptor gene mutations database.
Gottlieb, B; Trifiro, M; Lumbroso, R; Pinsky, L
1997-01-01
The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 212 to 272. We have expanded the database: (i) by adding a large amount of new data on somatic mutations in prostatic cancer tissue; (ii) by defining a new constitutional phenotype, mild androgen insensitivity (MAI); (iii) by placing additional relevant information on an internet site (http://www.mcgill.ca/androgendb/ ). The database has allowed us to examine the contribution of CpG sites to the multiplicity of reports of the same mutation in different families. The database is also available from EMBL (ftp.ebi.ac.uk/pub/databases/androgen) or as a Macintosh Filemaker Pro or Word file (MC33@musica,mcgill.ca) PMID:9016528
Databases for multilevel biophysiology research available at Physiome.jp.
Asai, Yoshiyuki; Abe, Takeshi; Li, Li; Oka, Hideki; Nomura, Taishin; Kitano, Hiroaki
2015-01-01
Physiome.jp (http://physiome.jp) is a portal site inaugurated in 2007 to support model-based research in physiome and systems biology. At Physiome.jp, several tools and databases are available to support construction of physiological, multi-hierarchical, large-scale models. There are three databases in Physiome.jp, housing mathematical models, morphological data, and time-series data. In late 2013, the site was fully renovated, and in May 2015, new functions were implemented to provide information infrastructure to support collaborative activities for developing models and performing simulations within the database framework. This article describes updates to the databases implemented since 2013, including cooperation among the three databases, interactive model browsing, user management, version management of models, management of parameter sets, and interoperability with applications.
Java Web Simulation (JWS); a web based database of kinetic models.
Snoep, J L; Olivier, B G
2002-01-01
Software to make a database of kinetic models accessible via the internet has been developed and a core database has been set up at http://jjj.biochem.sun.ac.za/. This repository of models, available to everyone with internet access, opens a whole new way in which we can make our models public. Via the database, a user can change enzyme parameters and run time simulations or steady state analyses. The interface is user friendly and no additional software is necessary. The database currently contains 10 models, but since the generation of the program code to include new models has largely been automated the addition of new models is straightforward and people are invited to submit their models to be included in the database.
Detecting errors and anomalies in computerized materials control and accountability databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whiteson, R.; Hench, K.; Yarbro, T.
The Automated MC and A Database Assessment project is aimed at improving anomaly and error detection in materials control and accountability (MC and A) databases and increasing confidence in the data that they contain. Anomalous data resulting in poor categorization of nuclear material inventories greatly reduces the value of the database information to users. Therefore it is essential that MC and A data be assessed periodically for anomalies or errors. Anomaly detection can identify errors in databases and thus provide assurance of the integrity of data. An expert system has been developed at Los Alamos National Laboratory that examines thesemore » large databases for anomalous or erroneous data. For several years, MC and A subject matter experts at Los Alamos have been using this automated system to examine the large amounts of accountability data that the Los Alamos Plutonium Facility generates. These data are collected and managed by the Material Accountability and Safeguards System, a near-real-time computerized nuclear material accountability and safeguards system. This year they have expanded the user base, customizing the anomaly detector for the varying requirements of different groups of users. This paper describes the progress in customizing the expert systems to the needs of the users of the data and reports on their results.« less
Hermjakob, Henning; Montecchi-Palazzi, Luisa; Bader, Gary; Wojcik, Jérôme; Salwinski, Lukasz; Ceol, Arnaud; Moore, Susan; Orchard, Sandra; Sarkans, Ugis; von Mering, Christian; Roechert, Bernd; Poux, Sylvain; Jung, Eva; Mersch, Henning; Kersey, Paul; Lappe, Michael; Li, Yixue; Zeng, Rong; Rana, Debashis; Nikolski, Macha; Husi, Holger; Brun, Christine; Shanker, K; Grant, Seth G N; Sander, Chris; Bork, Peer; Zhu, Weimin; Pandey, Akhilesh; Brazma, Alvis; Jacq, Bernard; Vidal, Marc; Sherman, David; Legrain, Pierre; Cesareni, Gianni; Xenarios, Ioannis; Eisenberg, David; Steipe, Boris; Hogue, Chris; Apweiler, Rolf
2004-02-01
A major goal of proteomics is the complete description of the protein interaction network underlying cell physiology. A large number of small scale and, more recently, large-scale experiments have contributed to expanding our understanding of the nature of the interaction network. However, the necessary data integration across experiments is currently hampered by the fragmentation of publicly available protein interaction data, which exists in different formats in databases, on authors' websites or sometimes only in print publications. Here, we propose a community standard data model for the representation and exchange of protein interaction data. This data model has been jointly developed by members of the Proteomics Standards Initiative (PSI), a work group of the Human Proteome Organization (HUPO), and is supported by major protein interaction data providers, in particular the Biomolecular Interaction Network Database (BIND), Cellzome (Heidelberg, Germany), the Database of Interacting Proteins (DIP), Dana Farber Cancer Institute (Boston, MA, USA), the Human Protein Reference Database (HPRD), Hybrigenics (Paris, France), the European Bioinformatics Institute's (EMBL-EBI, Hinxton, UK) IntAct, the Molecular Interactions (MINT, Rome, Italy) database, the Protein-Protein Interaction Database (PPID, Edinburgh, UK) and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, EMBL, Heidelberg, Germany).
NCBI2RDF: Enabling Full RDF-Based Access to NCBI Databases
Anguita, Alberto; García-Remesal, Miguel; de la Iglesia, Diana; Maojo, Victor
2013-01-01
RDF has become the standard technology for enabling interoperability among heterogeneous biomedical databases. The NCBI provides access to a large set of life sciences databases through a common interface called Entrez. However, the latter does not provide RDF-based access to such databases, and, therefore, they cannot be integrated with other RDF-compliant databases and accessed via SPARQL query interfaces. This paper presents the NCBI2RDF system, aimed at providing RDF-based access to the complete NCBI data repository. This API creates a virtual endpoint for servicing SPARQL queries over different NCBI repositories and presenting to users the query results in SPARQL results format, thus enabling this data to be integrated and/or stored with other RDF-compliant repositories. SPARQL queries are dynamically resolved, decomposed, and forwarded to the NCBI-provided E-utilities programmatic interface to access the NCBI data. Furthermore, we show how our approach increases the expressiveness of the native NCBI querying system, allowing several databases to be accessed simultaneously. This feature significantly boosts productivity when working with complex queries and saves time and effort to biomedical researchers. Our approach has been validated with a large number of SPARQL queries, thus proving its reliability and enhanced capabilities in biomedical environments. PMID:23984425
Roman, C; Scripcariu, L; Diaconescu, Rm; Grigoriu, A
2012-01-01
Biocides for prolonging the shelf life of a large variety of materials have been extensively used over the last decades. It has estimated that the worldwide biocide consumption to be about 12.4 billion dollars in 2011, and is expected to increase in 2012. As biocides are substances we get in contact with in our everyday lives, access to this type of information is of paramount importance in order to ensure an appropriate living environment. Consequently, a database where information may be quickly processed, sorted, and easily accessed, according to different search criteria, is the most desirable solution. The main aim of this work was to design and implement a relational database with complete information about biocides used in public health management to improve the quality of life. Design and implementation of a relational database for biocides, by using the software "phpMyAdmin". A database, which allows for an efficient collection, storage, and management of information including chemical properties and applications of a large quantity of biocides, as well as its adequate dissemination into the public health environment. The information contained in the database herein presented promotes an adequate use of biocides, by means of information technologies, which in consequence may help achieve important improvement in our quality of life.
VIEWCACHE: An incremental pointer-based access method for autonomous interoperable databases
NASA Technical Reports Server (NTRS)
Roussopoulos, N.; Sellis, Timos
1993-01-01
One of the biggest problems facing NASA today is to provide scientists efficient access to a large number of distributed databases. Our pointer-based incremental data base access method, VIEWCACHE, provides such an interface for accessing distributed datasets and directories. VIEWCACHE allows database browsing and search performing inter-database cross-referencing with no actual data movement between database sites. This organization and processing is especially suitable for managing Astrophysics databases which are physically distributed all over the world. Once the search is complete, the set of collected pointers pointing to the desired data are cached. VIEWCACHE includes spatial access methods for accessing image datasets, which provide much easier query formulation by referring directly to the image and very efficient search for objects contained within a two-dimensional window. We will develop and optimize a VIEWCACHE External Gateway Access to database management systems to facilitate database search.
Mining a human transcriptome database for Nrf2 modulators
Nuclear factor erythroid-2 related factor 2 (Nrf2) is a key transcription factor important in the protection against oxidative stress. We developed computational procedures to enable the identification of chemical, genetic and environmental modulators of Nrf2 in a large database ...
Task-Driven Dynamic Text Summarization
ERIC Educational Resources Information Center
Workman, Terri Elizabeth
2011-01-01
The objective of this work is to examine the efficacy of natural language processing (NLP) in summarizing bibliographic text for multiple purposes. Researchers have noted the accelerating growth of bibliographic databases. Information seekers using traditional information retrieval techniques when searching large bibliographic databases are often…
Mining The Sdss-moc Database For Main-belt Asteroid Solar Phase Behavior.
NASA Astrophysics Data System (ADS)
Truong, Thien-Tin; Hicks, M. D.
2010-10-01
The 4th Release of the Sloan Digital Sky Survey Moving Object Catalog (SDSS-MOC) contains 471569 moving object detections from 519 observing runs obtained up to March 2007. Of these, 220101 observations were linked with 104449 known small bodies, with 2150 asteroids sampled at least 10 times. It is our goal to mine this database in order to extract solar phase curve information for a large number of main-belt asteroids of different dynamical and taxonomic classes. We found that a simple linear phase curve fit allowed us to reject data contaminated by intrinsic rotational lightcurves and other effects. As expected, a running mean of solar phase coefficient is strongly correlated with orbital elements, with the inner main-belt dominated by bright S-type asteroids and transitioning to darker C and D-type asteroids with steeper solar phase slopes. We shall fit the empirical H-G model to our 2150 multi-sampled asteroids and correlate these parameters with spectral type derived from the SDSS colors and position within the asteroid belt. Our data should also allow us to constrain solar phase reddening for a variety of taxonomic classes. We shall discuss errors induced by the standard "g=0.15" assumption made in absolute magnitude determination, which may slightly affect number-size distribution models.
Syllabus Computer in Astronomy
NASA Astrophysics Data System (ADS)
Hojaev, Alisher S.
2015-08-01
One of the most important and actual subjects and training courses in the curricula for undergraduate level students at the National university of Uzbekistan is ‘Computer Methods in Astronomy’. It covers two semesters and includes both lecture and practice classes. Based on the long term experience we prepared the tutorial for students which contain the description of modern computer applications in astronomy.The main directions of computer application in field of astronomy briefly as follows:1) Automating the process of observation, data acquisition and processing2) Create and store databases (the results of observations, experiments and theoretical calculations) their generalization, classification and cataloging, working with large databases3) The decisions of the theoretical problems (physical modeling, mathematical modeling of astronomical objects and phenomena, derivation of model parameters to obtain a solution of the corresponding equations, numerical simulations), appropriate software creation4) The utilization in the educational process (e-text books, presentations, virtual labs, remote education, testing), amateur astronomy and popularization of the science5) The use as a means of communication and data transfer, research result presenting and dissemination (web-journals), the creation of a virtual information system (local and global computer networks).During the classes the special attention is drawn on the practical training and individual work of students including the independent one.
Development of damage probability matrices based on Greek earthquake damage data
NASA Astrophysics Data System (ADS)
Eleftheriadou, Anastasia K.; Karabinis, Athanasios I.
2011-03-01
A comprehensive study is presented for empirical seismic vulnerability assessment of typical structural types, representative of the building stock of Southern Europe, based on a large set of damage statistics. The observational database was obtained from post-earthquake surveys carried out in the area struck by the September 7, 1999 Athens earthquake. After analysis of the collected observational data, a unified damage database has been created which comprises 180,945 damaged buildings from/after the near-field area of the earthquake. The damaged buildings are classified in specific structural types, according to the materials, seismic codes and construction techniques in Southern Europe. The seismic demand is described in terms of both the regional macroseismic intensity and the ratio α g/ a o, where α g is the maximum peak ground acceleration (PGA) of the earthquake event and a o is the unique value PGA that characterizes each municipality shown on the Greek hazard map. The relative and cumulative frequencies of the different damage states for each structural type and each intensity level are computed in terms of damage ratio. Damage probability matrices (DPMs) and vulnerability curves are obtained for specific structural types. A comparison analysis is fulfilled between the produced and the existing vulnerability models.
Ascoli, Davide; Vacchiano, Giorgio; Turco, Marco; Conedera, Marco; Drobyshev, Igor; Maringer, Janet; Motta, Renzo; Hacket-Pain, Andrew
2017-12-20
Climate teleconnections drive highly variable and synchronous seed production (masting) over large scales. Disentangling the effect of high-frequency (inter-annual variation) from low-frequency (decadal trends) components of climate oscillations will improve our understanding of masting as an ecosystem process. Using century-long observations on masting (the MASTREE database) and data on the Northern Atlantic Oscillation (NAO), we show that in the last 60 years both high-frequency summer and spring NAO, and low-frequency winter NAO components are highly correlated to continent-wide masting in European beech and Norway spruce. Relationships are weaker (non-stationary) in the early twentieth century. This finding improves our understanding on how climate variation affects large-scale synchronization of tree masting. Moreover, it supports the connection between proximate and ultimate causes of masting: indeed, large-scale features of atmospheric circulation coherently drive cues and resources for masting, as well as its evolutionary drivers, such as pollination efficiency, abundance of seed dispersers, and natural disturbance regimes.
The prevalence and clinical characteristics of punding in Parkinson's disease.
Spencer, Ashley H; Rickards, Hugh; Fasano, Alfonso; Cavanna, Andrea E
2011-03-01
Punding (the display of stereotyped, repetitive behaviors) is a relatively recently discovered feature of Parkinson's disease (PD). Little is known about the prevalence and clinical characteristics of punding in PD. In this review, four large scientific databases were comprehensively searched for literature in relation to punding prevalence and clinical correlates in the context of PD. Prevalence was found to vary greatly (between 0.34 to 14%), although there were large disparities in study populations, assessment methods, and criteria. We observed an association between punding, dopaminergic medications, and impulse control disorder. Other characteristics, which may be more common among punders, include a higher severity of dyskinesia, younger age of disease onset, longer disease duration, and male gender. More research in large clinical datasets is required in many areas before conclusions are drawn. The pathophysiology behind the punding phenomenon is also poorly understood at present, rendering it difficult to develop targeted therapy. The current mainstay of treatment is the reduction in the dose of dopaminergic medications, the evidence for other suggested therapies being purely empirical.
Fast 3D shape screening of large chemical databases through alignment-recycling
Fontaine, Fabien; Bolton, Evan; Borodina, Yulia; Bryant, Stephen H
2007-01-01
Background Large chemical databases require fast, efficient, and simple ways of looking for similar structures. Although such tasks are now fairly well resolved for graph-based similarity queries, they remain an issue for 3D approaches, particularly for those based on 3D shape overlays. Inspired by a recent technique developed to compare molecular shapes, we designed a hybrid methodology, alignment-recycling, that enables efficient retrieval and alignment of structures with similar 3D shapes. Results Using a dataset of more than one million PubChem compounds of limited size (< 28 heavy atoms) and flexibility (< 6 rotatable bonds), we obtained a set of a few thousand diverse structures covering entirely the 3D shape space of the conformers of the dataset. Transformation matrices gathered from the overlays between these diverse structures and the 3D conformer dataset allowed us to drastically (100-fold) reduce the CPU time required for shape overlay. The alignment-recycling heuristic produces results consistent with de novo alignment calculation, with better than 80% hit list overlap on average. Conclusion Overlay-based 3D methods are computationally demanding when searching large databases. Alignment-recycling reduces the CPU time to perform shape similarity searches by breaking the alignment problem into three steps: selection of diverse shapes to describe the database shape-space; overlay of the database conformers to the diverse shapes; and non-optimized overlay of query and database conformers using common reference shapes. The precomputation, required by the first two steps, is a significant cost of the method; however, once performed, querying is two orders of magnitude faster. Extensions and variations of this methodology, for example, to handle more flexible and larger small-molecules are discussed. PMID:17880744
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework
2012-01-01
Background For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources. PMID:23216909
Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework.
Lewis, Steven; Csordas, Attila; Killcoyne, Sarah; Hermjakob, Henning; Hoopmann, Michael R; Moritz, Robert L; Deutsch, Eric W; Boyle, John
2012-12-05
For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.
Intelligent Interfaces for Mining Large-Scale RNAi-HCS Image Databases
Lin, Chen; Mak, Wayne; Hong, Pengyu; Sepp, Katharine; Perrimon, Norbert
2010-01-01
Recently, High-content screening (HCS) has been combined with RNA interference (RNAi) to become an essential image-based high-throughput method for studying genes and biological networks through RNAi-induced cellular phenotype analyses. However, a genome-wide RNAi-HCS screen typically generates tens of thousands of images, most of which remain uncategorized due to the inadequacies of existing HCS image analysis tools. Until now, it still requires highly trained scientists to browse a prohibitively large RNAi-HCS image database and produce only a handful of qualitative results regarding cellular morphological phenotypes. For this reason we have developed intelligent interfaces to facilitate the application of the HCS technology in biomedical research. Our new interfaces empower biologists with computational power not only to effectively and efficiently explore large-scale RNAi-HCS image databases, but also to apply their knowledge and experience to interactive mining of cellular phenotypes using Content-Based Image Retrieval (CBIR) with Relevance Feedback (RF) techniques. PMID:21278820
Exploring Human Cognition Using Large Image Databases.
Griffiths, Thomas L; Abbott, Joshua T; Hsu, Anne S
2016-07-01
Most cognitive psychology experiments evaluate models of human cognition using a relatively small, well-controlled set of stimuli. This approach stands in contrast to current work in neuroscience, perception, and computer vision, which have begun to focus on using large databases of natural images. We argue that natural images provide a powerful tool for characterizing the statistical environment in which people operate, for better evaluating psychological theories, and for bringing the insights of cognitive science closer to real applications. We discuss how some of the challenges of using natural images as stimuli in experiments can be addressed through increased sample sizes, using representations from computer vision, and developing new experimental methods. Finally, we illustrate these points by summarizing recent work using large image databases to explore questions about human cognition in four different domains: modeling subjective randomness, defining a quantitative measure of representativeness, identifying prior knowledge used in word learning, and determining the structure of natural categories. Copyright © 2016 Cognitive Science Society, Inc.