2017-09-01
NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS DATABASE CREATION AND STATISTICAL ANALYSIS: FINDING CONNECTIONS BETWEEN TWO OR MORE SECONDARY...BLANK ii Approved for public release. Distribution is unlimited. DATABASE CREATION AND STATISTICAL ANALYSIS: FINDING CONNECTIONS BETWEEN TWO OR MORE...Problem and Motivation . . . . . . . . . . . . . . . . . . . 1 1.2 DOD Applicability . . . . . . . . . . . . . . . . .. . . . . . . 2 1.3 Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kogalovskii, M.R.
This paper presents a review of problems related to statistical database systems, which are wide-spread in various fields of activity. Statistical databases (SDB) are referred to as databases that consist of data and are used for statistical analysis. Topics under consideration are: SDB peculiarities, properties of data models adequate for SDB requirements, metadata functions, null-value problems, SDB compromise protection problems, stored data compression techniques, and statistical data representation means. Also examined is whether the present Database Management Systems (DBMS) satisfy the SDB requirements. Some actual research directions in SDB systems are considered.
Lü, Yiran; Hao, Shuxin; Zhang, Guoqing; Liu, Jie; Liu, Yue; Xu, Dongqun
2018-01-01
To implement the online statistical analysis function in information system of air pollution and health impact monitoring, and obtain the data analysis information real-time. Using the descriptive statistical method as well as time-series analysis and multivariate regression analysis, SQL language and visual tools to implement online statistical analysis based on database software. Generate basic statistical tables and summary tables of air pollution exposure and health impact data online; Generate tendency charts of each data part online and proceed interaction connecting to database; Generate butting sheets which can lead to R, SAS and SPSS directly online. The information system air pollution and health impact monitoring implements the statistical analysis function online, which can provide real-time analysis result to its users.
Assigning statistical significance to proteotypic peptides via database searches
Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo
2011-01-01
Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489
[Construction and application of special analysis database of geoherbs based on 3S technology].
Guo, Lan-ping; Huang, Lu-qi; Lv, Dong-mei; Shao, Ai-juan; Wang, Jian
2007-09-01
In this paper,the structures, data sources, data codes of "the spacial analysis database of geoherbs" based 3S technology are introduced, and the essential functions of the database, such as data management, remote sensing, spacial interpolation, spacial statistics, spacial analysis and developing are described. At last, two examples for database usage are given, the one is classification and calculating of NDVI index of remote sensing image in geoherbal area of Atractylodes lancea, the other one is adaptation analysis of A. lancea. These indicate that "the spacial analysis database of geoherbs" has bright prospect in spacial analysis of geoherbs.
A RESEARCH DATABASE FOR IMPROVED DATA MANAGEMENT AND ANALYSIS IN LONGITUDINAL STUDIES
BIELEFELD, ROGER A.; YAMASHITA, TOYOKO S.; KEREKES, EDWARD F.; ERCANLI, EHAT; SINGER, LYNN T.
2014-01-01
We developed a research database for a five-year prospective investigation of the medical, social, and developmental correlates of chronic lung disease during the first three years of life. We used the Ingres database management system and the Statit statistical software package. The database includes records containing 1300 variables each, the results of 35 psychological tests, each repeated five times (providing longitudinal data on the child, the parents, and behavioral interactions), both raw and calculated variables, and both missing and deferred values. The four-layer menu-driven user interface incorporates automatic activation of complex functions to handle data verification, missing and deferred values, static and dynamic backup, determination of calculated values, display of database status, reports, bulk data extraction, and statistical analysis. PMID:7596250
Upgrade Summer Severe Weather Tool
NASA Technical Reports Server (NTRS)
Watson, Leela
2011-01-01
The goal of this task was to upgrade to the existing severe weather database by adding observations from the 2010 warm season, update the verification dataset with results from the 2010 warm season, use statistical logistic regression analysis on the database and develop a new forecast tool. The AMU analyzed 7 stability parameters that showed the possibility of providing guidance in forecasting severe weather, calculated verification statistics for the Total Threat Score (TTS), and calculated warm season verification statistics for the 2010 season. The AMU also performed statistical logistic regression analysis on the 22-year severe weather database. The results indicated that the logistic regression equation did not show an increase in skill over the previously developed TTS. The equation showed less accuracy than TTS at predicting severe weather, little ability to distinguish between severe and non-severe weather days, and worse standard categorical accuracy measures and skill scores over TTS.
A New Methodology for Systematic Exploitation of Technology Databases.
ERIC Educational Resources Information Center
Bedecarrax, Chantal; Huot, Charles
1994-01-01
Presents the theoretical aspects of a data analysis methodology that can help transform sequential raw data from a database into useful information, using the statistical analysis of patents as an example. Topics discussed include relational analysis and a technology watch approach. (Contains 17 references.) (LRW)
Valid Statistical Analysis for Logistic Regression with Multiple Sources
NASA Astrophysics Data System (ADS)
Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.
Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.
Safety Management Information Statistics (SAMIS) - 1995 Annual Report
DOT National Transportation Integrated Search
1997-04-01
The Safety Management Information Statistics 1995 Annual Report is a compilation and analysis of transit accident, casualty and crime statistics reported under the Federal Transit Administration's National Transit Database Reporting by transit system...
Statistical Learning in Specific Language Impairment: A Meta-Analysis
ERIC Educational Resources Information Center
Lammertink, Imme; Boersma, Paul; Wijnen, Frank; Rispens, Judith
2017-01-01
Purpose: The current meta-analysis provides a quantitative overview of published and unpublished studies on statistical learning in the auditory verbal domain in people with and without specific language impairment (SLI). The database used for the meta-analysis is accessible online and open to updates (Community-Augmented Meta-Analysis), which…
NASA Astrophysics Data System (ADS)
Barette, Florian; Poppe, Sam; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2017-10-01
We present an integrated, spatially-explicit database of existing geochemical major-element analyses available from (post-) colonial scientific reports, PhD Theses and international publications for the Virunga Volcanic Province, located in the western branch of the East African Rift System. This volcanic province is characterised by alkaline volcanism, including silica-undersaturated, alkaline and potassic lavas. The database contains a total of 908 geochemical analyses of eruptive rocks for the entire volcanic province with a localisation for most samples. A preliminary analysis of the overall consistency of the database, using statistical techniques on sets of geochemical analyses with contrasted analytical methods or dates, demonstrates that the database is consistent. We applied a principal component analysis and cluster analysis on whole-rock major element compositions included in the database to study the spatial variation of the chemical composition of eruptive products in the Virunga Volcanic Province. These statistical analyses identify spatially distributed clusters of eruptive products. The known geochemical contrasts are highlighted by the spatial analysis, such as the unique geochemical signature of Nyiragongo lavas compared to other Virunga lavas, the geochemical heterogeneity of the Bulengo area, and the trachyte flows of Karisimbi volcano. Most importantly, we identified separate clusters of eruptive products which originate from primitive magmatic sources. These lavas of primitive composition are preferentially located along NE-SW inherited rift structures, often at distance from the central Virunga volcanoes. Our results illustrate the relevance of a spatial analysis on integrated geochemical data for a volcanic province, as a complement to classical petrological investigations. This approach indeed helps to characterise geochemical variations within a complex of magmatic systems and to identify specific petrologic and geochemical investigations that should be tackled within a study area.
Stucki, Sheldon Lee; Biss, David J.
2000-01-01
An analysis was performed using the National Automotive Sampling System Crashworthiness Data System (NASS-CDS) database to compare the injury/fatality rates of variously restrained driver occupants as compared to unrestrained driver occupants in the total database of drivers/frontals, and also by Delta-V. A structured search of the NASS-CDS was done using the SAS® statistical analysis software to extract the data for this analysis and the SUDAAN software package was used to arrive at statistical significance indicators. In addition, this paper goes on to investigate different methods for presenting results of accident database searches including significance results; a risk versus Delta-V format for specific exposures; and, a percent cumulative injury versus Delta-V format to characterize injury trends. These alternative analysis presentation methods are then discussed by example using the present study results. PMID:11558105
NASA Astrophysics Data System (ADS)
Hendikawati, P.; Arifudin, R.; Zahid, M. Z.
2018-03-01
This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.
The forest inventory and analysis database description and users manual version 1.0
Patrick D. Miles; Gary J. Brand; Carol L. Alerich; Larry F. Bednar; Sharon W. Woudenberg; Joseph F. Glover; Edward N. Ezell
2001-01-01
Describes the structure of the Forest Inventory and Analysis Database (FIADB) and provides information on generating estimates of forest statistics from these data. The FIADB structure provides a consistent framework for storing forest inventory data across all ownerships across the entire United States. These data are available to the public.
New software for statistical analysis of Cambridge Structural Database data
Sykes, Richard A.; McCabe, Patrick; Allen, Frank H.; Battle, Gary M.; Bruno, Ian J.; Wood, Peter A.
2011-01-01
A collection of new software tools is presented for the analysis of geometrical, chemical and crystallographic data from the Cambridge Structural Database (CSD). This software supersedes the program Vista. The new functionality is integrated into the program Mercury in order to provide statistical, charting and plotting options alongside three-dimensional structural visualization and analysis. The integration also permits immediate access to other information about specific CSD entries through the Mercury framework, a common requirement in CSD data analyses. In addition, the new software includes a range of more advanced features focused towards structural analysis such as principal components analysis, cone-angle correction in hydrogen-bond analyses and the ability to deal with topological symmetry that may be exhibited in molecular search fragments. PMID:22477784
Can Money Buy Happiness? A Statistical Analysis of Predictors for User Satisfaction
ERIC Educational Resources Information Center
Hunter, Ben; Perret, Robert
2011-01-01
2007 data from LibQUAL+[TM] and the ACRL Library Trends and Statistics database were analyzed to determine if there is a statistically significant correlation between library expenditures and usage statistics and library patron satisfaction across 73 universities. The results show that users of larger, better funded libraries have higher…
NASA Astrophysics Data System (ADS)
Karpov, A. V.; Yumagulov, E. Z.
2003-05-01
We have restored and ordered the archive of meteor observations carried out with a meteor radar complex ``KGU-M5'' since 1986. A relational database has been formed under the control of the Database Management System (DBMS) Oracle 8. We also improved and tested a statistical method for studying the fine spatial structure of meteor streams with allowance for the specific features of application of the DBMS. Statistical analysis of the results of observations made it possible to obtain information about the substance distribution in the Quadrantid, Geminid, and Perseid meteor streams.
NASA Technical Reports Server (NTRS)
Herskovits, E. H.; Megalooikonomou, V.; Davatzikos, C.; Chen, A.; Bryan, R. N.; Gerring, J. P.
1999-01-01
PURPOSE: To determine whether there is an association between the spatial distribution of lesions detected at magnetic resonance (MR) imaging of the brain in children after closed-head injury and the development of secondary attention-deficit/hyperactivity disorder (ADHD). MATERIALS AND METHODS: Data obtained from 76 children without prior history of ADHD were analyzed. MR images were obtained 3 months after closed-head injury. After manual delineation of lesions, images were registered to the Talairach coordinate system. For each subject, registered images and secondary ADHD status were integrated into a brain-image database, which contains depiction (visualization) and statistical analysis software. Using this database, we assessed visually the spatial distributions of lesions and performed statistical analysis of image and clinical variables. RESULTS: Of the 76 children, 15 developed secondary ADHD. Depiction of the data suggested that children who developed secondary ADHD had more lesions in the right putamen than children who did not develop secondary ADHD; this impression was confirmed statistically. After Bonferroni correction, we could not demonstrate significant differences between secondary ADHD status and lesion burdens for the right caudate nucleus or the right globus pallidus. CONCLUSION: Closed-head injury-induced lesions in the right putamen in children are associated with subsequent development of secondary ADHD. Depiction software is useful in guiding statistical analysis of image data.
DOT National Transportation Integrated Search
2014-12-01
The Bureau of Transportation Statistics (BTS) leads in the collection, analysis, and dissemination of transportation data. The Intermodal Passenger Connectivity Database : (ICPD) is an ongoing data collection that measures the degree of connectivity ...
Analysis of the Database of Theses and Dissertations from DME/UFSCAR about Astronomy Education
NASA Astrophysics Data System (ADS)
Rodrigues Ferreira, Orlando; Voelzke, Marcos Rincon
2013-11-01
The paper presents a brief analysis of the "Database of Theses and Dissertations about Astronomy Education" from the Department of Teaching Methodology (DME) of the Federal University of São Carlos(UFSCar). This kind of study made it possible to develop new analysis and statistical data, as well as to conduct a rating of Brazilian institutions that produce academic work in the area.
Mallik, Saurav; Maulik, Ujjwal
2015-10-01
Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.
Database Performance Monitoring for the Photovoltaic Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klise, Katherine A.
The Database Performance Monitoring (DPM) software (copyright in processes) is being developed at Sandia National Laboratories to perform quality control analysis on time series data. The software loads time indexed databases (currently csv format), performs a series of quality control tests defined by the user, and creates reports which include summary statistics, tables, and graphics. DPM can be setup to run on an automated schedule defined by the user. For example, the software can be run once per day to analyze data collected on the previous day. HTML formatted reports can be sent via email or hosted on a website.more » To compare performance of several databases, summary statistics and graphics can be gathered in a dashboard view which links to detailed reporting information for each database. The software can be customized for specific applications.« less
Evaluating the Impact of Database Heterogeneity on Observational Study Results
Madigan, David; Ryan, Patrick B.; Schuemie, Martijn; Stang, Paul E.; Overhage, J. Marc; Hartzema, Abraham G.; Suchard, Marc A.; DuMouchel, William; Berlin, Jesse A.
2013-01-01
Clinical studies that use observational databases to evaluate the effects of medical products have become commonplace. Such studies begin by selecting a particular database, a decision that published papers invariably report but do not discuss. Studies of the same issue in different databases, however, can and do generate different results, sometimes with strikingly different clinical implications. In this paper, we systematically study heterogeneity among databases, holding other study methods constant, by exploring relative risk estimates for 53 drug-outcome pairs and 2 widely used study designs (cohort studies and self-controlled case series) across 10 observational databases. When holding the study design constant, our analysis shows that estimated relative risks range from a statistically significant decreased risk to a statistically significant increased risk in 11 of 53 (21%) of drug-outcome pairs that use a cohort design and 19 of 53 (36%) of drug-outcome pairs that use a self-controlled case series design. This exceeds the proportion of pairs that were consistent across databases in both direction and statistical significance, which was 9 of 53 (17%) for cohort studies and 5 of 53 (9%) for self-controlled case series. Our findings show that clinical studies that use observational databases can be sensitive to the choice of database. More attention is needed to consider how the choice of data source may be affecting results. PMID:23648805
Titulaer, Mark K; Siccama, Ivar; Dekker, Lennard J; van Rijswijk, Angelique LCT; Heeren, Ron MA; Sillevis Smitt, Peter A; Luider, Theo M
2006-01-01
Background Statistical comparison of peptide profiles in biomarker discovery requires fast, user-friendly software for high throughput data analysis. Important features are flexibility in changing input variables and statistical analysis of peptides that are differentially expressed between patient and control groups. In addition, integration the mass spectrometry data with the results of other experiments, such as microarray analysis, and information from other databases requires a central storage of the profile matrix, where protein id's can be added to peptide masses of interest. Results A new database application is presented, to detect and identify significantly differentially expressed peptides in peptide profiles obtained from body fluids of patient and control groups. The presented modular software is capable of central storage of mass spectra and results in fast analysis. The software architecture consists of 4 pillars, 1) a Graphical User Interface written in Java, 2) a MySQL database, which contains all metadata, such as experiment numbers and sample codes, 3) a FTP (File Transport Protocol) server to store all raw mass spectrometry files and processed data, and 4) the software package R, which is used for modular statistical calculations, such as the Wilcoxon-Mann-Whitney rank sum test. Statistic analysis by the Wilcoxon-Mann-Whitney test in R demonstrates that peptide-profiles of two patient groups 1) breast cancer patients with leptomeningeal metastases and 2) prostate cancer patients in end stage disease can be distinguished from those of control groups. Conclusion The database application is capable to distinguish patient Matrix Assisted Laser Desorption Ionization (MALDI-TOF) peptide profiles from control groups using large size datasets. The modular architecture of the application makes it possible to adapt the application to handle also large sized data from MS/MS- and Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometry experiments. It is expected that the higher resolution and mass accuracy of the FT-ICR mass spectrometry prevents the clustering of peaks of different peptides and allows the identification of differentially expressed proteins from the peptide profiles. PMID:16953879
Titulaer, Mark K; Siccama, Ivar; Dekker, Lennard J; van Rijswijk, Angelique L C T; Heeren, Ron M A; Sillevis Smitt, Peter A; Luider, Theo M
2006-09-05
Statistical comparison of peptide profiles in biomarker discovery requires fast, user-friendly software for high throughput data analysis. Important features are flexibility in changing input variables and statistical analysis of peptides that are differentially expressed between patient and control groups. In addition, integration the mass spectrometry data with the results of other experiments, such as microarray analysis, and information from other databases requires a central storage of the profile matrix, where protein id's can be added to peptide masses of interest. A new database application is presented, to detect and identify significantly differentially expressed peptides in peptide profiles obtained from body fluids of patient and control groups. The presented modular software is capable of central storage of mass spectra and results in fast analysis. The software architecture consists of 4 pillars, 1) a Graphical User Interface written in Java, 2) a MySQL database, which contains all metadata, such as experiment numbers and sample codes, 3) a FTP (File Transport Protocol) server to store all raw mass spectrometry files and processed data, and 4) the software package R, which is used for modular statistical calculations, such as the Wilcoxon-Mann-Whitney rank sum test. Statistic analysis by the Wilcoxon-Mann-Whitney test in R demonstrates that peptide-profiles of two patient groups 1) breast cancer patients with leptomeningeal metastases and 2) prostate cancer patients in end stage disease can be distinguished from those of control groups. The database application is capable to distinguish patient Matrix Assisted Laser Desorption Ionization (MALDI-TOF) peptide profiles from control groups using large size datasets. The modular architecture of the application makes it possible to adapt the application to handle also large sized data from MS/MS- and Fourier Transform Ion Cyclotron Resonance (FT-ICR) mass spectrometry experiments. It is expected that the higher resolution and mass accuracy of the FT-ICR mass spectrometry prevents the clustering of peaks of different peptides and allows the identification of differentially expressed proteins from the peptide profiles.
Aßmann, C
2016-06-01
Besides large efforts regarding field work, provision of valid databases requires statistical and informational infrastructure to enable long-term access to longitudinal data sets on height, weight and related issues. To foster use of longitudinal data sets within the scientific community, provision of valid databases has to address data-protection regulations. It is, therefore, of major importance to hinder identifiability of individuals from publicly available databases. To reach this goal, one possible strategy is to provide a synthetic database to the public allowing for pretesting strategies for data analysis. The synthetic databases can be established using multiple imputation tools. Given the approval of the strategy, verification is based on the original data. Multiple imputation by chained equations is illustrated to facilitate provision of synthetic databases as it allows for capturing a wide range of statistical interdependencies. Also missing values, typically occurring within longitudinal databases for reasons of item non-response, can be addressed via multiple imputation when providing databases. The provision of synthetic databases using multiple imputation techniques is one possible strategy to ensure data protection, increase visibility of longitudinal databases and enhance the analytical potential.
Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle
NASA Astrophysics Data System (ADS)
Huh, Lynn
The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.
Establishment and Assessment of Plasma Disruption and Warning Databases from EAST
NASA Astrophysics Data System (ADS)
Wang, Bo; Robert, Granetz; Xiao, Bingjia; Li, Jiangang; Yang, Fei; Li, Junjun; Chen, Dalong
2016-12-01
Disruption database and disruption warning database of the EAST tokamak had been established by a disruption research group. The disruption database, based on Structured Query Language (SQL), comprises 41 disruption parameters, which include current quench characteristics, EFIT equilibrium characteristics, kinetic parameters, halo currents, and vertical motion. Presently most disruption databases are based on plasma experiments of non-superconducting tokamak devices. The purposes of the EAST database are to find disruption characteristics and disruption statistics to the fully superconducting tokamak EAST, to elucidate the physics underlying tokamak disruptions, to explore the influence of disruption on superconducting magnets and to extrapolate toward future burning plasma devices. In order to quantitatively assess the usefulness of various plasma parameters for predicting disruptions, a similar SQL database to Alcator C-Mod for EAST has been created by compiling values for a number of proposed disruption-relevant parameters sampled from all plasma discharges in the 2015 campaign. The detailed statistic results and analysis of two databases on the EAST tokamak are presented. supported by the National Magnetic Confinement Fusion Science Program of China (No. 2014GB103000)
Developing and Refining the Taiwan Birth Cohort Study (TBCS): Five Years of Experience
ERIC Educational Resources Information Center
Lung, For-Wey; Chiang, Tung-Liang; Lin, Shio-Jean; Shu, Bih-Ching; Lee, Meng-Chih
2011-01-01
The Taiwan Birth Cohort Study (TBCS) is the first nationwide birth cohort database in Asia designed to establish national norms of children's development. Several challenges during database development and data analysis were identified. Challenges include sampling methods, instrument development and statistical approach to missing data. The…
ERIC Educational Resources Information Center
Zhou, Ping; Wang, Qinwen; Yang, Jie; Li, Jingqiu; Guo, Junming; Gong, Zhaohui
2015-01-01
This study aimed to investigate the statuses on the publishing and usage of college biochemistry textbooks in China. A textbook database was constructed and the statistical analysis was adopted to evaluate the textbooks. The results showed that there were 945 (~57%) books for theory teaching, 379 (~23%) books for experiment teaching and 331 (~20%)…
Statistical significance of trace evidence matches using independent physicochemical measurements
NASA Astrophysics Data System (ADS)
Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George
1997-02-01
A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.
mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
mESAdb: microRNA expression and sequence analysis database.
Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-01-01
Purpose: With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. Methods: A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. Results: The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. Conclusions: The work demonstrates the viability of the design approach and the software tool for analysis of large data sets. PMID:24320426
Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard
2013-11-01
With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.
NASA Astrophysics Data System (ADS)
Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab
2017-11-01
Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.
Zhao, Wenbo; Tu, Chongqi; Zhang, Hui; Fang, Yue; Wang, Guanglin; Liu, Lei
2014-04-01
To compare the effects and security between internal fixation and total hip arthroplasty for the patients in elderly with femoral neck fracture of displacement type through a meta analysis. Studies on comparison between internal fixation and total hip arthroplasty for the patients in the elderly with femoral neck fracture of displacement type were identified from PubMed database,EMBase database, COCHRANE library, CMB database, CNKI database and MEDLINE database. Data analysis were performed using Revman 5.2.6(the Cochrane Collaboration). Six published randomized controlled trials including 627 patients were suitable for the review, 286 cases in internal fixation group and 341 cases in total hip arthroplasty group. The results of meta analysis indicated that statistically significant difference were observed between the two groups in the quality of life which was reflected by the Harris scale (RR = 0.82, 95%CI:0.72-0.93, P < 0.05) , the reoperation rate (RR = 5.81, 95%CI:3.09-10.95, P < 0.05) and the major complications rate (RR = 3.60, 95%CI:2.29-5.67, P < 0.05) postoperatively. There were no difference in the mortality at 1 year and 5 years postoperatively(P > 0.05). For the patients with femoral neck fracture of displacement type in the elderly, there is no statistical difference between two groups in the mortality postoperatively. The quality of life and the security of operation in internal fixation group is worse than the total hip arthroplasty group.
ERIC Educational Resources Information Center
Wisniewski, Janusz L.
1986-01-01
Discussion of a new method of index term dictionary compression in an inverted-file-oriented database highlights a technique of word coding, which generates short fixed-length codes obtained from the index terms themselves by analysis of monogram and bigram statistical distributions. Substantial savings in communication channel utilization are…
Clinical study of the Erlanger silver catheter--data management and biometry.
Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P
1999-01-01
The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.
Security of statistical data bases: invasion of privacy through attribute correlational modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palley, M.A.
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
[A SAS marco program for batch processing of univariate Cox regression analysis for great database].
Yang, Rendong; Xiong, Jie; Peng, Yangqin; Peng, Xiaoning; Zeng, Xiaomin
2015-02-01
To realize batch processing of univariate Cox regression analysis for great database by SAS marco program. We wrote a SAS macro program, which can filter, integrate, and export P values to Excel by SAS9.2. The program was used for screening survival correlated RNA molecules of ovarian cancer. A SAS marco program could finish the batch processing of univariate Cox regression analysis, the selection and export of the results. The SAS macro program has potential applications in reducing the workload of statistical analysis and providing a basis for batch processing of univariate Cox regression analysis.
Acoustic fill factors for a 120 inch diameter fairing
NASA Technical Reports Server (NTRS)
Lee, Y. Albert
1992-01-01
Data from the acoustic test of a 120-inch diameter payload fairing were collected and an analysis of acoustic fill factors were performed. Correction factors for obtaining a weighted spatial average of the interior sound pressure level (SPL) were derived based on this database and a normalized 200-inch diameter fairing database. The weighted fill factors were determined and compared with statistical energy analysis (VAPEPS code) derived fill factors. The comparison is found to be reasonable.
An experimental investigation of masking in the US FDA adverse event reporting system database.
Wang, Hsin-wei; Hochberg, Alan M; Pearson, Ronald K; Hauben, Manfred
2010-12-01
A phenomenon of 'masking' or 'cloaking' in pharmacovigilance data mining has been described, which can potentially cause signals of disproportionate reporting (SDRs) to be missed, particularly in pharmaceutical company databases. Masking has been predicted theoretically, observed anecdotally or studied to a limited extent in both pharmaceutical company and health authority databases, but no previous publication systematically assesses its occurrence in a large health authority database. To explore the nature, extent and possible consequences of masking in the US FDA Adverse Event Reporting System (AERS) database by applying various experimental unmasking protocols to a set of drugs and events representing realistic pharmacovigilance analysis conditions. This study employed AERS data from 2001 through 2005. For a set of 63 Medical Dictionary for Regulatory Activities (MedDRA®) Preferred Terms (PTs), disproportionality analysis was carried out with respect to all drugs included in the AERS database, using a previously described urn-model-based algorithm. We specifically sought masking in which drug removal induced an increase in the statistical representation of a drug-event combination (DEC) that resulted in the emergence of a new SDR. We performed a series of unmasking experiments selecting drugs for removal using rational statistical decision rules based on the requirement of a reporting ratio (RR) >1, top-ranked statistical unexpectedness (SU) and relatedness as reflected in the WHO Anatomical Therapeutic Chemical level 4 (ATC4) grouping. In order to assess the possible extent of residual masking we performed two supplemental purely empirical analyses on a limited subset of data. This entailed testing every drug and drug group to determine which was most influential in uncovering masked SDRs. We assessed the strength of external evidence for a causal association for a small number of masked SDRs involving a subset of 29 drugs for which level of evidence adjudication was available from a previous study. The original disproportionality analysis identified 8719 SDRs for the 63 PTs. The SU-based unmasking protocols generated variable numbers of masked SDRs ranging from 38 to 156, representing a 0.43-1.8% increase over the number of baseline SDRs. A significant number of baseline SDRs were also lost in the course of our experiments. The trend in the number of gained SDRs per report removed was inversely related to the number of lost SDRs per protocol. Both the number and nature of the reports removed influenced the number of gained SDRs observed. The purely empirical protocols unmasked up to ten times as many SDRs. None of the masked SDRs had strong external evidence supporting a causal association. Most involved associations for which there was no external supporting evidence or were in the original product label. For two masked SDRs, there was external evidence of a possible causal association. We documented masking in the FDA AERS database. Attempts at unmasking SDRs using practically implementable protocols produced only small changes in the output of SDRs in our analysis. This is undoubtedly related to the large size and diversity of the database, but the complex interdependencies between drugs and events in authentic spontaneous reporting system (SRS) databases, and the impact of measures of statistical variability that are typically used in real-world disproportionality analysis, may be additional factors that constrain the discovery of masked SDRs and which may also operate in pharmaceutical company databases. Empirical determination of the most influential drugs may uncover significantly more SDRs than protocols based on predetermined statistical selection rules but are impractical except possibly for evaluating specific events. Routine global exercises to elicit masking, especially in large health authority databases are not justified based on results available to date. Exercises to elicit unmasking should be driven by prior knowledge or obvious data imbalances.
Egorova, K.S.; Kondakova, A.N.; Toukach, Ph.V.
2015-01-01
Carbohydrates are biological blocks participating in diverse and crucial processes both at cellular and organism levels. They protect individual cells, establish intracellular interactions, take part in the immune reaction and participate in many other processes. Glycosylation is considered as one of the most important modifications of proteins and other biologically active molecules. Still, the data on the enzymatic machinery involved in the carbohydrate synthesis and processing are scattered, and the advance on its study is hindered by the vast bulk of accumulated genetic information not supported by any experimental evidences for functions of proteins that are encoded by these genes. In this article, we present novel instruments for statistical analysis of glycomes in taxa. These tools may be helpful for investigating carbohydrate-related enzymatic activities in various groups of organisms and for comparison of their carbohydrate content. The instruments are developed on the Carbohydrate Structure Database (CSDB) platform and are available freely on the CSDB web-site at http://csdb.glycoscience.ru. Database URL: http://csdb.glycoscience.ru PMID:26337239
EHME: a new word database for research in Basque language.
Acha, Joana; Laka, Itziar; Landa, Josu; Salaburu, Pello
2014-11-14
This article presents EHME, the frequency dictionary of Basque structure, an online program that enables researchers in psycholinguistics to extract word and nonword stimuli, based on a broad range of statistics concerning the properties of Basque words. The database consists of 22.7 million tokens, and properties available include morphological structure frequency and word-similarity measures, apart from classical indexes: word frequency, orthographic structure, orthographic similarity, bigram and biphone frequency, and syllable-based measures. Measures are indexed at the lemma, morpheme and word level. We include reliability and validation analysis. The application is freely available, and enables the user to extract words based on concrete statistical criteria 1 , as well as to obtain statistical characteristics from a list of words
The construction and assessment of a statistical model for the prediction of protein assay data.
Pittman, J; Sacks, J; Young, S Stanley
2002-01-01
The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.
Sources of Safety Data and Statistical Strategies for Design and Analysis: Postmarket Surveillance.
Izem, Rima; Sanchez-Kam, Matilde; Ma, Haijun; Zink, Richard; Zhao, Yueqin
2018-03-01
Safety data are continuously evaluated throughout the life cycle of a medical product to accurately assess and characterize the risks associated with the product. The knowledge about a medical product's safety profile continually evolves as safety data accumulate. This paper discusses data sources and analysis considerations for safety signal detection after a medical product is approved for marketing. This manuscript is the second in a series of papers from the American Statistical Association Biopharmaceutical Section Safety Working Group. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from passive postmarketing surveillance systems compared to other sources. Signal detection has traditionally relied on spontaneous reporting databases that have been available worldwide for decades. However, current regulatory guidelines and ease of reporting have increased the size of these databases exponentially over the last few years. With such large databases, data-mining tools using disproportionality analysis and helpful graphics are often used to detect potential signals. Although the data sources have many limitations, analyses of these data have been successful at identifying safety signals postmarketing. Experience analyzing these dynamic data is useful in understanding the potential and limitations of analyses with new data sources such as social media, claims, or electronic medical records data.
Specifying the ISS Plasma Environment
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Diekmann, Anne; Neergaard, Linda; Bui, Them; Mikatarian, Ronald; Barsamian, Hagop; Koontz, Steven
2002-01-01
Quantifying the spacecraft charging risks and corresponding hazards for the International Space Station (ISS) requires a plasma environment specification describing the natural variability of ionospheric temperature (Te) and density (Ne). Empirical ionospheric specification and forecast models such as the International Reference Ionosphere (IN) model typically only provide estimates of long term (seasonal) mean Te and Ne values for the low Earth orbit environment. Knowledge of the Te and Ne variability as well as the likelihood of extreme deviations from the mean values are required to estimate both the magnitude and frequency of occurrence of potentially hazardous spacecraft charging environments for a given ISS construction stage and flight configuration. This paper describes the statistical analysis of historical ionospheric low Earth orbit plasma measurements used to estimate Ne, Te variability in the ISS flight environment. The statistical variability analysis of Ne and Te enables calculation of the expected frequency of occurrence of any particular values of Ne and Te, especially those that correspond to possibly hazardous spacecraft charging environments. The database used in the original analysis included measurements from the AE-C, AE-D, and DE-2 satellites. Recent work on the database has added additional satellites to the database and ground based incoherent scatter radar observations as well. Deviations of the data values from the IRI estimated Ne, Te parameters for each data point provide a statistical basis for modeling the deviations of the plasma environment from the IRI model output.
Transport Statistics - Transport - UNECE
Statistics and Data Online Infocards Database SDG Papers E-Road Census Traffic Census Map Traffic Census 2015 available. Two new datasets have been added to the transport statistics database: bus and coach statistics Database Evaluations Follow UNECE Facebook Rss Twitter You tube Contact us Instagram Flickr Google+ Â
Compression technique for large statistical data bases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eggers, S.J.; Olken, F.; Shoshani, A.
1981-03-01
The compression of large statistical databases is explored and are proposed for organizing the compressed data, such that the time required to access the data is logarithmic. The techniques exploit special characteristics of statistical databases, namely, variation in the space required for the natural encoding of integer attributes, a prevalence of a few repeating values or constants, and the clustering of both data of the same length and constants in long, separate series. The techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which ismore » used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. The details of the compression scheme and its implementation are discussed, several special cases are presented, and an analysis is given of the relative performance of the various versions.« less
, microquasars, neutron stars, pulsars, black holes astro-ph.IM - Instrumentation and Methods for Astrophysics Astrophysics. Methods for data analysis, statistical methods. Software, database design astro-ph.SR - Solar and
Scabbio, Camilla; Zoccarato, Orazio; Malaspina, Simona; Lucignani, Giovanni; Del Sole, Angelo; Lecchi, Michela
2017-10-17
To evaluate the impact of non-specific normal databases on the percent summed rest score (SR%) and stress score (SS%) from simulated low-dose SPECT studies by shortening the acquisition time/projection. Forty normal-weight and 40 overweight/obese patients underwent myocardial studies with a conventional gamma-camera (BrightView, Philips) using three different acquisition times/projection: 30, 15, and 8 s (100%-counts, 50%-counts, and 25%-counts scan, respectively) and reconstructed using the iterative algorithm with resolution recovery (IRR) Astonish TM (Philips). Three sets of normal databases were used: (1) full-counts IRR; (2) half-counts IRR; and (3) full-counts traditional reconstruction algorithm database (TRAD). The impact of these databases and the acquired count statistics on the SR% and SS% was assessed by ANOVA analysis and Tukey test (P < 0.05). Significantly higher SR% and SS% values (> 40%) were found for the full-counts TRAD databases respect to the IRR databases. For overweight/obese patients, significantly higher SS% values for 25%-counts scans (+19%) are confirmed compared to those of 50%-counts scan, independently of using the half-counts or the full-counts IRR databases. Astonish TM requires the adoption of the own specific normal databases in order to prevent very high overestimation of both stress and rest perfusion scores. Conversely, the count statistics of the normal databases seems not to influence the quantification scores.
DESIGNING ENVIRONMENTAL MONITORING DATABASES FOR STATISTIC ASSESSMENT
Databases designed for statistical analyses have characteristics that distinguish them from databases intended for general use. EMAP uses a probabilistic sampling design to collect data to produce statistical assessments of environmental conditions. In addition to supporting the ...
Diagnostic Value of Serum YKL-40 Level for Coronary Artery Disease: A Meta-Analysis.
Song, Chun-Li; Bin-Li; Diao, Hong-Ying; Wang, Jiang-Hua; Shi, Yong-fei; Lu, Yang; Wang, Guan; Guo, Zi-Yuan; Li, Yang-Xue; Liu, Jian-Gen; Wang, Jin-Peng; Zhang, Ji-Chang; Zhao, Zhuo; Liu, Yi-Hang; Li, Ying; Cai, Dan; Li, Qian
2016-01-01
This meta-analysis aimed to identify the value of serum YKL-40 level for the diagnosis of coronary artery disease (CAD). Through searching the following electronic databases: the Cochrane Library Database (Issue 12, 2013), Web of Science (1945 ∼ 2013), PubMed (1966 ∼ 2013), CINAHL (1982 ∼ 2013), EMBASE (1980 ∼ 2013), and the Chinese Biomedical Database (CBM; 1982 ∼ 2013), related articles were determined without any language restrictions. STATA statistical software (Version 12.0, Stata Corporation, College Station, TX) was chosen to deal with statistical data. Standard mean difference (SMD) and its corresponding 95% confidence interval (95% CI) were calculated. Eleven clinical case-control studies that recruited 1,175 CAD patients and 1,261 healthy controls were selected for statistical analysis. The main findings of our meta-analysis showed that serum YKL-40 level in CAD patients was significantly higher than that in control subjects (SMD = 2.79, 95% CI = 1.73 ∼ 3.85, P < 0.001). Ethnicity-stratified analysis indicated a higher serum YKL-40 level in CAD patients than control subjects among China, Korea, and Denmark populations (China: SMD = 2.97, 95% CI = 1.21 ∼ 4.74, P = 0.001; Korea: SMD = 0.66, 95% CI = 0.17 ∼ 1.15, P = 0.008; Denmark: SMD = 1.85, 95% CI = 1.42 ∼ 2.29, P < 0.001; respectively), but not in Turkey (SMD = 4.52, 95% CI = -2.87 ∼ 11.91, P = 0.231). The present meta-analysis suggests that an elevated serum YKL-40 level may be used as a promising diagnostic tool for early identification of CAD.
Hoijemberg, Pablo A; Pelczer, István
2018-01-05
A lot of time is spent by researchers in the identification of metabolites in NMR-based metabolomic studies. The usual metabolite identification starts employing public or commercial databases to match chemical shifts thought to belong to a given compound. Statistical total correlation spectroscopy (STOCSY), in use for more than a decade, speeds the process by finding statistical correlations among peaks, being able to create a better peak list as input for the database query. However, the (normally not automated) analysis becomes challenging due to the intrinsic issue of peak overlap, where correlations of more than one compound appear in the STOCSY trace. Here we present a fully automated methodology that analyzes all STOCSY traces at once (every peak is chosen as driver peak) and overcomes the peak overlap obstacle. Peak overlap detection by clustering analysis and sorting of traces (POD-CAST) first creates an overlap matrix from the STOCSY traces, then clusters the overlap traces based on their similarity and finally calculates a cumulative overlap index (COI) to account for both strong and intermediate correlations. This information is gathered in one plot to help the user identify the groups of peaks that would belong to a single molecule and perform a more reliable database query. The simultaneous examination of all traces reduces the time of analysis, compared to viewing STOCSY traces by pairs or small groups, and condenses the redundant information in the 2D STOCSY matrix into bands containing similar traces. The COI helps in the detection of overlapping peaks, which can be added to the peak list from another cross-correlated band. POD-CAST overcomes the generally overlooked and underestimated presence of overlapping peaks and it detects them to include them in the search of all compounds contributing to the peak overlap, enabling the user to accelerate the metabolite identification process with more successful database queries and searching all tentative compounds in the sample set.
A New Paradigm to Analyze Data Completeness of Patient Data.
Nasir, Ayan; Gurupur, Varadraj; Liu, Xinliang
2016-08-03
There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data.
A New Paradigm to Analyze Data Completeness of Patient Data
Nasir, Ayan; Liu, Xinliang
2016-01-01
Summary Background There is a need to develop a tool that will measure data completeness of patient records using sophisticated statistical metrics. Patient data integrity is important in providing timely and appropriate care. Completeness is an important step, with an emphasis on understanding the complex relationships between data fields and their relative importance in delivering care. This tool will not only help understand where data problems are but also help uncover the underlying issues behind them. Objectives Develop a tool that can be used alongside a variety of health care database software packages to determine the completeness of individual patient records as well as aggregate patient records across health care centers and subpopulations. Methods The methodology of this project is encapsulated within the Data Completeness Analysis Package (DCAP) tool, with the major components including concept mapping, CSV parsing, and statistical analysis. Results The results from testing DCAP with Healthcare Cost and Utilization Project (HCUP) State Inpatient Database (SID) data show that this tool is successful in identifying relative data completeness at the patient, subpopulation, and database levels. These results also solidify a need for further analysis and call for hypothesis driven research to find underlying causes for data incompleteness. Conclusion DCAP examines patient records and generates statistics that can be used to determine the completeness of individual patient data as well as the general thoroughness of record keeping in a medical database. DCAP uses a component that is customized to the settings of the software package used for storing patient data as well as a Comma Separated Values (CSV) file parser to determine the appropriate measurements. DCAP itself is assessed through a proof of concept exercise using hypothetical data as well as available HCUP SID patient data. PMID:27484918
New features added to EVALIDator: ratio estimation and county choropleth maps
Patrick D. Miles; Mark H. Hansen
2012-01-01
The EVALIDator Web application, developed in 2007, provides estimates and sampling errors for many user selected forest statistics from the Forest Inventory and Analysis Database (FIADB). Among the statistics estimated are forest area, number of trees, biomass, volume, growth, removals, and mortality. A new release of EVALIDator, developed in 2012, has an option to...
Batch reporting of forest inventory statistics using the EVALIDator
Patrick D. Miles
2015-01-01
The EVALIDator Web application, developed in 2007, provides estimates and sampling errors of forest statistics (e.g., forest area, number of trees, tree biomass) from data stored in the Forest Inventory and Analysis database. In response to user demand, new features have been added to the EVALIDator. The most recent additions are 1) the ability to generate multiple...
Bowden, Peter; Beavis, Ron; Marshall, John
2009-11-02
A goodness of fit test may be used to assign tandem mass spectra of peptides to amino acid sequences and to directly calculate the expected probability of mis-identification. The product of the peptide expectation values directly yields the probability that the parent protein has been mis-identified. A relational database could capture the mass spectral data, the best fit results, and permit subsequent calculations by a general statistical analysis system. The many files of the Hupo blood protein data correlated by X!TANDEM against the proteins of ENSEMBL were collected into a relational database. A redundant set of 247,077 proteins and peptides were correlated by X!TANDEM, and that was collapsed to a set of 34,956 peptides from 13,379 distinct proteins. About 6875 distinct proteins were only represented by a single distinct peptide, 2866 proteins showed 2 distinct peptides, and 3454 proteins showed at least three distinct peptides by X!TANDEM. More than 99% of the peptides were associated with proteins that had cumulative expectation values, i.e. probability of false positive identification, of one in one hundred or less. The distribution of peptides per protein from X!TANDEM was significantly different than those expected from random assignment of peptides.
On the frequency-magnitude distribution of converging boundaries
NASA Astrophysics Data System (ADS)
Marzocchi, W.; Laura, S.; Heuret, A.; Funiciello, F.
2011-12-01
The occurrence of the last mega-thrust earthquake in Japan has clearly remarked the high risk posed to society by such events in terms of social and economic losses even at large spatial scale. The primary component for a balanced and objective mitigation of the impact of these earthquakes is the correct forecast of where such kind of events may occur in the future. To date, there is a wide range of opinions about where mega-thrust earthquakes can occur. Here, we aim at presenting some detailed statistical analysis of a database of worldwide interplate earthquakes occurring at current subduction zones. The database has been recently published in the framework of the EURYI Project 'Convergent margins and seismogenesis: defining the risk of great earthquakes by using statistical data and modelling', and it provides a unique opportunity to explore in detail the seismogenic process in subducting lithosphere. In particular, the statistical analysis of this database allows us to explore many interesting scientific issues such as the existence of different frequency-magnitude distributions across the trenches, the quantitative characterization of subduction zones that are able to produce more likely mega-thrust earthquakes, the prominent features that characterize converging boundaries with different seismic activity and so on. Besides the scientific importance, such issues may lead to improve our mega-thrust earthquake forecasting capability.
Using SQL Databases for Sequence Similarity Searching and Analysis.
Pearson, William R; Mackey, Aaron J
2017-09-13
Relational databases can integrate diverse types of information and manage large sets of similarity search results, greatly simplifying genome-scale analyses. By focusing on taxonomic subsets of sequences, relational databases can reduce the size and redundancy of sequence libraries and improve the statistical significance of homologs. In addition, by loading similarity search results into a relational database, it becomes possible to explore and summarize the relationships between all of the proteins in an organism and those in other biological kingdoms. This unit describes how to use relational databases to improve the efficiency of sequence similarity searching and demonstrates various large-scale genomic analyses of homology-related data. It also describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. The unit also introduces search_demo, a database that stores sequence similarity search results. The search_demo database is then used to explore the evolutionary relationships between E. coli proteins and proteins in other organisms in a large-scale comparative genomic analysis. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Trimarchi, Matteo; Lund, Valerie J; Nicolai, Piero; Pini, Massimiliano; Senna, Massimo; Howard, David J
2004-04-01
The Neoplasms of the Sinonasal Tract software package (NSNT v 1.0) implements a complete visual database for patients with sinonasal neoplasia, facilitating standardization of data and statistical analysis. The software, which is compatible with the Macintosh and Windows platforms, provides multiuser application with a dedicated server (on Windows NT or 2000 or Macintosh OS 9 or X and a network of clients) together with web access, if required. The system hardware consists of an Apple Power Macintosh G4500 MHz computer with PCI bus, 256 Mb of RAM plus 60 Gb hard disk, or any IBM-compatible computer with a Pentium 2 processor. Image acquisition may be performed with different frame-grabber cards for analog or digital video input of different standards (PAL, SECAM, or NTSC) and levels of quality (VHS, S-VHS, Betacam, Mini DV, DV). The visual database is based on 4th Dimension by 4D Inc, and video compression is made in real-time MPEG format. Six sections have been developed: demographics, symptoms, extent of disease, radiology, treatment, and follow-up. Acquisition of data includes computed tomography and magnetic resonance imaging, histology, and endoscopy images, allowing sequential comparison. Statistical analysis integral to the program provides Kaplan-Meier survival curves. The development of a dedicated, user-friendly database for sinonasal neoplasia facilitates a multicenter network and has obvious clinical and research benefits.
On-Line Analysis of Southern FIA Data
Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch
2006-01-01
The Southern On-Line Estimator (SOLE) is a web-based FIA database analysis tool designed with an emphasis on modularity. The Java-based user interface is simple and intuitive to use and the R-based analysis engine is fast and stable. Each component of the program (data retrieval, statistical analysis and output) can be individually modified to accommodate major...
[Electronic poison information management system].
Kabata, Piotr; Waldman, Wojciech; Kaletha, Krystian; Sein Anand, Jacek
2013-01-01
We describe deployment of electronic toxicological information database in poison control center of Pomeranian Center of Toxicology. System was based on Google Apps technology, by Google Inc., using electronic, web-based forms and data tables. During first 6 months from system deployment, we used it to archive 1471 poisoning cases, prepare monthly poisoning reports and facilitate statistical analysis of data. Electronic database usage made Poison Center work much easier.
Web 2.0 in the Professional LIS Literature: An Exploratory Analysis
ERIC Educational Resources Information Center
Aharony, Noa
2011-01-01
This paper presents a statistical descriptive analysis and a thorough content analysis of descriptors and journal titles extracted from the Library and Information Science Abstracts (LISA) database, focusing on the subject of Web 2.0 and its main applications: blog, wiki, social network and tags.The primary research questions include: whether the…
DHLAS: A web-based information system for statistical genetic analysis of HLA population data.
Thriskos, P; Zintzaras, E; Germenis, A
2007-03-01
DHLAS (database HLA system) is a user-friendly, web-based information system for the analysis of human leukocyte antigens (HLA) data from population studies. DHLAS has been developed using JAVA and the R system, it runs on a Java Virtual Machine and its user-interface is web-based powered by the servlet engine TOMCAT. It utilizes STRUTS, a Model-View-Controller framework and uses several GNU packages to perform several of its tasks. The database engine it relies upon for fast access is MySQL, but others can be used a well. The system estimates metrics, performs statistical testing and produces graphs required for HLA population studies: (i) Hardy-Weinberg equilibrium (calculated using both asymptotic and exact tests), (ii) genetics distances (Euclidian or Nei), (iii) phylogenetic trees using the unweighted pair group method with averages and neigbor-joining method, (iv) linkage disequilibrium (pairwise and overall, including variance estimations), (v) haplotype frequencies (estimate using the expectation-maximization algorithm) and (vi) discriminant analysis. The main merit of DHLAS is the incorporation of a database, thus, the data can be stored and manipulated along with integrated genetic data analysis procedures. In addition, it has an open architecture allowing the inclusion of other functions and procedures.
Mackey, Aaron J; Pearson, William R
2004-10-01
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kruger, Albert A.; Muller, I.; Gilbo, K.
2013-11-13
The objectives of this work are aimed at the development of enhanced LAW propertycomposition models that expand the composition region covered by the models. The models of interest include PCT, VHT, viscosity and electrical conductivity. This is planned as a multi-year effort that will be performed in phases with the objectives listed below for the current phase. Incorporate property- composition data from the new glasses into the database. Assess the database and identify composition spaces in the database that need augmentation. Develop statistically-designed composition matrices to cover the composition regions identified in the above analysis. Preparemore » crucible melts of glass compositions from the statistically-designed composition matrix and measure the properties of interest. Incorporate the above property-composition data into the database. Assess existing models against the complete dataset and, as necessary, start development of new models.« less
Extending GIS Technology to Study Karst Features of Southeastern Minnesota
NASA Astrophysics Data System (ADS)
Gao, Y.; Tipping, R. G.; Alexander, E. C.; Alexander, S. C.
2001-12-01
This paper summarizes ongoing research on karst feature distribution of southeastern Minnesota. The main goals of this interdisciplinary research are: 1) to look for large-scale patterns in the rate and distribution of sinkhole development; 2) to conduct statistical tests of hypotheses about the formation of sinkholes; 3) to create management tools for land-use managers and planners; and 4) to deliver geomorphic and hydrogeologic criteria for making scientifically valid land-use policies and ethical decisions in karst areas of southeastern Minnesota. Existing county and sub-county karst feature datasets of southeastern Minnesota have been assembled into a large GIS-based database capable of analyzing the entire data set. The central database management system (DBMS) is a relational GIS-based system interacting with three modules: GIS, statistical and hydrogeologic modules. ArcInfo and ArcView were used to generate a series of 2D and 3D maps depicting karst feature distributions in southeastern Minnesota. IRIS ExplorerTM was used to produce satisfying 3D maps and animations using data exported from GIS-based database. Nearest-neighbor analysis has been used to test sinkhole distributions in different topographic and geologic settings. All current nearest-neighbor analyses testify that sinkholes in southeastern Minnesota are not evenly distributed in this area (i.e., they tend to be clustered). More detailed statistical methods such as cluster analysis, histograms, probability estimation, correlation and regression have been used to study the spatial distributions of some mapped karst features of southeastern Minnesota. A sinkhole probability map for Goodhue County has been constructed based on sinkhole distribution, bedrock geology, depth to bedrock, GIS buffer analysis and nearest-neighbor analysis. A series of karst features for Winona County including sinkholes, springs, seeps, stream sinks and outcrop has been mapped and entered into the Karst Feature Database of Southeastern Minnesota. The Karst Feature Database of Winona County is being expanded to include all the mapped karst features of southeastern Minnesota. Air photos from 1930s to 1990s of Spring Valley Cavern Area in Fillmore County were scanned and geo-referenced into our GIS system. This technology has been proved to be very useful to identify sinkholes and study the rate of sinkhole development.
Independent component analysis for automatic note extraction from musical trills
NASA Astrophysics Data System (ADS)
Brown, Judith C.; Smaragdis, Paris
2004-05-01
The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.
Kent, David M; Dahabreh, Issa J; Ruthazer, Robin; Furlan, Anthony J; Weimar, Christian; Serena, Joaquín; Meier, Bernhard; Mattle, Heinrich P; Di Angelantonio, Emanuele; Paciaroni, Maurizio; Schuchlenz, Herwig; Homma, Shunichi; Lutz, Jennifer S; Thaler, David E
2015-09-14
The preferred antithrombotic strategy for secondary prevention in patients with cryptogenic stroke (CS) and patent foramen ovale (PFO) is unknown. We pooled multiple observational studies and used propensity score-based methods to estimate the comparative effectiveness of oral anticoagulation (OAC) compared with antiplatelet therapy (APT). Individual participant data from 12 databases of medically treated patients with CS and PFO were analysed with Cox regression models, to estimate database-specific hazard ratios (HRs) comparing OAC with APT, for both the primary composite outcome [recurrent stroke, transient ischaemic attack (TIA), or death] and stroke alone. Propensity scores were applied via inverse probability of treatment weighting to control for confounding. We synthesized database-specific HRs using random-effects meta-analysis models. This analysis included 2385 (OAC = 804 and APT = 1581) patients with 227 composite endpoints (stroke/TIA/death). The difference between OAC and APT was not statistically significant for the primary composite outcome [adjusted HR = 0.76, 95% confidence interval (CI) 0.52-1.12] or for the secondary outcome of stroke alone (adjusted HR = 0.75, 95% CI 0.44-1.27). Results were consistent in analyses applying alternative weighting schemes, with the exception that OAC had a statistically significant beneficial effect on the composite outcome in analyses standardized to the patient population who actually received APT (adjusted HR = 0.64, 95% CI 0.42-0.99). Subgroup analyses did not detect statistically significant heterogeneity of treatment effects across clinically important patient groups. We did not find a statistically significant difference comparing OAC with APT; our results justify randomized trials comparing different antithrombotic approaches in these patients. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2015. For permissions please email: journals.permissions@oup.com.
The prior statistics of object colors.
Koenderink, Jan J
2010-02-01
The prior statistics of object colors is of much interest because extensive statistical investigations of reflectance spectra reveal highly non-uniform structure in color space common to several very different databases. This common structure is due to the visual system rather than to the statistics of environmental structure. Analysis involves an investigation of the proper sample space of spectral reflectance factors and of the statistical consequences of the projection of spectral reflectances on the color solid. Even in the case of reflectance statistics that are translationally invariant with respect to the wavelength dimension, the statistics of object colors is highly non-uniform. The qualitative nature of this non-uniformity is due to trichromacy.
A DICOM based radiotherapy plan database for research collaboration and reporting
NASA Astrophysics Data System (ADS)
Westberg, J.; Krogh, S.; Brink, C.; Vogelius, I. R.
2014-03-01
Purpose: To create a central radiotherapy (RT) plan database for dose analysis and reporting, capable of calculating and presenting statistics on user defined patient groups. The goal is to facilitate multi-center research studies with easy and secure access to RT plans and statistics on protocol compliance. Methods: RT institutions are able to send data to the central database using DICOM communications on a secure computer network. The central system is composed of a number of DICOM servers, an SQL database and in-house developed software services to process the incoming data. A web site within the secure network allows the user to manage their submitted data. Results: The RT plan database has been developed in Microsoft .NET and users are able to send DICOM data between RT centers in Denmark. Dose-volume histogram (DVH) calculations performed by the system are comparable to those of conventional RT software. A permission system was implemented to ensure access control and easy, yet secure, data sharing across centers. The reports contain DVH statistics for structures in user defined patient groups. The system currently contains over 2200 patients in 14 collaborations. Conclusions: A central RT plan repository for use in multi-center trials and quality assurance was created. The system provides an attractive alternative to dummy runs by enabling continuous monitoring of protocol conformity and plan metrics in a trial.
Introduction to the DISRUPT postprandial database: subjects, studies and methodologies.
Jackson, Kim G; Clarke, Dave T; Murray, Peter; Lovegrove, Julie A; O'Malley, Brendan; Minihane, Anne M; Williams, Christine M
2010-03-01
Dysregulation of lipid and glucose metabolism in the postprandial state are recognised as important risk factors for the development of cardiovascular disease and type 2 diabetes. Our objective was to create a comprehensive, standardised database of postprandial studies to provide insights into the physiological factors that influence postprandial lipid and glucose responses. Data were collated from subjects (n = 467) taking part in single and sequential meal postprandial studies conducted by researchers at the University of Reading, to form the DISRUPT (DIetary Studies: Reading Unilever Postprandial Trials) database. Subject attributes including age, gender, genotype, menopausal status, body mass index, blood pressure and a fasting biochemical profile, together with postprandial measurements of triacylglycerol (TAG), non-esterified fatty acids, glucose, insulin and TAG-rich lipoprotein composition are recorded. A particular strength of the studies is the frequency of blood sampling, with on average 10-13 blood samples taken during each postprandial assessment, and the fact that identical test meal protocols were used in a number of studies, allowing pooling of data to increase statistical power. The DISRUPT database is the most comprehensive postprandial metabolism database that exists worldwide and preliminary analysis of the pooled sequential meal postprandial dataset has revealed both confirmatory and novel observations with respect to the impact of gender and age on the postprandial TAG response. Further analysis of the dataset using conventional statistical techniques along with integrated mathematical models and clustering analysis will provide a unique opportunity to greatly expand current knowledge of the aetiology of inter-individual variability in postprandial lipid and glucose responses.
Rock Statistics at the Mars Pathfinder Landing Site, Roughness and Roving on Mars
NASA Technical Reports Server (NTRS)
Haldemann, A. F. C.; Bridges, N. T.; Anderson, R. C.; Golombek, M. P.
1999-01-01
Several rock counts have been carried out at the Mars Pathfinder landing site producing consistent statistics of rock coverage and size-frequency distributions. These rock statistics provide a primary element of "ground truth" for anchoring remote sensing information used to pick the Pathfinder, and future, landing sites. The observed rock population statistics should also be consistent with the emplacement and alteration processes postulated to govern the landing site landscape. The rock population databases can however be used in ways that go beyond the calculation of cumulative number and cumulative area distributions versus rock diameter and height. Since the spatial parameters measured to characterize each rock are determined with stereo image pairs, the rock database serves as a subset of the full landing site digital terrain model (DTM). Insofar as a rock count can be carried out in a speedier, albeit coarser, manner than the full DTM analysis, rock counting offers several operational and scientific products in the near term. Quantitative rock mapping adds further information to the geomorphic study of the landing site, and can also be used for rover traverse planning. Statistical analysis of the surface roughness using the rock count proxy DTM is sufficiently accurate when compared to the full DTM to compare with radar remote sensing roughness measures, and with rover traverse profiles.
1997-06-01
career success for academy graduates relative to officers commissioned from other sources. Favoritism occurs if high-ranking officers who are service... career success as a naval officer? 6 The thesis investigates several databases in an effort to paint a complete statistical picture of naval officer...including both public and private sector career success was conducted by the Standard & Poor’s Corporation with a related analysis by Professor Michael Useem
Space flight risk data collection and analysis project: Risk and reliability database
NASA Technical Reports Server (NTRS)
1994-01-01
The focus of the NASA 'Space Flight Risk Data Collection and Analysis' project was to acquire and evaluate space flight data with the express purpose of establishing a database containing measurements of specific risk assessment - reliability - availability - maintainability - supportability (RRAMS) parameters. The developed comprehensive RRAMS database will support the performance of future NASA and aerospace industry risk and reliability studies. One of the primary goals has been to acquire unprocessed information relating to the reliability and availability of launch vehicles and the subsystems and components thereof from the 45th Space Wing (formerly Eastern Space and Missile Command -ESMC) at Patrick Air Force Base. After evaluating and analyzing this information, it was encoded in terms of parameters pertinent to ascertaining reliability and availability statistics, and then assembled into an appropriate database structure.
NASA Astrophysics Data System (ADS)
Damm, Bodo; Klose, Martin
2014-05-01
This contribution presents an initiative to develop a national landslide database for the Federal Republic of Germany. It highlights structure and contents of the landslide database and outlines its major data sources and the strategy of information retrieval. Furthermore, the contribution exemplifies the database potentials in applied landslide impact research, including statistics of landslide damage, repair, and mitigation. The landslide database offers due to systematic regional data compilation a differentiated data pool of more than 5,000 data sets and over 13,000 single data files. It dates back to 1137 AD and covers landslide sites throughout Germany. In seven main data blocks, the landslide database stores besides information on landslide types, dimensions, and processes, additional data on soil and bedrock properties, geomorphometry, and climatic or other major triggering events. A peculiarity of this landslide database is its storage of data sets on land use effects, damage impacts, hazard mitigation, and landslide costs. Compilation of landslide data is based on a two-tier strategy of data collection. The first step of information retrieval includes systematic web content mining and exploration of online archives of emergency agencies, fire and police departments, and news organizations. Using web and RSS feeds and soon also a focused web crawler, this enables effective nationwide data collection for recent landslides. On the basis of this information, in-depth data mining is performed to deepen and diversify the data pool in key landslide areas. This enables to gather detailed landslide information from, amongst others, agency records, geotechnical reports, climate statistics, maps, and satellite imagery. Landslide data is extracted from these information sources using a mix of methods, including statistical techniques, imagery analysis, and qualitative text interpretation. The landslide database is currently migrated to a spatial database system running on PostgreSQL/PostGIS. This provides advanced functionality for spatial data analysis and forms the basis for future data provision and visualization using a WebGIS application. Analysis of landslide database contents shows that in most parts of Germany landslides primarily affect transportation infrastructures. Although with distinct lower frequency, recent landslides are also recorded to cause serious damage to hydraulic facilities and waterways, supply and disposal infrastructures, sites of cultural heritage, as well as forest, agricultural, and mining areas. The main types of landslide damage are failure of cut and fill slopes, destruction of retaining walls, street lights, and forest stocks, burial of roads, backyards, and garden areas, as well as crack formation in foundations, sewer lines, and building walls. Landslide repair and mitigation at transportation infrastructures is dominated by simple solutions such as catch barriers or rock fall drapery. These solutions are often undersized and fail under stress. The use of costly slope stabilization or protection systems is proven to reduce these risks effectively over longer maintenance cycles. The right balancing of landslide mitigation is thus a crucial problem in managing landslide risks. Development and analysis of such landslide databases helps to support decision-makers in finding efficient solutions to minimize landslide risks for human beings, infrastructures, and financial assets.
Cellular Consequences of Telomere Shortening in Histologically Normal Breast Tissues
2013-09-01
using the open source, JAVA -based image analysis software package ImageJ (http://rsb.info.nih.gov/ij/) and a custom designed plugin (“Telometer...Tabulated data were stored in a MySQL (http://www.mysql.com) database and viewed through Microsoft Access (Microsoft Corp.). Statistical Analysis For
1994-06-30
tip Opening Displacement (CTOD) Fracture Toughness Measurement". 48 The method has found application in the elastic-plastic fracture mechanics ( EPFM ...68 6.1 Proposed Material Property Database Format and Hierarchy .............. 68 6.2 Sample Application of the Material Property Database...the E 49.05 sub-committee. The relevant quality indicators applicable to the present program are: source of data, statistical basis of data
An architecture for a brain-image database
NASA Technical Reports Server (NTRS)
Herskovits, E. H.
2000-01-01
The widespread availability of methods for noninvasive assessment of brain structure has enabled researchers to investigate neuroimaging correlates of normal aging, cerebrovascular disease, and other processes; we designate such studies as image-based clinical trials (IBCTs). We propose an architecture for a brain-image database, which integrates image processing and statistical operators, and thus supports the implementation and analysis of IBCTs. The implementation of this architecture is described and results from the analysis of image and clinical data from two IBCTs are presented. We expect that systems such as this will play a central role in the management and analysis of complex research data sets.
Ocean Drilling Program: Web Site Access Statistics
and products Drilling services and tools Online Janus database Search the ODP/TAMU web site ODP's main See statistics for JOIDES members. See statistics for Janus database. 1997 October November December accessible only on www-odp.tamu.edu. ** End of ODP, start of IODP. Privacy Policy ODP | Search | Database
1991-06-24
52 Gross Industrial Output in April [CEI Database ...Jan-Apr Statistics on Payments to Employees [CEI Database ] ....................................................... 56 Jan-Apr Statistics on Labor...Productivity [CEI Database ] .............................................................. 56 TRANSPORTATION Hebei Province Opens Two Air Routes [HEBEI
CAO, XIAO-LAN; ZHONG, BAO-LIANG; XIANG, YU-TAO; UNGVARI, GABOR S.; LAI, KELLY Y. C.; CHIU, HELEN F. K.; CAINE, ERIC D.
2015-01-01
Objective The objective of this meta-analysis is to estimate the pooled prevalence of suicidal ideation and suicide attempts in the general population of Mainland China. Methods A systematic literature search was conducted via the following databases: PubMed, PsycINFO, MEDLINE, China Journals Full-Text Databases, Chongqing VIP database for Chinese Technical Periodicals and Wan Fang Data. Statistical analysis used the Comprehensive Meta-Analysis program. Results Eight studies met the inclusion criteria for the analysis; five reported on the prevalence of suicidal ideation and seven on that of suicide attempts. The estimated lifetime prevalence figures of suicidal ideation and suicide attempts were 3.9% (95% Confidence interval [CI]: 2.5%–6.0%) and 0.8% (95% CI: 0.7%–0.9%), respectively. The estimated female-male ratio for lifetime prevalence of suicidal ideation and suicide attempts was 1.7 and 2.2, respectively. Only the difference of suicide attempts between the two genders was statistically significant. Conclusion This was the first meta-analysis of the prevalence of suicidal ideation and suicide attempts in the general population of Mainland China. The pooled lifetime prevalence of both suicidal ideation and suicide attempts are relatively low; however, caution is required when assessing these self-report data. Women had a modestly higher prevalence for suicide attempts than men. The frequency for suicidal ideation and suicide attempts in urban regions was similar to those in rural areas. PMID:26060259
Using statistical process control to make data-based clinical decisions.
Pfadt, A; Wheeler, D J
1995-01-01
Applied behavior analysis is based on an investigation of variability due to interrelationships among antecedents, behavior, and consequences. This permits testable hypotheses about the causes of behavior as well as for the course of treatment to be evaluated empirically. Such information provides corrective feedback for making data-based clinical decisions. This paper considers how a different approach to the analysis of variability based on the writings of Walter Shewart and W. Edwards Deming in the area of industrial quality control helps to achieve similar objectives. Statistical process control (SPC) was developed to implement a process of continual product improvement while achieving compliance with production standards and other requirements for promoting customer satisfaction. SPC involves the use of simple statistical tools, such as histograms and control charts, as well as problem-solving techniques, such as flow charts, cause-and-effect diagrams, and Pareto charts, to implement Deming's management philosophy. These data-analytic procedures can be incorporated into a human service organization to help to achieve its stated objectives in a manner that leads to continuous improvement in the functioning of the clients who are its customers. Examples are provided to illustrate how SPC procedures can be used to analyze behavioral data. Issues related to the application of these tools for making data-based clinical decisions and for creating an organizational climate that promotes their routine use in applied settings are also considered.
Computer Administering of the Psychological Investigations: Set-Relational Representation
NASA Astrophysics Data System (ADS)
Yordzhev, Krasimir
Computer administering of a psychological investigation is the computer representation of the entire procedure of psychological assessments - test construction, test implementation, results evaluation, storage and maintenance of the developed database, its statistical processing, analysis and interpretation. A mathematical description of psychological assessment with the aid of personality tests is discussed in this article. The set theory and the relational algebra are used in this description. A relational model of data, needed to design a computer system for automation of certain psychological assessments is given. Some finite sets and relation on them, which are necessary for creating a personality psychological test, are described. The described model could be used to develop real software for computer administering of any psychological test and there is full automation of the whole process: test construction, test implementation, result evaluation, storage of the developed database, statistical implementation, analysis and interpretation. A software project for computer administering personality psychological tests is suggested.
Specification of the ISS Plasma Environment Variability
NASA Technical Reports Server (NTRS)
Minow, Joseph I.; Neergaard, Linda F.; Bui, Them H.; Mikatarian, Ronald R.; Barsamian, H.; Koontz, Steven L.
2002-01-01
Quantifying the spacecraft charging risks and corresponding hazards for the International Space Station (ISS) requires a plasma environment specification describing the natural variability of ionospheric temperature (Te) and density (Ne). Empirical ionospheric specification and forecast models such as the International Reference Ionosphere (IRI) model typically only provide estimates of long term (seasonal) mean Te and Ne values for the low Earth orbit environment. Knowledge of the Te and Ne variability as well as the likelihood of extreme deviations from the mean values are required to estimate both the magnitude and frequency of occurrence of potentially hazardous spacecraft charging environments for a given ISS construction stage and flight configuration. This paper describes the statistical analysis of historical ionospheric low Earth orbit plasma measurements used to estimate Ne, Te variability in the ISS flight environment. The statistical variability analysis of Ne and Te enables calculation of the expected frequency of Occurrence of any particular values of Ne and Te, especially those that correspond to possibly hazardous spacecraft charging environments. The database used in the original analysis included measurements from the AE-C, AE-D, and DE-2 satellites. Recent work on the database has added additional satellites to the database and ground based incoherent scatter radar observations as well. Deviations of the data values from the IRI estimated Ne, Te parameters for each data point provide a statistical basis for modeling the deviations of the plasma environment from the IRI model output. This technique, while developed specifically for the Space Station analysis, can also be generalized to provide ionospheric plasma environment risk specification models for low Earth orbit over an altitude range of 200 km through approximately 1000 km.
Computer-aided auditing of prescription drug claims.
Iyengar, Vijay S; Hermiz, Keith B; Natarajan, Ramesh
2014-09-01
We describe a methodology for identifying and ranking candidate audit targets from a database of prescription drug claims. The relevant audit targets may include various entities such as prescribers, patients and pharmacies, who exhibit certain statistical behavior indicative of potential fraud and abuse over the prescription claims during a specified period of interest. Our overall approach is consistent with related work in statistical methods for detection of fraud and abuse, but has a relative emphasis on three specific aspects: first, based on the assessment of domain experts, certain focus areas are selected and data elements pertinent to the audit analysis in each focus area are identified; second, specialized statistical models are developed to characterize the normalized baseline behavior in each focus area; and third, statistical hypothesis testing is used to identify entities that diverge significantly from their expected behavior according to the relevant baseline model. The application of this overall methodology to a prescription claims database from a large health plan is considered in detail.
Kokol, Peter; Vošner, Helena Blažun
2018-01-01
The overall aim of the present study was to compare the coverage of existing research funding information for articles indexed in Scopus, Web of Science, and PubMed databases. The numbers of articles with funding information published in 2015 were identified in the three selected databases and compared using bibliometric analysis of a sample of twenty-eight prestigious medical journals. Frequency analysis of the number of articles with funding information showed statistically significant differences between Scopus, Web of Science, and PubMed databases. The largest proportion of articles with funding information was found in Web of Science (29.0%), followed by PubMed (14.6%) and Scopus (7.7%). The results show that coverage of funding information differs significantly among Scopus, Web of Science, and PubMed databases in a sample of the same medical journals. Moreover, we found that, currently, funding data in PubMed is more difficult to obtain and analyze compared with that in the other two databases.
Mining Claim Activity on Federal Land in the United States
Causey, J. Douglas
2007-01-01
Several statistical compilations of mining claim activity on Federal land derived from the Bureau of Land Management's LR2000 database have previously been published by the U.S Geological Survey (USGS). The work in the 1990s did not include Arkansas or Florida. None of the previous reports included Alaska because it is stored in a separate database (Alaska Land Information System) and is in a different format. This report includes data for all states for which there are Federal mining claim records, beginning in 1976 and continuing to the present. The intent is to update the spatial and statistical data associated with this report on an annual basis, beginning with 2005 data. The statistics compiled from the databases are counts of the number of active mining claims in a section of land each year from 1976 to the present for all states within the United States. Claim statistics are subset by lode and placer types, as well as a dataset summarizing all claims including mill site and tunnel site claims. One table presents data by case type, case status, and number of claims in a section. This report includes a spatial database for each state in which mining claims were recorded, except North Dakota, which only has had two claims. A field is present that allows the statistical data to be joined to the spatial databases so that spatial displays and analysis can be done by using appropriate geographic information system (GIS) software. The data show how mining claim activity has changed in intensity, space, and time. Variations can be examined on a state, as well as a national level. The data are tied to a section of land, approximately 640 acres, which allows it to be used at regional, as well as local scale. The data only pertain to Federal land and mineral estate that was open to mining claim location at the time the claims were staked.
Hafen, G M; Hurst, C; Yearwood, J; Smith, J; Dzalilov, Z; Robinson, P J
2008-10-05
Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. (1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.
ERIC Educational Resources Information Center
Zheng, Henry Y.; Stewart, Alice A.
This study explores data envelopment analysis (DEA) as a tool for assessing and benchmarking the performance of public research universities. Using of national databases such as those conducted by the National Science Foundation and the National Center for Education Statistics, DEA analysis was conducted of the research and instructional outcomes…
NASA Astrophysics Data System (ADS)
Zhou, Hui
It is the inevitable outcome of higher education reform to carry out office and departmental target responsibility system, in which statistical processing of student's information is an important part of student's performance review. On the basis of the analysis of the student's evaluation, the student information management database application system is designed by using relational database management system software in this paper. In order to implement the function of student information management, the functional requirement, overall structure, data sheets and fields, data sheet Association and software codes are designed in details.
Remote visual analysis of large turbulence databases at multiple scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Remote visual analysis of large turbulence databases at multiple scales
Pulido, Jesus; Livescu, Daniel; Kanov, Kalin; ...
2018-06-15
The remote analysis and visualization of raw large turbulence datasets is challenging. Current accurate direct numerical simulations (DNS) of turbulent flows generate datasets with billions of points per time-step and several thousand time-steps per simulation. Until recently, the analysis and visualization of such datasets was restricted to scientists with access to large supercomputers. The public Johns Hopkins Turbulence database simplifies access to multi-terabyte turbulence datasets and facilitates the computation of statistics and extraction of features through the use of commodity hardware. In this paper, we present a framework designed around wavelet-based compression for high-speed visualization of large datasets and methodsmore » supporting multi-resolution analysis of turbulence. By integrating common technologies, this framework enables remote access to tools available on supercomputers and over 230 terabytes of DNS data over the Web. Finally, the database toolset is expanded by providing access to exploratory data analysis tools, such as wavelet decomposition capabilities and coherent feature extraction.« less
Federal Register 2010, 2011, 2012, 2013, 2014
2011-12-14
... completed and validated, the hardcopy questionnaires will be discarded. Data will be imported into SPSS (Statistical Package for the Social Sciences) for analysis. The database will be maintained at the respective...
Global Statistics of Bolides in the Terrestrial Atmosphere
NASA Astrophysics Data System (ADS)
Chernogor, L. F.; Shevelyov, M. B.
2017-06-01
Purpose: Evaluation and analysis of distribution of the number of meteoroid (mini asteroid) falls as a function of glow energy, velocity, the region of maximum glow altitude, and geographic coordinates. Design/methodology/approach: The satellite database on the glow of 693 mini asteroids, which were decelerated in the terrestrial atmosphere, has been used for evaluating basic meteoroid statistics. Findings: A rapid decrease in the number of asteroids with increasing of their glow energy is confirmed. The average speed of the celestial bodies is equal to about 17.9 km/s. The altitude of maximum glow most often equals to 30-40 km. The distribution law for a number of meteoroids entering the terrestrial atmosphere in longitude and latitude (after excluding the component in latitudinal dependence due to the geometry) is approximately uniform. Conclusions: Using a large enough database of measurements, the meteoroid (mini asteroid) statistics has been evaluated.
Utah Virtual Lab: JAVA interactivity for teaching science and statistics on line.
Malloy, T E; Jensen, G C
2001-05-01
The Utah on-line Virtual Lab is a JAVA program run dynamically off a database. It is embedded in StatCenter (www.psych.utah.edu/learn/statsampler.html), an on-line collection of tools and text for teaching and learning statistics. Instructors author a statistical virtual reality that simulates theories and data in a specific research focus area by defining independent, predictor, and dependent variables and the relations among them. Students work in an on-line virtual environment to discover the principles of this simulated reality: They go to a library, read theoretical overviews and scientific puzzles, and then go to a lab, design a study, collect and analyze data, and write a report. Each student's design and data analysis decisions are computer-graded and recorded in a database; the written research report can be read by the instructor or by other students in peer groups simulating scientific conventions.
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.
Li, Xuelong; Guo, Qun; Lu, Xiaoqiang
2016-05-13
It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Spectral signature verification using statistical analysis and text mining
NASA Astrophysics Data System (ADS)
DeCoster, Mallory E.; Firpi, Alexe H.; Jacobs, Samantha K.; Cone, Shelli R.; Tzeng, Nigel H.; Rodriguez, Benjamin M.
2016-05-01
In the spectral science community, numerous spectral signatures are stored in databases representative of many sample materials collected from a variety of spectrometers and spectroscopists. Due to the variety and variability of the spectra that comprise many spectral databases, it is necessary to establish a metric for validating the quality of spectral signatures. This has been an area of great discussion and debate in the spectral science community. This paper discusses a method that independently validates two different aspects of a spectral signature to arrive at a final qualitative assessment; the textual meta-data and numerical spectral data. Results associated with the spectral data stored in the Signature Database1 (SigDB) are proposed. The numerical data comprising a sample material's spectrum is validated based on statistical properties derived from an ideal population set. The quality of the test spectrum is ranked based on a spectral angle mapper (SAM) comparison to the mean spectrum derived from the population set. Additionally, the contextual data of a test spectrum is qualitatively analyzed using lexical analysis text mining. This technique analyzes to understand the syntax of the meta-data to provide local learning patterns and trends within the spectral data, indicative of the test spectrum's quality. Text mining applications have successfully been implemented for security2 (text encryption/decryption), biomedical3 , and marketing4 applications. The text mining lexical analysis algorithm is trained on the meta-data patterns of a subset of high and low quality spectra, in order to have a model to apply to the entire SigDB data set. The statistical and textual methods combine to assess the quality of a test spectrum existing in a database without the need of an expert user. This method has been compared to other validation methods accepted by the spectral science community, and has provided promising results when a baseline spectral signature is present for comparison. The spectral validation method proposed is described from a practical application and analytical perspective.
Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A
2009-10-01
Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.
Using Online Databases to Determine the Correlation between Ranked Lists of Journals.
1984-12-30
CLASSIFICATION UNCLASSIFIED/UNLIMITED EZ SAME AS RPT. 0 OTIC USERS 0l UNCLASSIFIED 22a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE NUMBER 22c. OFFICE SYMBOL...Communications Agency. The purpose of the study was to use citation analysis and statistical testing in journal selection. Bibliographic databases...sources to justify the cost of the journals selected. The research procedures used in this study included th compilation of a list of 157 technical
Semantic Annotation of Complex Text Structures in Problem Reports
NASA Technical Reports Server (NTRS)
Malin, Jane T.; Throop, David R.; Fleming, Land D.
2011-01-01
Text analysis is important for effective information retrieval from databases where the critical information is embedded in text fields. Aerospace safety depends on effective retrieval of relevant and related problem reports for the purpose of trend analysis. The complex text syntax in problem descriptions has limited statistical text mining of problem reports. The presentation describes an intelligent tagging approach that applies syntactic and then semantic analysis to overcome this problem. The tags identify types of problems and equipment that are embedded in the text descriptions. The power of these tags is illustrated in a faceted searching and browsing interface for problem report trending that combines automatically generated tags with database code fields and temporal information.
Network-based statistical comparison of citation topology of bibliographic databases
Šubelj, Lovro; Fiala, Dalibor; Bajec, Marko
2014-01-01
Modern bibliographic databases provide the basis for scientific research and its evaluation. While their content and structure differ substantially, there exist only informal notions on their reliability. Here we compare the topological consistency of citation networks extracted from six popular bibliographic databases including Web of Science, CiteSeer and arXiv.org. The networks are assessed through a rich set of local and global graph statistics. We first reveal statistically significant inconsistencies between some of the databases with respect to individual statistics. For example, the introduced field bow-tie decomposition of DBLP Computer Science Bibliography substantially differs from the rest due to the coverage of the database, while the citation information within arXiv.org is the most exhaustive. Finally, we compare the databases over multiple graph statistics using the critical difference diagram. The citation topology of DBLP Computer Science Bibliography is the least consistent with the rest, while, not surprisingly, Web of Science is significantly more reliable from the perspective of consistency. This work can serve either as a reference for scholars in bibliometrics and scientometrics or a scientific evaluation guideline for governments and research agencies. PMID:25263231
The study of co-citation analysis and knowledge structure on healthcare domain
NASA Astrophysics Data System (ADS)
Chu, Kuo-Chung; Liu, Wen-I.; Tsai, Ming-Yu
2012-11-01
With the prevalence of Internet and digital archives, the online e-journal database facilitates scholars to search literature in a research domain, or to cross-search an inter-disciplined field; the key literature can be efficiently traced out. This study intends to build a Web-based citation analysis system, which consists of four modules, they are: 1) literature search module; (2) statistics module; (3) articles analysis module; and (4) co-citation analysis module. The system focuses on PubMed Central dataset that has 170,000 records. In a research domain, a specific keyword searches in terms of authors, journals, and core issues. In addition, we use data mining techniques for co-citation analysis. The results assist researchers with in-depth understanding of the domain knowledge. Having an automated system for co-citation analysis, it helps to understand changes, trends, and knowledge structure of research domain. For the best of our knowledge, the proposed system differentiates from existing online electronic retrieval database analysis function. Perhaps, the proposed system is going to be a value-added database of healthcare domain, and hope to contribute the researchers.
Integrated ITS capabilities in transit vehicles : human factors research needs : summary report
DOT National Transportation Integrated Search
2005-08-30
This synthesis reviews statistics and activities of public private partnership activities in transportation around the world between 1985 and 2004. The data used for the analysis is from the 2004 International Public Works Financing Projects Database...
Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes
Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Ángel
2009-01-01
Background Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. Results To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. Conclusion The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest. PMID:19344481
Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.
Amigo, Jorge; Phillips, Christopher; Salas, Antonio; Carracedo, Angel
2009-03-19
Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies. To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases. The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.
Multivariate analysis: A statistical approach for computations
NASA Astrophysics Data System (ADS)
Michu, Sachin; Kaushik, Vandana
2014-10-01
Multivariate analysis is a type of multivariate statistical approach commonly used in, automotive diagnosis, education evaluating clusters in finance etc and more recently in the health-related professions. The objective of the paper is to provide a detailed exploratory discussion about factor analysis (FA) in image retrieval method and correlation analysis (CA) of network traffic. Image retrieval methods aim to retrieve relevant images from a collected database, based on their content. The problem is made more difficult due to the high dimension of the variable space in which the images are represented. Multivariate correlation analysis proposes an anomaly detection and analysis method based on the correlation coefficient matrix. Anomaly behaviors in the network include the various attacks on the network like DDOs attacks and network scanning.
Data resource profile: United Nations Children's Fund (UNICEF).
Murray, Colleen; Newby, Holly
2012-12-01
The United Nations Children's Fund (UNICEF) plays a leading role in the collection, compilation, analysis and dissemination of data to inform sound policies, legislation and programmes for promoting children's rights and well-being, and for global monitoring of progress towards the Millennium Development Goals. UNICEF maintains a set of global databases representing nearly 200 countries and covering the areas of child mortality, child health, maternal health, nutrition, immunization, water and sanitation, HIV/AIDS, education and child protection. These databases consist of internationally comparable and statistically sound data, and are updated annually through a process that draws on a wealth of data provided by UNICEF's wide network of >150 field offices. The databases are composed primarily of estimates from household surveys, with data from censuses, administrative records, vital registration systems and statistical models contributing to some key indicators as well. The data are assessed for quality based on a set of objective criteria to ensure that only the most reliable nationally representative information is included. For most indicators, data are available at the global, regional and national levels, plus sub-national disaggregation by sex, urban/rural residence and household wealth. The global databases are featured in UNICEF's flagship publications, inter-agency reports, including the Secretary General's Millennium Development Goals Report and Countdown to 2015, sector-specific reports and statistical country profiles. They are also publicly available on www.childinfo.org, together with trend data and equity analyses.
Hawthorne L. Beyer; Jeff Jenness; Samuel A. Cushman
2010-01-01
Spatial information systems (SIS) is a term that describes a wide diversity of concepts, techniques, and technologies related to the capture, management, display and analysis of spatial information. It encompasses technologies such as geographic information systems (GIS), global positioning systems (GPS), remote sensing, and relational database management systems (...
Bergamino, Maurizio; Hamilton, David J; Castelletti, Lara; Barletta, Laura; Castellan, Lucio
2015-03-01
In this study, we describe the development and utilization of a relational database designed to manage the clinical and radiological data of patients with brain tumors. The Brain Tumor Database was implemented using MySQL v.5.0, while the graphical user interface was created using PHP and HTML, thus making it easily accessible through a web browser. This web-based approach allows for multiple institutions to potentially access the database. The BT Database can record brain tumor patient information (e.g. clinical features, anatomical attributes, and radiological characteristics) and be used for clinical and research purposes. Analytic tools to automatically generate statistics and different plots are provided. The BT Database is a free and powerful user-friendly tool with a wide range of possible clinical and research applications in neurology and neurosurgery. The BT Database graphical user interface source code and manual are freely available at http://tumorsdatabase.altervista.org. © The Author(s) 2013.
Ushijima, Masaru; Mashima, Tetsuo; Tomida, Akihiro; Dan, Shingo; Saito, Sakae; Furuno, Aki; Tsukahara, Satomi; Seimiya, Hiroyuki; Yamori, Takao; Matsuura, Masaaki
2013-03-01
Genome-wide transcriptional expression analysis is a powerful strategy for characterizing the biological activity of anticancer compounds. It is often instructive to identify gene sets involved in the activity of a given drug compound for comparison with different compounds. Currently, however, there is no comprehensive gene expression database and related application system that is; (i) specialized in anticancer agents; (ii) easy to use; and (iii) open to the public. To develop a public gene expression database of antitumor agents, we first examined gene expression profiles in human cancer cells after exposure to 35 compounds including 25 clinically used anticancer agents. Gene signatures were extracted that were classified as upregulated or downregulated after exposure to the drug. Hierarchical clustering showed that drugs with similar mechanisms of action, such as genotoxic drugs, were clustered. Connectivity map analysis further revealed that our gene signature data reflected modes of action of the respective agents. Together with the database, we developed analysis programs that calculate scores for ranking changes in gene expression and for searching statistically significant pathways from the Kyoto Encyclopedia of Genes and Genomes database in order to analyze the datasets more easily. Our database and the analysis programs are available online at our website (http://scads.jfcr.or.jp/db/cs/). Using these systems, we successfully showed that proteasome inhibitors are selectively classified as endoplasmic reticulum stress inducers and induce atypical endoplasmic reticulum stress. Thus, our public access database and related analysis programs constitute a set of efficient tools to evaluate the mode of action of novel compounds and identify promising anticancer lead compounds. © 2012 Japanese Cancer Association.
Scholl, Joep H G; van Puijenbroek, Eugene P
2012-08-01
The Netherlands Pharmacovigilance Centre Lareb received reports of six cases of hearing impairment in association with oral terbinafine use. This study describes these cases and provides support for this association from the Lareb database for spontaneous adverse drug reaction (ADR) reporting and from Vigibase™, the ADR database of the WHO Collaborating Centre for International Drug Monitoring, the Uppsala Monitoring Centre. The objective of the current study was to identify whether the observed association between oral terbinafine use and hearing impairment, based on cases received by Lareb, constitutes a safety signal. Cases of hearing impairment in oral terbinafine users are described. In a case/non-case analysis, the strength of the association in Vigibase™ and the Lareb database was determined (date of analysis August 2011) by calculating the reporting odds ratios (RORs), adjusted for possible confounding by age, sex and ototoxic concomitant medication. For the purpose of this study, RORs were calculated for deafness, hypoacusis and the combination of both, defined as hearing impairment. In the Lareb database, six reports concerning individuals aged 31-82 years, who developed hearing impairment after starting oral terbinafine, were present. The use of oral terbinafine was disproportionally associated with hypoacusis in both the Lareb database (adjusted ROR 3.9; 95% CI 1.7, 9.0) and in Vigibase™ (adjusted ROR 1.7; 95% CI 1.0, 2.8). Deafness was not disproportionally present in either of the databases. Based on the described cases and the statistical analyses from both databases, a causal relationship between the use of oral terbinafine and hearing impairment is possible. The mechanism by which terbinafine could cause hearing impairment has not been elucidated yet. The pharmacological action of terbinafine is based on the inhibition of squalene epoxidase, an enzyme present in both fungal and human cells. This inhibition might result in a decrease in cholesterol levels in human cells, among which are the outer hair cells of the cochlea. It may be possible that the reduction in cochlear cholesterol levels leads to impaired cochlear function and possibly hearing impairment. In this study we describe hearing impairment as a possible ADR of oral terbinafine, based on six case reports and statistical support from Vigibase™ and the Lareb database. To our knowledge this association has not been described before.
Global building inventory for earthquake loss estimation and risk management
Jaiswal, Kishor; Wald, David; Porter, Keith
2010-01-01
We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat’s demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature.
Using statistical text classification to identify health information technology incidents
Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah
2013-01-01
Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Pradhan, A; Tincello, D G; Kearney, R
2013-01-01
To report the numbers of patients having childbirth after pelvic floor surgery in England. Retrospective analysis of Hospital Episode Statistics data. Hospital Episode Statistics database. Women, aged 20-44 years, undergoing childbirth after pelvic floor surgery between the years 2002 and 2008. Analysis of the Hospital Episode Statistics database using Office of Population, Censuses and Surveys: Classification of Interventions and Procedures, 4th Revision (OPCS-4) code at the four-character level for pelvic floor surgery and delivery, in women aged 20-44 years, between the years 2002 and 2008. Numbers of women having delivery episodes after previous pelvic floor surgery, and numbers having further pelvic floor surgery after delivery. Six hundred and three women had a delivery episode after previous pelvic floor surgery in the time period 2002-2008. In this group of 603 women, 42 had a further pelvic floor surgery episode following delivery in the same time period. The incidence of repeat surgery episode following delivery was higher in the group delivered vaginally than in those delivered by caesarean (13.6 versus 4.4%; odds ratio, 3.38; 95% confidence interval, 1.87-6.10). There were 603 women having childbirth after pelvic floor surgery in the time period 2002-2008. The incidence of further pelvic floor surgery after childbirth was lower after caesarean delivery than after vaginal delivery, and this may indicate a protective effect of abdominal delivery. © 2012 The Authors BJOG An International Journal of Obstetrics and Gynaecology © 2012 RCOG.
Irani, Morvarid; Amirian, Malihe; Sadeghi, Ramin; Lez, Justine Le; Latifnejad Roudsari, Robab
2017-08-29
To evaluate the effect of folate and folate plus zinc supplementation on endocrine parameters and sperm characteristics in sub fertile men. We conducted a systematic review and meta-analysis. Electronic databases of Medline, Scopus , Google scholar and Persian databases (SID, Iran medex, Magiran, Medlib, Iran doc) were searched from 1966 to December 2016 using a set of relevant keywords including "folate or folic acid AND (infertility, infertile, sterility)".All available randomized controlled trials (RCTs), conducted on a sample of sub fertile men with semen analyses, who took oral folic acid or folate plus zinc, were included. Data collected included endocrine parameters and sperm characteristics. Statistical analyses were done by Comprehensive Meta-analysis Version 2. In total, seven studies were included. Six studies had sufficient data for meta-analysis. "Sperm concentration was statistically higher in men supplemented with folate than with placebo (P < .001)". However, folate supplementation alone did not seem to be more effective than the placebo on the morphology (P = .056) and motility of the sperms (P = .652). Folate plus zinc supplementation did not show any statistically different effect on serum testosterone (P = .86), inhibin B (P = .84), FSH (P = .054), and sperm motility (P = .169) as compared to the placebo. Yet, folate plus zinc showed statistically higher effect on the sperm concentration (P < .001), morphology (P < .001), and serum folate level (P < .001) as compared to placebo. Folate plus zinc supplementation has a positive effect on sperm characteristics in sub fertile men. However, these results should be interpreted with caution due to the important heterogeneity of the studies included in this meta-analysis. Further trials are still needed to confirm the current findings.
Teacher Education Faculty and Computer Competency.
ERIC Educational Resources Information Center
Barger, Robert N.; Armel, Donald
A project was introduced in the College of Education at Eastern Illinois University to assist faculty, through inservice training, to become more knowledgeable about computer applications and limitations. Practical needs of faculty included word processing, statistical analysis, database manipulation, electronic mail, file transfers, file…
Quality assurance software inspections at NASA Ames: Metrics for feedback and modification
NASA Technical Reports Server (NTRS)
Wenneson, G.
1985-01-01
Software inspections are a set of formal technical review procedures held at selected key points during software development in order to find defects in software documents--is described in terms of history, participants, tools, procedures, statistics, and database analysis.
DOT National Transportation Integrated Search
2015-11-01
One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...
dbMDEGA: a database for meta-analysis of differentially expressed genes in autism spectrum disorder.
Zhang, Shuyun; Deng, Libin; Jia, Qiyue; Huang, Shaoting; Gu, Junwang; Zhou, Fankun; Gao, Meng; Sun, Xinyi; Feng, Chang; Fan, Guangqin
2017-11-16
Autism spectrum disorders (ASD) are hereditary, heterogeneous and biologically complex neurodevelopmental disorders. Individual studies on gene expression in ASD cannot provide clear consensus conclusions. Therefore, a systematic review to synthesize the current findings from brain tissues and a search tool to share the meta-analysis results are urgently needed. Here, we conducted a meta-analysis of brain gene expression profiles in the current reported human ASD expression datasets (with 84 frozen male cortex samples, 17 female cortex samples, 32 cerebellum samples and 4 formalin fixed samples) and knock-out mouse ASD model expression datasets (with 80 collective brain samples). Then, we applied R language software and developed an interactive shared and updated database (dbMDEGA) displaying the results of meta-analysis of data from ASD studies regarding differentially expressed genes (DEGs) in the brain. This database, dbMDEGA ( https://dbmdega.shinyapps.io/dbMDEGA/ ), is a publicly available web-portal for manual annotation and visualization of DEGs in the brain from data from ASD studies. This database uniquely presents meta-analysis values and homologous forest plots of DEGs in brain tissues. Gene entries are annotated with meta-values, statistical values and forest plots of DEGs in brain samples. This database aims to provide searchable meta-analysis results based on the current reported brain gene expression datasets of ASD to help detect candidate genes underlying this disorder. This new analytical tool may provide valuable assistance in the discovery of DEGs and the elucidation of the molecular pathogenicity of ASD. This database model may be replicated to study other disorders.
Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong
2015-01-01
Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
[Relational database for urinary stone ambulatory consultation. Assessment of initial outcomes].
Sáenz Medina, J; Páez Borda, A; Crespo Martinez, L; Gómez Dos Santos, V; Barrado, C; Durán Poveda, M
2010-05-01
To create a relational database for monitoring lithiasic patients. We describe the architectural details and the initial results of the statistical analysis. Microsoft Access 2002 was used as template. Four different tables were constructed to gather demographic data (table 1), clinical and laboratory findings (table 2), stone features (table 3) and therapeutic approach (table 4). For a reliability analysis of the database the number of correctly stored data was gathered. To evaluate the performance of the database, a prospective analysis was conducted, from May 2004 to August 2009, on 171 stone free patients after treatment (EWSL, surgery or medical) from a total of 511 patients stored in the database. Lithiasic status (stone free or stone relapse) was used as primary end point, while demographic factors (age, gender), lithiasic history, upper urinary tract alterations and characteristics of the stone (side, location, composition and size) were considered as predictive factors. An univariate analysis was conducted initially by chi square test and supplemented by Kaplan Meier estimates for time to stone recurrence. A multiple Cox proportional hazards regression model was generated to jointly assess the prognostic value of the demographic factors and the predictive value of stones characteristics. For the reliability analysis 22,084 data were available corresponding to 702 consultations on 511 patients. Analysis of data showed a recurrence rate of 85.4% (146/171, median time to recurrence 608 days, range 70-1758). In the univariate and multivariate analysis, none of the factors under consideration had a significant effect on recurrence rate (p=ns). The relational database is useful for monitoring patients with urolithiasis. It allows easy control and update, as well as data storage for later use. The analysis conducted for its evaluation showed no influence of demographic factors and stone features on stone recurrence.
Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A
2011-08-15
Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo
2018-03-15
RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .
Traumatic injury among drywall installers, 1992 to 1995.
Chiou, S S; Pan, C S; Keane, P
2000-11-01
This study examined the traumatic-injury characteristics associated with one of the high-risk occupations in the construction industry--drywall installers--through an analysis of the traumatic-injury data obtained from the Bureau of Labor Statistics. An additional objective was to demonstrate a feasible and economic approach to identify risk factors associated with a specific occupation by using an existing database. An analysis of nonfatal traumatic injuries with days away from work among wage-and-salary drywall installers was performed for 1992 through 1995 using the Occupational Injury and Illness Survey conducted by the Bureau of Labor Statistics. Results from this study indicate that drywall installers are at a high risk of overexertion and falls to a lower level. More than 40% of the injured drywall installers suffered sprains, strains, and/or tears. The most frequently injured body part was the trunk. More than one-third of the trunk injuries occurred while handling solid building materials, mainly drywall. In addition, the database analysis used in this study is valid in identifying overall risk factors for specific occupations.
NASA Astrophysics Data System (ADS)
Meneveau, Charles; Johnson, Perry; Hamilton, Stephen; Burns, Randal
2016-11-01
An intrinsic property of turbulent flows is the exponential deformation of fluid elements along Lagrangian paths. The production of enstrophy by vorticity stretching follows from a similar mechanism in the Lagrangian view, though the alignment statistics differ and viscosity prevents unbounded growth. In this paper, the stretching properties of fluid elements and vorticity along Lagrangian paths are studied in a channel flow at Reτ = 1000 and compared with prior, known results from isotropic turbulence. To track Lagrangian paths in a public database containing Direct Numerical Simulation (DNS) results, the task-parallel approach previously employed in the isotropic database is extended to the case of flow in a bounded domain. It is shown that above 100 viscous units from the wall, stretching statistics are equal to their isotropic values, in support of the local isotropy hypothesis. Normalized by dissipation rate, the stretching in the buffer layer and below is less efficient due to less favorable alignment statistics. The Cramér function characterizing cumulative Lagrangian stretching statistics shows that overall the channel flow has about half of the stretching per unit dissipation compared with isotropic turbulence. Supported by a National Science Foundation Graduate Research Fellowship Program under Grant No. DGE-1232825, and by National Science Foundation Grants CBET-1507469, ACI-1261715, OCI-1244820 and by JHU IDIES.
Improving accuracy and power with transfer learning using a meta-analytic database.
Schwartz, Yannick; Varoquaux, Gaël; Pallier, Christophe; Pinel, Philippe; Poline, Jean-Baptiste; Thirion, Bertrand
2012-01-01
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the basis of the coordinates of previously detected effects. In this paper, we propose to use a database of images, rather than coordinates, and frame the problem as transfer learning: learning a discriminant model on a reference task to apply it to a different but related new task. To facilitate statistical analysis of small cohorts, we use a sparse discriminant model that selects predictive voxels on the reference task and thus provides a principled procedure to define ROIs. The benefits of our approach are twofold. First it uses the reference database for prediction, i.e., to provide potential biomarkers in a clinical setting. Second it increases statistical power on the new task. We demonstrate on a set of 18 pairs of functional MRI experimental conditions that our approach gives good prediction. In addition, on a specific transfer situation involving different scanners at different locations, we show that voxel selection based on transfer learning leads to higher detection power on small cohorts.
"Hyperstat": an educational and working tool in epidemiology.
Nicolosi, A
1995-01-01
The work of a researcher in epidemiology is based on studying literature, planning studies, gathering data, analyzing data and writing results. Therefore he has need for performing, more or less, simple calculations, the need for consulting or quoting literature, the need for consulting textbooks about certain issues or procedures, and the need for looking at a specific formula. There are no programs conceived as a workstation to assist the different aspects of researcher work in an integrated fashion. A hypertextual system was developed which supports different stages of the epidemiologist's work. It combines database management, statistical analysis or planning, and literature searches. The software was developed on Apple Macintosh by using Hypercard 2.1 as a database and HyperTalk as a programming language. The program is structured in 7 "stacks" or files: Procedures; Statistical Tables; Graphs; References; Text; Formulas; Help. Each stack has its own management system with an automated Table of Contents. Stacks contain "cards" which make up the databases and carry executable programs. The programs are of four kinds: association; statistical procedure; formatting (input/output); database management. The system performs general statistical procedures, procedures applicable to epidemiological studies only (follow-up and case-control), and procedures for clinical trials. All commands are given by clicking the mouse on self-explanatory "buttons". In order to perform calculations, the user only needs to enter the data into the appropriate cells and then click on the selected procedure's button. The system has a hypertextual structure. The user can go from a procedure to other cards following the preferred order of succession and according to built-in associations. The user can access different levels of knowledge or information from any stack he is consulting or operating. From every card, the user can go to a selected procedure to perform statistical calculations, to the reference database management system, to the textbook in which all procedures and issues are discussed in detail, to the database of statistical formulas with automated table of contents, to statistical tables with automated table of contents, or to the help module. he program has a very user-friendly interface and leaves the user free to use the same format he would use on paper. The interface does not require special skills. It reflects the Macintosh philosophy of using windows, buttons and mouse. This allows the user to perform complicated calculations without losing the "feel" of data, weight alternatives, and simulations. This program shares many features in common with hypertexts. It has an underlying network database where the nodes consist of text, graphics, executable procedures, and combinations of these; the nodes in the database correspond to windows on the screen; the links between the nodes in the database are visible as "active" text or icons in the windows; the text is read by following links and opening new windows. The program is especially useful as an educational tool, directed to medical and epidemiology students. The combination of computing capabilities with a textbook and databases of formulas and literature references, makes the program versatile and attractive as a learning tool. The program is also helpful in the work done at the desk, where the researcher examines results, consults literature, explores different analytic approaches, plans new studies, or writes grant proposals or scientific articles.
Moo-Young, Tricia A; Panergo, Jessel; Wang, Chih E; Patel, Subhash; Duh, Hong Yan; Winchester, David J; Prinz, Richard A; Fogelfeld, Leon
2013-11-01
Clinicopathologic variables influence the treatment and prognosis of patients with thyroid cancer. A retrospective analysis of public hospital thyroid cancer database and the Surveillance, Epidemiology and End Results 17 database was conducted. Demographic, clinical, and pathologic data were compared across ethnic groups. Within the public hospital database, Hispanics versus non-Hispanic whites were younger and had more lymph node involvement (34% vs 17%, P < .001). Median tumor size was not statistically different across ethnic groups. Similar findings were demonstrated within the Surveillance, Epidemiology and End Results database. African Americans aged <45 years had the largest tumors but were least likely to have lymph node involvement. Asians had the most stage IV disease despite having no differences in tumor size, lymph node involvement, and capsular invasion. There is considerable variability in the clinical presentation of thyroid cancer across ethnic groups. Such disparities persist within an equal-access health care system. These findings suggest that factors beyond socioeconomics may contribute to such differences. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Technical Reports Server (NTRS)
Ho, C. Y.; Li, H. H.
1989-01-01
A computerized comprehensive numerical database system on the mechanical, thermophysical, electronic, electrical, magnetic, optical, and other properties of various types of technologically important materials such as metals, alloys, composites, dielectrics, polymers, and ceramics has been established and operational at the Center for Information and Numerical Data Analysis and Synthesis (CINDAS) of Purdue University. This is an on-line, interactive, menu-driven, user-friendly database system. Users can easily search, retrieve, and manipulate the data from the database system without learning special query language, special commands, standardized names of materials, properties, variables, etc. It enables both the direct mode of search/retrieval of data for specified materials, properties, independent variables, etc., and the inverted mode of search/retrieval of candidate materials that meet a set of specified requirements (which is the computer-aided materials selection). It enables also tabular and graphical displays and on-line data manipulations such as units conversion, variables transformation, statistical analysis, etc., of the retrieved data. The development, content, accessibility, etc., of the database system are presented and discussed.
Higher order statistical analysis of /x/ in male speech.
Orr, M C; Lithgow, B
2005-03-01
This paper presents a study of kurtosis analysis for the sound /x/ in male speech, /x/ is the sound of the 'o' at the end of words such as 'ago'. The sound analysed for this paper came from the Australian National Database of Spoken Language, more specifically the male speaker 17. The /x/ was isolated and extracted from the database by the author in a quiet booth using standard multimedia software. A 5 millisecond window was used for the analysis as it was shown previously by the author to be the most appropriate size for speech phoneme analysis. The significance of the research presented here is shown in the results where a majority of coefficients had a platykurtic (kurtosis between 0 and 3) value as opposed to the previously held leptokurtic (kurtosis > 3) belief.
ToxMiner Software Interface for Visualizing and Analyzing ToxCast Data
The ToxCast dataset represents a collection of assays and endpoints that will require both standard statistical approaches as well as customized data analysis workflows. To analyze this unique dataset, we have developed an integrated database with Javabased interface called ToxMi...
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete.
Pour, Sadaf Moallemi; Alam, M Shahria; Milani, Abbas S
2016-08-30
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models.
Clesse, Christophe; Lighezzolo-Alnot, Joëlle; De Lavergne, Sylvie; Hamlin, Sandrine; Scheffler, Michèle
2018-06-01
The authors' purpose for this article is to identify, review and interpret all publications about the episiotomy rates worldwide. Based on the criteria from the PRISMA guidelines, twenty databases were scrutinized. All studies which include national statistics related to episiotomy were selected, as well as studies presenting estimated data. Sixty-one papers were selected with publication dates between 1995 and 2016. A static and dynamic analysis of all the results was carried out. The assumption for the decline in the number of episiotomies is discussed and confirmed, recalling that nowadays high rates of episiotomy remain in less industrialized countries and East Asia. Finally, our analysis aims to investigate the potential determinants which influence apparent statistical disparities.
Hu, Yiwen; Chen, Jiahui; Hu, Guping; Yu, Jianchen; Zhu, Xun; Lin, Yongcheng; Chen, Shengping; Yuan, Jie
2015-01-07
Every year, hundreds of new compounds are discovered from the metabolites of marine organisms. Finding new and useful compounds is one of the crucial drivers for this field of research. Here we describe the statistics of bioactive compounds discovered from marine organisms from 1985 to 2012. This work is based on our database, which contains information on more than 15,000 chemical substances including 4196 bioactive marine natural products. We performed a comprehensive statistical analysis to understand the characteristics of the novel bioactive compounds and detail temporal trends, chemical structures, species distribution, and research progress. We hope this meta-analysis will provide useful information for research into the bioactivity of marine natural products and drug development.
Using Statistics for Database Management in an Academic Library.
ERIC Educational Resources Information Center
Hyland, Peter; Wright, Lynne
1996-01-01
Collecting statistical data about database usage by library patrons aids in the management of CD-ROM and database offerings, collection development, and evaluation of training programs. Two approaches to data collection are presented which should be used together: an automated or nonintrusive method which monitors search sessions while the…
Hewett, Paul; Bullock, William H
2014-01-01
For more than 20 years CSX Transportation (CSXT) has collected exposure measurements from locomotive engineers and conductors who are potentially exposed to diesel emissions. The database included measurements for elemental and total carbon, polycyclic aromatic hydrocarbons, aromatics, aldehydes, carbon monoxide, and nitrogen dioxide. This database was statistically analyzed and summarized, and the resulting statistics and exposure profiles were compared to relevant occupational exposure limits (OELs) using both parametric and non-parametric descriptive and compliance statistics. Exposure ratings, using the American Industrial Health Association (AIHA) exposure categorization scheme, were determined using both the compliance statistics and Bayesian Decision Analysis (BDA). The statistical analysis of the elemental carbon data (a marker for diesel particulate) strongly suggests that the majority of levels in the cabs of the lead locomotives (n = 156) were less than the California guideline of 0.020 mg/m(3). The sample 95th percentile was roughly half the guideline; resulting in an AIHA exposure rating of category 2/3 (determined using BDA). The elemental carbon (EC) levels in the trailing locomotives tended to be greater than those in the lead locomotive; however, locomotive crews rarely ride in the trailing locomotive. Lead locomotive EC levels were similar to those reported by other investigators studying locomotive crew exposures and to levels measured in urban areas. Lastly, both the EC sample mean and 95%UCL were less than the Environmental Protection Agency (EPA) reference concentration of 0.005 mg/m(3). With the exception of nitrogen dioxide, the overwhelming majority of the measurements for total carbon, polycyclic aromatic hydrocarbons, aromatics, aldehydes, and combustion gases in the cabs of CSXT locomotives were either non-detects or considerably less than the working OELs for the years represented in the database. When compared to the previous American Conference of Governmental Industrial Hygienists (ACGIH) threshold limit value (TLV) of 3 ppm the nitrogen dioxide exposure profile merits an exposure rating of AIHA exposure category 1. However, using the newly adopted TLV of 0.2 ppm the exposure profile receives an exposure rating of category 4. Further evaluation is recommended to determine the current status of nitrogen dioxide exposures. [Supplementary materials are available for this article. Go to the publisher's online edition of Journal of Occupational and Environmental Hygiene for the following free supplemental resource: additional text on OELs, methods, results, and additional figures and tables.].
Functional Interaction Network Construction and Analysis for Disease Discovery.
Wu, Guanming; Haw, Robin
2017-01-01
Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
Vivar, Juan C; Pemu, Priscilla; McPherson, Ruth; Ghosh, Sujoy
2013-08-01
Abstract Unparalleled technological advances have fueled an explosive growth in the scope and scale of biological data and have propelled life sciences into the realm of "Big Data" that cannot be managed or analyzed by conventional approaches. Big Data in the life sciences are driven primarily via a diverse collection of 'omics'-based technologies, including genomics, proteomics, metabolomics, transcriptomics, metagenomics, and lipidomics. Gene-set enrichment analysis is a powerful approach for interrogating large 'omics' datasets, leading to the identification of biological mechanisms associated with observed outcomes. While several factors influence the results from such analysis, the impact from the contents of pathway databases is often under-appreciated. Pathway databases often contain variously named pathways that overlap with one another to varying degrees. Ignoring such redundancies during pathway analysis can lead to the designation of several pathways as being significant due to high content-similarity, rather than truly independent biological mechanisms. Statistically, such dependencies also result in correlated p values and overdispersion, leading to biased results. We investigated the level of redundancies in multiple pathway databases and observed large discrepancies in the nature and extent of pathway overlap. This prompted us to develop the application, ReCiPa (Redundancy Control in Pathway Databases), to control redundancies in pathway databases based on user-defined thresholds. Analysis of genomic and genetic datasets, using ReCiPa-generated overlap-controlled versions of KEGG and Reactome pathways, led to a reduction in redundancy among the top-scoring gene-sets and allowed for the inclusion of additional gene-sets representing possibly novel biological mechanisms. Using obesity as an example, bioinformatic analysis further demonstrated that gene-sets identified from overlap-controlled pathway databases show stronger evidence of prior association to obesity compared to pathways identified from the original databases.
A global building inventory for earthquake loss estimation and risk management
Jaiswal, K.; Wald, D.; Porter, K.
2010-01-01
We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat's demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature. ?? 2010, Earthquake Engineering Research Institute.
Development and analysis of a meteorological database, Argonne National Laboratory, Illinois
Over, Thomas M.; Price, Thomas H.; Ishii, Audrey L.
2010-01-01
A database of hourly values of air temperature, dewpoint temperature, wind speed, and solar radiation from January 1, 1948, to September 30, 2003, primarily using data collected at the Argonne National Laboratory station, was developed for use in continuous-time hydrologic modeling in northeastern Illinois. Missing and apparently erroneous data values were replaced with adjusted values from nearby stations used as 'backup'. Temporal variations in the statistical properties of the data resulting from changes in measurement and data-storage methodologies were adjusted to match the statistical properties resulting from the data-collection procedures that have been in place since January 1, 1989. The adjustments were computed based on the regressions between the primary data series from Argonne National Laboratory and the backup series using data obtained during common periods; the statistical properties of the regressions were used to assign estimated standard errors to values that were adjusted or filled from other series. Each hourly value was assigned a corresponding data-source flag that indicates the source of the value and its transformations. An analysis of the data-source flags indicates that all the series in the database except dewpoint have a similar fraction of Argonne National Laboratory data, with about 89 percent for the entire period, about 86 percent from 1949 through 1988, and about 98 percent from 1989 through 2003. The dewpoint series, for which observations at Argonne National Laboratory did not begin until 1958, has only about 71 percent Argonne National Laboratory data for the entire period, about 63 percent from 1948 through 1988, and about 93 percent from 1989 through 2003, indicating a lower reliability of the dewpoint sensor. A basic statistical analysis of the filled and adjusted data series in the database, and a series of potential evapotranspiration computed from them using the computer program LXPET (Lamoreux Potential Evapotranspiration) also was carried out. This analysis indicates annual cycles in solar radiation and potential evapotranspiration that follow the annual cycle of extraterrestrial solar radiation, whereas temperature and dewpoint annual cycles are lagged by about 1 month relative to the solar cycle. The annual cycle of wind has a late summer minimum, and spring and fall maximums. At the annual time scale, the filled and adjusted data series and computed potential evapotranspiration have significant serial correlation and possibly have significant temporal trends. The inter-annual fluctuations of temperature and dewpoint are weakest, whereas those of wind and potential evapotranspiration are strongest.
PylotDB - A Database Management, Graphing, and Analysis Tool Written in Python
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnette, Daniel W.
2012-01-04
PylotDB, written completely in Python, provides a user interface (UI) with which to interact with, analyze, graph data from, and manage open source databases such as MySQL. The UI mitigates the user having to know in-depth knowledge of the database application programming interface (API). PylotDB allows the user to generate various kinds of plots from user-selected data; generate statistical information on text as well as numerical fields; backup and restore databases; compare database tables across different databases as well as across different servers; extract information from any field to create new fields; generate, edit, and delete databases, tables, and fields;more » generate or read into a table CSV data; and similar operations. Since much of the database information is brought under control of the Python computer language, PylotDB is not intended for huge databases for which MySQL and Oracle, for example, are better suited. PylotDB is better suited for smaller databases that might be typically needed in a small research group situation. PylotDB can also be used as a learning tool for database applications in general.« less
S.I.I.A for monitoring crop evolution and anomaly detection in Andalusia by remote sensing
NASA Astrophysics Data System (ADS)
Rodriguez Perez, Antonio Jose; Louakfaoui, El Mostafa; Munoz Rastrero, Antonio; Rubio Perez, Luis Alberto; de Pablos Epalza, Carmen
2004-02-01
A new remote sensing application was developed and incorporated to the Agrarian Integrated Information System (S.I.I.A), project which is involved on integrating the regional farming databases from a geographical point of view, adding new values and uses to the original information. The project is supported by the Studies and Statistical Service, Regional Government Ministry of Agriculture and Fisheries (CAP). The process integrates NDVI values from daily NOAA-AVHRR and monthly IRS-WIFS images, and crop classes location maps. Agrarian local information and meteorological information is being included in the working process to produce a synergistic effect. An updated crop-growing evaluation state is obtained by 10-days periods, crop class, sensor type (including data fusion) and administrative geographical borders. Last ten years crop database (1992-2002) has been organized according to these variables. Crop class database can be accessed by an application which helps users on the crop statistical analysis. Multi-temporal and multi-geographical comparative analysis can be done by the user, not only for a year but also for a historical point of view. Moreover, real time crop anomalies can be detected and analyzed. Most of the output products will be available on Internet in the near future by a on-line application.
Data Resource Profile: United Nations Children’s Fund (UNICEF)
Murray, Colleen; Newby, Holly
2012-01-01
The United Nations Children’s Fund (UNICEF) plays a leading role in the collection, compilation, analysis and dissemination of data to inform sound policies, legislation and programmes for promoting children’s rights and well-being, and for global monitoring of progress towards the Millennium Development Goals. UNICEF maintains a set of global databases representing nearly 200 countries and covering the areas of child mortality, child health, maternal health, nutrition, immunization, water and sanitation, HIV/AIDS, education and child protection. These databases consist of internationally comparable and statistically sound data, and are updated annually through a process that draws on a wealth of data provided by UNICEF’s wide network of >150 field offices. The databases are composed primarily of estimates from household surveys, with data from censuses, administrative records, vital registration systems and statistical models contributing to some key indicators as well. The data are assessed for quality based on a set of objective criteria to ensure that only the most reliable nationally representative information is included. For most indicators, data are available at the global, regional and national levels, plus sub-national disaggregation by sex, urban/rural residence and household wealth. The global databases are featured in UNICEF’s flagship publications, inter-agency reports, including the Secretary General’s Millennium Development Goals Report and Countdown to 2015, sector-specific reports and statistical country profiles. They are also publicly available on www.childinfo.org, together with trend data and equity analyses. PMID:23211414
The Cardiac Safety Research Consortium ECG database.
Kligfield, Paul; Green, Cynthia L
2012-01-01
The Cardiac Safety Research Consortium (CSRC) ECG database was initiated to foster research using anonymized, XML-formatted, digitized ECGs with corresponding descriptive variables from placebo- and positive-control arms of thorough QT studies submitted to the US Food and Drug Administration (FDA) by pharmaceutical sponsors. The database can be expanded to other data that are submitted directly to CSRC from other sources, and currently includes digitized ECGs from patients with genotyped varieties of congenital long-QT syndrome; this congenital long-QT database is also linked to ambulatory electrocardiograms stored in the Telemetric and Holter ECG Warehouse (THEW). Thorough QT data sets are available from CSRC for unblinded development of algorithms for analysis of repolarization and for blinded comparative testing of algorithms developed for the identification of moxifloxacin, as used as a positive control in thorough QT studies. Policies and procedures for access to these data sets are available from CSRC, which has developed tools for statistical analysis of blinded new algorithm performance. A recently approved CSRC project will create a data set for blinded analysis of automated ECG interval measurements, whose initial focus will include comparison of four of the major manufacturers of automated electrocardiographs in the United States. CSRC welcomes application for use of the ECG database for clinical investigation. Copyright © 2012 Elsevier Inc. All rights reserved.
Bem, Daryl; Tressoldi, Patrizio; Rabeyron, Thomas; Duggan, Michael
2015-01-01
In 2011, one of the authors (DJB) published a report of nine experiments in the Journal of Personality and Social Psychology purporting to demonstrate that an individual's cognitive and affective responses can be influenced by randomly selected stimulus events that do not occur until after his or her responses have already been made and recorded, a generalized variant of the phenomenon traditionally denoted by the term precognition. To encourage replications, all materials needed to conduct them were made available on request. We here report a meta-analysis of 90 experiments from 33 laboratories in 14 countries which yielded an overall effect greater than 6 sigma, z = 6.40, p = 1.2 × 10 (-10 ) with an effect size (Hedges' g) of 0.09. A Bayesian analysis yielded a Bayes Factor of 5.1 × 10 (9), greatly exceeding the criterion value of 100 for "decisive evidence" in support of the experimental hypothesis. When DJB's original experiments are excluded, the combined effect size for replications by independent investigators is 0.06, z = 4.16, p = 1.1 × 10 (-5), and the BF value is 3,853, again exceeding the criterion for "decisive evidence." The number of potentially unretrieved experiments required to reduce the overall effect size of the complete database to a trivial value of 0.01 is 544, and seven of eight additional statistical tests support the conclusion that the database is not significantly compromised by either selection bias or by intense " p-hacking"-the selective suppression of findings or analyses that failed to yield statistical significance. P-curve analysis, a recently introduced statistical technique, estimates the true effect size of the experiments to be 0.20 for the complete database and 0.24 for the independent replications, virtually identical to the effect size of DJB's original experiments (0.22) and the closely related "presentiment" experiments (0.21). We discuss the controversial status of precognition and other anomalous effects collectively known as psi.
[The concept "a case in outpatient treatment" in military policlinic activity].
Vinogradov, S N; Vorob'ev, E G; Shklovskiĭ, B L
2014-04-01
Substantiates the necessity of transition of military policlinics to the accounting system and evaluation of their activity on the finished cases of outpatient treatment. Only automating data-statistical processes can solve this problem. On the basis of analysis of the literature data, requirements of the guidance documents and observational results concludes that preliminarily should be done revisal (formalisation) of existing concepts of medical statistics from the position of information environment which in use - electronic databases. In this aspect specified the main features of outpatient treatment case as a unit of medical-statistical record, and formulated its definition.
The Monitoring Erosion of Agricultural Land and spatial database of erosion events
NASA Astrophysics Data System (ADS)
Kapicka, Jiri; Zizala, Daniel
2013-04-01
In 2011 originated in The Czech Republic The Monitoring Erosion of Agricultural Land as joint project of State Land Office (SLO) and Research Institute for Soil and Water Conservation (RISWC). The aim of the project is collecting and record keeping information about erosion events on agricultural land and their evaluation. The main idea is a creation of a spatial database that will be source of data and information for evaluation and modeling erosion process, for proposal of preventive measures and measures to reduce negative impacts of erosion events. A subject of monitoring is the manifestations of water erosion, wind erosion and slope deformation in which cause damaged agriculture land. A website, available on http://me.vumop.cz, is used as a tool for keeping and browsing information about monitored events. SLO employees carry out record keeping. RISWC is specialist institute in the Monitoring Erosion of Agricultural Land that performs keeping the spatial database, running the website, managing the record keeping of events, analysis the cause of origins events and statistical evaluations of keeping events and proposed measures. Records are inserted into the database using the user interface of the website which has map server as a component. Website is based on database technology PostgreSQL with superstructure PostGIS and MapServer UMN. Each record is in the database spatial localized by a drawing and it contains description information about character of event (data, situation description etc.) then there are recorded information about land cover and about grown crops. A part of database is photodocumentation which is taken in field reconnaissance which is performed within two days after notify of event. Another part of database are information about precipitations from accessible precipitation gauges. Website allows to do simple spatial analysis as are area calculation, slope calculation, percentage representation of GAEC etc.. Database structure was designed on the base of needs analysis inputs to mathematical models. Mathematical models are used for detailed analysis of chosen erosion events which include soil analysis. Till the end 2012 has had the database 135 events. The content of database still accrues and gives rise to the extensive source of data that is usable for testing mathematical models.
Monitoring of small laboratory animal experiments by a designated web-based database.
Frenzel, T; Grohmann, C; Schumacher, U; Krüll, A
2015-10-01
Multiple-parametric small animal experiments require, by their very nature, a sufficient number of animals which may need to be large to obtain statistically significant results.(1) For this reason database-related systems are required to collect the experimental data as well as to support the later (re-) analysis of the information gained during the experiments. In particular, the monitoring of animal welfare is simplified by the inclusion of warning signals (for instance, loss in body weight >20%). Digital patient charts have been developed for human patients but are usually not able to fulfill the specific needs of animal experimentation. To address this problem a unique web-based monitoring system using standard MySQL, PHP, and nginx has been created. PHP was used to create the HTML-based user interface and outputs in a variety of proprietary file formats, namely portable document format (PDF) or spreadsheet files. This article demonstrates its fundamental features and the easy and secure access it offers to the data from any place using a web browser. This information will help other researchers create their own individual databases in a similar way. The use of QR-codes plays an important role for stress-free use of the database. We demonstrate a way to easily identify all animals and samples and data collected during the experiments. Specific ways to record animal irradiations and chemotherapy applications are shown. This new analysis tool allows the effective and detailed analysis of huge amounts of data collected through small animal experiments. It supports proper statistical evaluation of the data and provides excellent retrievable data storage. © The Author(s) 2015.
[Review of meta-analysis research on exercise in South Korea].
Song, Youngshin; Gang, Moonhee; Kim, Sun Ae; Shin, In Soo
2014-10-01
The purpose of this study was to evaluate the quality of meta-analysis regarding exercise using Assessment of Multiple Systematic Reviews (AMSTAR) as well as to compare effect size according to outcomes. Electronic databases including the Korean Studies Information Service System (KISS), the National Assembly Library and the DBpia, HAKJISA and RISS4U for the dates 1990 to January 2014 were searched for 'meta-analysis' and 'exercise' in the fields of medical, nursing, physical therapy and physical exercise in Korea. AMSTAR was scored for quality assessment of the 33 articles included in the study. Data were analyzed using descriptive statistics, t-test, ANOVA and χ²-test. The mean score for AMSTAR evaluations was 4.18 (SD=1.78) and about 67% were classified at the low-quality level and 30% at the moderate-quality level. The scores of quality were statistically different by field of research, number of participants, number of databases, financial support and approval by IRB. The effect size that presented in individual studies were different by type of exercise in the applied intervention. This critical appraisal of meta-analysis published in various field that focused on exercise indicates that a guideline such as the PRISMA checklist should be strongly recommended for optimum reporting of meta-analysis across research fields.
Haytowitz, David B; Pehrsson, Pamela R
2018-01-01
For nearly 20years, the National Food and Nutrient Analysis Program (NFNAP) has expanded and improved the quantity and quality of data in US Department of Agriculture's (USDA) food composition databases (FCDB) through the collection and analysis of nationally representative food samples. NFNAP employs statistically valid sampling plans, the Key Foods approach to identify and prioritize foods and nutrients, comprehensive quality control protocols, and analytical oversight to generate new and updated analytical data for food components. NFNAP has allowed the Nutrient Data Laboratory to keep up with the dynamic US food supply and emerging scientific research. Recently generated results for nationally representative food samples show marked changes compared to previous database values for selected nutrients. Monitoring changes in the composition of foods is critical in keeping FCDB up-to-date, so that they remain a vital tool in assessing the nutrient intake of national populations, as well as for providing dietary advice. Published by Elsevier Ltd.
SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.
Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia
2012-01-01
Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.
SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis
Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V. Lila; Karagouni, Amalia D.; Tsakalidis, Athanasios; Kossida, Sophia
2012-01-01
Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality. PMID:22267904
NASA Technical Reports Server (NTRS)
Worrall, Diana M. (Editor); Biemesderfer, Chris (Editor); Barnes, Jeannette (Editor)
1992-01-01
Consideration is given to a definition of a distribution format for X-ray data, the Einstein on-line system, the NASA/IPAC extragalactic database, COBE astronomical databases, Cosmic Background Explorer astronomical databases, the ADAM software environment, the Groningen Image Processing System, search for a common data model for astronomical data analysis systems, deconvolution for real and synthetic apertures, pitfalls in image reconstruction, a direct method for spectral and image restoration, and a discription of a Poisson imagery super resolution algorithm. Also discussed are multivariate statistics on HI and IRAS images, a faint object classification using neural networks, a matched filter for improving SNR of radio maps, automated aperture photometry of CCD images, interactive graphics interpreter, the ROSAT extreme ultra-violet sky survey, a quantitative study of optimal extraction, an automated analysis of spectra, applications of synthetic photometry, an algorithm for extra-solar planet system detection and data reduction facilities for the William Herschel telescope.
Analysis of Landslide Hazard Impact Using the Landslide Database for Germany
NASA Astrophysics Data System (ADS)
Klose, M.; Damm, B.
2014-12-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still shows a comprehensive research history in Germany, but only one focused on development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present contribution reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this contribution, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of different case studies in the German Central Uplands. The case study results exemplify database application in analysis of vulnerability to landslides, impact statistics, and hazard or cost modeling.
NASA Astrophysics Data System (ADS)
Zhang, J. H.; Yang, J.; Sun, Y. S.
2015-06-01
This system combines the Mapworld platform and informationization of disabled person affairs, uses the basic information of disabled person as center frame. Based on the disabled person population database, the affairs management system and the statistical account system, the data were effectively integrated and the united information resource database was built. Though the data analysis and mining, the system provides powerful data support to the decision making, the affairs managing and the public serving. It finally realizes the rationalization, normalization and scientization of disabled person affairs management. It also makes significant contributions to the great-leap-forward development of the informationization of China Disabled Person's Federation.
Buell, Gary R.; Wehmeyer, Loren L.; Calhoun, Daniel L.
2012-01-01
A hydrologic and landscape database was developed by the U.S. Geological Survey, in cooperation with the U.S. Fish and Wildlife Service, for the Cache River and White River National Wildlife Refuges and their contributing watersheds in Arkansas, Missouri, and Oklahoma. The database is composed of a set of ASCII files, Microsoft Access® files, Microsoft Excel® files, an Environmental Systems Research Institute (ESRI) ArcGIS® geodatabase, ESRI ArcGRID® raster datasets, and an ESRI ArcReader® published map. The database was developed as an assessment and evaluation tool to use in examining refuge-specific hydrologic patterns and trends as related to water availability for refuge ecosystems, habitats, and target species; and includes hydrologic time-series data, statistics, and hydroecological metrics that can be used to assess refuge hydrologic conditions and the availability of aquatic and riparian habitat. Landscape data that describe the refuge physiographic setting and the locations of hydrologic-data collection stations are also included in the database. Categories of landscape data include land cover, soil hydrologic characteristics, physiographic features, geographic and hydrographic boundaries, hydrographic features, regional runoff estimates, and gaging-station locations. The database geographic extent covers three hydrologic subregions—the Lower Mississippi–St Francis (0802), the Upper White (1101), and the Lower Arkansas (1111)—within which human activities, climatic variation, and hydrologic processes can potentially affect the hydrologic regime of the refuges and adjacent areas. Database construction has been automated to facilitate periodic updates with new data. The database report (1) serves as a user guide for the database, (2) describes the data-collection, data-reduction, and data-analysis methods used to construct the database, (3) provides a statistical and graphical description of the database, and (4) provides detailed information on the development of analytical techniques designed to assess water availability for ecological needs.
Using databases in medical education research: AMEE Guide No. 77.
Cleland, Jennifer; Scott, Neil; Harrild, Kirsten; Moffat, Mandy
2013-05-01
This AMEE Guide offers an introduction to the use of databases in medical education research. It is intended for those who are contemplating conducting research in medical education but are new to the field. The Guide is structured around the process of planning your research so that data collection, management and analysis are appropriate for the research question. Throughout we consider contextual possibilities and constraints to educational research using databases, such as the resources available, and provide concrete examples of medical education research to illustrate many points. The first section of the Guide explains the difference between different types of data and classifying data, and addresses the rationale for research using databases in medical education. We explain the difference between qualitative research and qualitative data, the difference between categorical and quantitative data, and the difference types of data which fall into these categories. The Guide reviews the strengths and weaknesses of qualitative and quantitative research. The next section is structured around how to work with quantitative and qualitative databases and provides guidance on the many practicalities of setting up a database. This includes how to organise your database, including anonymising data and coding, as well as preparing and describing your data so it is ready for analysis. The critical matter of the ethics of using databases in medical educational research, including using routinely collected data versus data collected for research purposes, and issues of confidentiality, is discussed. Core to the Guide is drawing out the similarities and differences in working with different types of data and different types of databases. Future AMEE Guides in the research series will address statistical analysis of data in more detail.
Asymptotically Optimal and Private Statistical Estimation
NASA Astrophysics Data System (ADS)
Smith, Adam
Differential privacy is a definition of "privacy" for statistical databases. The definition is simple, yet it implies strong semantics even in the presence of an adversary with arbitrary auxiliary information about the database.
Collected Notes on the Workshop for Pattern Discovery in Large Databases
NASA Technical Reports Server (NTRS)
Buntine, Wray (Editor); Delalto, Martha (Editor)
1991-01-01
These collected notes are a record of material presented at the Workshop. The core data analysis is addressed that have traditionally required statistical or pattern recognition techniques. Some of the core tasks include classification, discrimination, clustering, supervised and unsupervised learning, discovery and diagnosis, i.e., general pattern discovery.
The open-source movement: an introduction for forestry professionals
Patrick Proctor; Paul C. Van Deusen; Linda S. Heath; Jeffrey H. Gove
2005-01-01
In recent years, the open-source movement has yielded a generous and powerful suite of software and utilities that rivals those developed by many commercial software companies. Open-source programs are available for many scientific needs: operating systems, databases, statistical analysis, Geographic Information System applications, and object-oriented programming....
[Analysis the epidemiological features of 3,258 patients with allergic rhinitis in Yichang City].
Chen, Bo; Zhang, Zhimao; Pei, Zhi; Chen, Shihan; Du, Zhimei; Lan, Yan; Han, Bei; Qi, Qi
2015-02-01
To investigate the epidemiological features in patients with allergic rhinitis (AR) in Yichang city, and put forward effective prevention and control measures. Collecting the data of allergic rhinitis in city proper from 2010 to 2013, input the data into the database and used statistical analysis. In recent years, the AR patients in this area increased year by year. The spring and the winter were the peak season of onset. The patients was constituted by young men. There was statistically significant difference between the age, the area,and the gender (P < 0.01). The history of allergy and the diseases related to the gender composition had statistical significance difference (P < 0.05). The allergens and the positive degree in gender, age structure had statistically significant difference (P < 0.01). Need to conduct the healthy propaganda and education, optimizing the environment, change the bad habits, timely medical treatment, standard treatment.
VAPEPS user's reference manual, version 5.0
NASA Technical Reports Server (NTRS)
Park, D. M.
1988-01-01
This is the reference manual for the VibroAcoustic Payload Environment Prediction System (VAPEPS). The system consists of a computer program and a vibroacoustic database. The purpose of the system is to collect measurements of vibroacoustic data taken from flight events and ground tests, and to retrieve this data and provide a means of using the data to predict future payload environments. This manual describes the operating language of the program. Topics covered include database commands, Statistical Energy Analysis (SEA) prediction commands, stress prediction command, and general computational commands.
Rule-Based Statistical Calculations on a Database Abstract.
1983-06-01
quadruples 17 L.6.6. Our methds ~ in distributed systems 17 L.6.7. Easy extensions 17 17. The datibms abstract as a database 17 17.1.w S orae mu is 1.7.2...the largest item in the intersection of two sets cannot be any larger that the minima of the maxima of the two sets for some numeric attribute. On the...from "range analysis" of arbitrary numeric attributes. Suppose the length range of tankers is from 300 to 1000 feet and that of American ships 50 to
Hepatitis C infection among intravenous drug users attending therapy programs in Cyprus.
Demetriou, Victoria L; van de Vijver, David A M C; Hezka, Johana; Kostrikis, Leondios G; Kostrikis, Leondios G
2010-02-01
The most high-risk population for HCV transmission worldwide today are intravenous drug users. HCV genotypes in the general population in Cyprus demonstrate a polyphyletic infection and include subtypes associated with intravenous drug users. The prevalence of HCV, HBV, and HIV infection, HCV genotypes and risk factors among intravenous drug users in Cyprus were investigated here for the first time. Blood samples and interviews were obtained from 40 consenting users in treatment centers, and were tested for HCV, HBV, and HIV antibodies. On the HCV-positive samples, viral RNA extraction, RT-PCR and sequencing were performed. Phylogenetic analysis determined subtype and any relationships with database sequences and statistical analysis determined any correlation of risk factors with HCV infection. The prevalence of HCV infection was 50%, but no HBV or HIV infections were found. Of the PCR-positive samples, eight (57%) were genotype 3a, and six (43%) were 1b. No other subtypes, recombinant strains or mixed infections were observed. The phylogenetic analysis of the injecting drug users' strains against database sequences observed no clustering, which does not allow determination of transmission route, possibly due to a limitation of sequences in the database. However, three clusters were discovered among the drug users' sequences, revealing small groups who possibly share injecting equipment. Statistical analysis showed the risk factor associated with HCV infection is drug use duration. Overall, the polyphyletic nature of HCV infection in Cyprus is confirmed, but the transmission route remains unknown. These findings highlight the need for harm-reduction strategies to reduce HCV transmission. (c) 2009 Wiley-Liss, Inc.
Shuttle Hypervelocity Impact Database
NASA Technical Reports Server (NTRS)
Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.
2011-01-01
With three missions outstanding, the Shuttle Hypervelocity Impact Database has nearly 3000 entries. The data is divided into tables for crew module windows, payload bay door radiators and thermal protection system regions, with window impacts compromising just over half the records. In general, the database provides dimensions of hypervelocity impact damage, a component level location (i.e., window number or radiator panel number) and the orbiter mission when the impact occurred. Additional detail on the type of particle that produced the damage site is provided when sampling data and definitive analysis results are available. Details and insights on the contents of the database including examples of descriptive statistics will be provided. Post flight impact damage inspection and sampling techniques that were employed during the different observation campaigns will also be discussed. Potential enhancements to the database structure and availability of the data for other researchers will be addressed in the Future Work section. A related database of returned surfaces from the International Space Station will also be introduced.
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data
Chen, Yi-Hau
2017-01-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.
Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin
2017-06-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.
Lack, N
2001-08-01
The introduction of the modified data set for quality assurance in obstetrics (formerly perinatal survey) in Lower Saxony and Bavaria as early as 1999 saw the urgent requirement for a corresponding new statistical analysis of the revised data. The general outline of a new data reporting concept was originally presented by the Bavarian Commission for Perinatology and Neonatology at the Munich Perinatal Conference in November 1997. These ideas are germinal to content and layout of the new quality report for obstetrics currently in its nationwide harmonisation phase coordinated by the federal office for quality assurance in hospital care. A flexible and modular database oriented analysis tool developed in Bavaria is now in its second year of successful operation. The functionalities of this system are described in detail.
Improved Bond Equations for Fiber-Reinforced Polymer Bars in Concrete
Pour, Sadaf Moallemi; Alam, M. Shahria; Milani, Abbas S.
2016-01-01
This paper explores a set of new equations to predict the bond strength between fiber reinforced polymer (FRP) rebar and concrete. The proposed equations are based on a comprehensive statistical analysis and existing experimental results in the literature. Namely, the most effective parameters on bond behavior of FRP concrete were first identified by applying a factorial analysis on a part of the available database. Then the database that contains 250 pullout tests were divided into four groups based on the concrete compressive strength and the rebar surface. Afterward, nonlinear regression analysis was performed for each study group in order to determine the bond equations. The results show that the proposed equations can predict bond strengths more accurately compared to the other previously reported models. PMID:28773859
Hu, Yiwen; Chen, Jiahui; Hu, Guping; Yu, Jianchen; Zhu, Xun; Lin, Yongcheng; Chen, Shengping; Yuan, Jie
2015-01-01
Every year, hundreds of new compounds are discovered from the metabolites of marine organisms. Finding new and useful compounds is one of the crucial drivers for this field of research. Here we describe the statistics of bioactive compounds discovered from marine organisms from 1985 to 2012. This work is based on our database, which contains information on more than 15,000 chemical substances including 4196 bioactive marine natural products. We performed a comprehensive statistical analysis to understand the characteristics of the novel bioactive compounds and detail temporal trends, chemical structures, species distribution, and research progress. We hope this meta-analysis will provide useful information for research into the bioactivity of marine natural products and drug development. PMID:25574736
Kessel, K A; Habermehl, D; Bohn, C; Jäger, A; Floca, R O; Zhang, L; Bougatf, N; Bendl, R; Debus, J; Combs, S E
2012-12-01
Especially in the field of radiation oncology, handling a large variety of voluminous datasets from various information systems in different documentation styles efficiently is crucial for patient care and research. To date, conducting retrospective clinical analyses is rather difficult and time consuming. With the example of patients with pancreatic cancer treated with radio-chemotherapy, we performed a therapy evaluation by using an analysis system connected with a documentation system. A total number of 783 patients have been documented into a professional, database-based documentation system. Information about radiation therapy, diagnostic images and dose distributions have been imported into the web-based system. For 36 patients with disease progression after neoadjuvant chemoradiation, we designed and established an analysis workflow. After an automatic registration of the radiation plans with the follow-up images, the recurrence volumes are segmented manually. Based on these volumes the DVH (dose volume histogram) statistic is calculated, followed by the determination of the dose applied to the region of recurrence. All results are saved in the database and included in statistical calculations. The main goal of using an automatic analysis tool is to reduce time and effort conducting clinical analyses, especially with large patient groups. We showed a first approach and use of some existing tools, however manual interaction is still necessary. Further steps need to be taken to enhance automation. Already, it has become apparent that the benefits of digital data management and analysis lie in the central storage of data and reusability of the results. Therefore, we intend to adapt the analysis system to other types of tumors in radiation oncology.
Analysis and preliminary design of Kunming land use and planning management information system
NASA Astrophysics Data System (ADS)
Li, Li; Chen, Zhenjie
2007-06-01
This article analyzes Kunming land use planning and management information system from the system building objectives and system building requirements aspects, nails down the system's users, functional requirements and construction requirements. On these bases, the three-tier system architecture based on C/S and B/S is defined: the user interface layer, the business logic layer and the data services layer. According to requirements for the construction of land use planning and management information database derived from standards of the Ministry of Land and Resources and the construction program of the Golden Land Project, this paper divides system databases into planning document database, planning implementation database, working map database and system maintenance database. In the design of the system interface, this paper uses various methods and data formats for data transmission and sharing between upper and lower levels. According to the system analysis results, main modules of the system are designed as follows: planning data management, the planning and annual plan preparation and control function, day-to-day planning management, planning revision management, decision-making support, thematic inquiry statistics, planning public participation and so on; besides that, the system realization technologies are discussed from the system operation mode, development platform and other aspects.
Bias-Free Chemically Diverse Test Sets from Machine Learning.
Swann, Ellen T; Fernandez, Michael; Coote, Michelle L; Barnard, Amanda S
2017-08-14
Current benchmarking methods in quantum chemistry rely on databases that are built using a chemist's intuition. It is not fully understood how diverse or representative these databases truly are. Multivariate statistical techniques like archetypal analysis and K-means clustering have previously been used to summarize large sets of nanoparticles however molecules are more diverse and not as easily characterized by descriptors. In this work, we compare three sets of descriptors based on the one-, two-, and three-dimensional structure of a molecule. Using data from the NIST Computational Chemistry Comparison and Benchmark Database and machine learning techniques, we demonstrate the functional relationship between these structural descriptors and the electronic energy of molecules. Archetypes and prototypes found with topological or Coulomb matrix descriptors can be used to identify smaller, statistically significant test sets that better capture the diversity of chemical space. We apply this same method to find a diverse subset of organic molecules to demonstrate how the methods can easily be reapplied to individual research projects. Finally, we use our bias-free test sets to assess the performance of density functional theory and quantum Monte Carlo methods.
Creation of a virtual cutaneous tissue bank
NASA Astrophysics Data System (ADS)
LaFramboise, William A.; Shah, Sujal; Hoy, R. W.; Letbetter, D.; Petrosko, P.; Vennare, R.; Johnson, Peter C.
2000-04-01
Cellular and non-cellular constituents of skin contain fundamental morphometric features and structural patterns that correlate with tissue function. High resolution digital image acquisitions performed using an automated system and proprietary software to assemble adjacent images and create a contiguous, lossless, digital representation of individual microscope slide specimens. Serial extraction, evaluation and statistical analysis of cutaneous feature is performed utilizing an automated analysis system, to derive normal cutaneous parameters comprising essential structural skin components. Automated digital cutaneous analysis allows for fast extraction of microanatomic dat with accuracy approximating manual measurement. The process provides rapid assessment of feature both within individual specimens and across sample populations. The images, component data, and statistical analysis comprise a bioinformatics database to serve as an architectural blueprint for skin tissue engineering and as a diagnostic standard of comparison for pathologic specimens.
NASA Astrophysics Data System (ADS)
Liang, Y.; Gallaher, D. W.; Grant, G.; Lv, Q.
2011-12-01
Change over time, is the central driver of climate change detection. The goal is to diagnose the underlying causes, and make projections into the future. In an effort to optimize this process we have developed the Data Rod model, an object-oriented approach that provides the ability to query grid cell changes and their relationships to neighboring grid cells through time. The time series data is organized in time-centric structures called "data rods." A single data rod can be pictured as the multi-spectral data history at one grid cell: a vertical column of data through time. This resolves the long-standing problem of managing time-series data and opens new possibilities for temporal data analysis. This structure enables rapid time- centric analysis at any grid cell across multiple sensors and satellite platforms. Collections of data rods can be spatially and temporally filtered, statistically analyzed, and aggregated for use with pattern matching algorithms. Likewise, individual image pixels can be extracted to generate multi-spectral imagery at any spatial and temporal location. The Data Rods project has created a series of prototype databases to store and analyze massive datasets containing multi-modality remote sensing data. Using object-oriented technology, this method overcomes the operational limitations of traditional relational databases. To demonstrate the speed and efficiency of time-centric analysis using the Data Rods model, we have developed a sea ice detection algorithm. This application determines the concentration of sea ice in a small spatial region across a long temporal window. If performed using traditional analytical techniques, this task would typically require extensive data downloads and spatial filtering. Using Data Rods databases, the exact spatio-temporal data set is immediately available No extraneous data is downloaded, and all selected data querying occurs transparently on the server side. Moreover, fundamental statistical calculations such as running averages are easily implemented against the time-centric columns of data.
Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani
2014-07-01
Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources. © 2014 WILEY PERIODICALS, INC.
Statistical Analysis of Protein Ensembles
NASA Astrophysics Data System (ADS)
Máté, Gabriell; Heermann, Dieter
2014-04-01
As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.
Turi, Christina E; Murch, Susan J
2013-07-09
Ethnobotanical research and the study of plants used for rituals, ceremonies and to connect with the spirit world have led to the discovery of many novel psychoactive compounds such as nicotine, caffeine, and cocaine. In North America, spiritual and ceremonial uses of plants are well documented and can be accessed online via the University of Michigan's Native American Ethnobotany Database. The objective of the study was to compare Residual, Bayesian, Binomial and Imprecise Dirichlet Model (IDM) analyses of ritual, ceremonial and spiritual plants in Moerman's ethnobotanical database and to identify genera that may be good candidates for the discovery of novel psychoactive compounds. The database was queried with the following format "Family Name AND Ceremonial OR Spiritual" for 263 North American botanical families. Spiritual and ceremonial flora consisted of 86 families with 517 species belonging to 292 genera. Spiritual taxa were then grouped further into ceremonial medicines and items categories. Residual, Bayesian, Binomial and IDM analysis were performed to identify over and under-utilized families. The 4 statistical approaches were in good agreement when identifying under-utilized families but large families (>393 species) were underemphasized by Binomial, Bayesian and IDM approaches for over-utilization. Residual, Binomial, and IDM analysis identified similar families as over-utilized in the medium (92-392 species) and small (<92 species) classes. The families Apiaceae, Asteraceae, Ericacea, Pinaceae and Salicaceae were identified as significantly over-utilized as ceremonial medicines in medium and large sized families. Analysis of genera within the Apiaceae and Asteraceae suggest that the genus Ligusticum and Artemisia are good candidates for facilitating the discovery of novel psychoactive compounds. The 4 statistical approaches were not consistent in the selection of over-utilization of flora. Residual analysis revealed overall trends that were supported by Binomial analysis when separated into small, medium and large families. The Bayesian, Binomial and IDM approaches identified different genera as potentially important. Species belonging to the genus Artemisia and Ligusticum were most consistently identified and may be valuable in future studies of the ethnopharmacology. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Trouve, A.; Veynante, D.; Bray, K. N. C.; Mantel, T.
1994-01-01
Current flamelot models based on a description of the flame surface dynamics require the closure of two inter-related equations: a transport equation for the mean reaction progress variable, (tilde)c, and a transport equation for the flame surface density, Sigma. The coupling between these two equations is investigated using direct numerical simulations (DNS) with emphasis on the correlation between the turbulent fluxes of (tilde)c, bar(pu''c''), and Sigma, (u'')(sub S)Sigma. Two different DNS databases are used in the present work: a database developed at CTR by A. Trouve and a database developed by C. J. Rutland using a different code. Both databases correspond to statistically one-dimensional premixed flames in isotropic turbulent flow. The run parameters, however, are significantly different, and the two databases correspond to different combustion regimes. It is found that in all simulated flames, the correlation between bar(pu''c'') and (u'')(sub S)Sigma is always strong. The sign, however, of the turbulent flux of (tilde)c or Sigma with respect to the mean gradients, delta(tilde)c/delta(x) or delta(Sigma)/delta(x), is case-dependent. The CTR database is found to exhibit gradient turbulent transport of (tilde)c and Sigma, whereas the Rutland DNS features counter-gradient diffusion. The two databases are analyzed and compared using various tools (a local analysis of the flow field near the flame, a classical analysis of the conservation equation for (tilde)(u''c''), and a thin flame theoretical analysis). A mechanism is then proposed to explain the discrepancies between the two databases and a preliminary simple criterion is derived to predict the occurrence of gradient/counter-gradient turbulent diffusion.
Comparison of direct numerical simulation databases of turbulent channel flow at Reτ = 180
NASA Astrophysics Data System (ADS)
Vreman, A. W.; Kuerten, J. G. M.
2014-01-01
Direct numerical simulation (DNS) databases are compared to assess the accuracy and reproducibility of standard and non-standard turbulence statistics of incompressible plane channel flow at Reτ = 180. Two fundamentally different DNS codes are shown to produce maximum relative deviations below 0.2% for the mean flow, below 1% for the root-mean-square velocity and pressure fluctuations, and below 2% for the three components of the turbulent dissipation. Relatively fine grids and long statistical averaging times are required. An analysis of dissipation spectra demonstrates that the enhanced resolution is necessary for an accurate representation of the smallest physical scales in the turbulent dissipation. The results are related to the physics of turbulent channel flow in several ways. First, the reproducibility supports the hitherto unproven theoretical hypothesis that the statistically stationary state of turbulent channel flow is unique. Second, the peaks of dissipation spectra provide information on length scales of the small-scale turbulence. Third, the computed means and fluctuations of the convective, pressure, and viscous terms in the momentum equation show the importance of the different forces in the momentum equation relative to each other. The Galilean transformation that leads to minimum peak fluctuation of the convective term is determined. Fourth, an analysis of higher-order statistics is performed. The skewness of the longitudinal derivative of the streamwise velocity is stronger than expected (-1.5 at y+ = 30). This skewness and also the strong near-wall intermittency of the normal velocity are related to coherent structures.
Injury profiles related to mortality in patients with a low Injury Severity Score: a case-mix issue?
Joosse, Pieter; Schep, Niels W L; Goslings, J Carel
2012-07-01
Outcome prediction models are widely used to evaluate trauma care. External benchmarking provides individual institutions with a tool to compare survival with a reference dataset. However, these models do have limitations. In this study, the hypothesis was tested whether specific injuries are associated with increased mortality and whether differences in case-mix of these injuries influence outcome comparison. A retrospective study was conducted in a Dutch trauma region. Injury profiles, based on injuries most frequently endured by unexpected death, were determined. The association between these injury profiles and mortality was studied in patients with a low Injury Severity Score by logistic regression. The standardized survival of our population (Ws statistic) was compared with North-American and British reference databases, with and without patients suffering from previously defined injury profiles. In total, 14,811 patients were included. Hip fractures, minor pelvic fractures, femur fractures, and minor thoracic injuries were significantly associated with mortality corrected for age, sex, and physiologic derangement in patients with a low injury severity. Odds ratios ranged from 2.42 to 2.92. The Ws statistic for comparison with North-American databases significantly improved after exclusion of patients with these injuries. The Ws statistic for comparison with a British reference database remained unchanged. Hip fractures, minor pelvic fractures, femur fractures, and minor thoracic wall injuries are associated with increased mortality. Comparative outcome analysis of a population with a reference database that differs in case-mix with respect to these injuries should be interpreted cautiously. Prognostic study, level II.
Rule-based statistical data mining agents for an e-commerce application
NASA Astrophysics Data System (ADS)
Qin, Yi; Zhang, Yan-Qing; King, K. N.; Sunderraman, Rajshekhar
2003-03-01
Intelligent data mining techniques have useful e-Business applications. Because an e-Commerce application is related to multiple domains such as statistical analysis, market competition, price comparison, profit improvement and personal preferences, this paper presents a hybrid knowledge-based e-Commerce system fusing intelligent techniques, statistical data mining, and personal information to enhance QoS (Quality of Service) of e-Commerce. A Web-based e-Commerce application software system, eDVD Web Shopping Center, is successfully implemented uisng Java servlets and an Oracle81 database server. Simulation results have shown that the hybrid intelligent e-Commerce system is able to make smart decisions for different customers.
T.M. Barrett
2004-01-01
During the 1990s, forest inventories for California, Oregon, and Washington were conducted by different agencies using different methods. The Pacific Northwest Research Station Forest Inventory and Analysis program recently integrated these inventories into a single database. This document briefly describes potential statistical methods for estimating population totals...
A Survey of Computer Use by Undergraduate Psychology Departments in Virginia.
ERIC Educational Resources Information Center
Stoloff, Michael L.; Couch, James V.
1987-01-01
Reports a survey of computer use in psychology departments in Virginia's four year colleges. Results showed that faculty, students, and clerical staff used word processing, statistical analysis, and database management most frequently. The three most numerous computers brands were the Apple II family, IBM PCs, and the Apple Macintosh. (Author/JDH)
The Marshall Islands Data Management Program
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoker, A.C.; Conrado, C.L.
1995-09-01
This report is a resource document of the methods and procedures used currently in the Data Management Program of the Marshall Islands Dose Assessment and Radioecology Project. Since 1973, over 60,000 environmental samples have been collected. Our program includes relational database design, programming and maintenance; sample and information management; sample tracking; quality control; and data entry, evaluation and reduction. The usefulness of scientific databases involves careful planning in order to fulfill the requirements of any large research program. Compilation of scientific results requires consolidation of information from several databases, and incorporation of new information as it is generated. The successmore » in combining and organizing all radionuclide analysis, sample information and statistical results into a readily accessible form, is critical to our project.« less
2014-01-01
Protein biomarkers offer major benefits for diagnosis and monitoring of disease processes. Recent advances in protein mass spectrometry make it feasible to use this very sensitive technology to detect and quantify proteins in blood. To explore the potential of blood biomarkers, we conducted a thorough review to evaluate the reliability of data in the literature and to determine the spectrum of proteins reported to exist in blood with a goal of creating a Federated Database of Blood Proteins (FDBP). A unique feature of our approach is the use of a SQL database for all of the peptide data; the power of the SQL database combined with standard informatic algorithms such as BLAST and the statistical analysis system (SAS) allowed the rapid annotation and analysis of the database without the need to create special programs to manage the data. Our mathematical analysis and review shows that in addition to the usual secreted proteins found in blood, there are many reports of intracellular proteins and good agreement on transcription factors, DNA remodelling factors in addition to cellular receptors and their signal transduction enzymes. Overall, we have catalogued about 12,130 proteins identified by at least one unique peptide, and of these 3858 have 3 or more peptide correlations. The FDBP with annotations should facilitate testing blood for specific disease biomarkers. PMID:24476026
Antithrombotic drug therapy for IgA nephropathy: a meta analysis of randomized controlled trials.
Liu, Xiu-Juan; Geng, Yan-Qiu; Xin, Shao-Nan; Huang, Guo-Ming; Tu, Xiao-Wen; Ding, Zhong-Ru; Chen, Xiang-Mei
2011-01-01
Antithrombotic agents, including antiplatelet agents, anticoagulants and thrombolysis agents, have been widely used in the management of immunoglobulin A (IgA) nephropathy in Chinese and Japanese populations. To systematically evaluate the effects of antithrombotic agents for IgA nephropathy. Data sources consisted of MEDLINE, EMBASE, the Cochrane Library, Chinese Biomedical Literature Database (CBM), Chinese Science and Technology Periodicals Databases (CNKI) and Japana Centra Revuo Medicina (http://www.jamas.gr.jp) up to April 5, 2011. The quality of the studies was evaluated from the intention to treat analysis and allocation concealment, as well as by the Jadad method. Meta-analyses were performed on the outcomes of proteinuria and renal function. Six articles met the predetermined inclusion criteria. Antithrombotic agents showed statistically significant effects on proteinuria (p<0.0001) but not on the protection of renal function (p=0.07). The pooled risk ratio for proteinuria was 0.53, [95% confidence intervals (CI): 0.41-0.68; I(2)=0%] and for renal function it was 0.42 (95% CI 0.17-1.06; I(2)=72%). Subgroup analysis showed that dipyridamole was beneficial for proteinuria (p=0.0003) but had no significant effects on protecting renal function. Urokinase had statistically significant effects both on the reduction of proteinuria (p=0.0005) and protecting renal function (p<0.00001) when compared with the control group. Antithrombotic agents had statistically significant effects on the reduction of proteinuria but not on the protection of renal function in patients with IgAN. Urokinase had statistically significant effects both on the reduction of proteinuria and on protecting renal function. Urokinase was shown to be a promising medication and should be investigated further.
Chess databases as a research vehicle in psychology: Modeling large data.
Vaci, Nemanja; Bilalić, Merim
2017-08-01
The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess-namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences.
Rosato, Stefano; D'Errigo, Paola; Badoni, Gabriella; Fusco, Danilo; Perucci, Carlo A; Seccareccia, Fulvia
2008-08-01
The availability of two contemporary sources of information about coronary artery bypass graft (CABG) interventions, allowed 1) to verify the feasibility of performing outcome evaluation studies using administrative data sources, and 2) to compare hospital performance obtainable using the CABG Project clinical database with hospital performance derived from the use of current administrative data. Interventions recorded in the CABG Project were linked to the hospital discharge record (HDR) administrative database. Only the linked records were considered for subsequent analyses (46% of the total CABG Project). A new selected population "clinical card-HDR" was then defined. Two independent risk-adjustment models were applied, each of them using information derived from one of the two different sources. Then, HDR information was supplemented with some patient preoperative conditions from the CABG clinical database. The two models were compared in terms of their adaptability to data. Hospital performances identified by the two different models and significantly different from the mean was compared. In only 4 of the 13 hospitals considered for analysis, the results obtained using the HDR model did not completely overlap with those obtained by the CABG model. When comparing statistical parameters of the HDR model and the HDR model + patient preoperative conditions, the latter showed the best adaptability to data. In this "clinical card-HDR" population, hospital performance assessment obtained using information from the clinical database is similar to that derived from the use of current administrative data. However, when risk-adjustment models built on administrative databases are supplemented with a few clinical variables, their statistical parameters improve and hospital performance assessment becomes more accurate.
GPCALMA: A Tool For Mammography With A GRID-Connected Distributed Database
NASA Astrophysics Data System (ADS)
Bottigli, U.; Cerello, P.; Cheran, S.; Delogu, P.; Fantacci, M. E.; Fauci, F.; Golosio, B.; Lauria, A.; Lopez Torres, E.; Magro, R.; Masala, G. L.; Oliva, P.; Palmiero, R.; Raso, G.; Retico, A.; Stumbo, S.; Tangaro, S.
2003-09-01
The GPCALMA (Grid Platform for Computer Assisted Library for MAmmography) collaboration involves several departments of physics, INFN (National Institute of Nuclear Physics) sections, and italian hospitals. The aim of this collaboration is developing a tool that can help radiologists in early detection of breast cancer. GPCALMA has built a large distributed database of digitised mammographic images (about 5500 images corresponding to 1650 patients) and developed a CAD (Computer Aided Detection) software which is integrated in a station that can also be used to acquire new images, as archive and to perform statistical analysis. The images (18×24 cm2, digitised by a CCD linear scanner with a 85 μm pitch and 4096 gray levels) are completely described: pathological ones have a consistent characterization with radiologist's diagnosis and histological data, non pathological ones correspond to patients with a follow up at least three years. The distributed database is realized throught the connection of all the hospitals and research centers in GRID tecnology. In each hospital local patients digital images are stored in the local database. Using GRID connection, GPCALMA will allow each node to work on distributed database data as well as local database data. Using its database the GPCALMA tools perform several analysis. A texture analysis, i.e. an automated classification on adipose, dense or glandular texture, can be provided by the system. GPCALMA software also allows classification of pathological features, in particular massive lesions (both opacities and spiculated lesions) analysis and microcalcification clusters analysis. The detection of pathological features is made using neural network software that provides a selection of areas showing a given "suspicion level" of lesion occurrence. The performance of the GPCALMA system will be presented in terms of the ROC (Receiver Operating Characteristic) curves. The results of GPCALMA system as "second reader" will also be presented.
Assessment of the SFC database for analysis and modeling
NASA Technical Reports Server (NTRS)
Centeno, Martha A.
1994-01-01
SFC is one of the four clusters that make up the Integrated Work Control System (IWCS), which will integrate the shuttle processing databases at Kennedy Space Center (KSC). The IWCS framework will enable communication among the four clusters and add new data collection protocols. The Shop Floor Control (SFC) module has been operational for two and a half years; however, at this stage, automatic links to the other 3 modules have not been implemented yet, except for a partial link to IOS (CASPR). SFC revolves around a DB/2 database with PFORMS acting as the database management system (DBMS). PFORMS is an off-the-shelf DB/2 application that provides a set of data entry screens and query forms. The main dynamic entity in the SFC and IOS database is a task; thus, the physical storage location and update privileges are driven by the status of the WAD. As we explored the SFC values, we realized that there was much to do before actually engaging in continuous analysis of the SFC data. Half way into this effort, it was realized that full scale analysis would have to be a future third phase of this effort. So, we concentrated on getting to know the contents of the database, and in establishing an initial set of tools to start the continuous analysis process. Specifically, we set out to: (1) provide specific procedures for statistical models, so as to enhance the TP-OAO office analysis and modeling capabilities; (2) design a data exchange interface; (3) prototype the interface to provide inputs to SCRAM; and (4) design a modeling database. These objectives were set with the expectation that, if met, they would provide former TP-OAO engineers with tools that would help them demonstrate the importance of process-based analyses. The latter, in return, will help them obtain the cooperation of various organizations in charting out their individual processes.
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining
NASA Astrophysics Data System (ADS)
Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing
2005-10-01
Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
Building a database for statistical characterization of ELMs on DIII-D
NASA Astrophysics Data System (ADS)
Fritch, B. J.; Marinoni, A.; Bortolon, A.
2017-10-01
Edge localized modes (ELMs) are bursty instabilities which occur in the edge region of H-mode plasmas and have the potential to damage in-vessel components of future fusion machines by exposing the divertor region to large energy and particle fluxes during each ELM event. While most ELM studies focus on average quantities (e.g. energy loss per ELM), this work investigates the statistical distributions of ELM characteristics, as a function of plasma parameters. A semi-automatic algorithm is being used to create a database documenting trigger times of the tens of thousands of ELMs for DIII-D discharges in scenarios relevant to ITER, thus allowing statistically significant analysis. Probability distributions of inter-ELM periods and energy losses will be determined and related to relevant plasma parameters such as density, stored energy, and current in order to constrain models and improve estimates of the expected inter-ELM periods and sizes, both of which must be controlled in future reactors. Work supported in part by US DoE under the Science Undergraduate Laboratory Internships (SULI) program, DE-FC02-04ER54698 and DE-FG02- 94ER54235.
StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics
Ramirez-Gonzalez, Ricardo H.; Leggett, Richard M.; Waite, Darren; Thanki, Anil; Drou, Nizar; Caccamo, Mario; Davey, Robert
2014-01-01
Modern sequencing platforms generate enormous quantities of data in ever-decreasing amounts of time. Additionally, techniques such as multiplex sequencing allow one run to contain hundreds of different samples. With such data comes a significant challenge to understand its quality and to understand how the quality and yield are changing across instruments and over time. As well as the desire to understand historical data, sequencing centres often have a duty to provide clear summaries of individual run performance to collaborators or customers. We present StatsDB, an open-source software package for storage and analysis of next generation sequencing run metrics. The system has been designed for incorporation into a primary analysis pipeline, either at the programmatic level or via integration into existing user interfaces. Statistics are stored in an SQL database and APIs provide the ability to store and access the data while abstracting the underlying database design. This abstraction allows simpler, wider querying across multiple fields than is possible by the manual steps and calculation required to dissect individual reports, e.g. ”provide metrics about nucleotide bias in libraries using adaptor barcode X, across all runs on sequencer A, within the last month”. The software is supplied with modules for storage of statistics from FastQC, a commonly used tool for analysis of sequence reads, but the open nature of the database schema means it can be easily adapted to other tools. Currently at The Genome Analysis Centre (TGAC), reports are accessed through our LIMS system or through a standalone GUI tool, but the API and supplied examples make it easy to develop custom reports and to interface with other packages. PMID:24627795
Template protection and its implementation in 3D face recognition systems
NASA Astrophysics Data System (ADS)
Zhou, Xuebing
2007-04-01
As biometric recognition systems are widely applied in various application areas, security and privacy risks have recently attracted the attention of the biometric community. Template protection techniques prevent stored reference data from revealing private biometric information and enhance the security of biometrics systems against attacks such as identity theft and cross matching. This paper concentrates on a template protection algorithm that merges methods from cryptography, error correction coding and biometrics. The key component of the algorithm is to convert biometric templates into binary vectors. It is shown that the binary vectors should be robust, uniformly distributed, statistically independent and collision-free so that authentication performance can be optimized and information leakage can be avoided. Depending on statistical character of the biometric template, different approaches for transforming biometric templates into compact binary vectors are presented. The proposed methods are integrated into a 3D face recognition system and tested on the 3D facial images of the FRGC database. It is shown that the resulting binary vectors provide an authentication performance that is similar to the original 3D face templates. A high security level is achieved with reasonable false acceptance and false rejection rates of the system, based on an efficient statistical analysis. The algorithm estimates the statistical character of biometric templates from a number of biometric samples in the enrollment database. For the FRGC 3D face database, the small distinction of robustness and discriminative power between the classification results under the assumption of uniquely distributed templates and the ones under the assumption of Gaussian distributed templates is shown in our tests.
ERIC Educational Resources Information Center
Robison, M. Henry; Christophersen, Kjell A.
2008-01-01
The purpose of this volume is to present the results of the economic impact analysis in detail by gender and entry level of education. On the data entry side, gender and entry level of education are important variables that help characterize the student body profile. This profile data links to national statistical databases which are already…
The Application and Future of Big Database Studies in Cardiology: A Single-Center Experience.
Lee, Kuang-Tso; Hour, Ai-Ling; Shia, Ben-Chang; Chu, Pao-Hsien
2017-11-01
As medical research techniques and quality have improved, it is apparent that cardiovascular problems could be better resolved by more strict experiment design. In fact, substantial time and resources should be expended to fulfill the requirements of high quality studies. Many worthy ideas and hypotheses were unable to be verified or proven due to ethical or economic limitations. In recent years, new and various applications and uses of databases have received increasing attention. Important information regarding certain issues such as rare cardiovascular diseases, women's heart health, post-marketing analysis of different medications, or a combination of clinical and regional cardiac features could be obtained by the use of rigorous statistical methods. However, there are limitations that exist among all databases. One of the key essentials to creating and correctly addressing this research is through reliable processes of analyzing and interpreting these cardiologic databases.
Concepts and data model for a co-operative neurovascular database.
Mansmann, U; Taylor, W; Porter, P; Bernarding, J; Jäger, H R; Lasjaunias, P; Terbrugge, K; Meisel, J
2001-08-01
Problems of clinical management of neurovascular diseases are very complex. This is caused by the chronic character of the diseases, a long history of symptoms and diverse treatments. If patients are to benefit from treatment, then treatment decisions have to rely on reliable and accurate knowledge of the natural history of the disease and the various treatments. Recent developments in statistical methodology and experience from electronic patient records are used to establish an information infrastructure based on a centralized register. A protocol to collect data on neurovascular diseases with technical as well as logistical aspects of implementing a database for neurovascular diseases are described. The database is designed as a co-operative tool of audit and research available to co-operating centres. When a database is linked to a systematic patient follow-up, it can be used to study prognosis. Careful analysis of patient outcome is valuable for decision-making.
Use of a Relational Database to Support Clinical Research: Application in a Diabetes Program
Lomatch, Diane; Truax, Terry; Savage, Peter
1981-01-01
A database has been established to support conduct of clinical research and monitor delivery of medical care for 1200 diabetic patients as part of the Michigan Diabetes Research and Training Center (MDRTC). Use of an intelligent microcomputer to enter and retrieve the data and use of a relational database management system (DBMS) to store and manage data have provided a flexible, efficient method of achieving both support of small projects and monitoring overall activity of the Diabetes Center Unit (DCU). Simplicity of access to data, efficiency in providing data for unanticipated requests, ease of manipulations of relations, security and “logical data independence” were important factors in choosing a relational DBMS. The ability to interface with an interactive statistical program and a graphics program is a major advantage of this system. Out database currently provides support for the operation and analysis of several ongoing research projects.
Tong, Shu-Hui; Liu, Yi-Ting; Liu, Yang
2013-02-01
To investigate the association between paternal exposure to occupational electromagnetic radiation and the sex ratio of the offspring. We searched various databases, including PubMed, Embase, Cochrane Library, OVID, Bioscience Information Service (BIOSIS), China National Knowledge Infrastructure, VIP Database for Chinese Technical Periodicals and Wanfang Database, for the literature relevant to the association of paternal exposure to occupational electromagnetic radiation with the sex ratio of the offspring. We conducted a meta-analysis on their correlation using Stata 11.0. There was no statistically significant difference in the sex ratio between the offspring with paternal exposure to occupational electromagnetic radiation and those without (pooled OR = 1.00 [95% CI: 0.95 -1.05], P = 0.875). Subgroup analysis of both case-control and cohort studies revealed no significant difference (pooled OR = 1.03 [95% CI: 0.99 -1.08], P = 0.104 and pooled OR = 0.98 [95% CI: 0.99 -1.08], P = 0.186, respectively). Paternal exposure to occupational electromagnetic radiation is not correlated with the sex ratio of the offspring.
Dai, Qi; Yang, Yanchun; Wang, Tianming
2008-10-15
Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
A statistical analysis of the impact of advertising signs on road safety.
Yannis, George; Papadimitriou, Eleonora; Papantoniou, Panagiotis; Voulgari, Chrisoula
2013-01-01
This research aims to investigate the impact of advertising signs on road safety. An exhaustive review of international literature was carried out on the effect of advertising signs on driver behaviour and safety. Moreover, a before-and-after statistical analysis with control groups was applied on several road sites with different characteristics in the Athens metropolitan area, in Greece, in order to investigate the correlation between the placement or removal of advertising signs and the related occurrence of road accidents. Road accident data for the 'before' and 'after' periods on the test sites and the control sites were extracted from the database of the Hellenic Statistical Authority, and the selected 'before' and 'after' periods vary from 2.5 to 6 years. The statistical analysis shows no statistical correlation between road accidents and advertising signs in none of the nine sites examined, as the confidence intervals of the estimated safety effects are non-significant at 95% confidence level. This can be explained by the fact that, in the examined road sites, drivers are overloaded with information (traffic signs, directions signs, labels of shops, pedestrians and other vehicles, etc.) so that the additional information load from advertising signs may not further distract them.
Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario
2014-01-01
Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Piotrowski, T; Rodrigues, G; Bajon, T; Yartsev, S
2014-03-01
Multi-institutional collaborations allow for more information to be analyzed but the data from different sources may vary in the subgroup sizes and/or conditions of measuring. Rigorous statistical analysis is required for pooling the data in a larger set. Careful comparison of all the components of the data acquisition is indispensable: identical conditions allow for enlargement of the database with improved statistical analysis, clearly defined differences provide opportunity for establishing a better practice. The optimal sequence of required normality, asymptotic normality, and independence tests is proposed. An example of analysis of six subgroups of position corrections in three directions obtained during image guidance procedures for 216 prostate cancer patients from two institutions is presented. Copyright © 2013 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
GenePublisher: Automated analysis of DNA microarray data.
Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, Thomas; Friis, Carsten
2003-07-01
GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with a specification of the data. The server performs normalization, statistical analysis and visualization of the data. The results are run against databases of signal transduction pathways, metabolic pathways and promoter sequences in order to extract more information. The results of the entire analysis are summarized in report form and returned to the user.
Weinreb, Jeffrey H; Yoshida, Ryu; Cote, Mark P; O'Sullivan, Michael B; Mazzocca, Augustus D
2017-01-01
The purpose of this study was to evaluate how database use has changed over time in Arthroscopy: The Journal of Arthroscopic and Related Surgery and to inform readers about available databases used in orthopaedic literature. An extensive literature search was conducted to identify databases used in Arthroscopy and other orthopaedic literature. All articles published in Arthroscopy between January 1, 2006, and December 31, 2015, were reviewed. A database was defined as a national, widely available set of individual patient encounters, applicable to multiple patient populations, used in orthopaedic research in a peer-reviewed journal, not restricted by encounter setting or visit duration, and with information available in English. Databases used in Arthroscopy included PearlDiver, the American College of Surgeons National Surgical Quality Improvement Program, the Danish Common Orthopaedic Database, the Swedish National Knee Ligament Register, the Hospital Episodes Statistics database, and the National Inpatient Sample. Database use increased significantly from 4 articles in 2013 to 11 articles in 2015 (P = .012), with no database use between January 1, 2006, and December 31, 2012. Database use increased significantly between January 1, 2006, and December 31, 2015, in Arthroscopy. Level IV, systematic review of Level II through IV studies. Copyright © 2016 Arthroscopy Association of North America. Published by Elsevier Inc. All rights reserved.
Yang, Yangfan; Zhong, Jing; Dun, Zhongjun; Liu, Xiao-an; Yu, Minbin
2015-01-01
Abstract Refractory glaucoma refers to uncontrolled intraocular pressure (IOP) despite anti-glaucoma medication and surgical treatment, which remains a challenge to be treated. The objective of this study is to evaluate and statistically compare the clinical efficacy between endoscopic cyclophotocoagulation (ECP) and alternative surgical techniques in the treatment of refractory glaucoma in this article, as a meta-analysis. Data sources are China Biomedical Database (Sinomed, online version), China National Knowledge Infrastructure (CNKI), Cqvip, Wanfang database, and PubMed. The randomized controlled trial (RCT) and case–control study literatures evaluating the clinical efficacy between ECP and other surgical techniques were searched electronically from public databases. The methodology quality of the retrieved articles was evaluated according to the RCT or case–control study criteria. The success rate of treatment, intraocular pressure (IOP) and visual acuity were statistically compared. RevMan 5.3 software was used for the meta-analysis. In total, 6 relevant control studies were selected in this study with a total sampling of 429 cases (429 eyes), including 204 eyes in the ECP group and 225 in the non-ECP group. Meta-analysis demonstrated that the clinical efficacy did not significantly differ between 2 groups (P > 0.05). Postoperative IOP was dramatically reduced in both groups. However, it was difficult to evaluate the combined influence of ECP and non-ECP therapies upon IOP reduction. In conclusion, ECP and non-ECP treatment yielded almost equivalent clinical efficacy in treating refractory glaucoma. The IOP-lowering degree, safety, and incidence of complications remain to be further elucidated by RCTs with a larger sample size. PMID:26426659
Claims-based risk model for first severe COPD exacerbation.
Stanford, Richard H; Nag, Arpita; Mapel, Douglas W; Lee, Todd A; Rosiello, Richard; Schatz, Michael; Vekeman, Francis; Gauthier-Loiselle, Marjolaine; Merrigan, J F Philip; Duh, Mei Sheng
2018-02-01
To develop and validate a predictive model for first severe chronic obstructive pulmonary disease (COPD) exacerbation using health insurance claims data and to validate the risk measure of controller medication to total COPD treatment (controller and rescue) ratio (CTR). A predictive model was developed and validated in 2 managed care databases: Truven Health MarketScan database and Reliant Medical Group database. This secondary analysis assessed risk factors, including CTR, during the baseline period (Year 1) to predict risk of severe exacerbation in the at-risk period (Year 2). Patients with COPD who were 40 years or older and who had at least 1 COPD medication dispensed during the year following COPD diagnosis were included. Subjects with severe exacerbations in the baseline year were excluded. Risk factors in the baseline period were included as potential predictors in multivariate analysis. Performance was evaluated using C-statistics. The analysis included 223,824 patients. The greatest risk factors for first severe exacerbation were advanced age, chronic oxygen therapy usage, COPD diagnosis type, dispensing of 4 or more canisters of rescue medication, and having 2 or more moderate exacerbations. A CTR of 0.3 or greater was associated with a 14% lower risk of severe exacerbation. The model performed well with C-statistics, ranging from 0.711 to 0.714. This claims-based risk model can predict the likelihood of first severe COPD exacerbation. The CTR could also potentially be used to target populations at greatest risk for severe exacerbations. This could be relevant for providers and payers in approaches to prevent severe exacerbations and reduce costs.
NASA Astrophysics Data System (ADS)
Yan, Zheng; Mingzhong, Tian; Hengli, Wang
2010-05-01
Chinese hand-written local records were originated from the first century. Generally, these local records include geography, evolution, customs, education, products, people, historical sites, as well as writings of an area. Through such endeavors, the information of the natural materials of China nearly has had no "dark ages" in the evolution of its 5000-year old civilization. A compilation of all meaningful historical data of natural-disasters taken place in Alxa of inner-Mongolia, the second largest desert in China, is used here for the construction of a 500-year high resolution database. The database is divided into subsets according to the types of natural-disasters like sand-dust storm, drought events, cold wave, etc. Through applying trend, correlation, wavelet, and spectral analysis on these data, we can estimate the statistically periodicity of different natural-disasters, detect and quantify similarities and patterns of the periodicities of these records, and finally take these results in aggregate to find a strong and coherent cyclicity through the last 500 years which serves as the driving mechanism of these geological hazards. Based on the periodicity obtained from the above analysis, the paper discusses the probability of forecasting natural-disasters and the suitable measures to reduce disaster losses through history records. Keyword: Chinese local records; Alxa; natural disasters; database; periodicity analysis
Said, Joseph I; Knapka, Joseph A; Song, Mingzhou; Zhang, Jinfa
2015-08-01
A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.
Zhao, Yan-qing; Teng, Jing
2015-03-01
To analyze the composition and medication regularities of prescriptions treating hypochondriac pain in Chinese journal full-text database (CNKI) based on the traditional Chinese medicine inheritance support system, in order to provide a reference for further research and development for new traditional Chinese medicines treating hypochondriac pain. The traditional Chinese medicine inheritance support platform software V2. 0 was used to build a prescription database of Chinese medicines treating hypochondriac pain. The software integration data mining method was used to distribute prescriptions according to "four odors", "five flavors" and "meridians" in the database and achieve frequency statistics, syndrome distribution, prescription regularity and new prescription analysis. An analysis were made for 192 prescriptions treating hypochondriac pain to determine the frequencies of medicines in prescriptions, commonly used medicine pairs and combinations and summarize 15 new prescriptions. This study indicated that the prescriptions treating hypochondriac pain in Chinese journal full-text database are mostly those for soothing liver-qi stagnation, promoting qi and activating blood, clearing heat and promoting dampness, and invigorating spleen and removing phlem, with a cold property and bitter taste, and reflect the principles of "distinguish deficiency and excess and relieving pain by smoothening meridians" in treating hypochondriac pain.
Construction of crystal structure prototype database: methods and applications.
Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming
2017-04-26
Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.
Construction of crystal structure prototype database: methods and applications
NASA Astrophysics Data System (ADS)
Su, Chuanxun; Lv, Jian; Li, Quan; Wang, Hui; Zhang, Lijun; Wang, Yanchao; Ma, Yanming
2017-04-01
Crystal structure prototype data have become a useful source of information for materials discovery in the fields of crystallography, chemistry, physics, and materials science. This work reports the development of a robust and efficient method for assessing the similarity of structures on the basis of their interatomic distances. Using this method, we proposed a simple and unambiguous definition of crystal structure prototype based on hierarchical clustering theory, and constructed the crystal structure prototype database (CSPD) by filtering the known crystallographic structures in a database. With similar method, a program structure prototype analysis package (SPAP) was developed to remove similar structures in CALYPSO prediction results and extract predicted low energy structures for a separate theoretical structure database. A series of statistics describing the distribution of crystal structure prototypes in the CSPD was compiled to provide an important insight for structure prediction and high-throughput calculations. Illustrative examples of the application of the proposed database are given, including the generation of initial structures for structure prediction and determination of the prototype structure in databases. These examples demonstrate the CSPD to be a generally applicable and useful tool for materials discovery.
Do climate extreme events foster violent civil conflicts? A coincidence analysis
NASA Astrophysics Data System (ADS)
Schleussner, Carl-Friedrich; Donges, Jonathan F.; Donner, Reik V.
2014-05-01
Civil conflicts promoted by adverse environmental conditions represent one of the most important potential feedbacks in the global socio-environmental nexus. While the role of climate extremes as a triggering factor is often discussed, no consensus is yet reached about the cause-and-effect relation in the observed data record. Here we present results of a rigorous statistical coincidence analysis based on the Munich Re Inc. extreme events database and the Uppsala conflict data program. We report evidence for statistically significant synchronicity between climate extremes with high economic impact and violent conflicts for various regions, although no coherent global signal emerges from our analysis. Our results indicate the importance of regional vulnerability and might aid to identify hot-spot regions for potential climate-triggered violent social conflicts.
The Molecular and Cellular Characterization of Screen‐Detected Lesions ‐ Coordinating Center and Data Management Group will provide support for the participating studies responding to RFA CA14‐10. The coordinating center supports three main domains: network coordination, statistical support and computational analysis and protocol development and database support. Support for
A spatial database of wildfires in the United States, 1992-2011
K. C. Short
2014-01-01
The statistical analysis of wildfire activity is a critical component of national wildfire planning, operations, and research in the United States (US). However, there are multiple federal, state, and local entities with wildfire protection and reporting responsibilities in the US, and no single, unified system of wildfire record keeping exists. To conduct even the...
A spatial database of wildfires in the United States, 1992-2011 [Discussions
K. C. Short
2013-01-01
The statistical analysis of wildfire activity is a critical component of national wildfire planning, operations, and research in the United States (US). However, there are multiple federal, state, and local entities with wildfire protection and reporting responsibilities in the US, and no single, unified system of wildfire record-keeping exists. To conduct even the...
Data-Mining Techniques in Detecting Factors Linked to Academic Achievement
ERIC Educational Resources Information Center
Martínez Abad, Fernando; Chaparro Caso López, Alicia A.
2017-01-01
In light of the emergence of statistical analysis techniques based on data mining in education sciences, and the potential they offer to detect non-trivial information in large databases, this paper presents a procedure used to detect factors linked to academic achievement in large-scale assessments. The study is based on a non-experimental,…
Cicconet, Marcelo; Gutwein, Michelle; Gunsalus, Kristin C; Geiger, Davi
2014-08-01
In this paper we report a database and a series of techniques related to the problem of tracking cells, and detecting their divisions, in time-lapse movies of mammalian embryos. Our contributions are (1) a method for counting embryos in a well, and cropping each individual embryo across frames, to create individual movies for cell tracking; (2) a semi-automated method for cell tracking that works up to the 8-cell stage, along with a software implementation available to the public (this software was used to build the reported database); (3) an algorithm for automatic tracking up to the 4-cell stage, based on histograms of mirror symmetry coefficients captured using wavelets; (4) a cell-tracking database containing 100 annotated examples of mammalian embryos up to the 8-cell stage; and (5) statistical analysis of various timing distributions obtained from those examples. Copyright © 2014 Elsevier Ltd. All rights reserved.
GenderMedDB: an interactive database of sex and gender-specific medical literature.
Oertelt-Prigione, Sabine; Gohlke, Björn-Oliver; Dunkel, Mathias; Preissner, Robert; Regitz-Zagrosek, Vera
2014-01-01
Searches for sex and gender-specific publications are complicated by the absence of a specific algorithm within search engines and by the lack of adequate archives to collect the retrieved results. We previously addressed this issue by initiating the first systematic archive of medical literature containing sex and/or gender-specific analyses. This initial collection has now been greatly enlarged and re-organized as a free user-friendly database with multiple functions: GenderMedDB (http://gendermeddb.charite.de). GenderMedDB retrieves the included publications from the PubMed database. Manuscripts containing sex and/or gender-specific analysis are continuously screened and the relevant findings organized systematically into disciplines and diseases. Publications are furthermore classified by research type, subject and participant numbers. More than 11,000 abstracts are currently included in the database, after screening more than 40,000 publications. The main functions of the database include searches by publication data or content analysis based on pre-defined classifications. In addition, registrants are enabled to upload relevant publications, access descriptive publication statistics and interact in an open user forum. Overall, GenderMedDB offers the advantages of a discipline-specific search engine as well as the functions of a participative tool for the gender medicine community.
Bem, Daryl; Tressoldi, Patrizio; Rabeyron, Thomas; Duggan, Michael
2016-01-01
In 2011, one of the authors (DJB) published a report of nine experiments in the Journal of Personality and Social Psychology purporting to demonstrate that an individual’s cognitive and affective responses can be influenced by randomly selected stimulus events that do not occur until after his or her responses have already been made and recorded, a generalized variant of the phenomenon traditionally denoted by the term precognition. To encourage replications, all materials needed to conduct them were made available on request. We here report a meta-analysis of 90 experiments from 33 laboratories in 14 countries which yielded an overall effect greater than 6 sigma, z = 6.40, p = 1.2 × 10 -10 with an effect size (Hedges’ g) of 0.09. A Bayesian analysis yielded a Bayes Factor of 5.1 × 10 9, greatly exceeding the criterion value of 100 for “decisive evidence” in support of the experimental hypothesis. When DJB’s original experiments are excluded, the combined effect size for replications by independent investigators is 0.06, z = 4.16, p = 1.1 × 10 -5, and the BF value is 3,853, again exceeding the criterion for “decisive evidence.” The number of potentially unretrieved experiments required to reduce the overall effect size of the complete database to a trivial value of 0.01 is 544, and seven of eight additional statistical tests support the conclusion that the database is not significantly compromised by either selection bias or by intense “ p-hacking”—the selective suppression of findings or analyses that failed to yield statistical significance. P-curve analysis, a recently introduced statistical technique, estimates the true effect size of the experiments to be 0.20 for the complete database and 0.24 for the independent replications, virtually identical to the effect size of DJB’s original experiments (0.22) and the closely related “presentiment” experiments (0.21). We discuss the controversial status of precognition and other anomalous effects collectively known as psi. PMID:26834996
Huang, Hui; Zhai, Zhifang; Shen, Zhu; Lin, Hui
2016-01-01
Purpose The present study determined the clinical characteristics and prognostic factors in patients with malignant melanoma based on a series of 82 cases from January 2009 to December 2014 in Southwest Hospital and a meta-analysis (including 12 articles) involving 958 patients in China. Materials and methods The database elements included basic demographic data and prognosticators which were extracted from medical records. Statistical analyses of survival, and multivariate analyses of factors associated with survival were performed using the Kaplan—Meier method, and the Cox proportional hazard model, respectively. Literatures were identified through systematic searches in PubMed, Embase, the Cochrane Library, China National Knowledge Infrastructure (CNKI) and Weipu database (VIP) database for the period from inception to December 2015. The meta-analysis was conducted using R 3.1.1 meta-analysis software Results In this series of 82 cases, the median age of the patients was 57.50 years. Melanoma was located in the foot in 79% of patients. Sixty-one patients (74.4%) were classified as stage II-III. Thirty-two patients (39.0%) had acral malignant melanoma, and 31 patients (37.8%) had nodular malignant melanoma. The clinical characteristics of melanoma were similar to those in areas outside southwest China (from results of the meta-analysis). The median survival time was 29.50 months. The 1-year, 3-year, and 5-year survival rates were 84.1%, 39.0% and 10.9%, respectively. COX regression following multi-factor analysis showed that ulcer, tumor boundary and lymph node metastasis were associated with prognosis. Conclusions The clinical characteristics of melanoma in Chinese were different from those in Caucasians. Ulcer, tumor margins, and lymph node metastasis were significantly associated with prognosis. Immune therapy may prolong the median survival time of patients with acral melanoma, nodular melanoma, or stage I-III disease, although these differences were not statistically significant. PMID:27861496
Version VI of the ESTree db: an improved tool for peach transcriptome analysis
Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Merelli, Ivan; Barale, Francesca; Milanesi, Luciano; Stella, Alessandra; Pozzi, Carlo
2008-01-01
Background The ESTree database (db) is a collection of Prunus persica and Prunus dulcis EST sequences that in its current version encompasses 75,404 sequences from 3 almond and 19 peach libraries. Nine peach genotypes and four peach tissues are represented, from four fruit developmental stages. The aim of this work was to implement the already existing ESTree db by adding new sequences and analysis programs. Particular care was given to the implementation of the web interface, that allows querying each of the database features. Results A Perl modular pipeline is the backbone of sequence analysis in the ESTree db project. Outputs obtained during the pipeline steps are automatically arrayed into the fields of a MySQL database. Apart from standard clustering and annotation analyses, version VI of the ESTree db encompasses new tools for tandem repeat identification, annotation against genomic Rosaceae sequences, and positioning on the database of oligomer sequences that were used in a peach microarray study. Furthermore, known protein patterns and motifs were identified by comparison to PROSITE. Based on data retrieved from sequence annotation against the UniProtKB database, a script was prepared to track positions of homologous hits on the GO tree and build statistics on the ontologies distribution in GO functional categories. EST mapping data were also integrated in the database. The PHP-based web interface was upgraded and extended. The aim of the authors was to enable querying the database according to all the biological aspects that can be investigated from the analysis of data available in the ESTree db. This is achieved by allowing multiple searches on logical subsets of sequences that represent different biological situations or features. Conclusions The version VI of ESTree db offers a broad overview on peach gene expression. Sequence analyses results contained in the database, extensively linked to external related resources, represent a large amount of information that can be queried via the tools offered in the web interface. Flexibility and modularity of the ESTree analysis pipeline and of the web interface allowed the authors to set up similar structures for different datasets, with limited manual intervention. PMID:18387211
Associative memory model for searching an image database by image snippet
NASA Astrophysics Data System (ADS)
Khan, Javed I.; Yun, David Y.
1994-09-01
This paper presents an associative memory called an multidimensional holographic associative computing (MHAC), which can be potentially used to perform feature based image database query using image snippet. MHAC has the unique capability to selectively focus on specific segments of a query frame during associative retrieval. As a result, this model can perform search on the basis of featural significance described by a subset of the snippet pixels. This capability is critical for visual query in image database because quite often the cognitive index features in the snippet are statistically weak. Unlike, the conventional artificial associative memories, MHAC uses a two level representation and incorporates additional meta-knowledge about the reliability status of segments of information it receives and forwards. In this paper we present the analysis of focus characteristics of MHAC.
Conditional statistics in a turbulent premixed flame derived from direct numerical simulation
NASA Technical Reports Server (NTRS)
Mantel, Thierry; Bilger, Robert W.
1994-01-01
The objective of this paper is to briefly introduce conditional moment closure (CMC) methods for premixed systems and to derive the transport equation for the conditional species mass fraction conditioned on the progress variable based on the enthalpy. Our statistical analysis will be based on the 3-D DNS database of Trouve and Poinsot available at the Center for Turbulence Research. The initial conditions and characteristics (turbulence, thermo-diffusive properties) as well as the numerical method utilized in the DNS of Trouve and Poinsot are presented, and some details concerning our statistical analysis are also given. From the analysis of DNS results, the effects of the position in the flame brush, of the Damkoehler and Lewis numbers on the conditional mean scalar dissipation, and conditional mean velocity are presented and discussed. Information concerning unconditional turbulent fluxes are also presented. The anomaly found in previous studies of counter-gradient diffusion for the turbulent flux of the progress variable is investigated.
The Advanced Composition Explorer Shock Database and Application to Particle Acceleration Theory
NASA Technical Reports Server (NTRS)
Parker, L. Neergaard; Zank, G. P.
2015-01-01
The theory of particle acceleration via diffusive shock acceleration (DSA) has been studied in depth by Gosling et al. (1981), van Nes et al. (1984), Mason (2000), Desai et al. (2003), Zank et al. (2006), among many others. Recently, Parker and Zank (2012, 2014) and Parker et al. (2014) using the Advanced Composition Explorer (ACE) shock database at 1 AU explored two questions: does the upstream distribution alone have enough particles to account for the accelerated downstream distribution and can the slope of the downstream accelerated spectrum be explained using DSA? As was shown in this research, diffusive shock acceleration can account for a large population of the shocks. However, Parker and Zank (2012, 2014) and Parker et al. (2014) used a subset of the larger ACE database. Recently, work has successfully been completed that allows for the entire ACE database to be considered in a larger statistical analysis. We explain DSA as it applies to single and multiple shocks and the shock criteria used in this statistical analysis. We calculate the expected injection energy via diffusive shock acceleration given upstream parameters defined from the ACE Solar Wind Electron, Proton, and Alpha Monitor (SWEPAM) data to construct the theoretical upstream distribution. We show the comparison of shock strength derived from diffusive shock acceleration theory to observations in the 50 keV to 5 MeV range from an instrument on ACE. Parameters such as shock velocity, shock obliquity, particle number, and time between shocks are considered. This study is further divided into single and multiple shock categories, with an additional emphasis on forward-forward multiple shock pairs. Finally with regard to forward-forward shock pairs, results comparing injection energies of the first shock, second shock, and second shock with previous energetic population will be given.
The Advanced Composition Explorer Shock Database and Application to Particle Acceleration Theory
NASA Technical Reports Server (NTRS)
Parker, L. Neergaard; Zank, G. P.
2015-01-01
The theory of particle acceleration via diffusive shock acceleration (DSA) has been studied in depth by Gosling et al. (1981), van Nes et al. (1984), Mason (2000), Desai et al. (2003), Zank et al. (2006), among many others. Recently, Parker and Zank (2012, 2014) and Parker et al. (2014) using the Advanced Composition Explorer (ACE) shock database at 1 AU explored two questions: does the upstream distribution alone have enough particles to account for the accelerated downstream distribution and can the slope of the downstream accelerated spectrum be explained using DSA? As was shown in this research, diffusive shock acceleration can account for a large population of the shocks. However, Parker and Zank (2012, 2014) and Parker et al. (2014) used a subset of the larger ACE database. Recently, work has successfully been completed that allows for the entire ACE database to be considered in a larger statistical analysis. We explain DSA as it applies to single and multiple shocks and the shock criteria used in this statistical analysis. We calculate the expected injection energy via diffusive shock acceleration given upstream parameters defined from the ACE Solar Wind Electron, Proton, and Alpha Monitor (SWEPAM) data to construct the theoretical upstream distribution. We show the comparison of shock strength derived from diffusive shock acceleration theory to observations in the 50 keV to 5 MeV range from an instrument on ACE. Parameters such as shock velocity, shock obliquity, particle number, and time between shocks are considered. This study is further divided into single and multiple shock categories, with an additional emphasis on forward-forward multiple shock pairs. Finally with regard to forwardforward shock pairs, results comparing injection energies of the first shock, second shock, and second shock with previous energetic population will be given.
SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers
Mann, Michael B
2018-01-01
Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366
Description of 'REQUEST-KYUSHYU' for KYUKEICHO regional data base
NASA Astrophysics Data System (ADS)
Takimoto, Shin'ichi
Kyushu Economic Research Association (a foundational juridical person) initiated the regional database services, ' REQUEST-Kyushu ' recently. It is the full scale databases compiled based on the information and know-hows which the Association has accumulated over forty years. It covers the regional information database for journal and newspaper articles, and statistical information database for economic statistics. As to the former database it is searched on a personal computer and then a search result (original text) is sent through a facsimile. As to the latter, it is also searched on a personal computer where the data is processed, edited or downloaded. This paper describes characteristics, content and the system outline of 'REQUEST-Kyushu'.
Ryan, Patrick B.; Schuemie, Martijn
2013-01-01
Background: Clinical studies that use observational databases, such as administrative claims and electronic health records, to evaluate the effects of medical products have become commonplace. These studies begin by selecting a particular study design, such as a case control, cohort, or self-controlled design, and different authors can and do choose different designs for the same clinical question. Furthermore, published papers invariably report the study design but do not discuss the rationale for the specific choice. Studies of the same clinical question with different designs, however, can generate different results, sometimes with strikingly different implications. Even within a specific study design, authors make many different analytic choices and these too can profoundly impact results. In this paper, we systematically study heterogeneity due to the type of study design and due to analytic choices within study design. Methods and findings: We conducted our analysis in 10 observational healthcare databases but mostly present our results in the context of the GE Centricity EMR database, an electronic health record database containing data for 11.2 million lives. We considered the impact of three different study design choices on estimates of associations between bisphosphonates and four particular health outcomes for which there is no evidence of an association. We show that applying alternative study designs can yield discrepant results, in terms of direction and significance of association. We also highlight that while traditional univariate sensitivity analysis may not show substantial variation, systematic assessment of all analytical choices within a study design can yield inconsistent results ranging from statistically significant decreased risk to statistically significant increased risk. Our findings show that clinical studies using observational databases can be sensitive both to study design choices and to specific analytic choices within study design. Conclusion: More attention is needed to consider how design choices may be impacting results and, when possible, investigators should examine a wide array of possible choices to confirm that significant findings are consistently identified. PMID:25083251
How to explain variations in sea cliff erosion rate?
NASA Astrophysics Data System (ADS)
Prémaillon, Melody; Regard, Vincent; Dewez, Thomas
2017-04-01
Every rocky coast of the world is eroding at different rate (cliff retreat rates). Erosion is caused by a complex interaction of multiple sea weather factors. While numerous local studies exist and explain erosion processes on specific sites, global studies lack. We started to compile many of those local studies and analyse their results with a global point of view in order to quantify the various parameters influencing erosion rates. In other words: is erosion more important in energetic seas? Are chalk cliff eroding faster in rainy environment? etc. In order to do this, we built a database based on literature and national erosion databases. It now contains 80 publications which represents 2500 cliffs studied and more than 3500 erosion rate estimates. A statistical analysis was conducted on this database. On a first approximation, cliff lithology is the only clear signal explaining erosion rate variation: hard lithologies are eroding at 1cm/y or less, whereas unconsolidated lithologies commonly erode faster than 10cm/y. No clear statistical relation were found between erosion rate and external parameters such as sea energy (swell, tide) or weather condition, even on cliff with similar lithology.
ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.
Carter, Kim W; Francis, Richard W; Carter, K W; Francis, R W; Bresnahan, M; Gissler, M; Grønborg, T K; Gross, R; Gunnes, N; Hammond, G; Hornig, M; Hultman, C M; Huttunen, J; Langridge, A; Leonard, H; Newman, S; Parner, E T; Petersson, G; Reichenberg, A; Sandin, S; Schendel, D E; Schalkwyk, L; Sourander, A; Steadman, C; Stoltenberg, C; Suominen, A; Surén, P; Susser, E; Sylvester Vethanayagam, A; Yusof, Z
2016-04-01
Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations. Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage. Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory. ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/]. © The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association.
Baseline estimation in flame's spectra by using neural networks and robust statistics
NASA Astrophysics Data System (ADS)
Garces, Hugo; Arias, Luis; Rojas, Alejandro
2014-09-01
This work presents a baseline estimation method in flame spectra based on artificial intelligence structure as a neural network, combining robust statistics with multivariate analysis to automatically discriminate measured wavelengths belonging to continuous feature for model adaptation, surpassing restriction of measuring target baseline for training. The main contributions of this paper are: to analyze a flame spectra database computing Jolliffe statistics from Principal Components Analysis detecting wavelengths not correlated with most of the measured data corresponding to baseline; to systematically determine the optimal number of neurons in hidden layers based on Akaike's Final Prediction Error; to estimate baseline in full wavelength range sampling measured spectra; and to train an artificial intelligence structure as a Neural Network which allows to generalize the relation between measured and baseline spectra. The main application of our research is to compute total radiation with baseline information, allowing to diagnose combustion process state for optimization in early stages.
The CSB Incident Screening Database: description, summary statistics and uses.
Gomez, Manuel R; Casper, Susan; Smith, E Allen
2008-11-15
This paper briefly describes the Chemical Incident Screening Database currently used by the CSB to identify and evaluate chemical incidents for possible investigations, and summarizes descriptive statistics from this database that can potentially help to estimate the number, character, and consequences of chemical incidents in the US. The report compares some of the information in the CSB database to roughly similar information available from databases operated by EPA and the Agency for Toxic Substances and Disease Registry (ATSDR), and explores the possible implications of these comparisons with regard to the dimension of the chemical incident problem. Finally, the report explores in a preliminary way whether a system modeled after the existing CSB screening database could be developed to serve as a national surveillance tool for chemical incidents.
Page, Grier P; Coulibaly, Issa
2008-01-01
Microarrays are a very powerful tool for quantifying the amount of RNA in samples; however, their ability to query essentially every gene in a genome, which can number in the tens of thousands, presents analytical and interpretative problems. As a result, a variety of software and web-based tools have been developed to help with these issues. This article highlights and reviews some of the tools for the first steps in the analysis of a microarray study. We have tried for a balance between free and commercial systems. We have organized the tools by topics including image processing tools (Section 2), power analysis tools (Section 3), image analysis tools (Section 4), database tools (Section 5), databases of functional information (Section 6), annotation tools (Section 7), statistical and data mining tools (Section 8), and dissemination tools (Section 9).
Suchard, Marc A; Zorych, Ivan; Simpson, Shawn E; Schuemie, Martijn J; Ryan, Patrick B; Madigan, David
2013-10-01
The self-controlled case series (SCCS) offers potential as an statistical method for risk identification involving medical products from large-scale observational healthcare data. However, analytic design choices remain in encoding the longitudinal health records into the SCCS framework and its risk identification performance across real-world databases is unknown. To evaluate the performance of SCCS and its design choices as a tool for risk identification in observational healthcare data. We examined the risk identification performance of SCCS across five design choices using 399 drug-health outcome pairs in five real observational databases (four administrative claims and one electronic health records). In these databases, the pairs involve 165 positive controls and 234 negative controls. We also consider several synthetic databases with known relative risks between drug-outcome pairs. We evaluate risk identification performance through estimating the area under the receiver-operator characteristics curve (AUC) and bias and coverage probability in the synthetic examples. The SCCS achieves strong predictive performance. Twelve of the twenty health outcome-database scenarios return AUCs >0.75 across all drugs. Including all adverse events instead of just the first per patient and applying a multivariate adjustment for concomitant drug use are the most important design choices. However, the SCCS as applied here returns relative risk point-estimates biased towards the null value of 1 with low coverage probability. The SCCS recently extended to apply a multivariate adjustment for concomitant drug use offers promise as a statistical tool for risk identification in large-scale observational healthcare databases. Poor estimator calibration dampens enthusiasm, but on-going work should correct this short-coming.
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database in which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. This database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.
ASM Based Synthesis of Handwritten Arabic Text Pages
Al-Hamadi, Ayoub; Elzobi, Moftah; El-etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available. PMID:26295059
ASM Based Synthesis of Handwritten Arabic Text Pages.
Dinges, Laslo; Al-Hamadi, Ayoub; Elzobi, Moftah; El-Etriby, Sherif; Ghoneim, Ahmed
2015-01-01
Document analysis tasks, as text recognition, word spotting, or segmentation, are highly dependent on comprehensive and suitable databases for training and validation. However their generation is expensive in sense of labor and time. As a matter of fact, there is a lack of such databases, which complicates research and development. This is especially true for the case of Arabic handwriting recognition, that involves different preprocessing, segmentation, and recognition methods, which have individual demands on samples and ground truth. To bypass this problem, we present an efficient system that automatically turns Arabic Unicode text into synthetic images of handwritten documents and detailed ground truth. Active Shape Models (ASMs) based on 28046 online samples were used for character synthesis and statistical properties were extracted from the IESK-arDB database to simulate baselines and word slant or skew. In the synthesis step ASM based representations are composed to words and text pages, smoothed by B-Spline interpolation and rendered considering writing speed and pen characteristics. Finally, we use the synthetic data to validate a segmentation method. An experimental comparison with the IESK-arDB database encourages to train and test document analysis related methods on synthetic samples, whenever no sufficient natural ground truthed data is available.
Massive Scale Cyber Traffic Analysis: A Driver for Graph Database Research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Choudhury, S.; Haglin, David J.
2013-06-19
We describe the significance and prominence of network traffic analysis (TA) as a graph- and network-theoretical domain for advancing research in graph database systems. TA involves observing and analyzing the connections between clients, servers, hosts, and actors within IP networks, both at particular times and as extended over times. Towards that end, NetFlow (or more generically, IPFLOW) data are available from routers and servers which summarize coherent groups of IP packets flowing through the network. IPFLOW databases are routinely interrogated statistically and visualized for suspicious patterns. But the ability to cast IPFLOW data as a massive graph and query itmore » interactively, in order to e.g.\\ identify connectivity patterns, is less well advanced, due to a number of factors including scaling, and their hybrid nature combining graph connectivity and quantitative attributes. In this paper, we outline requirements and opportunities for graph-structured IPFLOW analytics based on our experience with real IPFLOW databases. Specifically, we describe real use cases from the security domain, cast them as graph patterns, show how to express them in two graph-oriented query languages SPARQL and Datalog, and use these examples to motivate a new class of "hybrid" graph-relational systems.« less
Ramsthaler, F; Kreutz, K; Verhoff, M A
2007-11-01
It has been generally accepted in skeletal sex determination that the use of metric methods is limited due to the population dependence of the multivariate algorithms. The aim of the study was to verify the applicability of software-based sex estimations outside the reference population group for which discriminant equations have been developed. We examined 98 skulls from recent forensic cases of known age, sex, and Caucasian ancestry from cranium collections in Frankfurt and Mainz (Germany) to determine the accuracy of sex determination using the statistical software solution Fordisc which derives its database and functions from the US American Forensic Database. In a comparison between metric analysis using Fordisc and morphological determination of sex, average accuracy for both sexes was 86 vs 94%, respectively, and males were identified more accurately than females. The ratio of the true test result rate to the false test result rate was not statistically different for the two methodological approaches at a significance level of 0.05 but was statistically different at a level of 0.10 (p=0.06). Possible explanations for this difference comprise different ancestry, age distribution, and socio-economic status compared to the Fordisc reference sample. It is likely that a discriminant function analysis on the basis of more similar European reference samples will lead to more valid and reliable sexing results. The use of Fordisc as a single method for the estimation of sex of recent skeletal remains in Europe cannot be recommended without additional morphological assessment and without a built-in software update based on modern European reference samples.
IPRStats: visualization of the functional potential of an InterProScan run.
Kelly, Ryan J; Vincent, David E; Friedberg, Iddo
2010-12-21
InterPro is a collection of protein signatures for the classification and automated annotation of proteins. Interproscan is a software tool that scans protein sequences against Interpro member databases using a variety of profile-based, hidden markov model and positional specific score matrix methods. It not only combines a set of analysis tools, but also performs data look-up from various sources, as well as some redundancy removal. Interproscan is robust and scalable, able to perform on any machine from a netbook to a large cluster. However, when performing whole-genome or metagenome analysis, there is a need for a fast statistical visualization of the results to have good initial grasp on the functional potential of the sequences in the analyzed data set. This is especially important when analyzing and comparing metagenomic or metaproteomic data-sets. IPRStats is a tool for the visualization of Interproscan results. Interproscan results are parsed from the Interproscan XML or EBIXML file into an SQLite or MySQL database. The results for each signature database scan are read and displayed as pie-charts or bar charts as summary statistics. A table is also provided, where each entry is a signature (e.g. a Pfam entry) accompanied by one or more Gene Ontology terms, if Interproscan was run using the Gene Ontology option. We present an platform-independent, open source licensed tool that is useful for Interproscan users who wish to view the summary of their results in a rapid and concise fashion.
Fossil-Fuel C02 Emissions Database and Exploration System
NASA Astrophysics Data System (ADS)
Krassovski, M.; Boden, T.
2012-04-01
Fossil-Fuel C02 Emissions Database and Exploration System Misha Krassovski and Tom Boden Carbon Dioxide Information Analysis Center Oak Ridge National Laboratory The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL) quantifies the release of carbon from fossil-fuel use and cement production each year at global, regional, and national spatial scales. These estimates are vital to climate change research given the strong evidence suggesting fossil-fuel emissions are responsible for unprecedented levels of carbon dioxide (CO2) in the atmosphere. The CDIAC fossil-fuel emissions time series are based largely on annual energy statistics published for all nations by the United Nations (UN). Publications containing historical energy statistics make it possible to estimate fossil-fuel CO2 emissions back to 1751 before the Industrial Revolution. From these core fossil-fuel CO2 emission time series, CDIAC has developed a number of additional data products to satisfy modeling needs and to address other questions aimed at improving our understanding of the global carbon cycle budget. For example, CDIAC also produces a time series of gridded fossil-fuel CO2 emission estimates and isotopic (e.g., C13) emissions estimates. The gridded data are generated using the methodology described in Andres et al. (2011) and provide monthly and annual estimates for 1751-2008 at 1° latitude by 1° longitude resolution. These gridded emission estimates are being used in the latest IPCC Scientific Assessment (AR4). Isotopic estimates are possible thanks to detailed information for individual nations regarding the carbon content of select fuels (e.g., the carbon signature of natural gas from Russia). CDIAC has recently developed a relational database to house these baseline emissions estimates and associated derived products and a web-based interface to help users worldwide query these data holdings. Users can identify, explore and download desired CDIAC fossil-fuel CO2 emissions data. This presentation introduces the architecture and design of the new relational database and web interface, summarizes the present state and functionality of the Fossil-Fuel CO2 Emissions Database and Exploration System, and highlights future plans for expansion of the relational database and interface.
Medical cost analysis: application to colorectal cancer data from the SEER Medicare database.
Bang, Heejung
2005-10-01
Incompleteness is a key feature of most survival data. Numerous well established statistical methodologies and algorithms exist for analyzing life or failure time data. However, induced censorship invalidates the use of those standard analytic tools for some survival-type data such as medical costs. In this paper, some valid methods currently available for analyzing censored medical cost data are reviewed. Some cautionary findings under different assumptions are envisioned through application to medical costs from colorectal cancer patients. Cost analysis should be suitably planned and carefully interpreted under various meaningful scenarios even with judiciously selected statistical methods. This approach would be greatly helpful to policy makers who seek to prioritize health care expenditures and to assess the elements of resource use.
[Development of Hospital Equipment Maintenance Information System].
Zhou, Zhixin
2015-11-01
Hospital equipment maintenance information system plays an important role in improving medical treatment quality and efficiency. By requirement analysis of hospital equipment maintenance, the system function diagram is drawed. According to analysis of input and output data, tables and reports in connection with equipment maintenance process, relationships between entity and attribute is found out, and E-R diagram is drawed and relational database table is established. Software development should meet actual process requirement of maintenance and have a friendly user interface and flexible operation. The software can analyze failure cause by statistical analysis.
Generation of comprehensive thoracic oncology database--tool for translational research.
Surati, Mosmi; Robinson, Matthew; Nandi, Suvobroto; Faoro, Leonardo; Demchuk, Carley; Kanteti, Rajani; Ferguson, Benjamin; Gangadhar, Tara; Hensing, Thomas; Hasina, Rifat; Husain, Aliya; Ferguson, Mark; Karrison, Theodore; Salgia, Ravi
2011-01-22
The Thoracic Oncology Program Database Project was created to serve as a comprehensive, verified, and accessible repository for well-annotated cancer specimens and clinical data to be available to researchers within the Thoracic Oncology Research Program. This database also captures a large volume of genomic and proteomic data obtained from various tumor tissue studies. A team of clinical and basic science researchers, a biostatistician, and a bioinformatics expert was convened to design the database. Variables of interest were clearly defined and their descriptions were written within a standard operating manual to ensure consistency of data annotation. Using a protocol for prospective tissue banking and another protocol for retrospective banking, tumor and normal tissue samples from patients consented to these protocols were collected. Clinical information such as demographics, cancer characterization, and treatment plans for these patients were abstracted and entered into an Access database. Proteomic and genomic data have been included in the database and have been linked to clinical information for patients described within the database. The data from each table were linked using the relationships function in Microsoft Access to allow the database manager to connect clinical and laboratory information during a query. The queried data can then be exported for statistical analysis and hypothesis generation.
Schools and Data: The Educator's Guide for Using Data to Improve Decision Making
ERIC Educational Resources Information Center
Creighton, Theodore B.
2006-01-01
Since the first edition of "Schools and Data", the No Child Left Behind Act has swept the country, and data-based decision making is no longer an option for educators. Today's educational climate makes it imperative for all schools to collect data and use statistical analysis to help create clear goals and recognize strategies for…
ERIC Educational Resources Information Center
Pittayachawan, Siddhi; Macauley, Peter; Evans, Terry
2016-01-01
This article reports how statistical analyses of PhD thesis records can reveal future research capacities for disciplines beyond their primary fields. The previous research showed that most theses contributed to and/or used methodologies from more than one discipline. In Australia, there was a concern for declining mathematical teaching and…
Got Power? A Systematic Review of Sample Size Adequacy in Health Professions Education Research
ERIC Educational Resources Information Center
Cook, David A.; Hatala, Rose
2015-01-01
Many education research studies employ small samples, which in turn lowers statistical power. We re-analyzed the results of a meta-analysis of simulation-based education to determine study power across a range of effect sizes, and the smallest effect that could be plausibly excluded. We systematically searched multiple databases through May 2011,…
MutAIT: an online genetic toxicology data portal and analysis tools.
Avancini, Daniele; Menzies, Georgina E; Morgan, Claire; Wills, John; Johnson, George E; White, Paul A; Lewis, Paul D
2016-05-01
Assessment of genetic toxicity and/or carcinogenic activity is an essential element of chemical screening programs employed to protect human health. Dose-response and gene mutation data are frequently analysed by industry, academia and governmental agencies for regulatory evaluations and decision making. Over the years, a number of efforts at different institutions have led to the creation and curation of databases to house genetic toxicology data, largely, with the aim of providing public access to facilitate research and regulatory assessments. This article provides a brief introduction to a new genetic toxicology portal called Mutation Analysis Informatics Tools (MutAIT) (www.mutait.org) that provides easy access to two of the largest genetic toxicology databases, the Mammalian Gene Mutation Database (MGMD) and TransgenicDB. TransgenicDB is a comprehensive collection of transgenic rodent mutation data initially compiled and collated by Health Canada. The updated MGMD contains approximately 50 000 individual mutation spectral records from the published literature. The portal not only gives access to an enormous quantity of genetic toxicology data, but also provides statistical tools for dose-response analysis and calculation of benchmark dose. Two important R packages for dose-response analysis are provided as web-distributed applications with user-friendly graphical interfaces. The 'drsmooth' package performs dose-response shape analysis and determines various points of departure (PoD) metrics and the 'PROAST' package provides algorithms for dose-response modelling. The MutAIT statistical tools, which are currently being enhanced, provide users with an efficient and comprehensive platform to conduct quantitative dose-response analyses and determine PoD values that can then be used to calculate human exposure limits or margins of exposure. © The Author 2015. Published by Oxford University Press on behalf of the UK Environmental Mutagen Society. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Parson, W; Gusmão, L; Hares, D R; Irwin, J A; Mayr, W R; Morling, N; Pokorak, E; Prinz, M; Salas, A; Schneider, P M; Parsons, T J
2014-11-01
The DNA Commission of the International Society of Forensic Genetics (ISFG) regularly publishes guidelines and recommendations concerning the application of DNA polymorphisms to the question of human identification. Previous recommendations published in 2000 addressed the analysis and interpretation of mitochondrial DNA (mtDNA) in forensic casework. While the foundations set forth in the earlier recommendations still apply, new approaches to the quality control, alignment and nomenclature of mitochondrial sequences, as well as the establishment of mtDNA reference population databases, have been developed. Here, we describe these developments and discuss their application to both mtDNA casework and mtDNA reference population databasing applications. While the generation of mtDNA for forensic casework has always been guided by specific standards, it is now well-established that data of the same quality are required for the mtDNA reference population data used to assess the statistical weight of the evidence. As a result, we introduce guidelines regarding sequence generation, as well as quality control measures based on the known worldwide mtDNA phylogeny, that can be applied to ensure the highest quality population data possible. For both casework and reference population databasing applications, the alignment and nomenclature of haplotypes is revised here and the phylogenetic alignment proffered as acceptable standard. In addition, the interpretation of heteroplasmy in the forensic context is updated, and the utility of alignment-free database searches for unbiased probability estimates is highlighted. Finally, we discuss statistical issues and define minimal standards for mtDNA database searches. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Disutility analysis of oil spills: graphs and trends.
Ventikos, Nikolaos P; Sotiropoulos, Foivos S
2014-04-15
This paper reports the results of an analysis of oil spill cost data assembled from a worldwide pollution database that mainly includes data from the International Oil Pollution Compensation Fund. The purpose of the study is to analyze the conditions of marine pollution accidents and the factors that impact the costs of oil spills worldwide. The accidents are classified into categories based on their characteristics, and the cases are compared using charts to show how the costs are affected under all conditions. This study can be used as a helpful reference for developing a detailed statistical model that is capable of reliably and realistically estimating the total costs of oil spills. To illustrate the differences identified by this statistical analysis, the results are compared with the results of previous studies, and the findings are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Toward a perceptual image quality assessment of color quantized images
NASA Astrophysics Data System (ADS)
Frackiewicz, Mariusz; Palus, Henryk
2018-04-01
Color image quantization is an important operation in the field of color image processing. In this paper, we consider new perceptual image quality metrics for assessment of quantized images. These types of metrics, e.g. DSCSI, MDSIs, MDSIm and HPSI achieve the highest correlation coefficients with MOS during tests on the six publicly available image databases. Research was limited to images distorted by two types of compression: JPG and JPG2K. Statistical analysis of correlation coefficients based on the Friedman test and post-hoc procedures showed that the differences between the four new perceptual metrics are not statistically significant.
Tolerancing aspheres based on manufacturing statistics
NASA Astrophysics Data System (ADS)
Wickenhagen, S.; Möhl, A.; Fuchs, U.
2017-11-01
A standard way of tolerancing optical elements or systems is to perform a Monte Carlo based analysis within a common optical design software package. Although, different weightings and distributions are assumed they are all counting on statistics, which usually means several hundreds or thousands of systems for reliable results. Thus, employing these methods for small batch sizes is unreliable, especially when aspheric surfaces are involved. The huge database of asphericon was used to investigate the correlation between the given tolerance values and measured data sets. The resulting probability distributions of these measured data were analyzed aiming for a robust optical tolerancing process.
A Prototype System for Retrieval of Gene Functional Information
Folk, Lillian C.; Patrick, Timothy B.; Pattison, James S.; Wolfinger, Russell D.; Mitchell, Joyce A.
2003-01-01
Microarrays allow researchers to gather data about the expression patterns of thousands of genes simultaneously. Statistical analysis can reveal which genes show statistically significant results. Making biological sense of those results requires the retrieval of functional information about the genes thus identified, typically a manual gene-by-gene retrieval of information from various on-line databases. For experiments generating thousands of genes of interest, retrieval of functional information can become a significant bottleneck. To address this issue, we are currently developing a prototype system to automate the process of retrieval of functional information from multiple on-line sources. PMID:14728346
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
2009-01-01
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Yang, Li-Hua; Du, Shi-Zheng; Sun, Jin-Fang; Mei, Si-Juan; Wang, Xiao-Qing; Zhang, Yuan-Yuan
2014-01-01
Abstract Objectives: To assess the clinical evidence of auriculotherapy for constipation treatment and to identify the efficacy of groups using Semen vaccariae or magnetic pellets as taped objects in managing constipation. Methods: Databases were searched, including five English-language databases (the Cochrane Library, PubMed, Embase, CINAHL, and AMED) and four Chinese medical databases. Only randomized controlled trials were included in the review process. Critical appraisal was conducted using the Cochrane risk of bias tool. Results: Seventeen randomized, controlled trials (RCTs) met the inclusion criteria, of which 2 had low risk of bias. The primary outcome measures were the improvement rate and total effective rate. A meta-analysis of 15 RCTs showed a moderate, significant effect of auriculotherapy in managing constipation compared with controls (relative risk [RR], 2.06; 95% confidence interval [CI], 1.52– 2.79; p<0.00001). The 15 RCTs also showed a moderate, significant effect of auriculotherapy in relieving constipation (RR, 1.28; 95% CI, 1.13–1.44; p<0.0001). For other symptoms associated with constipation, such as abdominal distension or anorexia, results of the meta-analyses showed no statistical significance. Subgroup analysis revealed that use of S. vaccariae and use of magnetic pellets were both statistically favored over the control in relieving constipation. Conclusions: Current evidence illustrated that auriculotherapy, a relatively safe strategy, is probably beneficial in managing constipation. However, most of the eligible RCTs had a high risk of bias, and all were conducted in China. No definitive conclusion can be made because of cultural and geographic differences. Further rigorous RCTs from around the world are warranted to confirm the effect and safety of auriculotherapy for constipation. PMID:25020089
Yang, Li-Hua; Duan, Pei-Bei; Du, Shi-Zheng; Sun, Jin-Fang; Mei, Si-Juan; Wang, Xiao-Qing; Zhang, Yuan-Yuan
2014-08-01
To assess the clinical evidence of auriculotherapy for constipation treatment and to identify the efficacy of groups using Semen vaccariae or magnetic pellets as taped objects in managing constipation. Databases were searched, including five English-language databases (the Cochrane Library, PubMed, Embase, CINAHL, and AMED) and four Chinese medical databases. Only randomized controlled trials were included in the review process. Critical appraisal was conducted using the Cochrane risk of bias tool. Seventeen randomized, controlled trials (RCTs) met the inclusion criteria, of which 2 had low risk of bias. The primary outcome measures were the improvement rate and total effective rate. A meta-analysis of 15 RCTs showed a moderate, significant effect of auriculotherapy in managing constipation compared with controls (relative risk [RR], 2.06; 95% confidence interval [CI], 1.52- 2.79; p<0.00001). The 15 RCTs also showed a moderate, significant effect of auriculotherapy in relieving constipation (RR, 1.28; 95% CI, 1.13-1.44; p<0.0001). For other symptoms associated with constipation, such as abdominal distension or anorexia, results of the meta-analyses showed no statistical significance. Subgroup analysis revealed that use of S. vaccariae and use of magnetic pellets were both statistically favored over the control in relieving constipation. Current evidence illustrated that auriculotherapy, a relatively safe strategy, is probably beneficial in managing constipation. However, most of the eligible RCTs had a high risk of bias, and all were conducted in China. No definitive conclusion can be made because of cultural and geographic differences. Further rigorous RCTs from around the world are warranted to confirm the effect and safety of auriculotherapy for constipation.
Weaver, J. Curtis; Feaster, Toby D.; Gotvald, Anthony J.
2009-01-01
Reliable estimates of the magnitude and frequency of floods are required for the economical and safe design of transportation and water-conveyance structures. A multistate approach was used to update methods for estimating the magnitude and frequency of floods in rural, ungaged basins in North Carolina, South Carolina, and Georgia that are not substantially affected by regulation, tidal fluctuations, or urban development. In North Carolina, annual peak-flow data available through September 2006 were available for 584 sites; 402 of these sites had a total of 10 or more years of systematic record that is required for at-site, flood-frequency analysis. Following data reviews and the computation of 20 physical and climatic basin characteristics for each station as well as at-site flood-frequency statistics, annual peak-flow data were identified for 363 sites in North Carolina suitable for use in this analysis. Among these 363 sites, 19 sites had records that could be divided into unregulated and regulated/ channelized annual peak discharges, which means peak-flow records were identified for a total of 382 cases in North Carolina. Considering the 382 cases, at-site flood-frequency statistics are provided for 333 unregulated cases (also used for the regression database) and 49 regulated/channelized cases. The flood-frequency statistics for the 333 unregulated sites were combined with data for sites from South Carolina, Georgia, and adjacent parts of Alabama, Florida, Tennessee, and Virginia to create a database of 943 sites considered for use in the regional regression analysis. Flood-frequency statistics were computed by fitting logarithms (base 10) of the annual peak flows to a log-Pearson Type III distribution. As part of the computation process, a new generalized skew coefficient was developed by using a Bayesian generalized least-squares regression model. Exploratory regression analyses using ordinary least-squares regression completed on the initial database of 943 sites resulted in defining five hydrologic regions for North Carolina, South Carolina, and Georgia. Stations with drainage areas less than 1 square mile were removed from the database, and a procedure to examine for basin redundancy (based on drainage area and periods of record) also resulted in the removal of some stations from the regression database. Flood-frequency estimates and basin characteristics for 828 gaged stations were combined to form the final database that was used in the regional regression analysis. Regional regression analysis, using generalized least-squares regression, was used to develop a set of predictive equations that can be used for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent chance exceedance flows for rural ungaged, basins in North Carolina, South Carolina, and Georgia. The final predictive equations are all functions of drainage area and the percentage of drainage basin within each of the five hydrologic regions. Average errors of prediction for these regression equations range from 34.0 to 47.7 percent. Discharge estimates determined from the systematic records for the current study are, on average, larger in magnitude than those from a previous study for the highest percent chance exceedances (50 and 20 percent) and tend to be smaller than those from the previous study for the lower percent chance exceedances when all sites are considered as a group. For example, mean differences for sites in the Piedmont hydrologic region range from positive 0.5 percent for the 50-percent chance exceedance flow to negative 4.6 percent for the 0.2-percent chance exceedance flow when stations are grouped by hydrologic region. Similarly for the same hydrologic region, median differences range from positive 0.9 percent for the 50-percent chance exceedance flow to negative 7.1 percent for the 0.2-percent chance exceedance flow. However, mean and median percentage differences between the estimates from the previous and curre
Biermann, Martin
2014-04-01
Clinical trials aiming for regulatory approval of a therapeutic agent must be conducted according to Good Clinical Practice (GCP). Clinical Data Management Systems (CDMS) are specialized software solutions geared toward GCP-trials. They are however less suited for data management in small non-GCP research projects. For use in researcher-initiated non-GCP studies, we developed a client-server database application based on the public domain CakePHP framework. The underlying MySQL database uses a simple data model based on only five data tables. The graphical user interface can be run in any web browser inside the hospital network. Data are validated upon entry. Data contained in external database systems can be imported interactively. Data are automatically anonymized on import, and the key lists identifying the subjects being logged to a restricted part of the database. Data analysis is performed by separate statistics and analysis software connecting to the database via a generic Open Database Connectivity (ODBC) interface. Since its first pilot implementation in 2011, the solution has been applied to seven different clinical research projects covering different clinical problems in different organ systems such as cancer of the thyroid and the prostate glands. This paper shows how the adoption of a generic web application framework is a feasible, flexible, low-cost, and user-friendly way of managing multidimensional research data in researcher-initiated non-GCP clinical projects. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence
Turner, Brian; Razick, Sabry; Turinsky, Andrei L.; Vlasblom, James; Crowdy, Edgard K.; Cho, Emerson; Morrison, Kyle; Wodak, Shoshana J.
2010-01-01
We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/ PMID:20940177
The EXOSAT database and archive
NASA Technical Reports Server (NTRS)
Reynolds, A. P.; Parmar, A. N.
1992-01-01
The EXOSAT database provides on-line access to the results and data products (spectra, images, and lightcurves) from the EXOSAT mission as well as access to data and logs from a number of other missions (such as EINSTEIN, COS-B, ROSAT, and IRAS). In addition, a number of familiar optical, infrared, and x ray catalogs, including the Hubble Space Telescope (HST) guide star catalog are available. The complete database is located at the EXOSAT observatory at ESTEC in the Netherlands and is accessible remotely via a captive account. The database management system was specifically developed to efficiently access the database and to allow the user to perform statistical studies on large samples of astronomical objects as well as to retrieve scientific and bibliographic information on single sources. The system was designed to be mission independent and includes timing, image processing, and spectral analysis packages as well as software to allow the easy transfer of analysis results and products to the user's own institute. The archive at ESTEC comprises a subset of the EXOSAT observations, stored on magnetic tape. Observations of particular interest were copied in compressed format to an optical jukebox, allowing users to retrieve and analyze selected raw data entirely from their terminals. Such analysis may be necessary if the user's needs are not accommodated by the products contained in the database (in terms of time resolution, spectral range, and the finesse of the background subtraction, for instance). Long-term archiving of the full final observation data is taking place at ESRIN in Italy as part of the ESIS program, again using optical media, and ESRIN have now assumed responsibility for distributing the data to the community. Tests showed that raw observational data (typically several tens of megabytes for a single target) can be transferred via the existing networks in reasonable time.
The MIND PALACE: A Multi-Spectral Imaging and Spectroscopy Database for Planetary Science
NASA Astrophysics Data System (ADS)
Eshelman, E.; Doloboff, I.; Hara, E. K.; Uckert, K.; Sapers, H. M.; Abbey, W.; Beegle, L. W.; Bhartia, R.
2017-12-01
The Multi-Instrument Database (MIND) is the web-based home to a well-characterized set of analytical data collected by a suite of deep-UV fluorescence/Raman instruments built at the Jet Propulsion Laboratory (JPL). Samples derive from a growing body of planetary surface analogs, mineral and microbial standards, meteorites, spacecraft materials, and other astrobiologically relevant materials. In addition to deep-UV spectroscopy, datasets stored in MIND are obtained from a variety of analytical techniques obtained over multiple spatial and spectral scales including electron microscopy, optical microscopy, infrared spectroscopy, X-ray fluorescence, and direct fluorescence imaging. Multivariate statistical analysis techniques, primarily Principal Component Analysis (PCA), are used to guide interpretation of these large multi-analytical spectral datasets. Spatial co-referencing of integrated spectral/visual maps is performed using QGIS (geographic information system software). Georeferencing techniques transform individual instrument data maps into a layered co-registered data cube for analysis across spectral and spatial scales. The body of data in MIND is intended to serve as a permanent, reliable, and expanding database of deep-UV spectroscopy datasets generated by this unique suite of JPL-based instruments on samples of broad planetary science interest.
Richardson, J; Smith, J E; McCall, G; Richardson, A; Pilkington, K; Kirsch, I
2007-09-01
To systematically review the research evidence on the effectiveness of hypnosis for cancer chemotherapy-induced nausea and vomiting (CINV). A comprehensive search of major biomedical databases including MEDLINE, EMBASE, ClNAHL, PsycINFO and the Cochrane Library was conducted. Specialist complementary and alternative medicine databases were searched and efforts were made to identify unpublished and ongoing research. Citations were included from the databases' inception to March 2005. Randomized controlled trials (RCTs) were appraised and meta-analysis undertaken. Clinical commentaries were obtained. Six RCTs evaluating the effectiveness of hypnosis in CINV were found. In five of these studies the participants were children. Studies report positive results including statistically significant reductions in anticipatory and CINV. Meta-analysis revealed a large effect size of hypnotic treatment when compared with treatment as usual, and the effect was at least as large as that of cognitive-behavioural therapy. Meta-analysis has demonstrated that hypnosis could be a clinically valuable intervention for anticipatory and CINV in children with cancer. Further research into the effectiveness, acceptance and feasibility of hypnosis in CINV, particularly in adults, is suggested. Future studies should assess suggestibility and provide full details of the hypnotic intervention.
Analysis of model development strategies: predicting ventral hernia recurrence.
Holihan, Julie L; Li, Linda T; Askenasy, Erik P; Greenberg, Jacob A; Keith, Jerrod N; Martindale, Robert G; Roth, J Scott; Liang, Mike K
2016-11-01
There have been many attempts to identify variables associated with ventral hernia recurrence; however, it is unclear which statistical modeling approach results in models with greatest internal and external validity. We aim to assess the predictive accuracy of models developed using five common variable selection strategies to determine variables associated with hernia recurrence. Two multicenter ventral hernia databases were used. Database 1 was randomly split into "development" and "internal validation" cohorts. Database 2 was designated "external validation". The dependent variable for model development was hernia recurrence. Five variable selection strategies were used: (1) "clinical"-variables considered clinically relevant, (2) "selective stepwise"-all variables with a P value <0.20 were assessed in a step-backward model, (3) "liberal stepwise"-all variables were included and step-backward regression was performed, (4) "restrictive internal resampling," and (5) "liberal internal resampling." Variables were included with P < 0.05 for the Restrictive model and P < 0.10 for the Liberal model. A time-to-event analysis using Cox regression was performed using these strategies. The predictive accuracy of the developed models was tested on the internal and external validation cohorts using Harrell's C-statistic where C > 0.70 was considered "reasonable". The recurrence rate was 32.9% (n = 173/526; median/range follow-up, 20/1-58 mo) for the development cohort, 36.0% (n = 95/264, median/range follow-up 20/1-61 mo) for the internal validation cohort, and 12.7% (n = 155/1224, median/range follow-up 9/1-50 mo) for the external validation cohort. Internal validation demonstrated reasonable predictive accuracy (C-statistics = 0.772, 0.760, 0.767, 0.757, 0.763), while on external validation, predictive accuracy dipped precipitously (C-statistic = 0.561, 0.557, 0.562, 0.553, 0.560). Predictive accuracy was equally adequate on internal validation among models; however, on external validation, all five models failed to demonstrate utility. Future studies should report multiple variable selection techniques and demonstrate predictive accuracy on external data sets for model validation. Copyright © 2016 Elsevier Inc. All rights reserved.
Does an Otolaryngology-Specific Database Have Added Value? A Comparative Feasibility Analysis.
Bellmunt, Angela M; Roberts, Rhonda; Lee, Walter T; Schulz, Kris; Pynnonen, Melissa A; Crowson, Matthew G; Witsell, David; Parham, Kourosh; Langman, Alan; Vambutas, Andrea; Ryan, Sheila E; Shin, Jennifer J
2016-07-01
There are multiple nationally representative databases that support epidemiologic and outcomes research, and it is unknown whether an otolaryngology-specific resource would prove indispensable or superfluous. Therefore, our objective was to determine the feasibility of analyses in the National Ambulatory Medical Care Survey (NAMCS) and National Hospital Ambulatory Medical Care Survey (NHAMCS) databases as compared with the otolaryngology-specific Creating Healthcare Excellence through Education and Research (CHEER) database. Parallel analyses in 2 data sets. Ambulatory visits in the United States. To test a fixed hypothesis that could be directly compared between data sets, we focused on a condition with expected prevalence high enough to substantiate availability in both. This query also encompassed a broad span of diagnoses to sample the breadth of available information. Specifically, we compared an assessment of suspected risk factors for sensorineural hearing loss in subjects 0 to 21 years of age, according to a predetermined protocol. We also assessed the feasibility of 6 additional diagnostic queries among all age groups. In the NAMCS/NHAMCS data set, the number of measured observations was not sufficient to support reliable numeric conclusions (percentage standard error among risk factors: 38.6-92.1). Analysis of the CHEER database demonstrated that age, sex, meningitis, and cytomegalovirus were statistically significant factors associated with pediatric sensorineural hearing loss (P < .01). Among the 6 additional diagnostic queries assessed, NAMCS/NHAMCS usage was also infeasible; the CHEER database contained 1585 to 212,521 more observations per annum. An otolaryngology-specific database has added utility when compared with already available national ambulatory databases. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2016.
NASA Astrophysics Data System (ADS)
Stanley, H. E.; Gabaix, Xavier; Gopikrishnan, Parameswaran; Plerou, Vasiliki
2007-08-01
One challenge of economics is that the systems treated by these sciences have no perfect metronome in time and no perfect spatial architecture-crystalline or otherwise. Nonetheless, as if by magic, out of nothing but randomness one finds remarkably fine-tuned processes in time. We present an overview of recent research joining practitioners of economic theory and statistical physics to try to better understand puzzles regarding economic fluctuations. One of these puzzles is how to describe outliers, phenomena that lie outside of patterns of statistical regularity. We review evidence consistent with the possibility that such outliers may not exist. This possibility is supported by recent analysis of databases containing information about each trade of every stock.
Mars Pathfinder Near-Field Rock Distribution Re-Evaluation
NASA Technical Reports Server (NTRS)
Haldemann, A. F. C.; Golombek, M. P.
2003-01-01
We have completed analysis of a new near-field rock count at the Mars Pathfinder landing site and determined that the previously published rock count suggesting 16% cumulative fractional area (CFA) covered by rocks is incorrect. The earlier value is not so much wrong (our new CFA is 20%), as right for the wrong reason: both the old and the new CFA's are consistent with remote sensing data, however the earlier determination incorrectly calculated rock coverage using apparent width rather than average diameter. Here we present details of the new rock database and the new statistics, as well as the importance of using rock average diameter for rock population statistics. The changes to the near-field data do not affect the far-field rock statistics.
A biomedical information system for retrieval and manipulation of NHANES data.
Mukherjee, Sukrit; Martins, David; Norris, Keith C; Jenders, Robert A
2013-01-01
The retrieval and manipulation of data from large public databases like the U.S. National Health and Nutrition Examination Survey (NHANES) may require sophisticated statistical software and significant expertise that may be unavailable in the university setting. In response, we have developed the Data Retrieval And Manipulation System (DReAMS), an automated information system to handle all processes of data extraction and cleaning and then joining different subsets to produce analysis-ready output. The system is a browser-based data warehouse application in which the input data from flat files or operational systems are aggregated in a structured way so that the desired data can be read, recoded, queried and extracted efficiently. The current pilot implementation of the system provides access to a limited amount of NHANES database. We plan to increase the amount of data available through the system in the near future and to extend the techniques to other large databases from CDU archive with a current holding of about 53 databases.
Kamitsuji, Shigeo; Matsuda, Takashi; Nishimura, Koichi; Endo, Seiko; Wada, Chisa; Watanabe, Kenji; Hasegawa, Koichi; Hishigaki, Haretsugu; Masuda, Masatoshi; Kuwahara, Yusuke; Tsuritani, Katsuki; Sugiura, Kenkichi; Kubota, Tomoko; Miyoshi, Shinji; Okada, Kinya; Nakazono, Kazuyuki; Sugaya, Yuki; Yang, Woosung; Sawamoto, Taiji; Uchida, Wataru; Shinagawa, Akira; Fujiwara, Tsutomu; Yamada, Hisaharu; Suematsu, Koji; Tsutsui, Naohisa; Kamatani, Naoyuki; Liou, Shyh-Yuh
2015-06-01
Japan Pharmacogenomics Data Science Consortium (JPDSC) has assembled a database for conducting pharmacogenomics (PGx) studies in Japanese subjects. The database contains the genotypes of 2.5 million single-nucleotide polymorphisms (SNPs) and 5 human leukocyte antigen loci from 2994 Japanese healthy volunteers, as well as 121 kinds of clinical information, including self-reports, physiological data, hematological data and biochemical data. In this article, the reliability of our data was evaluated by principal component analysis (PCA) and association analysis for hematological and biochemical traits by using genome-wide SNP data. PCA of the SNPs showed that all the samples were collected from the Japanese population and that the samples were separated into two major clusters by birthplace, Okinawa and other than Okinawa, as had been previously reported. Among 87 SNPs that have been reported to be associated with 18 hematological and biochemical traits in genome-wide association studies (GWAS), the associations of 56 SNPs were replicated using our data base. Statistical power simulations showed that the sample size of the JPDSC control database is large enough to detect genetic markers having a relatively strong association even when the case sample size is small. The JPDSC database will be useful as control data for conducting PGx studies to explore genetic markers to improve the safety and efficacy of drugs either during clinical development or in post-marketing.
[Effect of vinegar-processed Curcumae Rhizoma on bile metabolism in rats].
Gu, Wei; Lu, Tu-Lin; Li, Jin-Ci; Wang, Qiao-Han; Pan, Zi-Hao; Ji, De; Li, Lin; Zhang, Ji; Mao, Chun-Qin
2016-04-01
To explore the effect of vinegar-processed Curcumae Rhizoma on endogenous metabolites in bile by investigating the endogenous metabolites difference in bile before and after Curcumae Rhizoma was processed with vinegar. Alcohol extracts of crude and vinegar-processed Curcumae Rhizoma, as well as normal saline were prepared respectively, which were then given to the rats by intragastric administration for 0.5 h. Then common bile duct intubation drainage was conducted to collect 12 h bile of the rats. UPLC-TOF-MS analysis of bile samples was applied after 1∶3 acetonitrile protein precipitation; unidimensional statistics were combined with multivariate statistics and PeakView software was compared with network database to identify the potential biomarkers. Vinegar-processed Curcumae Rhizoma extracts had significant effects on metabolites spectrum in bile of the rats. With the boundaries of P<0.05, 13 metabolites with significant differences were found in bile of crude and vinegar-processed Curcumae Rhizoma groups, and 8 of them were identified when considering the network database. T-test unidimensional statistical analysis was applied between administration groups and blank group to obtain 7 metabolites with significant differences and identify them as potential biomarkers. 6 of the potential biomarkers were up-regulated in vinegar-processed group, which were related to the metabolism regulation of phospholipid metabolism, fat metabolism, bile acid metabolism, and N-acylethanolamine hydrolysis reaction balance, indicating the mechanism of vinegar-processed Curcumae Rhizoma on endogenous metabolites in bile of the rats. Copyright© by the Chinese Pharmaceutical Association.
RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo
2007-01-01
Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253
Statistical analysis of DOE EML QAP data from 1982 to 1998.
Mizanur Rahman, G M; Isenhour, T L; Larget, B; Greenlaw, P D
2001-01-01
The historical database from the Environmental Measurements Laboratory's Quality Assessment Program from 1982 to 1998 has been analyzed to determine control limits for future performance evaluations of the different laboratories contracted to the U.S. Department of Energy. Seventy-three radionuclides in four different matrices (air filter, soil, vegetation, and water) were analyzed. The evaluation criteria were established based on a z-score calculation.
GeneTools--application for functional annotation and statistical hypothesis testing.
Beisvag, Vidar; Jünge, Frode K R; Bergum, Hallgeir; Jølsum, Lars; Lydersen, Stian; Günther, Clara-Cecilie; Ramampiaro, Heri; Langaas, Mette; Sandvik, Arne K; Laegreid, Astrid
2006-10-24
Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.no
Bosma, Laine; Balen, Robert M; Davidson, Erin; Jewesson, Peter J
2003-01-01
The development and integration of a personal digital assistant (PDA)-based point-of-care database into an intravenous resource nurse (IVRN) consultation service for the purposes of consultation management and service characterization are described. The IVRN team provides a consultation service 7 days a week in this 1000-bed tertiary adult care teaching hospital. No simple, reliable method for documenting IVRN patient care activity and facilitating IVRN-initiated patient follow-up evaluation was available. Implementation of a PDA database with exportability of data to statistical analysis software was undertaken in July 2001. A Palm IIIXE PDA was purchased and a three-table, 13-field database was developed using HanDBase software. During the 7-month period of data collection, the IVRN team recorded 4868 consultations for 40 patient care areas. Full analysis of service characteristics was conducted using SPSS 10.0 software. Team members adopted the new technology with few problems, and the authors now can efficiently track and analyze the services provided by their IVRN team.
Interactive Multi-Instrument Database of Solar Flares (IMIDSF)
NASA Astrophysics Data System (ADS)
Sadykov, Viacheslav M.; Nita, Gelu M.; Oria, Vincent; Kosovichev, Alexander G.
2017-08-01
Solar flares represent a complicated physical phenomenon observed in a broad range of the electromagnetic spectrum, from radiowaves to gamma-rays. For a complete understanding of the flares it is necessary to perform a combined multi-wavelength analysis using observations from many satellites and ground-based observatories. For efficient data search, integration of different flare lists and representation of observational data, we have developed the Interactive Multi-Instrument Database of Solar Flares (https://solarflare.njit.edu/). The web database is fully functional and allows the user to search for uniquely-identified flare events based on their physical descriptors and availability of observations of a particular set of instruments. Currently, data from three primary flare lists (GOES, RHESSI and HEK) and a variety of other event catalogs (Hinode, Fermi GBM, Konus-Wind, OVSA flare catalogs, CACTus CME catalog, Filament eruption catalog) and observing logs (IRIS and Nobeyama coverage), are integrated. An additional set of physical descriptors (temperature and emission measure) along with observing summary, data links and multi-wavelength light curves is provided for each flare event since January 2002. Results of an initial statistical analysis will be presented.
Data exploration systems for databases
NASA Technical Reports Server (NTRS)
Greene, Richard J.; Hield, Christopher
1992-01-01
Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
A Review of Calibration Transfer Practices and Instrument Differences in Spectroscopy.
Workman, Jerome J
2018-03-01
Calibration transfer for use with spectroscopic instruments, particularly for near-infrared, infrared, and Raman analysis, has been the subject of multiple articles, research papers, book chapters, and technical reviews. There has been a myriad of approaches published and claims made for resolving the problems associated with transferring calibrations; however, the capability of attaining identical results over time from two or more instruments using an identical calibration still eludes technologists. Calibration transfer, in a precise definition, refers to a series of analytical approaches or chemometric techniques used to attempt to apply a single spectral database, and the calibration model developed using that database, for two or more instruments, with statistically retained accuracy and precision. Ideally, one would develop a single calibration for any particular application, and move it indiscriminately across instruments and achieve identical analysis or prediction results. There are many technical aspects involved in such precision calibration transfer, related to the measuring instrument reproducibility and repeatability, the reference chemical values used for the calibration, the multivariate mathematics used for calibration, and sample presentation repeatability and reproducibility. Ideally, a multivariate model developed on a single instrument would provide a statistically identical analysis when used on other instruments following transfer. This paper reviews common calibration transfer techniques, mostly related to instrument differences, and the mathematics of the uncertainty between instruments when making spectroscopic measurements of identical samples. It does not specifically address calibration maintenance or reference laboratory differences.
Sun, Sam Z; Empie, Mark W
2007-08-01
The relationship between obesity risk and sugar-sweetened beverage (SSB) consumption was examined together with multiple lifestyle factors. Statistical analysis was performed using population dietary survey databases of USDA CSFII 1989-1991, CSFII 1994-1996, CDC NHANES III, and combined NHANES 1999-2002. Totally, 38,409 individuals, ages 20-74 years, with accompanying data of dietary intake, lifestyle factors, and anthropometrics were included in the descriptive statistics and risk analysis. Analytical results indicate that obesity risk was significantly and positively associated with gender, age, daily TV/screen watching hours and dietary fat content, and negatively associated with smoking habit, education and physical activity; obesity risk was not significantly associated with SSB consumption pattern, dietary saturated fat content and total calorie intake. No elevated BMI values or increased obesity rates were observed in populations frequently consuming SSB compared to populations infrequently consuming SSB. Additionally, one-day food consumption data was found to overestimate SSB usual intake by up to 38.9% compared to the data of multiple survey days. multiple lifestyle factors and higher dietary fat intake were significantly associated with obesity risk. Populations who frequently consumed SSB, primarily HFCS sweetened beverages, did not have a higher obesity rate or increased obesity risk than that of populations which consumed SSB infrequently.
Primary prevention of dental erosion by calcium and fluoride: a systematic review.
Zini, A; Krivoroutski, Y; Vered, Y
2014-02-01
Overviews of the current literature only provide summaries of existing relevant preventive strategies for dental erosion. To perform a systematic review according to the quantitative meta-analysis method of the scientific literature on prevention of dental erosion. The focused question will address primary prevention of dental erosion by calcium and fluoride. Randomized clinical trials (RCTs) regarding dental erosion prevention. The search included five databases: Embase, Cochrane database of systematic reviews, PubMed (MEDLINE), FDA publication and Berman medical library of the Hebrew University. The search included data in the English language, with effect on preventing dental erosion always presented as mean enamel loss and measured by profilometer. Statistical meta-analysis was performed by StatsDirect program and PEPI statistical software. Fixed- and random-effect models were used to analyse the data. Heterogeneity tests were employed to validate the fixed-effect model assumption. A total of 475 articles on dental erosion prevention were located. A four-stage selection process was employed, and 10 RCT articles were found to be suitable for meta-analysis. The number of studies on prevention of dental erosion maintaining standards of evidence-based dentistry remains insufficient to reach any definite conclusions. The focused questions of this review cannot be addressed according to the existing literature. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
FTree query construction for virtual screening: a statistical analysis.
Gerlach, Christof; Broughton, Howard; Zaliani, Andrea
2008-02-01
FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
FTree query construction for virtual screening: a statistical analysis
NASA Astrophysics Data System (ADS)
Gerlach, Christof; Broughton, Howard; Zaliani, Andrea
2008-02-01
FTrees (FT) is a known chemoinformatic tool able to condense molecular descriptions into a graph object and to search for actives in large databases using graph similarity. The query graph is classically derived from a known active molecule, or a set of actives, for which a similar compound has to be found. Recently, FT similarity has been extended to fragment space, widening its capabilities. If a user were able to build a knowledge-based FT query from information other than a known active structure, the similarity search could be combined with other, normally separate, fields like de-novo design or pharmacophore searches. With this aim in mind, we performed a comprehensive analysis of several databases in terms of FT description and provide a basic statistical analysis of the FT spaces so far at hand. Vendors' catalogue collections and MDDR as a source of potential or known "actives", respectively, have been used. With the results reported herein, a set of ranges, mean values and standard deviations for several query parameters are presented in order to set a reference guide for the users. Applications on how to use this information in FT query building are also provided, using a newly built 3D-pharmacophore from 57 5HT-1F agonists and a published one which was used for virtual screening for tRNA-guanine transglycosylase (TGT) inhibitors.
Transport and Environment Database System (TRENDS): Maritime air pollutant emission modelling
NASA Astrophysics Data System (ADS)
Georgakaki, Aliki; Coffey, Robert A.; Lock, Graham; Sorenson, Spencer C.
This paper reports the development of the maritime module within the framework of the Transport and Environment Database System (TRENDS) project. A detailed database has been constructed for the calculation of energy consumption and air pollutant emissions. Based on an in-house database of commercial vessels kept at the Technical University of Denmark, relationships between the fuel consumption and size of different vessels have been developed, taking into account the fleet's age and service speed. The technical assumptions and factors incorporated in the database are presented, including changes from findings reported in Methodologies for Estimating air pollutant Emissions from Transport (MEET). The database operates on statistical data provided by Eurostat, which describe vessel and freight movements from and towards EU 15 major ports. Data are at port to Maritime Coastal Area (MCA) level, so a bottom-up approach is used. A port to MCA distance database has also been constructed for the purpose of the study. This was the first attempt to use Eurostat maritime statistics for emission modelling; and the problems encountered, since the statistical data collection was not undertaken with a view to this purpose, are mentioned. Examples of the results obtained by the database are presented. These include detailed air pollutant emission calculations for bulk carriers entering the port of Helsinki, as an example of the database operation, and aggregate results for different types of movements for France. Overall estimates of SO x and NO x emission caused by shipping traffic between the EU 15 countries are in the area of 1 and 1.5 million tonnes, respectively.
Evaluating the statistical methodology of randomized trials on dentin hypersensitivity management.
Matranga, Domenica; Matera, Federico; Pizzo, Giuseppe
2017-12-27
The present study aimed to evaluate the characteristics and quality of statistical methodology used in clinical studies on dentin hypersensitivity management. An electronic search was performed for data published from 2009 to 2014 by using PubMed, Ovid/MEDLINE, and Cochrane Library databases. The primary search terms were used in combination. Eligibility criteria included randomized clinical trials that evaluated the efficacy of desensitizing agents in terms of reducing dentin hypersensitivity. A total of 40 studies were considered eligible for assessment of quality statistical methodology. The four main concerns identified were i) use of nonparametric tests in the presence of large samples, coupled with lack of information about normality and equality of variances of the response; ii) lack of P-value adjustment for multiple comparisons; iii) failure to account for interactions between treatment and follow-up time; and iv) no information about the number of teeth examined per patient and the consequent lack of cluster-specific approach in data analysis. Owing to these concerns, statistical methodology was judged as inappropriate in 77.1% of the 35 studies that used parametric methods. Additional studies with appropriate statistical analysis are required to obtain appropriate assessment of the efficacy of desensitizing agents.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grant, C W; Lenderman, J S; Gansemer, J D
This document is an update to the 'ADIS Algorithm Evaluation Project Plan' specified in the Statement of Work for the US-VISIT Identity Matching Algorithm Evaluation Program, as deliverable II.D.1. The original plan was delivered in August 2010. This document modifies the plan to reflect modified deliverables reflecting delays in obtaining a database refresh. This document describes the revised schedule of the program deliverables. The detailed description of the processes used, the statistical analysis processes and the results of the statistical analysis will be described fully in the program deliverables. The US-VISIT Identity Matching Algorithm Evaluation Program is work performed bymore » Lawrence Livermore National Laboratory (LLNL) under IAA HSHQVT-07-X-00002 P00004 from the Department of Homeland Security (DHS).« less
Potential errors and misuse of statistics in studies on leakage in endodontics.
Lucena, C; Lopez, J M; Pulgar, R; Abalos, C; Valderrama, M J
2013-04-01
To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and to compare the results found using appropriate versus inappropriate inferential statistical methods. The search strategy used the descriptors 'root filling' 'microleakage', 'dye penetration', 'dye leakage', 'polymicrobial leakage' and 'fluid filtration' for the time interval 2001-2010 in journals within the categories 'Dentistry, Oral Surgery and Medicine' and 'Materials Science, Biomaterials' of the Journal Citation Report. All retrieved articles were reviewed to find potential pitfalls in statistical methodology that may be encountered during study design, data management or data analysis. The database included 209 papers. In all the studies reviewed, the statistical methods used were appropriate for the category attributed to the outcome variable, but in 41% of the cases, the chi-square test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no statistical test was used. In 99% of cases, a statistically 'significant' or 'not significant' effect was reported as a main finding, whilst only 1% also presented an estimation of the magnitude of the effect. When the appropriate statistical methods were applied in the studies with originally inappropriate data analysis, the conclusions changed in 19% of the cases. Statistical deficiencies in leakage studies may affect their results and interpretation and might be one of the reasons for the poor agreement amongst the reported findings. Therefore, more effort should be made to standardize statistical methodology. © 2012 International Endodontic Journal.
The European Southern Observatory-MIDAS table file system
NASA Technical Reports Server (NTRS)
Peron, M.; Grosbol, P.
1992-01-01
The new and substantially upgraded version of the Table File System in MIDAS is presented as a scientific database system. MIDAS applications for performing database operations on tables are discussed, for instance, the exchange of the data to and from the TFS, the selection of objects, the uncertainty joins across tables, and the graphical representation of data. This upgraded version of the TFS is a full implementation of the binary table extension of the FITS format; in addition, it also supports arrays of strings. Different storage strategies for optimal access of very large data sets are implemented and are addressed in detail. As a simple relational database, the TFS may be used for the management of personal data files. This opens the way to intelligent pipeline processing of large amounts of data. One of the key features of the Table File System is to provide also an extensive set of tools for the analysis of the final results of a reduction process. Column operations using standard and special mathematical functions as well as statistical distributions can be carried out; commands for linear regression and model fitting using nonlinear least square methods and user-defined functions are available. Finally, statistical tests of hypothesis and multivariate methods can also operate on tables.
Hadoop and friends - first experience at CERN with a new platform for high throughput analysis steps
NASA Astrophysics Data System (ADS)
Duellmann, D.; Surdy, K.; Menichetti, L.; Toebbicke, R.
2017-10-01
The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources.
Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme
2012-09-01
A quantitative determinants-of-exposure analysis of respirable crystalline silica (RCS) levels in the construction industry was performed using a database compiled from an extensive literature review. Statistical models were developed to predict work-shift exposure levels by trade. Monte Carlo simulation was used to recreate exposures derived from summarized measurements which were combined with single measurements for analysis. Modeling was performed using Tobit models within a multimodel inference framework, with year, sampling duration, type of environment, project purpose, project type, sampling strategy and use of exposure controls as potential predictors. 1346 RCS measurements were included in the analysis, of which 318 were non-detects and 228 were simulated from summary statistics. The model containing all the variables explained 22% of total variability. Apart from trade, sampling duration, year and strategy were the most influential predictors of RCS levels. The use of exposure controls was associated with an average decrease of 19% in exposure levels compared to none, and increased concentrations were found for industrial, demolition and renovation projects. Predicted geometric means for year 1999 were the highest for drilling rig operators (0.238 mg m(-3)) and tunnel construction workers (0.224 mg m(-3)), while the estimated exceedance fraction of the ACGIH TLV by trade ranged from 47% to 91%. The predicted geometric means in this study indicated important overexposure compared to the TLV. However, the low proportion of variability explained by the models suggests that the construction trade is only a moderate predictor of work-shift exposure levels. The impact of the different tasks performed during a work shift should also be assessed to provide better management and control of RCS exposure levels on construction sites.
Lee, Nathan J; Guzman, Javier Z; Kim, Jun; Skovrlj, Branko; Martin, Christopher T; Pugely, Andrew J; Gao, Yubo; Caridi, John M; Mendoza-Lattes, Sergio; Cho, Samuel K
2016-11-01
Retrospective cohort analysis. A growing number of publications have utilized the Scoliosis Research Society (SRS) Morbidity and Mortality (M&M) database, but none have compared it to other large databases. The objective of this study was to compare SRS complications with those in administrative databases. The Nationwide Inpatient Sample (NIS) and Kid's Inpatient Database (KID) captured a greater number of overall complications while the SRS M&M data provided a greater incidence of spine-related complications following adolescent idiopathic scoliosis (AIS) surgery. Chi-square was used to obtain statistical significance, with p < .05 considered significant. The SRS 2004-2007 (9,904 patients), NIS 2004-2007 (20,441 patients) and KID 2003-2006 (10,184 patients) databases were analyzed for AIS patients who underwent fusion. Comparable variables were queried in all three databases, including patient demographics, surgical variables, and complications. Patients undergoing AIS in the SRS database were slightly older (SRS 14.4 years vs. NIS 13.8 years, p < .0001; KID 13.9 years, p < .0001) and less likely to be male (SRS 18.5% vs. NIS 26.3%, p < .0001; KID 24.8%, p < .0001). Revision surgery (SRS 3.3% vs. NIS 2.4%, p < .0001; KID 0.9%, p < .0001) and osteotomy (SRS 8% vs. NIS 2.3%, p < .0001; KID 2.4%, p < .0001) were more commonly reported in the SRS database. The SRS database reported fewer overall complications (SRS 3.9% vs. NIS 7.3%, p < .0001; KID 6.6%, p < .0001). However, when respiratory complications (SRS 0.5% vs. NIS 3.7%, p < .0001; KID 4.4%, p < .0001) were excluded, medical complication rates were similar across databases. In contrast, SRS reported higher spine-specific complication rates. Mortality rates were similar between SRS versus NIS (p = .280) and SRS versus KID (p = .08) databases. There are similarities and differences between the three databases. These discrepancies are likely due to the varying data-gathering methods each organization uses to collect their morbidity data. Level IV. Copyright © 2016 Scoliosis Research Society. Published by Elsevier Inc. All rights reserved.
Multivariate analysis in thoracic research.
Mengual-Macenlle, Noemí; Marcos, Pedro J; Golpe, Rafael; González-Rivas, Diego
2015-03-01
Multivariate analysis is based in observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. The development of multivariate methods emerged to analyze large databases and increasingly complex data. Since the best way to represent the knowledge of reality is the modeling, we should use multivariate statistical methods. Multivariate methods are designed to simultaneously analyze data sets, i.e., the analysis of different variables for each person or object studied. Keep in mind at all times that all variables must be treated accurately reflect the reality of the problem addressed. There are different types of multivariate analysis and each one should be employed according to the type of variables to analyze: dependent, interdependence and structural methods. In conclusion, multivariate methods are ideal for the analysis of large data sets and to find the cause and effect relationships between variables; there is a wide range of analysis types that we can use.
Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno
2012-01-01
Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
National Online Meeting Proceedings (15th, New York, New York, May 10-12, 1994).
ERIC Educational Resources Information Center
1994
This proceedings contains 58 papers that were reviewed and selected for presentation at the 1994 National Online Meeting. The introduction, "Highlights of the Online/CD-ROM Database Industry: Implications of the Internet for Database Producers" by Martha E. Williams, provides statistics regarding databases, database records, database…
bpRNA: large-scale automated annotation and analysis of RNA secondary structure.
Danaee, Padideh; Rouches, Mason; Wiley, Michelle; Deng, Dezhong; Huang, Liang; Hendrix, David
2018-05-09
While RNA secondary structure prediction from sequence data has made remarkable progress, there is a need for improved strategies for annotating the features of RNA secondary structures. Here, we present bpRNA, a novel annotation tool capable of parsing RNA structures, including complex pseudoknot-containing RNAs, to yield an objective, precise, compact, unambiguous, easily-interpretable description of all loops, stems, and pseudoknots, along with the positions, sequence, and flanking base pairs of each such structural feature. We also introduce several new informative representations of RNA structure types to improve structure visualization and interpretation. We have further used bpRNA to generate a web-accessible meta-database, 'bpRNA-1m', of over 100 000 single-molecule, known secondary structures; this is both more fully and accurately annotated and over 20-times larger than existing databases. We use a subset of the database with highly similar (≥90% identical) sequences filtered out to report on statistical trends in sequence, flanking base pairs, and length. Both the bpRNA method and the bpRNA-1m database will be valuable resources both for specific analysis of individual RNA molecules and large-scale analyses such as are useful for updating RNA energy parameters for computational thermodynamic predictions, improving machine learning models for structure prediction, and for benchmarking structure-prediction algorithms.
Multivariate statistical analysis of wildfires in Portugal
NASA Astrophysics Data System (ADS)
Costa, Ricardo; Caramelo, Liliana; Pereira, Mário
2013-04-01
Several studies demonstrate that wildfires in Portugal present high temporal and spatial variability as well as cluster behavior (Pereira et al., 2005, 2011). This study aims to contribute to the characterization of the fire regime in Portugal with the multivariate statistical analysis of the time series of number of fires and area burned in Portugal during the 1980 - 2009 period. The data used in the analysis is an extended version of the Rural Fire Portuguese Database (PRFD) (Pereira et al, 2011), provided by the National Forest Authority (Autoridade Florestal Nacional, AFN), the Portuguese Forest Service, which includes information for more than 500,000 fire records. There are many multiple advanced techniques for examining the relationships among multiple time series at the same time (e.g., canonical correlation analysis, principal components analysis, factor analysis, path analysis, multiple analyses of variance, clustering systems). This study compares and discusses the results obtained with these different techniques. Pereira, M.G., Trigo, R.M., DaCamara, C.C., Pereira, J.M.C., Leite, S.M., 2005: "Synoptic patterns associated with large summer forest fires in Portugal". Agricultural and Forest Meteorology. 129, 11-25. Pereira, M. G., Malamud, B. D., Trigo, R. M., and Alves, P. I.: The history and characteristics of the 1980-2005 Portuguese rural fire database, Nat. Hazards Earth Syst. Sci., 11, 3343-3358, doi:10.5194/nhess-11-3343-2011, 2011 This work is supported by European Union Funds (FEDER/COMPETE - Operational Competitiveness Programme) and by national funds (FCT - Portuguese Foundation for Science and Technology) under the project FCOMP-01-0124-FEDER-022692, the project FLAIR (PTDC/AAC-AMB/104702/2008) and the EU 7th Framework Program through FUME (contract number 243888).
Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C
2018-03-07
Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
NASA Astrophysics Data System (ADS)
Poppe, Sam; Barette, Florian; Smets, Benoît; Benbakkar, Mhammed; Kervyn, Matthieu
2016-04-01
The Virunga Volcanic Province (VVP) is situated within the western branch of the East-African Rift. The geochemistry and petrology of its' volcanic products has been studied extensively in a fragmented manner. They represent a unique collection of silica-undersaturated, ultra-alkaline and ultra-potassic compositions, displaying marked geochemical variations over the area occupied by the VVP. We present a novel spatially-explicit database of existing whole-rock geochemical analyses of the VVP volcanics, compiled from international publications, (post-)colonial scientific reports and PhD theses. In the database, a total of 703 geochemical analyses of whole-rock samples collected from the 1950s until recently have been characterised with a geographical location, eruption source location, analytical results and uncertainty estimates for each of these categories. Comparative box plots and Kruskal-Wallis H tests on subsets of analyses with contrasting ages or analytical methods suggest that the overall database accuracy is consistent. We demonstrate how statistical techniques such as Principal Component Analysis (PCA) and subsequent cluster analysis allow the identification of clusters of samples with similar major-element compositions. The spatial patterns represented by the contrasting clusters show that both the historically active volcanoes represent compositional clusters which can be identified based on their contrasted silica and alkali contents. Furthermore, two sample clusters are interpreted to represent the most primitive, deep magma source within the VVP, different from the shallow magma reservoirs that feed the eight dominant large volcanoes. The samples from these two clusters systematically originate from locations which 1. are distal compared to the eight large volcanoes and 2. mostly coincide with the surface expressions of rift faults or NE-SW-oriented inherited Precambrian structures which were reactivated during rifting. The lava from the Mugogo eruption of 1957 belongs to these primitive clusters and is the only known to have erupted outside the current rift valley in historical times. We thus infer there is a distributed hazard of vent opening susceptibility additional to the susceptibility associated with the main Virunga edifices. This study suggests that the statistical analysis of such geochemical database may help to understand complex volcanic plumbing systems and the spatial distribution of volcanic hazards in active and poorly known volcanic areas such as the Virunga Volcanic Province.
Li, Min; Dong, Xiang-yu; Liang, Hao; Leng, Li; Zhang, Hui; Wang, Shou-zhi; Li, Hui; Du, Zhi-Qiang
2017-05-20
Effective management and analysis of precisely recorded phenotypic traits are important components of the selection and breeding of superior livestocks. Over two decades, we divergently selected chicken lines for abdominal fat content at Northeast Agricultural University (Northeast Agricultural University High and Low Fat, NEAUHLF), and collected large volume of phenotypic data related to the investigation on molecular genetic basis of adipose tissue deposition in broilers. To effectively and systematically store, manage and analyze phenotypic data, we built the NEAUHLF Phenome Database (NEAUHLFPD). NEAUHLFPD included the following phenotypic records: pedigree (generations 1-19) and 29 phenotypes, such as body sizes and weights, carcass traits and their corresponding rates. The design and construction strategy of NEAUHLFPD were executed as follows: (1) Framework design. We used Apache as our web server, MySQL and Navicat as database management tools, and PHP as the HTML-embedded language to create dynamic interactive website. (2) Structural components. On the main interface, detailed introduction on the composition, function, and the index buttons of the basic structure of the database could be found. The functional modules of NEAUHLFPD had two main components: the first module referred to the physical storage space for phenotypic data, in which functional manipulation on data can be realized, such as data indexing, filtering, range-setting, searching, etc.; the second module related to the calculation of basic descriptive statistics, where data filtered from the database can be used for the computation of basic statistical parameters and the simultaneous conditional sorting. NEAUHLFPD could be used to effectively store and manage not only phenotypic, but also genotypic and genomics data, which can facilitate further investigation on the molecular genetic basis of chicken adipose tissue growth and development, and expedite the selection and breeding of broilers with low fat content.
Splendore, Alessandra; Fanganiello, Roberto D; Masotti, Cibele; Morganti, Lucas S C; Passos-Bueno, M Rita
2005-05-01
Recently, a novel exon was described in TCOF1 that, although alternatively spliced, is included in the major protein isoform. In addition, most published mutations in this gene do not conform to current mutation nomenclature guidelines. Given these observations, we developed an online database of TCOF1 mutations in which all the reported mutations are renamed according to standard recommendations and in reference to the genomic and novel cDNA reference sequences (www.genoma.ib.usp.br/TCOF1_database). We also report in this work: 1) results of the first screening for large deletions in TCOF1 by Southern blot in patients without mutation detected by direct sequencing; 2) the identification of the first pathogenic mutation in the newly described exon 6A; and 3) statistical analysis of pathogenic mutations and polymorphism distribution throughout the gene.
SHOP: scaffold HOPping by GRID-based similarity searches.
Bergmann, Rikke; Linusson, Anna; Zamora, Ismael
2007-05-31
A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known ligands of three different protein targets relevant for drug discovery using a rational approach based on statistical experimental design. Five out of eight and seven out of eight thrombin scaffolds and all seven HIV protease scaffolds were recovered within the top 10 and 31 out of 31 neuraminidase scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures. The databases contained scaffolds from published combinatorial libraries to ensure that identified scaffolds could be feasibly synthesized.
Effectiveness of propolis on oral health: a meta-analysis.
Hwu, Yueh-Juen; Lin, Feng-Yu
2014-12-01
The use of propolis mouth rinse or gel as a supplementary intervention has increased during the last decade in Taiwan. However, the effect of propolis on oral health is not well understood. The purpose of this meta-analysis was to present the best available evidence regarding the effects of propolis use on oral health, including oral infection, dental plaque, and stomatitis. Researchers searched seven electronic databases for relevant articles published between 1969 and 2012. Data were collected using inclusion and exclusion criteria. The Joanna Briggs Institute Meta Analysis of Statistics Assessment and Review Instrument was used to evaluate the quality of the identified articles. Eight trials published from 1997 to 2011 with 194 participants had extractable data. The result of the meta-analysis indicated that, although propolis had an effect on reducing dental plaque, this effect was not statistically significant. The results were not statistically significant for oral infection or stomatitis. Although there are a number of promising indications, in view of the limited number and quality of studies and the variation in results among studies, this review highlights the need for additional well-designed trials to draw conclusions that are more robust.
An Introduction to MAMA (Meta-Analysis of MicroArray data) System.
Zhang, Zhe; Fenstermacher, David
2005-01-01
Analyzing microarray data across multiple experiments has been proven advantageous. To support this kind of analysis, we are developing a software system called MAMA (Meta-Analysis of MicroArray data). MAMA utilizes a client-server architecture with a relational database on the server-side for the storage of microarray datasets collected from various resources. The client-side is an application running on the end user's computer that allows the user to manipulate microarray data and analytical results locally. MAMA implementation will integrate several analytical methods, including meta-analysis within an open-source framework offering other developers the flexibility to plug in additional statistical algorithms.
The clinical value of large neuroimaging data sets in Alzheimer's disease.
Toga, Arthur W
2012-02-01
Rapid advances in neuroimaging and cyberinfrastructure technologies have brought explosive growth in the Web-based warehousing, availability, and accessibility of imaging data on a variety of neurodegenerative and neuropsychiatric disorders and conditions. There has been a prolific development and emergence of complex computational infrastructures that serve as repositories of databases and provide critical functionalities such as sophisticated image analysis algorithm pipelines and powerful three-dimensional visualization and statistical tools. The statistical and operational advantages of collaborative, distributed team science in the form of multisite consortia push this approach in a diverse range of population-based investigations. Copyright © 2012 Elsevier Inc. All rights reserved.
Structural texture similarity metrics for image analysis and retrieval.
Zujovic, Jana; Pappas, Thrasyvoulos N; Neuhoff, David L
2013-07-01
We develop new metrics for texture similarity that accounts for human visual perception and the stochastic nature of textures. The metrics rely entirely on local image statistics and allow substantial point-by-point deviations between textures that according to human judgment are essentially identical. The proposed metrics extend the ideas of structural similarity and are guided by research in texture analysis-synthesis. They are implemented using a steerable filter decomposition and incorporate a concise set of subband statistics, computed globally or in sliding windows. We conduct systematic tests to investigate metric performance in the context of "known-item search," the retrieval of textures that are "identical" to the query texture. This eliminates the need for cumbersome subjective tests, thus enabling comparisons with human performance on a large database. Our experimental results indicate that the proposed metrics outperform peak signal-to-noise ratio (PSNR), structural similarity metric (SSIM) and its variations, as well as state-of-the-art texture classification metrics, using standard statistical measures.
Factorial analysis of trihalomethanes formation in drinking water.
Chowdhury, Shakhawat; Champagne, Pascale; McLellan, P James
2010-06-01
Disinfection of drinking water reduces pathogenic infection, but may pose risks to human health through the formation of disinfection byproducts. The effects of different factors on the formation of trihalomethanes were investigated using a statistically designed experimental program, and a predictive model for trihalomethanes formation was developed. Synthetic water samples with different factor levels were produced, and trihalomethanes concentrations were measured. A replicated fractional factorial design with center points was performed, and significant factors were identified through statistical analysis. A second-order trihalomethanes formation model was developed from 92 experiments, and the statistical adequacy was assessed through appropriate diagnostics. This model was validated using additional data from the Drinking Water Surveillance Program database and was applied to the Smiths Falls water supply system in Ontario, Canada. The model predictions were correlated strongly to the measured trihalomethanes, with correlations of 0.95 and 0.91, respectively. The resulting model can assist in analyzing risk-cost tradeoffs in the design and operation of water supply systems.
Pike, Katie; Nash, Rachel L; Murphy, Gavin J; Reeves, Barnaby C; Rogers, Chris A
2015-02-22
The Transfusion Indication Threshold Reduction (TITRe2) trial is the largest randomized controlled trial to date to compare red blood cell transfusion strategies following cardiac surgery. This update presents the statistical analysis plan, detailing how the study will be analyzed and presented. The statistical analysis plan has been written following recommendations from the International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use, prior to database lock and the final analysis of trial data. Outlined analyses are in line with the Consolidated Standards of Reporting Trials (CONSORT). The study aims to randomize 2000 patients from 17 UK centres. Patients are randomized to either a restrictive (transfuse if haemoglobin concentration <7.5 g/dl) or liberal (transfuse if haemoglobin concentration <9 g/dl) transfusion strategy. The primary outcome is a binary composite outcome of any serious infectious or ischaemic event in the first 3 months following randomization. The statistical analysis plan details how non-adherence with the intervention, withdrawals from the study, and the study population will be derived and dealt with in the analysis. The planned analyses of the trial primary and secondary outcome measures are described in detail, including approaches taken to deal with multiple testing, model assumptions not being met and missing data. Details of planned subgroup and sensitivity analyses and pre-specified ancillary analyses are given, along with potential issues that have been identified with such analyses and possible approaches to overcome such issues. ISRCTN70923932 .
Research of facial feature extraction based on MMC
NASA Astrophysics Data System (ADS)
Xue, Donglin; Zhao, Jiufen; Tang, Qinhong; Shi, Shaokun
2017-07-01
Based on the maximum margin criterion (MMC), a new algorithm of statistically uncorrelated optimal discriminant vectors and a new algorithm of orthogonal optimal discriminant vectors for feature extraction were proposed. The purpose of the maximum margin criterion is to maximize the inter-class scatter while simultaneously minimizing the intra-class scatter after the projection. Compared with original MMC method and principal component analysis (PCA) method, the proposed methods are better in terms of reducing or eliminating the statistically correlation between features and improving recognition rate. The experiment results on Olivetti Research Laboratory (ORL) face database shows that the new feature extraction method of statistically uncorrelated maximum margin criterion (SUMMC) are better in terms of recognition rate and stability. Besides, the relations between maximum margin criterion and Fisher criterion for feature extraction were revealed.
Statistical physics and economic fluctuations: do outliers exist?
NASA Astrophysics Data System (ADS)
Stanley, H. Eugene
2003-02-01
We present an overview of recent research applying ideas of statistical physics to try to better understand puzzles regarding economic fluctuations. One of these puzzles is how to describe outliers, phenomena that lie outside of patterns of statistical regularity. We review evidence consistent with the possibility that such outliers may not exist. This possibility is supported by recent analysis by Plerou et al. of a database containing the bid, ask, and sale price of each trade of every stock. Further, the data support the picture of economic fluctuations, due to Plerou et al., in which a financial market alternates between being in an “equilibrium phase” where market behavior is split roughly equally between buying and selling, and an “out-of-equilibrium phase” where the market is mainly either buying or selling.
NIH funding in Radiation Oncology – A snapshot
Steinberg, Michael; McBride, William H.; Vlashi, Erina; Pajonk, Frank
2013-01-01
Currently, pay lines for NIH grants are at a historical low. In this climate of fierce competition knowledge about the funding situation in a small field like Radiation Oncology becomes very important for career planning and recruitment of faculty. Unfortunately, this data cannot be easily extracted from the NIH s database because it does not discriminate between Radiology and Radiation Oncology Departments. At the start of fiscal year 2013, we extracted records for 952 individual grants, which were active at the time of analysis from the NIH database. Proposals originating from Radiation Oncology Departments were identified manually. Descriptive statistics were generated using the JMP statistical software package. Our analysis identified 197 grants in Radiation Oncology. These proposals came from 134 individual investigators in 43 academic institutions. The majority of the grants (118) were awarded to PIs at the Full Professor level and 122 PIs held a PhD degree. In 79% of the grants the research topic fell into the field of Biology, in 13 % into the field of Medical Physics. Only 7.6% of the proposals were clinical investigations. Our data suggests that the field of Radiation Oncology is underfunded by the NIH, and that the current level of support does not match the relevance of Radiation Oncology for cancer patients or the potential of its academic work force. PMID:23523324
Ahlberg, Ernst; Amberg, Alexander; Beilke, Lisa D; Bower, David; Cross, Kevin P; Custer, Laura; Ford, Kevin A; Van Gompel, Jacky; Harvey, James; Honma, Masamitsu; Jolly, Robert; Joossens, Elisabeth; Kemper, Raymond A; Kenyon, Michelle; Kruhlak, Naomi; Kuhnke, Lara; Leavitt, Penny; Naven, Russell; Neilan, Claire; Quigley, Donald P; Shuey, Dana; Spirkl, Hans-Peter; Stavitskaya, Lidiya; Teasdale, Andrew; White, Angela; Wichard, Joerg; Zwickl, Craig; Myatt, Glenn J
2016-06-01
Statistical-based and expert rule-based models built using public domain mutagenicity knowledge and data are routinely used for computational (Q)SAR assessments of pharmaceutical impurities in line with the approach recommended in the ICH M7 guideline. Knowledge from proprietary corporate mutagenicity databases could be used to increase the predictive performance for selected chemical classes as well as expand the applicability domain of these (Q)SAR models. This paper outlines a mechanism for sharing knowledge without the release of proprietary data. Primary aromatic amine mutagenicity was selected as a case study because this chemical class is often encountered in pharmaceutical impurity analysis and mutagenicity of aromatic amines is currently difficult to predict. As part of this analysis, a series of aromatic amine substructures were defined and the number of mutagenic and non-mutagenic examples for each chemical substructure calculated across a series of public and proprietary mutagenicity databases. This information was pooled across all sources to identify structural classes that activate or deactivate aromatic amine mutagenicity. This structure activity knowledge, in combination with newly released primary aromatic amine data, was incorporated into Leadscope's expert rule-based and statistical-based (Q)SAR models where increased predictive performance was demonstrated. Copyright © 2016 Elsevier Inc. All rights reserved.
Optoelectronics-related competence building in Japanese and Western firms
NASA Astrophysics Data System (ADS)
Miyazaki, Kumiko
1992-05-01
In this paper, an analysis is made of how different firms in Japan and the West have developed competence related to optoelectronics on the basis of their previous experience and corporate strategies. The sample consists of a set of seven Japanese and four Western firms in the industrial, consumer electronics and materials sectors. Optoelectronics is divided into subfields including optical communications systems, optical fibers, optoelectronic key components, liquid crystal displays, optical disks, and others. The relative strengths and weaknesses of companies in the various subfields are determined using the INSPEC database, from 1976 to 1989. Parallel data are analyzed using OTAF U.S. patent statistics and the two sets of data are compared. The statistical analysis from the database is summarized for firms in each subfield in the form of an intra-firm technology index (IFTI), a new technique introduced to assess the revealed technology advantage of firms. The quantitative evaluation is complemented by results from intensive interviews with the management and scientists of the firms involved. The findings show that there is a marked variation in the way firms' technological trajectories have evolved giving rise to strength in some and weakness in other subfields for the different companies, which are related to their accumulated core competencies, previous core business activities, organizational, marketing, and competitive factors.
Kostopoulos, Spiros A; Asvestas, Pantelis A; Kalatzis, Ioannis K; Sakellaropoulos, George C; Sakkis, Theofilos H; Cavouras, Dionisis A; Glotsos, Dimitris T
2017-09-01
The aim of this study was to propose features that evaluate pictorial differences between melanocytic nevus (mole) and melanoma lesions by computer-based analysis of plain photography images and to design a cross-platform, tunable, decision support system to discriminate with high accuracy moles from melanomas in different publicly available image databases. Digital plain photography images of verified mole and melanoma lesions were downloaded from (i) Edinburgh University Hospital, UK, (Dermofit, 330moles/70 melanomas, under signed agreement), from 5 different centers (Multicenter, 63moles/25 melanomas, publicly available), and from the Groningen University, Netherlands (Groningen, 100moles/70 melanomas, publicly available). Images were processed for outlining the lesion-border and isolating the lesion from the surrounding background. Fourteen features were generated from each lesion evaluating texture (4), structure (5), shape (4) and color (1). Features were subjected to statistical analysis for determining differences in pictorial properties between moles and melanomas. The Probabilistic Neural Network (PNN) classifier, the exhaustive search features selection, the leave-one-out (LOO), and the external cross-validation (ECV) methods were used to design the PR-system for discriminating between moles and melanomas. Statistical analysis revealed that melanomas as compared to moles were of lower intensity, of less homogenous surface, had more dark pixels with intensities spanning larger spectra of gray-values, contained more objects of different sizes and gray-levels, had more asymmetrical shapes and irregular outlines, had abrupt intensity transitions from lesion to background tissue, and had more distinct colors. The PR-system designed by the Dermofit images scored on the Dermofit images, using the ECV, 94.1%, 82.9%, 96.5% for overall accuracy, sensitivity, specificity, on the Multicenter Images 92.0%, 88%, 93.7% and on the Groningen Images 76.2%, 73.9%, 77.8% respectively. The PR-system as designed by the Dermofit image database could be fine-tuned to classify with good accuracy plain photography moles/melanomas images of other databases employing different image capturing equipment and protocols. Copyright © 2017 Elsevier B.V. All rights reserved.
Knowledge Discovery and Data Mining in Iran's Climatic Researches
NASA Astrophysics Data System (ADS)
Karimi, Mostafa
2013-04-01
Advances in measurement technology and data collection is the database gets larger. Large databases require powerful tools for analysis data. Iterative process of acquiring knowledge from information obtained from data processing is done in various forms in all scientific fields. However, when the data volume large, and many of the problems the Traditional methods cannot respond. in the recent years, use of databases in various scientific fields, especially atmospheric databases in climatology expanded. in addition, increases in the amount of data generated by the climate models is a challenge for analysis of it for extraction of hidden pattern and knowledge. The approach to this problem has been made in recent years uses the process of knowledge discovery and data mining techniques with the use of the concepts of machine learning, artificial intelligence and expert (professional) systems is overall performance. Data manning is analytically process for manning in massive volume data. The ultimate goal of data mining is access to information and finally knowledge. climatology is a part of science that uses variety and massive volume data. Goal of the climate data manning is Achieve to information from variety and massive atmospheric and non-atmospheric data. in fact, Knowledge Discovery performs these activities in a logical and predetermined and almost automatic process. The goal of this research is study of uses knowledge Discovery and data mining technique in Iranian climate research. For Achieve This goal, study content (descriptive) analysis and classify base method and issue. The result shown that in climatic research of Iran most clustering, k-means and wards applied and in terms of issues precipitation and atmospheric circulation patterns most introduced. Although several studies in geography and climate issues with statistical techniques such as clustering and pattern extraction is done, Due to the nature of statistics and data mining, but cannot say for internal climate studies in data mining and knowledge discovery techniques are used. However, it is necessary to use the KDD Approach and DM techniques in the climatic studies, specific interpreter of climate modeling result.
Damiani, Lucas Petri; Berwanger, Otavio; Paisani, Denise; Laranjeira, Ligia Nasi; Suzumura, Erica Aranha; Amato, Marcelo Britto Passos; Carvalho, Carlos Roberto Ribeiro; Cavalcanti, Alexandre Biasi
2017-01-01
Background The Alveolar Recruitment for Acute Respiratory Distress Syndrome Trial (ART) is an international multicenter randomized pragmatic controlled trial with allocation concealment involving 120 intensive care units in Brazil, Argentina, Colombia, Italy, Poland, Portugal, Malaysia, Spain, and Uruguay. The primary objective of ART is to determine whether maximum stepwise alveolar recruitment associated with PEEP titration, adjusted according to the static compliance of the respiratory system (ART strategy), is able to increase 28-day survival in patients with acute respiratory distress syndrome compared to conventional treatment (ARDSNet strategy). Objective To describe the data management process and statistical analysis plan. Methods The statistical analysis plan was designed by the trial executive committee and reviewed and approved by the trial steering committee. We provide an overview of the trial design with a special focus on describing the primary (28-day survival) and secondary outcomes. We describe our data management process, data monitoring committee, interim analyses, and sample size calculation. We describe our planned statistical analyses for primary and secondary outcomes as well as pre-specified subgroup analyses. We also provide details for presenting results, including mock tables for baseline characteristics, adherence to the protocol and effect on clinical outcomes. Conclusion According to best trial practice, we report our statistical analysis plan and data management plan prior to locking the database and beginning analyses. We anticipate that this document will prevent analysis bias and enhance the utility of the reported results. Trial registration ClinicalTrials.gov number, NCT01374022. PMID:28977255
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
DOE Office of Scientific and Technical Information (OSTI.GOV)
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
An affinity-structure database of helix-turn-helix: DNA complexes with a universal coordinate system
AlQuraishi, Mohammed; Tang, Shengdong; Xia, Xide
2015-11-19
Molecular interactions between proteins and DNA molecules underlie many cellular processes, including transcriptional regulation, chromosome replication, and nucleosome positioning. Computational analyses of protein-DNA interactions rely on experimental data characterizing known protein-DNA interactions structurally and biochemically. While many databases exist that contain either structural or biochemical data, few integrate these two data sources in a unified fashion. Such integration is becoming increasingly critical with the rapid growth of structural and biochemical data, and the emergence of algorithms that rely on the synthesis of multiple data types to derive computational models of molecular interactions. We have developed an integrated affinity-structure database inmore » which the experimental and quantitative DNA binding affinities of helix-turn-helix proteins are mapped onto the crystal structures of the corresponding protein-DNA complexes. This database provides access to: (i) protein-DNA structures, (ii) quantitative summaries of protein-DNA binding affinities using position weight matrices, and (iii) raw experimental data of protein-DNA binding instances. Critically, this database establishes a correspondence between experimental structural data and quantitative binding affinity data at the single basepair level. Furthermore, we present a novel alignment algorithm that structurally aligns the protein-DNA complexes in the database and creates a unified residue-level coordinate system for comparing the physico-chemical environments at the interface between complexes. Using this unified coordinate system, we compute the statistics of atomic interactions at the protein-DNA interface of helix-turn-helix proteins. We provide an interactive website for visualization, querying, and analyzing this database, and a downloadable version to facilitate programmatic analysis. Lastly, this database will facilitate the analysis of protein-DNA interactions and the development of programmatic computational methods that capitalize on integration of structural and biochemical datasets. The database can be accessed at http://ProteinDNA.hms.harvard.edu.« less
Historical analysis of US pipeline accidents triggered by natural hazards
NASA Astrophysics Data System (ADS)
Girgin, Serkan; Krausmann, Elisabeth
2015-04-01
Natural hazards, such as earthquakes, floods, landslides, or lightning, can initiate accidents in oil and gas pipelines with potentially major consequences on the population or the environment due to toxic releases, fires and explosions. Accidents of this type are also referred to as Natech events. Many major accidents highlight the risk associated with natural-hazard impact on pipelines transporting dangerous substances. For instance, in the USA in 1994, flooding of the San Jacinto River caused the rupture of 8 and the undermining of 29 pipelines by the floodwaters. About 5.5 million litres of petroleum and related products were spilled into the river and ignited. As a results, 547 people were injured and significant environmental damage occurred. Post-incident analysis is a valuable tool for better understanding the causes, dynamics and impacts of pipeline Natech accidents in support of future accident prevention and mitigation. Therefore, data on onshore hazardous-liquid pipeline accidents collected by the US Pipeline and Hazardous Materials Safety Administration (PHMSA) was analysed. For this purpose, a database-driven incident data analysis system was developed to aid the rapid review and categorization of PHMSA incident reports. Using an automated data-mining process followed by a peer review of the incident records and supported by natural hazard databases and external information sources, the pipeline Natechs were identified. As a by-product of the data-collection process, the database now includes over 800,000 incidents from all causes in industrial and transportation activities, which are automatically classified in the same way as the PHMSA record. This presentation describes the data collection and reviewing steps conducted during the study, provides information on the developed database and data analysis tools, and reports the findings of a statistical analysis of the identified hazardous liquid pipeline incidents in terms of accident dynamics and consequences.
Su, Chang; Peng, Cuiying; Agbodza, Ena; Bai, Harrison X; Huang, Yuqian; Karakousis, Giorgos; Zhang, Paul J; Zhang, Zishu
2018-03-01
The utilization and impact of the studies published using the National Cancer Database (NCDB) is currently unclear. In this study, we aim to characterize the published studies, and identify relatively unexplored areas for future investigations. A literature search was performed using PubMed in January 2017 to identify all papers published using NCDB data. Characteristics of the publications were extracted. Citation frequencies were obtained through the Web of Science. Three hundred 2 articles written by 230 first authors met the inclusion criteria. The number of publications grew exponentially since 2013, with 108 articles published in 2016. Articles were published in 86 journals. The majority of the published papers focused on digestive system cancer, while bone and joints, eye and orbit, myeloma, mesothelioma, and Kaposi Sarcoma were never studied. Thirteen institutions in the United States were associated with more than 5 publications. The papers have been cited for a total of 9858 times since the publication of the first paper in 1992. Frequently appearing keywords congregated into 3 clusters: "demographics," "treatments and survival," and "statistical analysis method." Even though the main focuses of the articles captured a extremely wide range, they can be classified into 2 main categories: survival analysis and characterization. Other focuses include database(s) analysis and/or comparison, and hospital reporting. The surging interest in the use of NCDB is accompanied by unequal utilization of resources by individuals and institutions. Certain areas were relatively understudied and should be further explored.
NASA Technical Reports Server (NTRS)
Fomenkova, M. N.
1997-01-01
The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
Data harmonization and federated analysis of population-based studies: the BioSHaRE project
2013-01-01
Abstracts Background Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. Methods Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. Results Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. Conclusion New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. PMID:24257327
Sparse approximation of currents for statistics on curves and surfaces.
Durrleman, Stanley; Pennec, Xavier; Trouvé, Alain; Ayache, Nicholas
2008-01-01
Computing, processing, visualizing statistics on shapes like curves or surfaces is a real challenge with many applications ranging from medical image analysis to computational geometry. Modelling such geometrical primitives with currents avoids feature-based approach as well as point-correspondence method. This framework has been proved to be powerful to register brain surfaces or to measure geometrical invariants. However, if the state-of-the-art methods perform efficiently pairwise registrations, new numerical schemes are required to process groupwise statistics due to an increasing complexity when the size of the database is growing. Statistics such as mean and principal modes of a set of shapes often have a heavy and highly redundant representation. We propose therefore to find an adapted basis on which mean and principal modes have a sparse decomposition. Besides the computational improvement, this sparse representation offers a way to visualize and interpret statistics on currents. Experiments show the relevance of the approach on 34 sets of 70 sulcal lines and on 50 sets of 10 meshes of deep brain structures.
KMeyeDB: a graphical database of mutations in genes that cause eye diseases.
Kawamura, Takashi; Ohtsubo, Masafumi; Mitsuyama, Susumu; Ohno-Nakamura, Saho; Shimizu, Nobuyoshi; Minoshima, Shinsei
2010-06-01
KMeyeDB (http://mutview.dmb.med.keio.ac.jp/) is a database of human gene mutations that cause eye diseases. We have substantially enriched the amount of data in the database, which now contains information about the mutations of 167 human genes causing eye-related diseases including retinitis pigmentosa, cone-rod dystrophy, night blindness, Oguchi disease, Stargardt disease, macular degeneration, Leber congenital amaurosis, corneal dystrophy, cataract, glaucoma, retinoblastoma, Bardet-Biedl syndrome, and Usher syndrome. KMeyeDB is operated using the database software MutationView, which deals with various characters of mutations, gene structure, protein functional domains, and polymerase chain reaction (PCR) primers, as well as clinical data for each case. Users can access the database using an ordinary Internet browser with smooth user-interface, without user registration. The results are displayed on the graphical windows together with statistical calculations. All mutations and associated data have been collected from published articles. Careful data analysis with KMeyeDB revealed many interesting features regarding the mutations in 167 genes that cause 326 different types of eye diseases. Some genes are involved in multiple types of eye diseases, whereas several eye diseases are caused by different mutations in one gene.
Winslow, Ksenia; Ho, Andrew; Fortney, Kristen; Morgen, Eric
2017-01-01
Biomarkers of all-cause mortality are of tremendous clinical and research interest. Because of the long potential duration of prospective human lifespan studies, such biomarkers can play a key role in quantifying human aging and quickly evaluating any potential therapies. Decades of research into mortality biomarkers have resulted in numerous associations documented across hundreds of publications. Here, we present MortalityPredictors.org, a manually-curated, publicly accessible database, housing published, statistically-significant relationships between biomarkers and all-cause mortality in population-based or generally healthy samples. To gather the information for this database, we searched PubMed for appropriate research papers and then manually curated relevant data from each paper. We manually curated 1,576 biomarker associations, involving 471 distinct biomarkers. Biomarkers ranged in type from hematologic (red blood cell distribution width) to molecular (DNA methylation changes) to physical (grip strength). Via the web interface, the resulting data can be easily browsed, searched, and downloaded for further analysis. MortalityPredictors.org provides comprehensive results on published biomarkers of human all-cause mortality that can be used to compare biomarkers, facilitate meta-analysis, assist with the experimental design of aging studies, and serve as a central resource for analysis. We hope that it will facilitate future research into human mortality and aging. PMID:28858850
Hazards of Extreme Weather: Flood Fatalities in Texas
NASA Astrophysics Data System (ADS)
Sharif, H. O.; Jackson, T.; Bin-Shafique, S.
2009-12-01
The Federal Emergency Management Agency (FEMA) considers flooding “America’s Number One Natural Hazard”. Despite flood management efforts in many communities, U.S. flood damages remain high, due, in large part, to increasing population and property development in flood-prone areas. Floods are the leading cause of fatalities related to natural disasters in Texas. Texas leads the nation in flash flood fatalities. There are three times more fatalities in Texas (840) than the following state Pennsylvania (265). This study examined flood fatalities that occurred in Texas between 1960 and 2008. Flood fatality statistics were extracted from three sources: flood fatality databases from the National Climatic Data Center, the Spatial Hazard Event and Loss Database for the United States, and the Texas Department of State Health Services. The data collected for flood fatalities include the date, time, gender, age, location, and weather conditions. Inconsistencies among the three databases were identified and discussed. Analysis reveals that most fatalities result from driving into flood water (about 65%). Spatial analysis indicates that more fatalities occurred in counties containing major urban centers. Hydrologic analysis of a flood event that resulted in five fatalities was performed. A hydrologic model was able to simulate the water level at a location where a vehicle was swept away by flood water resulting in the death of the driver.
Peto, Maximus V; De la Guardia, Carlos; Winslow, Ksenia; Ho, Andrew; Fortney, Kristen; Morgen, Eric
2017-08-31
Biomarkers of all-cause mortality are of tremendous clinical and research interest. Because of the long potential duration of prospective human lifespan studies, such biomarkers can play a key role in quantifying human aging and quickly evaluating any potential therapies. Decades of research into mortality biomarkers have resulted in numerous associations documented across hundreds of publications. Here, we present MortalityPredictors.org , a manually-curated, publicly accessible database, housing published, statistically-significant relationships between biomarkers and all-cause mortality in population-based or generally healthy samples. To gather the information for this database, we searched PubMed for appropriate research papers and then manually curated relevant data from each paper. We manually curated 1,576 biomarker associations, involving 471 distinct biomarkers. Biomarkers ranged in type from hematologic (red blood cell distribution width) to molecular (DNA methylation changes) to physical (grip strength). Via the web interface, the resulting data can be easily browsed, searched, and downloaded for further analysis. MortalityPredictors.org provides comprehensive results on published biomarkers of human all-cause mortality that can be used to compare biomarkers, facilitate meta-analysis, assist with the experimental design of aging studies, and serve as a central resource for analysis. We hope that it will facilitate future research into human mortality and aging.
Ho, Lap; Cheng, Haoxiang; Wang, Jun; Simon, James E; Wu, Qingli; Zhao, Danyue; Carry, Eileen; Ferruzzi, Mario G; Faith, Jeremiah; Valcarcel, Breanna; Hao, Ke; Pasinetti, Giulio M
2018-03-05
The development of a given botanical preparation for eventual clinical application requires extensive, detailed characterizations of the chemical composition, as well as the biological availability, biological activity, and safety profiles of the botanical. These issues are typically addressed using diverse experimental protocols and model systems. Based on this consideration, in this study we established a comprehensive database and analysis framework for the collection, collation, and integrative analysis of diverse, multiscale data sets. Using this framework, we conducted an integrative analysis of heterogeneous data from in vivo and in vitro investigation of a complex bioactive dietary polyphenol-rich preparation (BDPP) and built an integrated network linking data sets generated from this multitude of diverse experimental paradigms. We established a comprehensive database and analysis framework as well as a systematic and logical means to catalogue and collate the diverse array of information gathered, which is securely stored and added to in a standardized manner to enable fast query. We demonstrated the utility of the database in (1) a statistical ranking scheme to prioritize response to treatments and (2) in depth reconstruction of functionality studies. By examination of these data sets, the system allows analytical querying of heterogeneous data and the access of information related to interactions, mechanism of actions, functions, etc., which ultimately provide a global overview of complex biological responses. Collectively, we present an integrative analysis framework that leads to novel insights on the biological activities of a complex botanical such as BDPP that is based on data-driven characterizations of interactions between BDPP-derived phenolic metabolites and their mechanisms of action, as well as synergism and/or potential cancellation of biological functions. Out integrative analytical approach provides novel means for a systematic integrative analysis of heterogeneous data types in the development of complex botanicals such as polyphenols for eventual clinical and translational applications.
Qu, Shu-Gen; Gao, Jin; Tang, Bo; Yu, Bo; Shen, Yue-Ping; Tu, Yu
2018-05-01
Low-dose ionizing radiation (LDIR) may increase the mortality of solid cancers in nuclear industry workers, but only few individual cohort studies exist, and the available reports have low statistical power. The aim of the present study was to focus on solid cancer mortality risk from LDIR in the nuclear industry using standard mortality ratios (SMRs) and 95% confidence intervals. A systematic literature search through the PubMed and Embase databases identified 27 studies relevant to this meta-analysis. There was statistical significance for total, solid and lung cancers, with meta-SMR values of 0.88, 0.80, and 0.89, respectively. There was evidence of stochastic effects by IR, but more definitive conclusions require additional analyses using standardized protocols to determine whether LDIR increases the risk of solid cancer-related mortality.
Sun, Jian; Zhang, Lei; Cui, Jing; Li, Shanshan; Lu, Hongting; Zhang, Yong; Li, Haiming; Sun, Jianping; Baloch, Zulqarnain
2018-05-10
Previous studies have shown beneficial effects of dietary approaches for iron deficiency anemia (IDA) control. This study was design to investigate the effect of dietary intervention treatment on children with iron deficiency anemia. We performed a systematic review of published dietary interventions effect on IDA treatment through meta-analysis. CBM, CNKI, Wanfang database, EMBASE, VIP, PubMed and Web of science database were searched to identify studies published between January, 1980 and December, 2016. Statistical analysis was performed by Revmen5.2 software. Initially we retrieved for 373 studies, and then 6 studies with a total of 676 individuals were included in the analysis according to the inclusion and exclusion criteria for meta-analysis. The overall pooled estimate of odds ratio [(OR), 95% confidence intervals (95% CI)] in the dietary intervention on children with iron deficiency anemia was 6.54 (95% CI: 3.48-12.31, Z = 5.82, p<0.001) and funnel plot is symmetric. Our meta-analysis suggested that dietary interventions are effective in improving the iron deficiency in children with iron deficiency anemia (IDA) and should be considered in the overall strategy of IDA management.
Martinez-Murcia, Francisco Jesús; Lai, Meng-Chuan; Górriz, Juan Manuel; Ramírez, Javier; Young, Adam M H; Deoni, Sean C L; Ecker, Christine; Lombardo, Michael V; Baron-Cohen, Simon; Murphy, Declan G M; Bullmore, Edward T; Suckling, John
2017-03-01
Neuroimaging studies have reported structural and physiological differences that could help understand the causes and development of Autism Spectrum Disorder (ASD). Many of them rely on multisite designs, with the recruitment of larger samples increasing statistical power. However, recent large-scale studies have put some findings into question, considering the results to be strongly dependent on the database used, and demonstrating the substantial heterogeneity within this clinically defined category. One major source of variance may be the acquisition of the data in multiple centres. In this work we analysed the differences found in the multisite, multi-modal neuroimaging database from the UK Medical Research Council Autism Imaging Multicentre Study (MRC AIMS) in terms of both diagnosis and acquisition sites. Since the dissimilarities between sites were higher than between diagnostic groups, we developed a technique called Significance Weighted Principal Component Analysis (SWPCA) to reduce the undesired intensity variance due to acquisition site and to increase the statistical power in detecting group differences. After eliminating site-related variance, statistically significant group differences were found, including Broca's area and the temporo-parietal junction. However, discriminative power was not sufficient to classify diagnostic groups, yielding accuracies results close to random. Our work supports recent claims that ASD is a highly heterogeneous condition that is difficult to globally characterize by neuroimaging, and therefore different (and more homogenous) subgroups should be defined to obtain a deeper understanding of ASD. Hum Brain Mapp 38:1208-1223, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Kadhem, Hasan; Amagasa, Toshiyuki; Kitagawa, Hiroyuki
Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
ERIC Educational Resources Information Center
Gruner, Richard; Heron, Carol E.
1984-01-01
Examines usefulness of DIALOG as legal research tool through use of DIALOG's DIALINDEX database to identify those databases among almost 200 available that contain large numbers of records related to federal securities regulation. Eight databases selected for further study are detailed. Twenty-six footnotes, database statistics, and samples are…
Clauson, Kevin A; Polen, Hyla H; Peak, Amy S; Marsh, Wallace A; DiScala, Sandra L
2008-11-01
Clinical decision support tools (CDSTs) on personal digital assistants (PDAs) and online databases assist healthcare practitioners who make decisions about dietary supplements. To assess and compare the content of PDA dietary supplement databases and their online counterparts used as CDSTs. A total of 102 question-and-answer pairs were developed within 10 weighted categories of the most clinically relevant aspects of dietary supplement therapy. PDA versions of AltMedDex, Lexi-Natural, Natural Medicines Comprehensive Database, and Natural Standard and their online counterparts were assessed by scope (percent of correct answers present), completeness (3-point scale), ease of use, and a composite score integrating all 3 criteria. Descriptive statistics and inferential statistics, including a chi(2) test, Scheffé's multiple comparison test, McNemar's test, and the Wilcoxon signed rank test were used to analyze data. The scope scores for PDA databases were: Natural Medicines Comprehensive Database 84.3%, Natural Standard 58.8%, Lexi-Natural 50.0%, and AltMedDex 36.3%, with Natural Medicines Comprehensive Database statistically superior (p < 0.01). Completeness scores were: Natural Medicines Comprehensive Database 78.4%, Natural Standard 51.0%, Lexi-Natural 43.5%, and AltMedDex 29.7%. Lexi-Natural was superior in ease of use (p < 0.01). Composite scores for PDA databases were: Natural Medicines Comprehensive Database 79.3, Natural Standard 53.0, Lexi-Natural 48.0, and AltMedDex 32.5, with Natural Medicines Comprehensive Database superior (p < 0.01). There was no difference between the scope for PDA and online database pairs with Lexi-Natural (50.0% and 53.9%, respectively) or Natural Medicines Comprehensive Database (84.3% and 84.3%, respectively) (p > 0.05), whereas differences existed for AltMedDex (36.3% vs 74.5%, respectively) and Natural Standard (58.8% vs 80.4%, respectively) (p < 0.01). For composite scores, AltMedDex and Natural Standard online were better than their PDA counterparts (p < 0.01). Natural Medicines Comprehensive Database achieved significantly higher scope, completeness, and composite scores compared with other dietary supplement PDA CDSTs in this study. There was no difference between the PDA and online databases for Lexi-Natural and Natural Medicines Comprehensive Database, whereas online versions of AltMedDex and Natural Standard were significantly better than their PDA counterparts.
Constantinescu, Alexandra C; Wolters, Maria; Moore, Adam; MacPherson, Sarah E
2017-06-01
The International Affective Picture System (IAPS; Lang, Bradley, & Cuthbert, 2008) is a stimulus database that is frequently used to investigate various aspects of emotional processing. Despite its extensive use, selecting IAPS stimuli for a research project is not usually done according to an established strategy, but rather is tailored to individual studies. Here we propose a standard, replicable method for stimulus selection based on cluster analysis, which re-creates the group structure that is most likely to have produced the valence arousal, and dominance norms associated with the IAPS images. Our method includes screening the database for outliers, identifying a suitable clustering solution, and then extracting the desired number of stimuli on the basis of their level of certainty of belonging to the cluster they were assigned to. Our method preserves statistical power in studies by maximizing the likelihood that the stimuli belong to the cluster structure fitted to them, and by filtering stimuli according to their certainty of cluster membership. In addition, although our cluster-based method is illustrated using the IAPS, it can be extended to other stimulus databases.
PlantNATsDB: a comprehensive database of plant natural antisense transcripts.
Chen, Dijun; Yuan, Chunhui; Zhang, Jian; Zhang, Zhao; Bai, Lin; Meng, Yijun; Chen, Ling-Ling; Chen, Ming
2012-01-01
Natural antisense transcripts (NATs), as one type of regulatory RNAs, occur prevalently in plant genomes and play significant roles in physiological and pathological processes. Although their important biological functions have been reported widely, a comprehensive database is lacking up to now. Consequently, we constructed a plant NAT database (PlantNATsDB) involving approximately 2 million NAT pairs in 69 plant species. GO annotation and high-throughput small RNA sequencing data currently available were integrated to investigate the biological function of NATs. PlantNATsDB provides various user-friendly web interfaces to facilitate the presentation of NATs and an integrated, graphical network browser to display the complex networks formed by different NATs. Moreover, a 'Gene Set Analysis' module based on GO annotation was designed to dig out the statistical significantly overrepresented GO categories from the specific NAT network. PlantNATsDB is currently the most comprehensive resource of NATs in the plant kingdom, which can serve as a reference database to investigate the regulatory function of NATs. The PlantNATsDB is freely available at http://bis.zju.edu.cn/pnatdb/.
SDAR 1.0 a New Quantitative Toolkit for Analyze Stratigraphic Data
NASA Astrophysics Data System (ADS)
Ortiz, John; Moreno, Carlos; Cardenas, Andres; Jaramillo, Carlos
2015-04-01
Since the foundation of stratigraphy geoscientists have recognized that data obtained from stratigraphic columns (SC), two dimensional schemes recording descriptions of both geological and paleontological features (e.g., thickness of rock packages, grain size, fossil and lithological components, and sedimentary structures), are key elements for establishing reliable hypotheses about the distribution in space and time of rock sequences, and ancient sedimentary environmental and paleobiological dynamics. Despite the tremendous advances on the way geoscientists store, plot, and quantitatively analyze sedimentological and paleontological data (e.g., Macrostrat [http://www.macrostrat.org/], Paleobiology Database [http://www.paleodb.org/], respectively), there is still a lack of computational methodologies designed to quantitatively examine data from a highly detailed SCs. Moreover, frequently the stratigraphic information is plotted "manually" using vector graphics editors (e.g., Corel Draw, Illustrator), however, this information although store on a digital format, cannot be used readily for any quantitative analysis. Therefore, any attempt to examine the stratigraphic data in an analytical fashion necessarily takes further steps. Given these issues, we have developed the sofware 'Stratigraphic Data Analysis in R' (SDAR), which stores in a database all sedimentological, stratigraphic, and paleontological information collected from a SC, allowing users to generate high-quality graphic plots (including one or multiple features stored in the database). SDAR also encompasses quantitative analyses helping users to quantify stratigraphic information (e.g. grain size, sorting and rounding, proportion of sand/shale). Finally, given that the SDAR analysis module, has been written in the open-source high-level computer language "R graphics/statistics language" [R Development Core Team, 2014], it is already loaded with many of the crucial features required to accomplish basic and complex tasks of statistical analysis (i.e., R language provide more than hundred spatial libraries that allow users to explore various Geostatistics and spatial analysis). Consequently, SDAR allows a deeper exploration of the stratigraphic data collected in the field, it will allow the geoscientific community in the near future to develop complex analyses related with the distribution in space and time of rock sequences, such as lithofacial correlations, by a multivariate comparison between empirical SCs with quantitative lithofacial models established from modern sedimentary environments.
Scalable privacy-preserving data sharing methodology for genome-wide association studies.
Yu, Fei; Fienberg, Stephen E; Slavković, Aleksandra B; Uhler, Caroline
2014-08-01
The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of "an attack" on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual's privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ(2)-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls' data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013). Copyright © 2014 Elsevier Inc. All rights reserved.
Fendrich, S; Pothmann, J
2010-10-01
The present database concerning the extent of neglect and abuse of children in Germany and accordingly the endangerment of their health and well-being has to be considered as deficient. Yet the degree of danger is indicated by sporadic empirical research as well as the police statistics on criminality, the health statistics and the official statistics on child and youth welfare. In contrast to the general public opinion the analyses of the available data have shown a stagnation in the infanticide rate at a historically low level and even a decline in infanticide in recent years. Meanwhile, according to statistics the sensitivity to the threats of neglect and abuse of children is increasing. Especially the clarification of the order for protection in the Child and Youth Welfare Act (§ 8a SGB VIII) contributed to the raised interest and attention from child and youth welfare services. However, these contexts are insufficiently researched, which makes an improvement of the database inevitable. Therefore, a continuous registration and documentation of cases of child neglect and abuse is necessary. A promising option to attain a significant database is a routine collection of data in the context of an official statistic by the child and youth welfare departments.
Shih, Wei-Liang; Kao, Chung-Feng; Chuang, Li-Chung; Kuo, Po-Hsiu
2012-01-01
MicroRNAs (miRNAs) are known to be important post-transcriptional regulators that are involved in the etiology of complex psychiatric traits. The present study aimed to incorporate miRNAs information into pathway analysis using a genome-wide association dataset to identify relevant biological pathways for bipolar disorder (BPD). We selected psychiatric- and neurological-associated miRNAs (N = 157) from PhenomiR database. The miRNA target genes (miTG) predictions were obtained from microRNA.org. Canonical pathways (N = 4,051) were downloaded from the Molecule Signature Database. We employed a novel weighting scheme for miTGs in pathway analysis using methods of gene set enrichment analysis and sum-statistic. Under four statistical scenarios, 38 significantly enriched pathways (P-value < 0.01 after multiple testing correction) were identified for the risk of developing BPD, including pathways of ion channels associated (e.g., gated channel activity, ion transmembrane transporter activity, and ion channel activity) and nervous related biological processes (e.g., nervous system development, cytoskeleton, and neuroactive ligand receptor interaction). Among them, 19 were identified only when the weighting scheme was applied. Many miRNA-targeted genes were functionally related to ion channels, collagen, and axonal growth and guidance that have been suggested to be associated with BPD previously. Some of these genes are linked to the regulation of miRNA machinery in the literature. Our findings provide support for the potential involvement of miRNAs in the psychopathology of BPD. Further investigations to elucidate the functions and mechanisms of identified candidate pathways are needed. PMID:23264780
Song, Guo-Min; Tian, Xu; Liu, Xiao-Ling; Chen, Hui; Zhou, Jian-Guo; Bian, Wei; Chen, Wei-Qing
2017-06-06
This systematic review and meta-analysis aims to systematically assess the effects of concurrent chemo-radiotherapy (CRT) compared with radiotherapy (RT) alone for elderly Chinese patients with non-metastatic esophageal squamous cancer. We searched PubMed, EMBASE, Cochrane Central Register of Controlled Trials (CENTRAL), China Biomedical Literature Database (CBM), and China National Knowledge Infrastructure (CNKI) databases. We retrieved randomized controlled trials on concurrent CRT with Gimeraciland Oteracil Porassium (S-1) compared with RT alone for aged Chinese patients with non-metastatic esophageal squamous cancer performed until August 2016. Eight eligible studies involving 536 patients were subjected to meta-analysis. As a response rate measure, a relative risk (RR) of 1.37 [95% confidence intervals (CIs): 1.24, 1.53; P = 0.00], which reached statistical significance, was estimated when concurrent CRT with S-1 was performed compared with RT alone. Sensitivity analysis on response rate confirmed the robustness of the pooled result. The RR values of 1.44 (95% CIs: 1.22, 1.70; P = 0.00) and 1.77 (95% CIs: 1.26, 2.48; P = 0.00) estimated for 1- and 2-year survival rate indices, respectively, were also statistically significant. The incidence of adverse events was similar in both groups. This review concluded that concurrent CRT with S-1 can improve the efficacy and prolong the survival period of elderly Chinese patients with non-metastatic esophageal squamous cancer and does not significantly increase the acute adverse effects of RT alone.
Dasari, Surendra; Chambers, Matthew C.; Martinez, Misti A.; Carpenter, Kristin L.; Ham, Amy-Joan L.; Vega-Montoto, Lorenzo J.; Tabb, David L.
2012-01-01
Spectral libraries have emerged as a viable alternative to protein sequence databases for peptide identification. These libraries contain previously detected peptide sequences and their corresponding tandem mass spectra (MS/MS). Search engines can then identify peptides by comparing experimental MS/MS scans to those in the library. Many of these algorithms employ the dot product score for measuring the quality of a spectrum-spectrum match (SSM). This scoring system does not offer a clear statistical interpretation and ignores fragment ion m/z discrepancies in the scoring. We developed a new spectral library search engine, Pepitome, which employs statistical systems for scoring SSMs. Pepitome outperformed the leading library search tool, SpectraST, when analyzing data sets acquired on three different mass spectrometry platforms. We characterized the reliability of spectral library searches by confirming shotgun proteomics identifications through RNA-Seq data. Applying spectral library and database searches on the same sample revealed their complementary nature. Pepitome identifications enabled the automation of quality analysis and quality control (QA/QC) for shotgun proteomics data acquisition pipelines. PMID:22217208
Wood, Eric; Duran, Adam; Kelly, Kenneth
2016-09-27
In collaboration with the U.S. Environmental Protection Agency and the U.S. Department of Energy, the National Renewable Energy Laboratory has conducted a national analysis of road grade characteristics experienced by U.S. medium- and heavy-duty trucks on controlled access highways. These characteristics have been developed using TomTom's commercially available street map and road grade database. Using the TomTom national road grade database, national statistics on road grade and hill distances were generated for the U.S. network of controlled access highways. These statistical distributions were then weighted using data provided by the U.S. Environmental Protection Agency for activity of medium- and heavy-dutymore » trucks on controlled access highways. Here, the national activity-weighted road grade and hill distance distributions were then used as targets for development of a handful of sample grade profiles potentially to be used in the U.S. Environmental Protection Agency's Greenhouse Gas Emissions Model certification tool as well as in dynamometer testing of medium- and heavy-duty vehicles and their powertrains.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wood, Eric; Duran, Adam; Kelly, Kenneth
In collaboration with the U.S. Environmental Protection Agency and the U.S. Department of Energy, the National Renewable Energy Laboratory has conducted a national analysis of road grade characteristics experienced by U.S. medium- and heavy-duty trucks on controlled access highways. These characteristics have been developed using TomTom's commercially available street map and road grade database. Using the TomTom national road grade database, national statistics on road grade and hill distances were generated for the U.S. network of controlled access highways. These statistical distributions were then weighted using data provided by the U.S. Environmental Protection Agency for activity of medium- and heavy-dutymore » trucks on controlled access highways. Here, the national activity-weighted road grade and hill distance distributions were then used as targets for development of a handful of sample grade profiles potentially to be used in the U.S. Environmental Protection Agency's Greenhouse Gas Emissions Model certification tool as well as in dynamometer testing of medium- and heavy-duty vehicles and their powertrains.« less
Genetics and attribution issues that confront the microbial forensics field.
Budowle, Bruce
2004-12-02
The commission of an act of bioterrorism or biocrime is a real concern for law enforcement and society. Efforts are underway to develop a strong microbial forensic program to assist in identifying perpetrators of acts of bioterrorism and biocrimes, as well as serve as a deterrent for those who might commit such illicit acts. Genetic analyses of microbial organisms will likely be a powerful tool for attribution of criminal acts. There are some similarities to forensic human DNA analysis practices, such as: molecular biology technology, use of population databases, qualitative conclusions of test results, and the application of QA/QC practices. Differences include: database size and composition, statistical interpretation methods, and confidence/uncertainty in the outcome of an interpretation.
Effect of microstructure on the elasto-viscoplastic deformation of dual phase titanium structures
NASA Astrophysics Data System (ADS)
Ozturk, Tugce; Rollett, Anthony D.
2018-02-01
The present study is devoted to the creation of a process-structure-property database for dual phase titanium alloys, through a synthetic microstructure generation method and a mesh-free fast Fourier transform based micromechanical model that operates on a discretized image of the microstructure. A sensitivity analysis is performed as a precursor to determine the statistically representative volume element size for creating 3D synthetic microstructures based on additively manufactured Ti-6Al-4V characteristics, which are further modified to expand the database for features of interest, e.g., lath thickness. Sets of titanium hardening parameters are extracted from literature, and The relative effect of the chosen microstructural features is quantified through comparisons of average and local field distributions.
GIS based solid waste management information system for Nagpur, India.
Vijay, Ritesh; Jain, Preeti; Sharma, N; Bhattacharyya, J K; Vaidya, A N; Sohony, R A
2013-01-01
Solid waste management is one of the major problems of today's world and needs to be addressed by proper utilization of technologies and design of effective, flexible and structured information system. Therefore, the objective of this paper was to design and develop a GIS based solid waste management information system as a decision making and planning tool for regularities and municipal authorities. The system integrates geo-spatial features of the city and database of existing solid waste management. GIS based information system facilitates modules of visualization, query interface, statistical analysis, report generation and database modification. It also provides modules like solid waste estimation, collection, transportation and disposal details. The information system is user-friendly, standalone and platform independent.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Portwood, J.T.
1995-12-31
This paper discusses a database of information collected and organized during the past eight years from 2,000 producing oil wells in the United States, all of which have been treated with special applications techniques developed to improve the effectiveness of MEOR technology. The database, believed to be the first of its kind, has been generated for the purpose of statistically evaluating the effectiveness and economics of the MEOR process in a wide variety of oil reservoir environments, and is a tool that can be used to improve the predictability of treatment response. The information in the database has also beenmore » evaluated to determine which, if any, reservoir characteristics are dominant factors in determining the applicability of MEOR.« less
Influence of Deployment on the Use of E-Cigarettes in the United States Army and Air Force
2018-03-22
the "Tobacco Use Among Service Members" survey sponsored by the Murtha Cancer Center and the Postgraduate Dental School of the Uniformed Services...the study period, and were willing to complete the survey . The survey was voluntary and anonymous; no personally identifiable information was...collected about participants. Statistical analysis of the data obtained from this survey database was performed using SAS. The independent variables were
Daniel Goodman’s empirical approach to Bayesian statistics
Gerrodette, Tim; Ward, Eric; Taylor, Rebecca L.; Schwarz, Lisa K.; Eguchi, Tomoharu; Wade, Paul; Himes Boor, Gina
2016-01-01
Bayesian statistics, in contrast to classical statistics, uses probability to represent uncertainty about the state of knowledge. Bayesian statistics has often been associated with the idea that knowledge is subjective and that a probability distribution represents a personal degree of belief. Dr. Daniel Goodman considered this viewpoint problematic for issues of public policy. He sought to ground his Bayesian approach in data, and advocated the construction of a prior as an empirical histogram of “similar” cases. In this way, the posterior distribution that results from a Bayesian analysis combined comparable previous data with case-specific current data, using Bayes’ formula. Goodman championed such a data-based approach, but he acknowledged that it was difficult in practice. If based on a true representation of our knowledge and uncertainty, Goodman argued that risk assessment and decision-making could be an exact science, despite the uncertainties. In his view, Bayesian statistics is a critical component of this science because a Bayesian analysis produces the probabilities of future outcomes. Indeed, Goodman maintained that the Bayesian machinery, following the rules of conditional probability, offered the best legitimate inference from available data. We give an example of an informative prior in a recent study of Steller sea lion spatial use patterns in Alaska.
Feature maps driven no-reference image quality prediction of authentically distorted images
NASA Astrophysics Data System (ADS)
Ghadiyaram, Deepti; Bovik, Alan C.
2015-03-01
Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.
The landslide database for Germany: Closing the gap at national level
NASA Astrophysics Data System (ADS)
Damm, Bodo; Klose, Martin
2015-11-01
The Federal Republic of Germany has long been among the few European countries that lack a national landslide database. Systematic collection and inventory of landslide data still has a long research history in Germany, but one focussed on the development of databases with local or regional coverage. This has changed in recent years with the launch of a database initiative aimed at closing the data gap existing at national level. The present paper reports on this project that is based on a landslide database which evolved over the last 15 years to a database covering large parts of Germany. A strategy of systematic retrieval, extraction, and fusion of landslide data is at the heart of the methodology, providing the basis for a database with a broad potential of application. The database offers a data pool of more than 4,200 landslide data sets with over 13,000 single data files and dates back to the 12th century. All types of landslides are covered by the database, which stores not only core attributes, but also various complementary data, including data on landslide causes, impacts, and mitigation. The current database migration to PostgreSQL/PostGIS is focused on unlocking the full scientific potential of the database, while enabling data sharing and knowledge transfer via a web GIS platform. In this paper, the goals and the research strategy of the database project are highlighted at first, with a summary of best practices in database development providing perspective. Next, the focus is on key aspects of the methodology, which is followed by the results of three case studies in the German Central Uplands. The case study results exemplify database application in the analysis of landslide frequency and causes, impact statistics, and landslide susceptibility modeling. Using the example of these case studies, strengths and weaknesses of the database are discussed in detail. The paper concludes with a summary of the database project with regard to previous achievements and the strategic roadmap.
Statistical analysis of microgravity experiment performance using the degrees of success scale
NASA Technical Reports Server (NTRS)
Upshaw, Bernadette; Liou, Ying-Hsin Andrew; Morilak, Daniel P.
1994-01-01
This paper describes an approach to identify factors that significantly influence microgravity experiment performance. Investigators developed the 'degrees of success' scale to provide a numerical representation of success. A degree of success was assigned to 293 microgravity experiments. Experiment information including the degree of success rankings and factors for analysis was compiled into a database. Through an analysis of variance, nine significant factors in microgravity experiment performance were identified. The frequencies of these factors are presented along with the average degree of success at each level. A preliminary discussion of the relationship between the significant factors and the degree of success is presented.
Automated spectral and timing analysis of AGNs
NASA Astrophysics Data System (ADS)
Munz, F.; Karas, V.; Guainazzi, M.
2006-12-01
% We have developed an autonomous script that helps the user to automate the XMM-Newton data analysis for the purposes of extensive statistical investigations. We test this approach by examining X-ray spectra of bright AGNs pre-selected from the public database. The event lists extracted in this process were studied further by constructing their energy-resolved Fourier power-spectrum density. This analysis combines energy distributions, light-curves, and their power-spectra and it proves useful to assess the variability patterns present is the data. As another example, an automated search was based on the XSPEC package to reveal the emission features in 2-8 keV range.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Webb-Robertson, Bobbie-Jo M.
Accurate identification of peptides is a current challenge in mass spectrometry (MS) based proteomics. The standard approach uses a search routine to compare tandem mass spectra to a database of peptides associated with the target organism. These database search routines yield multiple metrics associated with the quality of the mapping of the experimental spectrum to the theoretical spectrum of a peptide. The structure of these results make separating correct from false identifications difficult and has created a false identification problem. Statistical confidence scores are an approach to battle this false positive problem that has led to significant improvements in peptidemore » identification. We have shown that machine learning, specifically support vector machine (SVM), is an effective approach to separating true peptide identifications from false ones. The SVM-based peptide statistical scoring method transforms a peptide into a vector representation based on database search metrics to train and validate the SVM. In practice, following the database search routine, a peptides is denoted in its vector representation and the SVM generates a single statistical score that is then used to classify presence or absence in the sample« less
Enhancement and Validation of an Arab Surname Database
Schwartz, Kendra; Beebani, Ganj; Sedki, Mai; Tahhan, Mamon; Ruterbusch, Julie J.
2015-01-01
Objectives Arab Americans constitute a large, heterogeneous, and quickly growing subpopulation in the United States. Health statistics for this group are difficult to find because US governmental offices do not recognize Arab as separate from white. The development and validation of an Arab- and Chaldean-American name database will enhance research efforts in this population subgroup. Methods A previously validated name database was supplemented with newly identified names gathered primarily from vital statistic records and then evaluated using a multistep process. This process included 1) review by 4 Arabic- and Chaldean-speaking reviewers, 2) ethnicity assessment by social media searches, and 3) self-report of ancestry obtained from a telephone survey. Results Our Arab- and Chaldean-American name algorithm has a positive predictive value of 91% and a negative predictive value of 100%. Conclusions This enhanced name database and algorithm can be used to identify Arab Americans in health statistics data, such as cancer and hospital registries, where they are often coded as white, to determine the extent of health disparities in this population. PMID:24625771
Very large database of lipids: rationale and design.
Martin, Seth S; Blaha, Michael J; Toth, Peter P; Joshi, Parag H; McEvoy, John W; Ahmed, Haitham M; Elshazly, Mohamed B; Swiger, Kristopher J; Michos, Erin D; Kwiterovich, Peter O; Kulkarni, Krishnaji R; Chimera, Joseph; Cannon, Christopher P; Blumenthal, Roger S; Jones, Steven R
2013-11-01
Blood lipids have major cardiovascular and public health implications. Lipid-lowering drugs are prescribed based in part on categorization of patients into normal or abnormal lipid metabolism, yet relatively little emphasis has been placed on: (1) the accuracy of current lipid measures used in clinical practice, (2) the reliability of current categorizations of dyslipidemia states, and (3) the relationship of advanced lipid characterization to other cardiovascular disease biomarkers. To these ends, we developed the Very Large Database of Lipids (NCT01698489), an ongoing database protocol that harnesses deidentified data from the daily operations of a commercial lipid laboratory. The database includes individuals who were referred for clinical purposes for a Vertical Auto Profile (Atherotech Inc., Birmingham, AL), which directly measures cholesterol concentrations of low-density lipoprotein, very low-density lipoprotein, intermediate-density lipoprotein, high-density lipoprotein, their subclasses, and lipoprotein(a). Individual Very Large Database of Lipids studies, ranging from studies of measurement accuracy, to dyslipidemia categorization, to biomarker associations, to characterization of rare lipid disorders, are investigator-initiated and utilize peer-reviewed statistical analysis plans to address a priori hypotheses/aims. In the first database harvest (Very Large Database of Lipids 1.0) from 2009 to 2011, there were 1 340 614 adult and 10 294 pediatric patients; the adult sample had a median age of 59 years (interquartile range, 49-70 years) with even representation by sex. Lipid distributions closely matched those from the population-representative National Health and Nutrition Examination Survey. The second harvest of the database (Very Large Database of Lipids 2.0) is underway. Overall, the Very Large Database of Lipids database provides an opportunity for collaboration and new knowledge generation through careful examination of granular lipid data on a large scale. © 2013 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Heffernan, Julieanne; Biedermann, Eric; Mayes, Alexander; Livings, Richard; Jauriqui, Leanne; Goodlet, Brent; Aldrin, John C.; Mazdiyasni, Siamack
2018-04-01
Process Compensated Resonant Testing (PCRT) is a full-body nondestructive testing (NDT) method that measures the resonance frequencies of a part and correlates them to the part's material and/or damage state. PCRT testing is used in the automotive, aerospace, and power generation industries via automated PASS/FAIL inspections to distinguish parts with nominal process variation from those with the defect(s) of interest. Traditional PCRT tests are created through the statistical analysis of populations of "good" and "bad" parts. However, gathering a statistically significant number of parts can be costly and time-consuming, and the availability of defective parts may be limited. This work uses virtual databases of good and bad parts to create two targeted PCRT inspections for single crystal (SX) nickel-based superalloy turbine blades. Using finite element (FE) models, populations were modeled to include variations in geometric dimensions, material properties, crystallographic orientation, and creep damage. Model results were verified by comparing the frequency variation in the modeled populations with the measured frequency variations of several physical blade populations. Additionally, creep modeling results were verified through the experimental evaluation of coupon geometries. A virtual database of resonance spectra was created from the model data. The virtual database was used to create PCRT inspections to detect crystallographic defects and creep strain. Quantification of creep strain values using the PCRT inspection results was also demonstrated.
Huang, Yi-Wen; Roa, Juan C.; Goodfellow, Paul J.; Kizer, E. Lynette; Huang, Tim H. M.; Chen, Yidong
2013-01-01
Background DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Methodology/Principal Findings Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. Conclusions/Significance CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/. PMID:23630576
Gu, Fei; Doderer, Mark S; Huang, Yi-Wen; Roa, Juan C; Goodfellow, Paul J; Kizer, E Lynette; Huang, Tim H M; Chen, Yidong
2013-01-01
DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/.
FBIS: A regional DNA barcode archival & analysis system for Indian fishes.
Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar
2012-01-01
DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. The database is available for free at http://mail.nbfgr.res.in/fbis/
Impact of the mass media on calls to the CDC National AIDS Hotline.
Fan, D P
1996-06-01
This paper considers new computer methodologies for assessing the impact of different types of public health information. The example used public service announcements (PSAs) and mass media news to predict the volume of attempts to call the CDC National AIDS Hotline from December 1992 through to the end of 1993. The analysis relied solely on data from electronic databases. Newspaper stories and television news transcripts were obtained from the NEXIS electronic database and were scored by machine for AIDS coverage. The PSA database was generated by computer monitoring of advertising distributed by the Centers for Disease Control and Prevention (CDC) and by others. The volume of call attempts was collected automatically by the public branch exchange (PBX) of the Hotline telephone system. The call attempts, the PSAs and the news story data were related to each other using both a standard time series method and the statistical model of ideodynamics. The analysis indicated that the only significant explanatory variable for the call attempts was PSAs produced by the CDC. One possible explanation was that these commercials all included the Hotline telephone number while the other information sources did not.
Diagnostic value of 3D time-of-flight MRA in trigeminal neuralgia.
Cai, Jing; Xin, Zhen-Xue; Zhang, Yu-Qiang; Sun, Jie; Lu, Ji-Liang; Xie, Feng
2015-08-01
The aim of this meta-analysis was to evaluate the diagnostic value of 3D time-of-flight magnetic resonance angiography (3D-TOF-MRA) in trigeminal neuralgia (TN). Relevant studies were identified by computerized database searches supplemented by manual search strategies. The studies were included in accordance with stringent inclusion and exclusion criteria. Following a multistep screening process, high quality studies related to the diagnostic value of 3D-TOF-MRA in TN were selected for meta-analysis. Statistical analyses were conducted using Statistical Analysis Software (version 8.2; SAS Institute, Cary, NC, USA) and Meta Disc (version 1.4; Unit of Clinical Biostatistics, Ramon y Cajal Hospital, Madrid, Spain). For the present meta-analysis, we initially retrieved 95 studies from database searches. A total of 13 studies were eventually enrolled containing a combined total of 1084 TN patients. The meta-analysis results demonstrated that the sensitivity and specificity of the diagnostic value of 3D-TOF-MRA in TN were 95% (95% confidence interval [CI] 0.93-0.96) and 77% (95% CI 0.66-0.86), respectively. The pooled positive likelihood ratio and negative likelihood ratio were 2.72 (95% CI 1.81-4.09) and 0.08 (95% CI 0.06-0.12), respectively. The pooled diagnostic odds ratio of 3D-TOF-MRA in TN was 52.92 (95% CI 26.39-106.11), and the corresponding area under the curve in the summary receiver operating characteristic curve based on the 3D-TOF-MRA diagnostic image of observers was 0.9695 (standard error 0.0165). Our results suggest that 3D-TOF-MRA has excellent sensitivity and specificity as a diagnostic tool for TN, and that it can accurately identify neurovascular compression in TN patients. Copyright © 2015 Elsevier Ltd. All rights reserved.
1988-12-19
Statistics [CEI Database 4 Nov] 17 Construction Bank Checks on Investment Loans [XINHUA] 17 Gold Output Rising 10 Percent Annually [CHINA DAILY 8...Industrial Output for September [CEI Database 11 Nov] 23 Energy Industry Grows Steadily in 1988 [CEI Database 11 Nov] 23 Government Plans To Boost...Plastics Industry [XINHUA] 24 Chongqing’s Industrial Output Increases [XINHUA] 24 Haikou Boosts Power Industry [CEI Database 27 Oct] 24 Jilin
HTAPP: High-Throughput Autonomous Proteomic Pipeline
Yu, Kebing; Salomon, Arthur R.
2011-01-01
Recent advances in the speed and sensitivity of mass spectrometers and in analytical methods, the exponential acceleration of computer processing speeds, and the availability of genomic databases from an array of species and protein information databases have led to a deluge of proteomic data. The development of a lab-based automated proteomic software platform for the automated collection, processing, storage, and visualization of expansive proteomic datasets is critically important. The high-throughput autonomous proteomic pipeline (HTAPP) described here is designed from the ground up to provide critically important flexibility for diverse proteomic workflows and to streamline the total analysis of a complex proteomic sample. This tool is comprised of software that controls the acquisition of mass spectral data along with automation of post-acquisition tasks such as peptide quantification, clustered MS/MS spectral database searching, statistical validation, and data exploration within a user-configurable lab-based relational database. The software design of HTAPP focuses on accommodating diverse workflows and providing missing software functionality to a wide range of proteomic researchers to accelerate the extraction of biological meaning from immense proteomic data sets. Although individual software modules in our integrated technology platform may have some similarities to existing tools, the true novelty of the approach described here is in the synergistic and flexible combination of these tools to provide an integrated and efficient analysis of proteomic samples. PMID:20336676
NASA Astrophysics Data System (ADS)
Dondeynaz, C.; Carmona Moreno, C.; Céspedes Lorente, J. J.
2012-01-01
The "Integrated Water Resources Management" principle was formally laid down at the International Conference on Water and Sustainable development in Dublin 1992. One of the main results of this conference is that improving Water and Sanitation Services (WSS), being a complex and interdisciplinary issue, passes through collaboration and coordination of different sectors (environment, health, economic activities, governance, and international cooperation). These sectors influence or are influenced by the access to WSS. The understanding of these interrelations appears as crucial for decision makers in the water sector. In this framework, the Joint Research Centre (JRC) of the European Commission (EC) has developed a new database (WatSan4Dev database) containing 45 indicators (called variables in this paper) from environmental, socio-economic, governance and financial aid flows data in developing countries. This paper describes the development of the WatSan4Dev dataset, the statistical processes needed to improve the data quality; and, finally, the analysis to verify the database coherence is presented. At the light of the first analysis, WatSan4Dev Dataset shows the coherency among the different variables that are confirmed by the direct field experience and/or the scientific literature in the domain. Preliminary analysis of the relationships indicates that the informal urbanisation development is an important factor influencing negatively the percentage of the population having access to WSS. Health, and in particular children health, benefits from the improvement of WSS. Efficient environmental governance is also an important factor for providing improved water supply services. The database would be at the base of posterior analyses to better understand the interrelationships between the different indicators associated in the water sector in developing countries. A data model using the different indicators will be realised on the next phase of this research work.
Separation and confirmation of showers
NASA Astrophysics Data System (ADS)
Neslušan, L.; Hajduková, M.
2017-02-01
Aims: Using IAU MDC photographic, IAU MDC CAMS video, SonotaCo video, and EDMOND video databases, we aim to separate all provable annual meteor showers from each of these databases. We intend to reveal the problems inherent in this procedure and answer the question whether the databases are complete and the methods of separation used are reliable. We aim to evaluate the statistical significance of each separated shower. In this respect, we intend to give a list of reliably separated showers rather than a list of the maximum possible number of showers. Methods: To separate the showers, we simultaneously used two methods. The use of two methods enables us to compare their results, and this can indicate the reliability of the methods. To evaluate the statistical significance, we suggest a new method based on the ideas of the break-point method. Results: We give a compilation of the showers from all four databases using both methods. Using the first (second) method, we separated 107 (133) showers, which are in at least one of the databases used. These relatively low numbers are a consequence of discarding any candidate shower with a poor statistical significance. Most of the separated showers were identified as meteor showers from the IAU MDC list of all showers. Many of them were identified as several of the showers in the list. This proves that many showers have been named multiple times with different names. Conclusions: At present, a prevailing share of existing annual showers can be found in the data and confirmed when we use a combination of results from large databases. However, to gain a complete list of showers, we need more-complete meteor databases than the most extensive databases currently are. We also still need a more sophisticated method to separate showers and evaluate their statistical significance. Tables A.1 and A.2 are also available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/598/A40
García-Pérez, M A
2001-11-01
This paper presents an analysis of research published in the decade 1989-1998 by Spanish faculty members in the areas of statistical methods, research methodology, and psychometric theory. Database search and direct correspondence with faculty members in Departments of Methodology across Spain rendered a list of 193 papers published in these broad areas by 82 faculty members. These and other faculty members had actually published 931 papers over the decade of analysis, but 738 of them addressed topics not appropriate for description in this report. Classification and analysis of these 193 papers revealed topics that have attracted the most interest (psychophysics, item response theory, analysis of variance, sequential analysis, and meta-analysis) as well as other topics that have received less attention (scaling, factor analysis, time series, and structural models). A significant number of papers also dealt with various methodological issues (software, algorithms, instrumentation, and techniques). A substantial part of this report is devoted to describing the issues addressed across these 193 papers--most of which are written in the Spanish language and published in Spanish journals--and some representative references are given.
NiCd cell reliability in the mission environment
NASA Technical Reports Server (NTRS)
Denson, William K.; Klein, Glenn C.
1993-01-01
This paper summarizes an effort by Gates Aerospace Batteries (GAB) and the Reliability Analysis Center (RAC) to analyze survivability data for both General Electric and GAB NiCd cells utilized in various spacecraft. For simplicity sake, all mission environments are described as either low Earth orbital (LEO) or geosynchronous Earth orbit (GEO). 'Extreme value statistical methods' are applied to this database because of the longevity of the numerous missions while encountering relatively few failures. Every attempt was made to include all known instances of cell-induced-failures of the battery and to exclude battery-induced-failures of the cell. While this distinction may be somewhat limited due to availability of in-flight data, we have accepted the learned opinion of the specific customer contacts to ensure integrity of the common databases. This paper advances the preliminary analysis reported upon at the 1991 NASA Battery Workshop. That prior analysis was concerned with an estimated 278 million cell-hours of operation encompassing 183 satellites. The paper also cited 'no reported failures to date.' This analysis reports on 428 million cell hours of operation emcompassing 212 satellites. This analysis also reports on seven 'cell-induced-failures.'
Lu, Haili; Tang, Haifang; Zhou, Tian; Kang, Na
2018-03-01
At present, many scholars have studied the periodontal health status of patients undergoing orthodontic treatment with fixed appliances and invisalign. However, those results are inconsistent. Therefore, we conducted this meta-analysis, and then provide reference for clinical treatment. Most databases, such as the Cochrane Library, EMBASE, PubMed, Medline, Chinese Biomedical Literature Database, CNKI, and Wan Fang Data were retrieved for related articles from the establishment of the database to October 2017. Meanwhile, we also searched the references of the related literatures manually, in order to increase the included literatures. Two researchers screened the related literatures according to the inclusion criteria and exclusion criteria. Stata 12.0 software was used for data analysis, and results are estimated by odds ratio (OR) and 95% confidence interval (CI). Finally, 7 articles, including 368 patients, were included into our meta-analysis. Meta-analysis results showed that there was no statistically significant difference of gingival index (GI) and sulcus probing depth (SPD) status between the invisalign group and the control group, including at 1, 3, and 6 months (all P > .05). When compared with the control group, the invisalign group presented a lower plaque index (PLI) and sulcus bleeding index (SBI) status at 1 month (OR = -0.53, 95% CI: -0.89 to -0.18; OR = -0.44, 95% CI: -0.70 to -0.19, respectively), 3 months (OR = -0.69, 95% CI: -1.12 to -0.27; OR = -0.49, 95% CI: -0.93 to -0.05, respectively), and 6 months (OR = -0.91, 95% CI: -1.47 to -0.35; OR = -0.40, 95% CI: -0.63 to -0.07, respectively). Subgroup analysis showed that the SPD status was lower in the invisalign group at 6 months when measured the teeth using Ramfjord index (OR = -0.74, 95% CI: -1.35 to -0.12). However, there was no statistically significant difference between the 2 groups when using other measure methods (OR = 0.12, 95% CI: -0.26 to 0.17). Our meta-analysis suggests that comparing with the traditional fixed appliances, patients treated with invisalign have a better periodontal health. However, more studies are needed to confirm this conclusion in the future.
Development of new on-line statistical program for the Korean Society for Radiation Oncology
Song, Si Yeol; Ahn, Seung Do; Chung, Weon Kuu; Choi, Eun Kyung; Cho, Kwan Ho
2015-01-01
Purpose To develop new on-line statistical program for the Korean Society for Radiation Oncology (KOSRO) to collect and extract medical data in radiation oncology more efficiently. Materials and Methods The statistical program is a web-based program. The directory was placed in a sub-folder of the homepage of KOSRO and its web address is http://www.kosro.or.kr/asda. The operating systems server is Linux and the webserver is the Apache HTTP server. For database (DB) server, MySQL is adopted and dedicated scripting language is the PHP. Each ID and password are controlled independently and all screen pages for data input or analysis are made to be friendly to users. Scroll-down menu is actively used for the convenience of user and the consistence of data analysis. Results Year of data is one of top categories and main topics include human resource, equipment, clinical statistics, specialized treatment and research achievement. Each topic or category has several subcategorized topics. Real-time on-line report of analysis is produced immediately after entering each data and the administrator is able to monitor status of data input of each hospital. Backup of data as spread sheets can be accessed by the administrator and be used for academic works by any members of the KOSRO. Conclusion The new on-line statistical program was developed to collect data from nationwide departments of radiation oncology. Intuitive screen and consistent input structure are expected to promote entering data of member hospitals and annual statistics should be a cornerstone of advance in radiation oncology. PMID:26157684
Development of new on-line statistical program for the Korean Society for Radiation Oncology.
Song, Si Yeol; Ahn, Seung Do; Chung, Weon Kuu; Shin, Kyung Hwan; Choi, Eun Kyung; Cho, Kwan Ho
2015-06-01
To develop new on-line statistical program for the Korean Society for Radiation Oncology (KOSRO) to collect and extract medical data in radiation oncology more efficiently. The statistical program is a web-based program. The directory was placed in a sub-folder of the homepage of KOSRO and its web address is http://www.kosro.or.kr/asda. The operating systems server is Linux and the webserver is the Apache HTTP server. For database (DB) server, MySQL is adopted and dedicated scripting language is the PHP. Each ID and password are controlled independently and all screen pages for data input or analysis are made to be friendly to users. Scroll-down menu is actively used for the convenience of user and the consistence of data analysis. Year of data is one of top categories and main topics include human resource, equipment, clinical statistics, specialized treatment and research achievement. Each topic or category has several subcategorized topics. Real-time on-line report of analysis is produced immediately after entering each data and the administrator is able to monitor status of data input of each hospital. Backup of data as spread sheets can be accessed by the administrator and be used for academic works by any members of the KOSRO. The new on-line statistical program was developed to collect data from nationwide departments of radiation oncology. Intuitive screen and consistent input structure are expected to promote entering data of member hospitals and annual statistics should be a cornerstone of advance in radiation oncology.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-01-01
Background The Claim Evaluation Tools database contains multiple-choice items for measuring people’s ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. Objectives To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. Participants We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Results Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Conclusion Most of the items conformed well to the Rasch model’s expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. PMID:28550019
Lee, Ellen E; Della Selva, Megan P; Liu, Anson; Himelhoch, Seth
2015-01-01
Given the significant disability, morbidity and mortality associated with depression, the promising recent trials of ketamine highlight a novel intervention. A meta-analysis was conducted to assess the efficacy of ketamine in comparison with placebo for the reduction of depressive symptoms in patients who meet criteria for a major depressive episode. Two electronic databases were searched in September 2013 for English-language studies that were randomized placebo-controlled trials of ketamine treatment for patients with major depressive disorder or bipolar depression and utilized a standardized rating scale. Studies including participants receiving electroconvulsive therapy and adolescent/child participants were excluded. Five studies were included in the quantitative meta-analysis. The quantitative meta-analysis showed that ketamine significantly reduced depressive symptoms. The overall effect size at day 1 was large and statistically significant with an overall standardized mean difference of 1.01 (95% confidence interval 0.69-1.34) (P<.001), with the effects sustained at 7 days postinfusion. The heterogeneity of the studies was low and not statistically significant, and the funnel plot showed no publication bias. The large and statistically significant effect of ketamine on depressive symptoms supports a promising, new and effective pharmacotherapy with rapid onset, high efficacy and good tolerability. Copyright © 2015. Published by Elsevier Inc.
Chughtai, Morad; Gwam, Chukwuweike U; Khlopas, Anton; Newman, Jared M; Curtis, Gannon L; Torres, Pedro A; Khan, Rafay; Mont, Michael A
2017-07-25
Pneumonia is the third most common postoperative complication. However, its epidemiology varies widely and is often difficult to assess. For a better understanding, we utilized two national databases to determine the incidence of postoperative pneumonia after various surgical procedures. Specifically, we used the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) and the Nationwide Inpatient Sample (NIS) to determine the incidence and yearly trends of postoperative pneumonia following orthopaedic, urologic, otorhinolaryngologic, cardiothoracic, neurosurgery, and general surgeries. The NIS and NSQIP databases from 2009-2013 were utilized. The Clinical Classification Software (CCS) for International Classification of Diseases, 9th edition (ICD-9) codes provided by the NIS database was used to identify all surgical subspecialty procedures. The incidence of postoperative pneumonia was identified as the total number of cases under each identifying CCS code that also had ICD-9 codes for postoperative pneumonia. In the NSQIP database, the surgical subspecialties were selected using the following identifying string variables provided by NSQIP: 1) "Orthopedics", 2) "Otolaryngology (ENT)", 3) "Urology", 4) "Neurosurgery", 5) "General Surgery", and 6) "Cardiac Surgery" and "Thoracic Surgery". Cardiac and thoracic surgery was merged to create the variable "Cardiothoracic Surgery". Postoperative pneumonia cases were extracted utilizing the available NSQIP nominal variables. All variables were used to isolate the incidences of postoperative pneumonia stratified by surgical specialty. A subsequent trend analysis was conducted to assess the associations between operative year and incidence of postoperative pneumonia. For all NIS surgeries, the incidence of postoperative pneumonia was 0.97% between 2009 and 2013. The incidence was highest among patients who underwent cardiothoracic surgery (3.3%) and urologic surgery (1.73%). Patients who underwent general surgery, neurosurgery, spine surgery, orthopaedic surgery, and ENT surgery had a postoperative pneumonia incidence of 1.1%, 0.6%, 0.5%, 0.5%, and 0.4%, respectively. Overall trend analysis demonstrated a statistically significant decrease in postoperative pneumonia incidence (p <0.001), which paralleled in each specialty as well. In NSQIP, the incidence of postoperative pneumonia for all surgeries that occurred between 2009 and 2013 was 1.3%. The incidences of postoperative pneumonia were highest among patients who underwent cardiothoracic surgery (5.3%), general surgery (1.4%), and neurosurgery (1.4%). The incidences of postoperative pneumonia in patients who underwent ENT surgery, orthopedic surgery, and urologic surgery were 0.7%, respectively. Overall trend analysis demonstrated a statistically significant increase in postoperative pneumonia incidence for patients undergoing cardiothoracic surgery (p <0.001). There were no notable trends for the other surgical subspecialties. The incidence of postoperative pneumonia differs between the two national databases. Furthermore, the incidences differed among the various surgical subspecialties; however, cardiothoracic surgery had the highest incidence in both databases. Furthermore, cardiothoracic surgery appeared to have an increasing trend in incidence. Standardizing and implementing accurate coding methodologies for this complication are needed for a more accurate assessment of this burdensome complication. Future studies should assess interventions, such as oral cleansing and suctioning, incentive spirometry, as well as designated institution-based pneumonia prevention programs and protocols to help prevent and mitigate the occurrence of this complication.
Ogawa, Takaya; Iyoki, Kenta; Fukushima, Tomohiro; Kajikawa, Yuya
2017-12-14
The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a "bird's eye" view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) "well-cited" research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach.
Ogawa, Takaya; Fukushima, Tomohiro; Kajikawa, Yuya
2017-01-01
The field of porous materials is widely spreading nowadays, and researchers need to read tremendous numbers of papers to obtain a “bird’s eye” view of a given research area. However, it is difficult for researchers to obtain an objective database based on statistical data without any relation to subjective knowledge related to individual research interests. Here, citation network analysis was applied for a comparative analysis of the research areas for zeolites and metal-organic frameworks as examples for porous materials. The statistical and objective data contributed to the analysis of: (1) the computational screening of research areas; (2) classification of research stages to a certain domain; (3) “well-cited” research areas; and (4) research area preferences of specific countries. Moreover, we proposed a methodology to assist researchers to gain potential research ideas by reviewing related research areas, which is based on the detection of unfocused ideas in one area but focused in the other area by a bibliometric approach. PMID:29240708
URS DataBase: universe of RNA structures and their motifs.
Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail
2016-01-01
The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA-protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification.Database URL: http://server3.lpm.org.ru/urs/. © The Author(s) 2016. Published by Oxford University Press.
URS DataBase: universe of RNA structures and their motifs
Baulin, Eugene; Yacovlev, Victor; Khachko, Denis; Spirin, Sergei; Roytberg, Mikhail
2016-01-01
The Universe of RNA Structures DataBase (URSDB) stores information obtained from all RNA-containing PDB entries (2935 entries in October 2015). The content of the database is updated regularly. The database consists of 51 tables containing indexed data on various elements of the RNA structures. The database provides a web interface allowing user to select a subset of structures with desired features and to obtain various statistical data for a selected subset of structures or for all structures. In particular, one can easily obtain statistics on geometric parameters of base pairs, on structural motifs (stems, loops, etc.) or on different types of pseudoknots. The user can also view and get information on an individual structure or its selected parts, e.g. RNA–protein hydrogen bonds. URSDB employs a new original definition of loops in RNA structures. That definition fits both pseudoknot-free and pseudoknotted secondary structures and coincides with the classical definition in case of pseudoknot-free structures. To our knowledge, URSDB is the first database supporting searches based on topological classification of pseudoknots and on extended loop classification. Database URL: http://server3.lpm.org.ru/urs/ PMID:27242032
Analysis of Runway Incursion Data
NASA Technical Reports Server (NTRS)
Green, Lawrence L.
2013-01-01
A statistical analysis of runway incursion (RI) events was conducted to ascertain relevance to the top ten challenges of the National Aeronautics and Space Administration Aviation Safety Program (AvSP). The information contained in the RI database was found to contain data that may be relevant to several of the AvSP top ten challenges. When combined with other data from the FAA documenting air traffic volume from calendar year 2000 through 2011, the structure of a predictive model emerges that can be used to forecast the frequency of RI events at various airports for various classes of aircraft and under various environmental conditions.
Bumm, Klaus; Zheng, Mingzhong; Bailey, Clyde; Zhan, Fenghuang; Chiriva-Internati, M; Eddlemon, Paul; Terry, Julian; Barlogie, Bart; Shaughnessy, John D
2002-02-01
Clinical GeneOrganizer (CGO) is a novel windows-based archiving, organization and data mining software for the integration of gene expression profiling in clinical medicine. The program implements various user-friendly tools and extracts data for further statistical analysis. This software was written for Affymetrix GeneChip *.txt files, but can also be used for any other microarray-derived data. The MS-SQL server version acts as a data mart and links microarray data with clinical parameters of any other existing database and therefore represents a valuable tool for combining gene expression analysis and clinical disease characteristics.
Information categorization approach to literary authorship disputes
NASA Astrophysics Data System (ADS)
Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.
2003-11-01
Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.
Workflow based framework for life science informatics.
Tiwari, Abhishek; Sekhar, Arvind K T
2007-10-01
Workflow technology is a generic mechanism to integrate diverse types of available resources (databases, servers, software applications and different services) which facilitate knowledge exchange within traditionally divergent fields such as molecular biology, clinical research, computational science, physics, chemistry and statistics. Researchers can easily incorporate and access diverse, distributed tools and data to develop their own research protocols for scientific analysis. Application of workflow technology has been reported in areas like drug discovery, genomics, large-scale gene expression analysis, proteomics, and system biology. In this article, we have discussed the existing workflow systems and the trends in applications of workflow based systems.
Publication trend, resource utilization, and impact of the US National Cancer Database
Su, Chang; Peng, Cuiying; Agbodza, Ena; Bai, Harrison X.; Huang, Yuqian; Karakousis, Giorgos; Zhang, Paul J.; Zhang, Zishu
2018-01-01
Abstract Background: The utilization and impact of the studies published using the National Cancer Database (NCDB) is currently unclear. In this study, we aim to characterize the published studies, and identify relatively unexplored areas for future investigations. Methods: A literature search was performed using PubMed in January 2017 to identify all papers published using NCDB data. Characteristics of the publications were extracted. Citation frequencies were obtained through the Web of Science. Results: Three hundred 2 articles written by 230 first authors met the inclusion criteria. The number of publications grew exponentially since 2013, with 108 articles published in 2016. Articles were published in 86 journals. The majority of the published papers focused on digestive system cancer, while bone and joints, eye and orbit, myeloma, mesothelioma, and Kaposi Sarcoma were never studied. Thirteen institutions in the United States were associated with more than 5 publications. The papers have been cited for a total of 9858 times since the publication of the first paper in 1992. Frequently appearing keywords congregated into 3 clusters: “demographics,” “treatments and survival,” and “statistical analysis method.” Even though the main focuses of the articles captured a extremely wide range, they can be classified into 2 main categories: survival analysis and characterization. Other focuses include database(s) analysis and/or comparison, and hospital reporting. Conclusion: The surging interest in the use of NCDB is accompanied by unequal utilization of resources by individuals and institutions. Certain areas were relatively understudied and should be further explored. PMID:29489679
Development of a conceptual integrated traffic safety problem identification database
DOT National Transportation Integrated Search
1999-12-01
The project conceptualized a traffic safety risk management information system and statistical database for improved problem-driver identification, countermeasure development, and resource allocation. The California Department of Motor Vehicles Drive...
Mummadi, Srinivas; Kumbam, Anusha; Hahn, Peter Y.
2015-01-01
Background: Malignant Pleural Effusion (MPE) is common with advanced malignancy. Palliative care with minimal adverse events is the cornerstone of management. Although talc pleurodesis plays an important role in treatment, the best modality of talc application remains controversial. Objective: To compare rates of successful pleurodesis, rates of respiratory and non-respiratory complications between thoracoscopic talc insufflation/poudrage (TTI) and talc slurry (TS). Data sources and study selection: MEDLINE (PubMed, OVID), EBM Reviews (Cochrane database of Systematic Reviews, ACP Journal Club, DARE, Cochrane Central Register of Controlled Trials, Cochrane Methodology Register, Health Technology Assessment and NHS Economic Evaluation Database), EMBASE and Scopus. Randomized controlled trials published between 01/01/1980 - 10/1/2014 and comparing the two strategies were selected. Results: Twenty-eight potential studies were identified of which 24 studies were further excluded, leaving four studies. No statistically significant difference in the probability of successful pleurodesis was observed between TS and TTI groups (RR 1.06; 95 % CI 0.99-1.14; Q statistic, 4.84). There was a higher risk of post procedural respiratory complications in the TTI group compared to the TS group (RR 1.91, 95% CI= 1.24-2.93, Q statistic 3.15). No statistically significant difference in the incidence of non-respiratory complications between the TTI group and the TS group was observed (RR 0.88, 95% CI= 0.72-1.07, Q statistic 4.61). Conclusions: There is no difference in success rates of pleurodesis based on patient centered outcomes between talc poudrage and talc slurry treatments. Respiratory complications are more common with talc poudrage via thoracoscopy. PMID:25878773
Fatal falls in the US construction industry, 1990 to 1999.
Derr, J; Forst, L; Chen, H Y; Conroy, L
2001-10-01
The Occupational Safety and Health Administration's (OSHA's) Integrated Management Information System (IMIS) database allows for the detailed analysis of risk factors surrounding fatal occupational events. This study used IMIS data to (1) perform a risk factor analysis of fatal construction falls, and (2) assess the impact of the February 1995 29 CFR Part 1926 Subpart M OSHA fall protection regulations for construction by calculating trends in fatal fall rates. In addition, IMIS data on fatal construction falls were compared with data from other occupational fatality surveillance systems. For falls in construction, the study identified several demographic factors that may indicate increased risk. A statistically significant downward trend in fatal falls was evident in all construction and within several construction categories during the decade. Although the study failed to show a statistically significant intervention effect from the new OSHA regulations, it may have lacked the power to do so.
NASA Technical Reports Server (NTRS)
Decker, Ryan K.; Barbre, Robert E., Jr.
2014-01-01
Space launch vehicles incorporate upper-level wind profiles to determine wind effects on the vehicle and for a commit to launch decision. These assessments incorporate wind profiles measured hours prior to launch and may not represent the actual wind the vehicle will fly through. Uncertainty in the upper-level winds over the time period between the assessment and launch can be mitigated by a statistical analysis of wind change over time periods of interest using historical data from the launch range. Five sets of temporal wind pairs at various times (.75, 1.5, 2, 3 and 4-hrs) at the Eastern Range, Western Range and Wallops Flight Facility were developed for use in upper-level wind assessments. Database development procedures as well as statistical analysis of temporal wind variability at each launch range will be presented.
Qu, Shu-Gen; Gao, Jin; Tang, Bo; Yu, Bo; Shen, Yue-Ping; Tu, Yu
2018-01-01
Low-dose ionizing radiation (LDIR) may increase the mortality of solid cancers in nuclear industry workers, but only few individual cohort studies exist, and the available reports have low statistical power. The aim of the present study was to focus on solid cancer mortality risk from LDIR in the nuclear industry using standard mortality ratios (SMRs) and 95% confidence intervals. A systematic literature search through the PubMed and Embase databases identified 27 studies relevant to this meta-analysis. There was statistical significance for total, solid and lung cancers, with meta-SMR values of 0.88, 0.80, and 0.89, respectively. There was evidence of stochastic effects by IR, but more definitive conclusions require additional analyses using standardized protocols to determine whether LDIR increases the risk of solid cancer-related mortality. PMID:29725540
A Meta-Analysis of Hypnotherapeutic Techniques in the Treatment of PTSD Symptoms.
O'Toole, Siobhan K; Solomon, Shelby L; Bergdahl, Stephen A
2016-02-01
The efficacy of hypnotherapeutic techniques as treatment for symptoms of posttraumatic stress disorder (PTSD) was explored through meta-analytic methods. Studies were selected through a search of 29 databases. Altogether, 81 studies discussing hypnotherapy and PTSD were reviewed for inclusion criteria. The outcomes of 6 studies representing 391 participants were analyzed using meta-analysis. Evaluation of effect sizes related to avoidance and intrusion, in addition to overall PTSD symptoms after hypnotherapy treatment, revealed that all studies showed that hypnotherapy had a positive effect on PTSD symptoms. The overall Cohen's d was large (-1.18) and statistically significant (p < .001). Effect sizes varied based on study quality; however, they were large and statistically significant. Using the classic fail-safe N to assess for publication bias, it was determined it would take 290 nonsignificant studies to nullify these findings. Copyright © 2016 International Society for Traumatic Stress Studies.
Integration of NASA/GSFC and USGS Rock Magnetic Databases.
NASA Astrophysics Data System (ADS)
Nazarova, K. A.; Glen, J. M.
2004-05-01
A global Magnetic Petrology Database (MPDB) was developed and continues to be updated at NASA/Goddard Space Flight Center. The purpose of this database is to provide the geomagnetic community with a comprehensive and user-friendly method of accessing magnetic petrology data via the Internet for a more realistic interpretation of satellite (as well as aeromagnetic and ground) lithospheric magnetic anomalies. The MPDB contains data on rocks from localities around the world (about 19,000 samples) including the Ukranian and Baltic Shields, Kamchatka, Iceland, Urals Mountains, etc. The MPDB is designed, managed and presented on the web as a research oriented database. Several database applications have been specifically developed for data manipulation and analysis of the MPDB. The geophysics unit at the USGS in Menlo Park has over 17,000 rock-property data, largely from sites within the western U.S. This database contains rock-density and rock-magnetic parameters collected for use in gravity and magnetic field modeling, and paleomagnetic studies. Most of these data were taken from surface outcrops and together they span a broad range of rock types. Measurements were made either in-situ at the outcrop, or in the laboratory on hand samples and paleomagnetic cores acquired in the field. The USGS and NASA/GSFC data will be integrated as part of an effort to provide public access to a single, uniformly maintained database. Due to the large number of data and the very large area sampled, the database can yield rock-property statistics on a broad range of rock types; it is thus applicable to study areas beyond the geographic scope of the database. The intent of this effort is to provide incentive for others to further contribute to the database, and a tool with which the geophysical community can entertain studies formerly precluded.
Seabed mapping and characterization of sediment variability using the usSEABED data base
Goff, J.A.; Jenkins, C.J.; Jeffress, Williams S.
2008-01-01
We present a methodology for statistical analysis of randomly located marine sediment point data, and apply it to the US continental shelf portions of usSEABED mean grain size records. The usSEABED database, like many modern, large environmental datasets, is heterogeneous and interdisciplinary. We statistically test the database as a source of mean grain size data, and from it provide a first examination of regional seafloor sediment variability across the entire US continental shelf. Data derived from laboratory analyses ("extracted") and from word-based descriptions ("parsed") are treated separately, and they are compared statistically and deterministically. Data records are selected for spatial analysis by their location within sample regions: polygonal areas defined in ArcGIS chosen by geography, water depth, and data sufficiency. We derive isotropic, binned semivariograms from the data, and invert these for estimates of noise variance, field variance, and decorrelation distance. The highly erratic nature of the semivariograms is a result both of the random locations of the data and of the high level of data uncertainty (noise). This decorrelates the data covariance matrix for the inversion, and largely prevents robust estimation of the fractal dimension. Our comparison of the extracted and parsed mean grain size data demonstrates important differences between the two. In particular, extracted measurements generally produce finer mean grain sizes, lower noise variance, and lower field variance than parsed values. Such relationships can be used to derive a regionally dependent conversion factor between the two. Our analysis of sample regions on the US continental shelf revealed considerable geographic variability in the estimated statistical parameters of field variance and decorrelation distance. Some regional relationships are evident, and overall there is a tendency for field variance to be higher where the average mean grain size is finer grained. Surprisingly, parsed and extracted noise magnitudes correlate with each other, which may indicate that some portion of the data variability that we identify as "noise" is caused by real grain size variability at very short scales. Our analyses demonstrate that by applying a bias-correction proxy, usSEABED data can be used to generate reliable interpolated maps of regional mean grain size and sediment character.
PASSALI, D.; CARUSO, G.; ARIGLIANO, L.C.; PASSALI, F.M.; BELLUSSI, L.
2012-01-01
SUMMARY Obstructive sleep apnoea syndrome (OSAS) results from upper airway collapse during sleep. It represents an increasingly recognized pathology associated with many diseases. Herein, we describe a database for patients with OSAS. This has different goals: to facilitate good uniformity in clinical assessment, to allow the use of the application even by non-ENT specialists, to evaluate the results of medical and/or surgical treatments and to enable a statistical meta-analysis derived from the data collected in many OSAS medical centres. PMID:23093815
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.
Amigo, Jorge; Salas, Antonio; Phillips, Christopher; Carracedo, Angel
2008-10-10
In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 x 10(9) genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.
Estimated intakes and sources of total and added sugars in the Canadian diet.
Brisbois, Tristin D; Marsden, Sandra L; Anderson, G Harvey; Sievenpiper, John L
2014-05-08
National food supply data and dietary surveys are essential to estimate nutrient intakes and monitor trends, yet there are few published studies estimating added sugars consumption. The purpose of this report was to estimate and trend added sugars intakes and their contribution to total energy intake among Canadians by, first, using Canadian Community Health Survey (CCHS) nutrition survey data of intakes of sugars in foods and beverages, and second, using Statistics Canada availability data and adjusting these for wastage to estimate intakes. Added sugars intakes were estimated from CCHS data by categorizing the sugars content of food groups as either added or naturally occurring. Added sugars accounted for approximately half of total sugars consumed. Annual availability data were obtained from Statistics Canada CANSIM database. Estimates for added sugars were obtained by summing the availability of "sugars and syrups" with availability of "soft drinks" (proxy for high fructose corn syrup) and adjusting for waste. Analysis of both survey and availability data suggests that added sugars average 11%-13% of total energy intake. Availability data indicate that added sugars intakes have been stable or modestly declining as a percent of total energy over the past three decades. Although these are best estimates based on available data, this analysis may encourage the development of better databases to help inform public policy recommendations.
National Institutes of Health Funding in Radiation Oncology: A Snapshot
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinberg, Michael; McBride, William H.; Vlashi, Erina
Currently, pay lines for National Institutes of Health (NIH) grants are at a historical low. In this climate of fierce competition, knowledge about the funding situation in a small field like radiation oncology becomes very important for career planning and recruitment of faculty. Unfortunately, these data cannot be easily extracted from the NIH's database because it does not discriminate between radiology and radiation oncology departments. At the start of fiscal year 2013 we extracted records for 952 individual grants, which were active at the time of analysis from the NIH database. Proposals originating from radiation oncology departments were identified manually.more » Descriptive statistics were generated using the JMP statistical software package. Our analysis identified 197 grants in radiation oncology. These proposals came from 134 individual investigators in 43 academic institutions. The majority of the grants (118) were awarded to principal investigators at the full professor level, and 122 principal investigators held a PhD degree. In 79% of the grants, the research topic fell into the field of biology, 13% in the field of medical physics. Only 7.6% of the proposals were clinical investigations. Our data suggest that the field of radiation oncology is underfunded by the NIH and that the current level of support does not match the relevance of radiation oncology for cancer patients or the potential of its academic work force.« less
National Institutes of Health funding in radiation oncology: a snapshot.
Steinberg, Michael; McBride, William H; Vlashi, Erina; Pajonk, Frank
2013-06-01
Currently, pay lines for National Institutes of Health (NIH) grants are at a historical low. In this climate of fierce competition, knowledge about the funding situation in a small field like radiation oncology becomes very important for career planning and recruitment of faculty. Unfortunately, these data cannot be easily extracted from the NIH's database because it does not discriminate between radiology and radiation oncology departments. At the start of fiscal year 2013 we extracted records for 952 individual grants, which were active at the time of analysis from the NIH database. Proposals originating from radiation oncology departments were identified manually. Descriptive statistics were generated using the JMP statistical software package. Our analysis identified 197 grants in radiation oncology. These proposals came from 134 individual investigators in 43 academic institutions. The majority of the grants (118) were awarded to principal investigators at the full professor level, and 122 principal investigators held a PhD degree. In 79% of the grants, the research topic fell into the field of biology, 13% in the field of medical physics. Only 7.6% of the proposals were clinical investigations. Our data suggest that the field of radiation oncology is underfunded by the NIH and that the current level of support does not match the relevance of radiation oncology for cancer patients or the potential of its academic work force. Copyright © 2013 Elsevier Inc. All rights reserved.
Estimated Intakes and Sources of Total and Added Sugars in the Canadian Diet
Brisbois, Tristin D.; Marsden, Sandra L.; Anderson, G. Harvey; Sievenpiper, John L.
2014-01-01
National food supply data and dietary surveys are essential to estimate nutrient intakes and monitor trends, yet there are few published studies estimating added sugars consumption. The purpose of this report was to estimate and trend added sugars intakes and their contribution to total energy intake among Canadians by, first, using Canadian Community Health Survey (CCHS) nutrition survey data of intakes of sugars in foods and beverages, and second, using Statistics Canada availability data and adjusting these for wastage to estimate intakes. Added sugars intakes were estimated from CCHS data by categorizing the sugars content of food groups as either added or naturally occurring. Added sugars accounted for approximately half of total sugars consumed. Annual availability data were obtained from Statistics Canada CANSIM database. Estimates for added sugars were obtained by summing the availability of “sugars and syrups” with availability of “soft drinks” (proxy for high fructose corn syrup) and adjusting for waste. Analysis of both survey and availability data suggests that added sugars average 11%–13% of total energy intake. Availability data indicate that added sugars intakes have been stable or modestly declining as a percent of total energy over the past three decades. Although these are best estimates based on available data, this analysis may encourage the development of better databases to help inform public policy recommendations. PMID:24815507
Vision-based gait impairment analysis for aided diagnosis.
Ortells, Javier; Herrero-Ezquerro, María Trinidad; Mollineda, Ramón A
2018-02-12
Gait is a firsthand reflection of health condition. This belief has inspired recent research efforts to automate the analysis of pathological gait, in order to assist physicians in decision-making. However, most of these efforts rely on gait descriptions which are difficult to understand by humans, or on sensing technologies hardly available in ambulatory services. This paper proposes a number of semantic and normalized gait features computed from a single video acquired by a low-cost sensor. Far from being conventional spatio-temporal descriptors, features are aimed at quantifying gait impairment, such as gait asymmetry from several perspectives or falling risk. They were designed to be invariant to frame rate and image size, allowing cross-platform comparisons. Experiments were formulated in terms of two databases. A well-known general-purpose gait dataset is used to establish normal references for features, while a new database, introduced in this work, provides samples under eight different walking styles: one normal and seven impaired patterns. A number of statistical studies were carried out to prove the sensitivity of features at measuring the expected pathologies, providing enough evidence about their accuracy. Graphical Abstract Graphical abstract reflecting main contributions of the manuscript: at the top, a robust, semantic and easy-to-interpret feature set to describe impaired gait patterns; at the bottom, a new dataset consisting of video-recordings of a number of volunteers simulating different patterns of pathological gait, where features were statistically assessed.
Contextualization of drug-mediator relations using evidence networks.
Tran, Hai Joey; Speyer, Gil; Kiefer, Jeff; Kim, Seungchan
2017-05-31
Genomic analysis of drug response can provide unique insights into therapies that can be used to match the "right drug to the right patient." However, the process of discovering such therapeutic insights using genomic data is not straightforward and represents an area of active investigation. EDDY (Evaluation of Differential DependencY), a statistical test to detect differential statistical dependencies, is one method that leverages genomic data to identify differential genetic dependencies. EDDY has been used in conjunction with the Cancer Therapeutics Response Portal (CTRP), a dataset with drug-response measurements for more than 400 small molecules, and RNAseq data of cell lines in the Cancer Cell Line Encyclopedia (CCLE) to find potential drug-mediator pairs. Mediators were identified as genes that showed significant change in genetic statistical dependencies within annotated pathways between drug sensitive and drug non-sensitive cell lines, and the results are presented as a public web-portal (EDDY-CTRP). However, the interpretability of drug-mediator pairs currently hinders further exploration of these potentially valuable results. In this study, we address this challenge by constructing evidence networks built with protein and drug interactions from the STITCH and STRING interaction databases. STITCH and STRING are sister databases that catalog known and predicted drug-protein interactions and protein-protein interactions, respectively. Using these two databases, we have developed a method to construct evidence networks to "explain" the relation between a drug and a mediator. RESULTS: We applied this approach to drug-mediator relations discovered in EDDY-CTRP analysis and identified evidence networks for ~70% of drug-mediator pairs where most mediators were not known direct targets for the drug. Constructed evidence networks enable researchers to contextualize the drug-mediator pair with current research and knowledge. Using evidence networks, we were able to improve the interpretability of the EDDY-CTRP results by linking the drugs and mediators with genes associated with both the drug and the mediator. We anticipate that these evidence networks will help inform EDDY-CTRP results and enhance the generation of important insights to drug sensitivity that will lead to improved precision medicine applications.
Genetic polymorphisms of pharmacogenomic VIP variants in the Yi population from China.
Yan, Mengdan; Li, Dianzhen; Zhao, Guige; Li, Jing; Niu, Fanglin; Li, Bin; Chen, Peng; Jin, Tianbo
2018-03-30
Drug response and target therapeutic dosage are different among individuals. The variability is largely genetically determined. With the development of pharmacogenetics and pharmacogenomics, widespread research have provided us a wealth of information on drug-related genetic polymorphisms, and the very important pharmacogenetic (VIP) variants have been identified for the major populations around the world whereas less is known regarding minorities in China, including the Yi ethnic group. Our research aims to screen the potential genetic variants in Yi population on pharmacogenomics and provide a theoretical basis for future medication guidance. In the present study, 80 VIP variants (selected from the PharmGKB database) were genotyped in 100 unrelated and healthy Yi adults recruited for our research. Through statistical analysis, we made a comparison between the Yi and other 11 populations listed in the HapMap database for significant SNPs detection. Two specific SNPs were subsequently enrolled in an observation on global allele distribution with the frequencies downloaded from ALlele FREquency Database. Moreover, F-statistics (Fst), genetic structure and phylogenetic tree analyses were conducted for determination of genetic similarity between the 12 ethnic groups. Using the χ2 tests, rs1128503 (ABCB1), rs7294 (VKORC1), rs9934438 (VKORC1), rs1540339 (VDR) and rs689466 (PTGS2) were identified as the significantly different loci for further analysis. The global allele distribution revealed that the allele "A" of rs1540339 and rs9934438 were more frequent in Yi people, which was consistent with the most populations in East Asia. F-statistics (Fst), genetic structure and phylogenetic tree analyses demonstrated that the Yi and CHD shared a closest relationship on their genetic backgrounds. Additionally, Yi was considered similar to the Han people from Shaanxi province among the domestic ethnic populations in China. Our results demonstrated significant differences on several polymorphic SNPs and supplement the pharmacogenomic information for the Yi population, which could provide new strategies for optimizing clinical medication in accordance with the genetic determinants of drug toxicity and efficacy. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.
1992-01-01
A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.
A Database of Woody Vegetation Responses to Elevated Atmospheric CO2 (NDP-072)
Curtis, Peter S [The Ohio State Univ., Columbus, OH (United States); Cushman, Robert M [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Brenkert, Antoinette L [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
1999-01-01
To perform a statistically rigorous meta-analysis of research results on the response by woody vegetation to increased atmospheric CO2 levels, a multiparameter database of responses was compiled. Eighty-four independent CO2-enrichment studies, covering 65 species and 35 response parameters, met the necessary criteria for inclusion in the database: reporting mean response, sample size, and variance of the response (either as standard deviation or standard error). Data were retrieved from the published literature and unpublished reports. This numeric data package contains a 29-field data set of CO2-exposure experiment responses by woody plants (as both a flat ASCII file and a spreadsheet file), files listing the references to the CO2-exposure experiments and specific comments relevant to the data in the data set, and this documentation file (which includes SAS and Fortran codes to read the ASCII data file; SAS is a registered trademark of the SAS Institute, Inc., Cary, North Carolina 27511).
Kraeutler, Matthew J; Carver, Trevor J; Belk, John W; McCarty, Eric C
2018-06-01
Kraeutler, MJ, Carver, TJ, Belk, JW, and McCarty, EC. What is the value of a National Football League draft pick? An analysis based on changes made in the collective bargaining agreement. J Strength Cond Res 32(6): 1656-1661, 2018-The purpose of this study was to analyze and compare the value of players drafted in early rounds of the National Football League (NFL) Draft since the new collective bargaining agreement began in 2011. The NFL's player statistics database and database of player contract details were searched for players drafted in the first 3 rounds of the 2011 to 2013 NFL Drafts. Performance outcomes specific to each position were divided by each player's salary to calculate a value statistic. Various demographics, NFL Combine results, and total number of games missed because of injury were also recorded for each player. These statistics were compared within each position between players selected in the first round of the NFL Draft (group A) vs. those drafted in the second or third round (group B). A total of 147 players were included (group A 35, group B 112). Overall, players in group A were significantly taller (p ≤ 0.01) and heavier (p = 0.037) than players in group B. Group B demonstrated significantly greater value statistics than group A for quarterbacks (p = 0.028), wide receivers (p ≤ 0.001), defensive tackles (p = 0.019), and cornerbacks (p ≤ 0.001). No significant differences were found between groups with regard to number of games missed because of injury. Players drafted in the second or third rounds of the NFL Draft often carry more value than those drafted in the first round. NFL teams may wish to more frequently trade down in the Draft rather than trading up.
Dziuba, Bartłomiej; Dziuba, Marta
2014-08-20
New peptides with potential antimicrobial activity, encrypted in milk protein sequences, were searched for with the use of bioinformatic tools. The major milk proteins were hydrolyzed in silico by 28 enzymes. The obtained peptides were characterized by the following parameters: molecular weight, isoelectric point, composition and number of amino acid residues, net charge at pH 7.0, aliphatic index, instability index, Boman index, and GRAVY index, and compared with those calculated for known 416 antimicrobial peptides including 59 antimicrobial peptides (AMPs) from milk proteins listed in the BIOPEP database. A simple analysis of physico-chemical properties and the values of biological activity indicators were insufficient to select potentially antimicrobial peptides released in silico from milk proteins by proteolytic enzymes. The final selection was made based on the results of multidimensional statistical analysis such as support vector machines (SVM), random forest (RF), artificial neural networks (ANN) and discriminant analysis (DA) available in the Collection of Anti-Microbial Peptides (CAMP database). Eleven new peptides with potential antimicrobial activity were selected from all peptides released during in silico proteolysis of milk proteins.
Dziuba, Bartłomiej; Dziuba, Marta
2014-01-01
New peptides with potential antimicrobial activity, encrypted in milk protein sequences, were searched for with the use of bioinformatic tools. The major milk proteins were hydrolyzed in silico by 28 enzymes. The obtained peptides were characterized by the following parameters: molecular weight, isoelectric point, composition and number of amino acid residues, net charge at pH 7.0, aliphatic index, instability index, Boman index, and GRAVY index, and compared with those calculated for known 416 antimicrobial peptides including 59 antimicrobial peptides (AMPs) from milk proteins listed in the BIOPEP database. A simple analysis of physico-chemical properties and the values of biological activity indicators were insufficient to select potentially antimicrobial peptides released in silico from milk proteins by proteolytic enzymes. The final selection was made based on the results of multidimensional statistical analysis such as support vector machines (SVM), random forest (RF), artificial neural networks (ANN) and discriminant analysis (DA) available in the Collection of Anti-Microbial Peptides (CAMP database). Eleven new peptides with potential antimicrobial activity were selected from all peptides released during in silico proteolysis of milk proteins. PMID:25141106
Discovering Knowledge from AIS Database for Application in VTS
NASA Astrophysics Data System (ADS)
Tsou, Ming-Cheng
The widespread use of the Automatic Identification System (AIS) has had a significant impact on maritime technology. AIS enables the Vessel Traffic Service (VTS) not only to offer commonly known functions such as identification, tracking and monitoring of vessels, but also to provide rich real-time information that is useful for marine traffic investigation, statistical analysis and theoretical research. However, due to the rapid accumulation of AIS observation data, the VTS platform is often unable quickly and effectively to absorb and analyze it. Traditional observation and analysis methods are becoming less suitable for the modern AIS generation of VTS. In view of this, we applied the same data mining technique used for business intelligence discovery (in Customer Relation Management (CRM) business marketing) to the analysis of AIS observation data. This recasts the marine traffic problem as a business-marketing problem and integrates technologies such as Geographic Information Systems (GIS), database management systems, data warehousing and data mining to facilitate the discovery of hidden and valuable information in a huge amount of observation data. Consequently, this provides the marine traffic managers with a useful strategic planning resource.
Insights into vehicle trajectories at the handling limits: analysing open data from race car drivers
NASA Astrophysics Data System (ADS)
Kegelman, John C.; Harbott, Lene K.; Gerdes, J. Christian
2017-02-01
Race car drivers can offer insights into vehicle control during extreme manoeuvres; however, little data from race teams is publicly available for analysis. The Revs Program at Stanford has built a collection of vehicle dynamics data acquired from vintage race cars during live racing events with the intent of making this database publicly available for future analysis. This paper discusses the data acquisition, post-processing, and storage methods used to generate the database. An analysis of available data quantifies the repeatability of professional race car driver performance by examining the statistical dispersion of their driven paths. Certain map features, such as sections with high path curvature, consistently corresponded to local minima in path dispersion, quantifying the qualitative concept that drivers anchor their racing lines at specific locations around the track. A case study explores how two professional drivers employ distinct driving styles to achieve similar lap times, supporting the idea that driving at the limits allows a family of solutions in terms of paths and speed that can be adapted based on specific spatial, temporal, or other constraints and objectives.
Effects of the water level on the flow topology over the Bolund island
NASA Astrophysics Data System (ADS)
Cuerva-Tejero, A.; Yeow, T. S.; Gallego-Castillo, C.; Lopez-Garcia, O.
2014-06-01
We have analyzed the influence of the actual height of Bolund island above water level on different full-scale statistics of the velocity field over the peninsula. Our analysis is focused on the database of 10-minute statistics provided by Risø-DTU for the Bolund Blind Experiment. We have considered 10-minut.e periods with near-neutral atmospheric conditions, mean wind speed values in the interval [5,20] m/s, and westerly wind directions. As expected, statistics such as speed-up, normalized increase of turbulent kinetic energy and probability of recirculating flow show a large dependence on the emerged height of the island for the locations close to the escarpment. For the published ensemble mean values of speed-up and normalized increase of turbulent kinetic energy in these locations, we propose that some ammount of uncertainty could be explained as a deterministic dependence of the flow field statistics upon the actual height of the Bolund island above the sea level.
Bahlmann, Claus; Burkhardt, Hans
2004-03-01
In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.
A statistical method for measuring activation of gene regulatory networks.
Esteves, Gustavo H; Reis, Luiz F L
2018-06-13
Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.
NASA Astrophysics Data System (ADS)
Yan, Rui; Parrot, Michel; Pinçon, Jean-Louis
2017-12-01
In this paper, we present the result of a statistical study performed on the ionospheric ion density variations above areas of seismic activity. The ion density was observed by the low altitude satellite DEMETER between 2004 and 2010. In the statistical analysis a superposed epoch method is used where the observed ionospheric ion density close to the epicenters both in space and in time is compared to background values recorded at the same location and in the same conditions. Data associated with aftershocks have been carefully removed from the database to prevent spurious effects on the statistics. It is shown that, during nighttime, anomalous ionospheric perturbations related to earthquakes with magnitudes larger than 5 are evidenced. At the time of these perturbations the background ion fluctuation departs from a normal distribution. They occur up to 200 km from the epicenters and mainly 5 days before the earthquakes. As expected, an ion density perturbation occurring just after the earthquakes and close to the epicenters is also evidenced.
Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier
2003-01-01
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
NASA Astrophysics Data System (ADS)
Paprotny, Dominik; Morales-Nápoles, Oswaldo; Jonkman, Sebastiaan N.
2018-03-01
The influence of social and economic change on the consequences of natural hazards has been a matter of much interest recently. However, there is a lack of comprehensive, high-resolution data on historical changes in land use, population, or assets available to study this topic. Here, we present the Historical Analysis of Natural Hazards in Europe (HANZE) database, which contains two parts: (1) HANZE-Exposure with maps for 37 countries and territories from 1870 to 2020 in 100 m resolution and (2) HANZE-Events, a compilation of past disasters with information on dates, locations, and losses, currently limited to floods only. The database was constructed using high-resolution maps of present land use and population, a large compilation of historical statistics, and relatively simple disaggregation techniques and rule-based land use reallocation schemes. Data encompassed in HANZE allow one to "normalize" information on losses due to natural hazards by taking into account inflation as well as changes in population, production, and wealth. This database of past events currently contains 1564 records (1870-2016) of flash, river, coastal, and compound floods. The HANZE database is freely available at https://data.4tu.nl/repository/collection:HANZE.
Integrating forensic information in a crime intelligence database.
Rossy, Quentin; Ioset, Sylvain; Dessimoz, Damien; Ribaux, Olivier
2013-07-10
Since 2008, intelligence units of six states of the western part of Switzerland have been sharing a common database for the analysis of high volume crimes. On a daily basis, events reported to the police are analysed, filtered and classified to detect crime repetitions and interpret the crime environment. Several forensic outcomes are integrated in the system such as matches of traces with persons, and links between scenes detected by the comparison of forensic case data. Systematic procedures have been settled to integrate links assumed mainly through DNA profiles, shoemarks patterns and images. A statistical outlook on a retrospective dataset of series from 2009 to 2011 of the database informs for instance on the number of repetition detected or confirmed and increased by forensic case data. Time needed to obtain forensic intelligence in regard with the type of marks treated, is seen as a critical issue. Furthermore, the underlying integration process of forensic intelligence into the crime intelligence database raised several difficulties in regards of the acquisition of data and the models used in the forensic databases. Solutions found and adopted operational procedures are described and discussed. This process form the basis to many other researches aimed at developing forensic intelligence models. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
An Update on Electronic Information Sources.
ERIC Educational Resources Information Center
Ackerman, Katherine
1987-01-01
This review of new developments and products in online services discusses trends in travel related services; full text databases; statistical source databases; an emphasis on regional and international business news; and user friendly systems. (Author/CLB)
2016-01-01
The aims of this study were to develop strategies and algorithms of calculating food commodity intake suitable for exposure assessment of residual chemicals by using the food intake database of Korea National Health and Nutrition Examination Survey (KNHANES). In this study, apples and their processed food products were chosen as a model food for accurate calculation of food commodity intakes uthrough the recently developed Korea food commodity intake calculation (KFCIC) software. The average daily intakes of total apples in Korea Health Statistics were 29.60 g in 2008, 32.40 g in 2009, 34.30 g in 2010, 28.10 g in 2011, and 24.60 g in 2012. The average daily intakes of apples by KFCIC software was 2.65 g higher than that by Korea Health Statistics. The food intake data in Korea Health Statistics might have less reflected the intake of apples from mixed and processed foods than KFCIC software has. These results can affect outcome of risk assessment for residual chemicals in foods. Therefore, the accurate estimation of the average daily intake of food commodities is very important, and more data for food intakes and recipes have to be applied to improve the quality of data. Nevertheless, this study can contribute to the predictive estimation of exposure to possible residual chemicals and subsequent analysis for their potential risks. PMID:27152299
Zhu, Jie; Chen, Hao; Song, Zhixiu; Wang, Xudong; Sun, Zhenshuang
2018-01-01
This article aims to assess the effects of ginger (Zingiber officinale Roscoe) on type 2 diabetes mellitus (T2DM) and/or components of the metabolic syndrome (MetS). Electronic literature was searched in PubMed, Embase, the Cochrane Library, Chinese Biomedical Database, China National Knowledge Infrastructure, and Wanfang Database from inception of the database to May 19, 2017, and supplemented by browsing reference lists of potentially eligible articles. Randomized controlled trials on research subjects were included. Data were extracted as a mean difference (MD) and 95% confidence interval (CI). Subgroup analysis of fasting blood glucose (FBG) was performed. 10 studies met the inclusion criteria with a total of 490 individuals. Ginger showed a significant beneficial effect in glucose control and insulin sensitivity. The pooled weighted MD of glycosylated hemoglobin (HbA1c) was -1.00, (95% CI: -1.56, -0.44; P < 0.001). Subgroup analysis revealed that ginger obviously reduced FBG in T2DM patients (-21.24; 95% CI: -33.21, -9.26; P < 0.001). Meanwhile, the significant effects of improvement of lipid profile were observed. Most analyses were not statistically heterogeneous. Based on the negligible side effects and obvious ameliorative effects on glucose control, insulin sensitivity, and lipid profile, ginger may be a promising adjuvant therapy for T2DM and MetS.
Brand, S; Otte, D; Stübig, T; Petri, M; Ettinger, M; Mueller, C W; Krettek, C; Haasper, C; Probst, C
2013-12-01
Patients of motor vehicle crashes (MVCs) suffering burns are challenging for the rescue team and the admitting hospital. These patients often face worse outcomes than crash patients with trauma only. Our analysis of the German In-depth Accident Study (GIDAS) database researches the detailed crash mechanisms to identify potential prevention measures. We analyzed the 2011 GIDAS database comprising 14,072 MVC patients and compared individuals with (Burns) and without (NoBurns) burns. Only complete data sets were included. Patients with burns obviously resulting of air bag deployment only were not included in the Burns group. Data acquisition by an on call team of medical and technical researchers starts at the crash scene immediately after the crash and comprises technical data as well as medical information until discharge from the hospital. Statistical analysis was done by Mann-Whitney-U-test. Level of significance was p < 0.05. 14,072 MVC patients with complete data sets were included in the analysis. 99 individuals suffered burns (0.7%; group "Burns"). Demographic data and injury severity showed no statistical significant difference between the two groups of Burns and NoBurns. Injury severity was measured using the Injury Severity Score (ISS). Direct frontal impact (Burns: 48.5% vs. NoBurns: 33%; p < 0.05) and high-energy impacts as represented by delta-v (m/s) (Burns: 33.5 ± 21.4 vs. NoBurns: 25.2 ± 15.9; p < 0.05) were significantly different between groups as was mortality (Burns: 12.5% vs. NoBurns: 2.1%; p < 0.05). Type of patients' motor vehicles and type of crash opponent showed no differences. Our results show, that frontal and high-energy impacts are associated with a frequency of burns. This may serve automobile construction companies to improve the burn safety to prevent flames spreading from the motor compartment to the passenger compartment. Communities may impose speed limits in local crash hot spots. Copyright © 2013 Elsevier Ltd and ISBI. All rights reserved.
Jakovljevic, Aleksandar; Andric, Miroslav
2014-01-01
During the last decade, a hypothesis has been established that human cytomegalovirus (HCMV) and Epstein-Barr virus (EBV) may be implicated in the pathogenesis of apical periodontitis. The aim of this review was to analyze the available evidence that indicates that HCMV and EBV can actually contribute to the pathogenesis of periapical lesions and to answer the following focused question: is there a relationship between HCMV and EBV DNA and/or RNA detection and the clinical features of human periapical lesions? The literature search covered MEDLINE, Science Citation Index Expanded (SCIexpanded), Scopus, and The Cochrane Library database. Quantitative statistical analysis was performed on the pooled data of HCMV and EBV messenger RNA transcripts in tissues of symptomatic and asymptomatic periapical lesions. The electronic database search yielded 48 hits from PubMed, 197 hits from Scopus, 40 hits from Web of Science, and 1 from the Cochrane Library. Seventeen cross-sectional studies have been included in the final review. The pooled results from quantitative systematic method analysis showed no statistically significant relationship between the presence of HCMV and EBV messenger RNA transcripts (P = .083 and P = .306, respectively) and the clinical features of apical periodontitis. The findings of HCMV and EBV transcripts in apical periodontitis were controversial among the included studies. Herpesviruses were common in symptomatic and large-size periapical lesions, but such results failed to reach statistical significance. Further studies, including those based on an experimental animal model, should provide more data on herpesviruses as a factor in the pathogenesis of periapical inflammation. Copyright © 2014 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.
Jiang, Nan; Zhao, Gui-Qiu; Lin, Jing; Hu, Li-Ting; Che, Cheng-Ye; Wang, Qian; Xu, Qiang; Li, Cui; Zhang, Jie
2018-01-01
To conduct a systematic review and quantitative Meta-analysis of the efficacy and safety of combined surgery for the eyes with coexisting cataract and open angle glaucoma. We performed a systematic search of the related literature in the Cochrane Library, PubMed, EMBASE, Web of Science databases, CNKI, CBM and Wan Fang databases, with no limitations on language or publication date. The primary efficacy estimate was identified by weighted mean difference of the percentage of intraocular pressure reduction (IOPR%) from baseline to end-point, the percentage of number of glaucoma medications reduction from pre- to post-operation, and the secondary efficacy evaluations were performed by odds ratio (OR) and 95% confidence interval (CI) for complete and qualified success rate. Besides, ORs were applied to assess the tolerability of adverse incidents. Meta-analyses of fixed or random effect models were performed using RevMan software 5.2 to gather the consequences. Heterogeneity was evaluated by Chi 2 test and the I 2 measure. Ten studies enrolling 3108 patients were included. The combined consequences indicated that both glaucoma and combined cataract and glaucoma surgery significantly decreased IOP. For deep sclerectomy vs deep sclerectomy plus phacoemulsification and canaloplasty vs phaco-canaloplasty, the differences in IOPR% were not all statistically significant while trabeculotomy was detected to gain a quantitatively greater IOPR% compared with trabeculotomy plus phacoemulsification. Furthermore, there was no statistical significance in the complete and qualified success rate, and the rates of adverse incidents for trabeculotomy vs trabeculotomy plus phacoemulsification. Compared with trabeculotomy plus phacoemulsification, trabeculectomy alone is more effective in lowering IOP and the number of glaucoma medications, while the two surgeries can not demonstrate statistical differences in the complete success rate, qualified success rate, or incidence of adverse incidents.
Orchard, John W; Driscoll, Tim; Seward, Hugh; Orchard, Jessica J
2012-05-01
To study risk factors for hamstring injury in the Australian Football League (AFL), in particular the effect of recent changes in match participation (increased use of the interchange bench) on hamstring injury. Analysis of hamstring match injury statistics extracted from an injury database combined with match participation statistics extracted from a player statistics database. 56,320 player matches in the AFL over the period 2003-2010 were analyzed, in which 416 hamstring injuries occurred. In a Generalized Estimating Equation (GEE) analysis accounting for clustering of different teams, significant predictors of hamstring injuries were recent hamstring injury (RR 4.16, 95% CI 3.19-5.43), past history of ACL reconstruction (RR 1.69, 95% CI 1.09-2.60), past history of calf injury (RR 1.58, 95% CI 1.37-1.82), opposition team making 60 or more interchanges during the game (RR 1.38, 95% CI 1.12-1.68) and player having made 7 or more interchanges off the field in the last 3 weeks (protective RR 0.74, 95% CI 0.59-0.93). These findings suggest that regular interchanges protect individual players against hamstring injuries, but increase the risk of hamstring injury for opposition players. These findings can be explained by a model in which both fatigue and average match running speed are risk factors for hamstring injury. A player who returns to the ground after a rest on the interchange bench may himself have some short-term protection against hamstring injury because of the reduced fatigue, but his rested state may contribute to increased average running speed for his direct opponent, increasing the risk of injury for players on the opposition team. Copyright © 2011 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
GARNET--gene set analysis with exploration of annotation relations.
Rho, Kyoohyoung; Kim, Bumjin; Jang, Youngjun; Lee, Sanghyun; Bae, Taejeong; Seo, Jihae; Seo, Chaehwa; Lee, Jihyun; Kang, Hyunjung; Yu, Ungsik; Kim, Sunghoon; Lee, Sanghyuk; Kim, Wan Kyu
2011-02-15
Gene set analysis is a powerful method of deducing biological meaning for an a priori defined set of genes. Numerous tools have been developed to test statistical enrichment or depletion in specific pathways or gene ontology (GO) terms. Major difficulties towards biological interpretation are integrating diverse types of annotation categories and exploring the relationships between annotation terms of similar information. GARNET (Gene Annotation Relationship NEtwork Tools) is an integrative platform for gene set analysis with many novel features. It includes tools for retrieval of genes from annotation database, statistical analysis & visualization of annotation relationships, and managing gene sets. In an effort to allow access to a full spectrum of amassed biological knowledge, we have integrated a variety of annotation data that include the GO, domain, disease, drug, chromosomal location, and custom-defined annotations. Diverse types of molecular networks (pathways, transcription and microRNA regulations, protein-protein interaction) are also included. The pair-wise relationship between annotation gene sets was calculated using kappa statistics. GARNET consists of three modules--gene set manager, gene set analysis and gene set retrieval, which are tightly integrated to provide virtually automatic analysis for gene sets. A dedicated viewer for annotation network has been developed to facilitate exploration of the related annotations. GARNET (gene annotation relationship network tools) is an integrative platform for diverse types of gene set analysis, where complex relationships among gene annotations can be easily explored with an intuitive network visualization tool (http://garnet.isysbio.org/ or http://ercsb.ewha.ac.kr/garnet/).
Fossil-Fuel C02 Emissions Database and Exploration System
NASA Astrophysics Data System (ADS)
Krassovski, M.; Boden, T.; Andres, R. J.; Blasing, T. J.
2012-12-01
The Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory (ORNL) quantifies the release of carbon from fossil-fuel use and cement production at global, regional, and national spatial scales. The CDIAC emission time series estimates are based largely on annual energy statistics published at the national level by the United Nations (UN). CDIAC has developed a relational database to house collected data and information and a web-based interface to help users worldwide identify, explore and download desired emission data. The available information is divided in two major group: time series and gridded data. The time series data is offered for global, regional and national scales. Publications containing historical energy statistics make it possible to estimate fossil fuel CO2 emissions back to 1751. Etemad et al. (1991) published a summary compilation that tabulates coal, brown coal, peat, and crude oil production by nation and year. Footnotes in the Etemad et al.(1991) publication extend the energy statistics time series back to 1751. Summary compilations of fossil fuel trade were published by Mitchell (1983, 1992, 1993, 1995). Mitchell's work tabulates solid and liquid fuel imports and exports by nation and year. These pre-1950 production and trade data were digitized and CO2 emission calculations were made following the procedures discussed in Marland and Rotty (1984) and Boden et al. (1995). The gridded data presents annual and monthly estimates. Annual data presents a time series recording 1° latitude by 1° longitude CO2 emissions in units of million metric tons of carbon per year from anthropogenic sources for 1751-2008. The monthly, fossil-fuel CO2 emissions estimates from 1950-2008 provided in this database are derived from time series of global, regional, and national fossil-fuel CO2 emissions (Boden et al. 2011), the references therein, and the methodology described in Andres et al. (2011). The data accessible here take these tabular, national, mass-emissions data and distribute them spatially on a one degree latitude by one degree longitude grid. The within-country spatial distribution is achieved through a fixed population distribution as reported in Andres et al. (1996). This presentation introduces newly build database and web interface, reflects the present state and functionality of the Fossil-Fuel CO2 Emissions Database and Exploration System as well as future plans for expansion.
Semantic memory: a feature-based analysis and new norms for Italian.
Montefinese, Maria; Ambrosini, Ettore; Fairfield, Beth; Mammarella, Nicola
2013-06-01
Semantic norms for properties produced by native speakers are valuable tools for researchers interested in the structure of semantic memory and in category-specific semantic deficits in individuals following brain damage. The aims of this study were threefold. First, we sought to extend existing semantic norms by adopting an empirical approach to category (Exp. 1) and concept (Exp. 2) selection, in order to obtain a more representative set of semantic memory features. Second, we extensively outlined a new set of semantic production norms collected from Italian native speakers for 120 artifactual and natural basic-level concepts, using numerous measures and statistics following a feature-listing task (Exp. 3b). Finally, we aimed to create a new publicly accessible database, since only a few existing databases are publicly available online.
Nomura, Kaori; Takahashi, Kunihiko; Hinomura, Yasushi; Kawaguchi, Genta; Matsushita, Yasuyuki; Marui, Hiroko; Anzai, Tatsuhiko; Hashiguchi, Masayuki; Mochizuki, Mayumi
2015-01-01
Background The use of a statistical approach to analyze cumulative adverse event (AE) reports has been encouraged by regulatory authorities. However, data variations affect statistical analyses (eg, signal detection). Further, differences in regulations, social issues, and health care systems can cause variations in AE data. The present study examined similarities and differences between two publicly available databases, ie, the Japanese Adverse Drug Event Report (JADER) database and the US Food and Drug Administration Adverse Event Reporting System (FAERS), and how they affect signal detection. Methods Two AE data sources from 2010 were examined, ie, JADER cases (JP) and Japanese cases extracted from the FAERS (FAERS-JP). Three methods for signals of disproportionate reporting, ie, the reporting odds ratio, Bayesian confidence propagation neural network, and Gamma Poisson Shrinker (GPS), were used on drug-event combinations for three substances frequently recorded in both systems. Results The two databases showed similar elements of AE reports, but no option was provided for a shareable case identifier. The average number of AEs per case was 1.6±1.3 (maximum 37) in the JP and 3.3±3.5 (maximum 62) in the FAERS-JP. Between 5% and 57% of all AEs were signaled by three quantitative methods for etanercept, infliximab, and paroxetine. Signals identified by GPS for the JP and FAERS-JP, as referenced by Japanese labeling, showed higher positive sensitivity than was expected. Conclusion The FAERS-JP was different from the JADER. Signals derived from both datasets identified different results, but shared certain signals. Discrepancies in type of AEs, drugs reported, and average number of AEs per case were potential contributing factors. This study will help those concerned with pharmacovigilance better understand the use and pitfalls of using spontaneous AE data. PMID:26109846
The 2002 RPA Plot Summary database users manual
Patrick D. Miles; John S. Vissage; W. Brad Smith
2004-01-01
Describes the structure of the RPA 2002 Plot Summary database and provides information on generating estimates of forest statistics from these data. The RPA 2002 Plot Summary database provides a consistent framework for storing forest inventory data across all ownerships across the entire United States. The data represents the best available data as of October 2001....
MDB: the Metalloprotein Database and Browser at The Scripps Research Institute
Castagnetto, Jesus M.; Hennessy, Sean W.; Roberts, Victoria A.; Getzoff, Elizabeth D.; Tainer, John A.; Pique, Michael E.
2002-01-01
The Metalloprotein Database and Browser (MDB; http://metallo.scripps.edu) at The Scripps Research Institute is a web-accessible resource for metalloprotein research. It offers the scientific community quantitative information on geometrical parameters of metal-binding sites in protein structures available from the Protein Data Bank (PDB). The MDB also offers analytical tools for the examination of trends or patterns in the indexed metal-binding sites. A user can perform interactive searches, metal-site structure visualization (via a Java applet), and analysis of the quantitative data by accessing the MDB through a web browser without requiring an external application or platform-dependent plugin. The MDB also has a non-interactive interface with which other web sites and network-aware applications can seamlessly incorporate data or statistical analysis results from metal-binding sites. The information contained in the MDB is periodically updated with automated algorithms that find and index metal sites from new protein structures released by the PDB. PMID:11752342
Gandhi, Pranav K; Gentry, William M; Bottorff, Michael B
2012-10-01
To investigate reports of thrombotic events associated with the use of C1 esterase inhibitor products in patients with hereditary angioedema in the United States. Retrospective data mining analysis. The United States Food and Drug Administration (FDA) adverse event reporting system (AERS) database. Case reports of C1 esterase inhibitor products, thrombotic events, and C1 esterase inhibitor product-associated thrombotic events (i.e., combination cases) were extracted from the AERS database, using the time frames of each respective product's FDA approval date through the second quarter of 2011. Bayesian statistical methodology within the neural network architecture was implemented to identify potential signals of a drug-associated adverse event. A potential signal is generated when the lower limit of the 95% 2-sided confidence interval of the information component, denoted by IC₀₂₅ , is greater than zero. This suggests that the particular drug-associated adverse event was reported to the database more often than statistically expected from reports available in the database. Ten combination cases of thrombotic events associated with the use of one C1 esterase inhibitor product (Cinryze) were identified in patients with hereditary angioedema. A potential signal demonstrated by an IC₀₂₅ value greater than zero (IC₀₂₅ = 2.91) was generated for these combination cases. The extracted cases from the AERS indicate continuing reports of thrombotic events associated with the use of one C1 esterase inhibitor product among patients with hereditary angioedema. The AERS is incapable of establishing a causal link and detecting the true frequency of an adverse event associated with a drug; however, potential signals of C1 esterase inhibitor product-associated thrombotic events among patients with hereditary angioedema were identified in the extracted combination cases. © 2012 Pharmacotherapy Publications, Inc.
Rojas, Jorge A; Bernal, Jaime E; García, Mary A; Zarante, Ignacio; Ramírez, Natalia; Bernal, Constanza; Gelvez, Nancy; Tamayo, Marta L
2014-10-01
The aim of this study was to investigate the characteristics and performance of transient evoked oto-acoustic emission (TEOAE) hearing screening in newborns in Colombia, and analyze all possible variables and factors affecting the results. An observational, descriptive and retrospective study with bivariate analysis was performed. The study population consisted of 56,822 newborns evaluated at the private institution, PREGEN. TEOAE testing was carried out as a pediatric hearing screening test from December 2003 to March 2012. The database from PREGEN was revised, and the protocol for evaluation included the same screening test performed twice. Demographic characteristics were recorded and the newborn's background was evaluated. Basic statistics of the qualitative and quantitative variables, and statistical analysis were obtained using the chi-square test. Of the 56,822 records examined, 0.28% were classed as abnormal, which corresponded to a prevalence of 1 in 350. In the screened newborns, 0.08% had a major abnormality or other clinical condition diagnosed, and 0.29% reported a family history of hearing loss. A prevalence of 6.7 in 10,000 was obtained for microtia, which is similar to the 6.4 in 10,000 previously reported in Colombia (database of the Latin-American Collaborative Study of Congenital Malformations - ECLAMC). Statistical analysis demonstrated an association between presenting with a major anomaly and a higher frequency of abnormal results on both TEOAE tests. Newborns in Colombia do not currently undergo screening for the early detection of hearing impairment. The results from this study suggest TEOAE screening tests, when performed twice, are able to detect hearing abnormalities in newborns. This highlights the need to improve the long-term evaluation and monitoring of patients in Colombia through diagnostic tests, and to provide tests that are both sensitive and specific. Furthermore, the use of TEOAE screening is justified by the favorable cost: benefit ratio demonstrated in many countries worldwide. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Ambler, Graeme K; Gohel, Manjit S; Mitchell, David C; Loftus, Ian M; Boyle, Jonathan R
2015-01-01
Accurate adjustment of surgical outcome data for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple imputation methodology together with stepwise model selection to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P < .001 and P = .001, respectively). Discrimination remained excellent when only elective procedures were considered. There was no evidence of miscalibration by Hosmer-Lemeshow analysis. We have developed accurate models to assess risk of in-hospital mortality after AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.
An application of statistics to comparative metagenomics
Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A
2006-01-01
Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025
An application of statistics to comparative metagenomics.
Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A
2006-03-20
Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.
A computational visual saliency model based on statistics and machine learning.
Lin, Ru-Je; Lin, Wei-Song
2014-08-01
Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.
Weckwerth, Wolfram; Wienkoop, Stefanie; Hoehenwarter, Wolfgang; Egelhofer, Volker; Sun, Xiaoliang
2014-01-01
Genome sequencing and systems biology are revolutionizing life sciences. Proteomics emerged as a fundamental technique of this novel research area as it is the basis for gene function analysis and modeling of dynamic protein networks. Here a complete proteomics platform suited for functional genomics and systems biology is presented. The strategy includes MAPA (mass accuracy precursor alignment; http://www.univie.ac.at/mosys/software.html ) as a rapid exploratory analysis step; MASS WESTERN for targeted proteomics; COVAIN ( http://www.univie.ac.at/mosys/software.html ) for multivariate statistical analysis, data integration, and data mining; and PROMEX ( http://www.univie.ac.at/mosys/databases.html ) as a database module for proteogenomics and proteotypic peptides for targeted analysis. Moreover, the presented platform can also be utilized to integrate metabolomics and transcriptomics data for the analysis of metabolite-protein-transcript correlations and time course analysis using COVAIN. Examples for the integration of MAPA and MASS WESTERN data, proteogenomic and metabolic modeling approaches for functional genomics, phosphoproteomics by integration of MOAC (metal-oxide affinity chromatography) with MAPA, and the integration of metabolomics, transcriptomics, proteomics, and physiological data using this platform are presented. All software and step-by-step tutorials for data processing and data mining can be downloaded from http://www.univie.ac.at/mosys/software.html.
Epidemiology of brucellosis in Iran: A comprehensive systematic review and meta-analysis study.
Mirnejad, Reza; Jazi, Faramarz Masjedian; Mostafaei, Shayan; Sedighi, Mansour
2017-08-01
Brucellosis is still one of the most challenging issues for health and the economy in many developing countries such as Iran. Considering the high prevalence of brucellosis, the aim of the current study was to systematically review published data about the annual incidence rate of this infection from different parts of Iran and provide an overall relative frequency (RF) for Iran using meta-analysis. We searched several databases including PubMed, ISI Web of Science, Scopus, google scholar, IranMedex and Iranian Scientific Information Database (SID) by using the following keywords: "Brucella", "Brucellosis", "Malta fever", "Mediterranean fever", "undulant fever", "zoonosis" and "Iran" in Title/Abstract/Keywords fields. Articles/Abstracts, which used clinical specimens and reported the incidence of brucellosis, were included in this review. Quality of studies was assessed by STROB and PRISMA forms. All statistical analyses were performed using STATA 11.0 (STATA Corp, College Station, TX) and P-values under 0.05 were considered statistically significant. Out of the 8326 results, we found 34 articles suitable, according to inclusion and exlusion criteria, for inclusion in this systematic review and meta-analysis. The pooled incidence of brucellosis was estimated 0.001% (95% confidence interval (CI) = 0.0005-0.0015%) annually. Relative frequency of brucellosis in different studies varied from 7.0/100000 to 276.41/100000 in Qom and Kermanshah provinces, respectively. This systematic-review and meta-analysis study showed that the highest incidences of brucellosis are occurred in west and northwest regions of Iran. Totally, the incidence of the disease in Iran is in the high range. Copyright © 2017 Elsevier Ltd. All rights reserved.
Use of Multiscale Entropy to Facilitate Artifact Detection in Electroencephalographic Signals
Mariani, Sara; Borges, Ana F. T.; Henriques, Teresa; Goldberger, Ary L.; Costa, Madalena D.
2016-01-01
Electroencephalographic (EEG) signals present a myriad of challenges to analysis, beginning with the detection of artifacts. Prior approaches to noise detection have utilized multiple techniques, including visual methods, independent component analysis and wavelets. However, no single method is broadly accepted, inviting alternative ways to address this problem. Here, we introduce a novel approach based on a statistical physics method, multiscale entropy (MSE) analysis, which quantifies the complexity of a signal. We postulate that noise corrupted EEG signals have lower information content, and, therefore, reduced complexity compared with their noise free counterparts. We test the new method on an open-access database of EEG signals with and without added artifacts due to electrode motion. PMID:26738116
Promise and Limitations of Big Data Research in Plastic Surgery.
Zhu, Victor Zhang; Tuggle, Charles Thompson; Au, Alexander Francis
2016-04-01
The use of "Big Data" in plastic surgery outcomes research has increased dramatically in the last 5 years. This article addresses some of the benefits and limitations of such research. This is a narrative review of large database studies in plastic surgery. There are several benefits to database research as compared with traditional forms of research, such as randomized controlled studies and cohort studies. These include the ease in patient recruitment, reduction in selection bias, and increased generalizability. As such, the types of outcomes research that are particularly suited for database studies include determination of geographic variations in practice, volume outcome analysis, evaluation of how sociodemographic factors affect access to health care, and trend analyses over time. The limitations of database research include data which are limited only to what was captured in the database, high power which can cause clinically insignificant differences to achieve statistical significance, and fishing which can lead to increased type I errors. The National Surgical Quality Improvement Project is an important general surgery database that may be useful for plastic surgeons because it is validated and has a large number of patients after over a decade of collecting data. The Tracking Operations and Outcomes for Plastic Surgeons Program is a newer database specific to plastic surgery. Databases are a powerful tool for plastic surgery outcomes research. It is critically important to understand their benefits and limitations when designing research projects or interpreting studies whose data have been drawn from them. For plastic surgeons, National Surgical Quality Improvement Project has a greater number of publications, but Tracking Operations and Outcomes for Plastic Surgeons Program is the most applicable database for plastic surgery research.
Horsch, Alexander; Hapfelmeier, Alexander; Elter, Matthias
2011-11-01
Breast cancer is globally a major threat for women's health. Screening and adequate follow-up can significantly reduce the mortality from breast cancer. Human second reading of screening mammograms can increase breast cancer detection rates, whereas this has not been proven for current computer-aided detection systems as "second reader". Critical factors include the detection accuracy of the systems and the screening experience and training of the radiologist with the system. When assessing the performance of systems and system components, the choice of evaluation methods is particularly critical. Core assets herein are reference image databases and statistical methods. We have analyzed characteristics and usage of the currently largest publicly available mammography database, the Digital Database for Screening Mammography (DDSM) from the University of South Florida, in literature indexed in Medline, IEEE Xplore, SpringerLink, and SPIE, with respect to type of computer-aided diagnosis (CAD) (detection, CADe, or diagnostics, CADx), selection of database subsets, choice of evaluation method, and quality of descriptions. 59 publications presenting 106 evaluation studies met our selection criteria. In 54 studies (50.9%), the selection of test items (cases, images, regions of interest) extracted from the DDSM was not reproducible. Only 2 CADx studies, not any CADe studies, used the entire DDSM. The number of test items varies from 100 to 6000. Different statistical evaluation methods are chosen. Most common are train/test (34.9% of the studies), leave-one-out (23.6%), and N-fold cross-validation (18.9%). Database-related terminology tends to be imprecise or ambiguous, especially regarding the term "case". Overall, both the use of the DDSM as data source for evaluation of mammography CAD systems, and the application of statistical evaluation methods were found highly diverse. Results reported from different studies are therefore hardly comparable. Drawbacks of the DDSM (e.g. varying quality of lesion annotations) may contribute to the reasons. But larger bias seems to be caused by authors' own decisions upon study design. RECOMMENDATIONS/CONCLUSION: For future evaluation studies, we derive a set of 13 recommendations concerning the construction and usage of a test database, as well as the application of statistical evaluation methods.
Dental research in Spain. A bibliometric analysis on subjects, authors and institutions (1993-2012).
Bueno-Aguilera, F; Jiménez-Contreras, E; Lucena-Martín, C; Pulgar-Encinas, R
2016-03-01
Bibliometrics is defined as the use of statistical methods in the analysis of a body of literature to reveal the historical development of subject fields and patterns of authorship, publication, and use. Our objective was to characterize Spanish scientific output in Dentistry through the analysis of Web of Science database in a 20-year period. By means of a bibliometric study documents were statistically analyzed using indicators that showed quantitative and qualitative aspects of the production. Specifically, time course of the scientific production within the time span was analysed, as were the journals where the article was published and the categories of Journal Citation Reports (JCR) in which they belong, thematic areas, authorship, and finally authors and institutions with the highest production in Spain. By means of the design of a specific search strategy previously described in the scientific literature, we recovered all citable documents about Dentistry signed by Spanish researchers and included in the WoS database between 1993 and 2012. A total of 3006 documents fulfilled the search criteria, of which 2449 (81.5%) were published in journals within the category Dentistry Oral Surgery and Medicine and 557 (18.5%) within other categories of the JCR. During the four quinquenniums studied, the production increased quantitatively (8.6-fold) and qualitatively. Finally, the universities of Granada and Complutense of Madrid were the institutions with the highest production and most prolific authors. The Spanish dental production sharply increased in the last two decades, reaching quantitative and qualitative levels similar to those of the other medical specialties in the country.
Appraisal of levels and patterns of occupational exposure to 1,3-butadiene.
Scarselli, Alberto; Corfiati, Marisa; Di Marzi, Davide; Iavicoli, Sergio
2017-09-01
Objectives 1,3-butadiene is classified as carcinogenic to human by inhalation and the association with leukemia has been observed in several epidemiological studies. The aim of this study was to evaluate data about occupational exposure levels to 1,3-butadiene in the Italian working force. Methods Airborne concentrations of 1,3-butadiene were extracted from the Italian database on occupational exposure to carcinogens in the period 1996-2015. Descriptive statistics were calculated for exposure-related variables. An analysis through linear mixed model was performed to determine factors influencing the exposure level. The probability of exceeding the exposure limit was predicted using a mixed-effects logistic model. Concurrent exposures with other occupational carcinogens were investigated using the two-step cluster analysis. Results The total number of exposure measurements selected was 23 885, with an overall arithmetic mean of 0.12 mg/m3. The economic sector with the highest number of measurements was manufacturing of chemicals (18 744). The most predictive variables of the exposure level resulted to be the occupational group and its interaction with the measurement year. The highest likelihood of exceeding the exposure limit was found in the manufacture of coke and refined petroleum products. Concurrent exposures were frequently detected, mainly with benzene, acrylonitrile and ethylene dichloride, and three main clusters were identified. Conclusions Exposure to 1,3-butadiene occurs in a wide variety of activity sectors and occupational groups. The use of several statistical analysis methods applied to occupational exposure databases can help to identify exposure situations at high risk for workers' health and better target preventive interventions and research projects.
Cao, Yuezhou; Chen, Weixian; Qian, Yun; Zeng, Yanying; Liu, Wenhua
2014-12-01
The guanosine insertion/deletion polymorphism (4G/5G) of plasminogen activator inhibitor-1 (PAI-1) gene has been suggested as a risk factor for ischemic stroke (IS), but direct evidence from genetic association studies remains inconclusive even in Chinese population. Therefore, we performed a meta-analysis to evaluate this association. All of the relevant studies were identified from PubMed, Embase, Chinese National Knowledge Infrastructure database and Chinese Wanfang database up to September 2013. Statistical analyses were conducted with Revman 5.2 and STATA 12.0 software. Odds ratio (OR) with 95% confidence interval (CI) values were applied to evaluate the strength of the association. Heterogeneity was evaluated by Q-test and the I² statistic. The Begg's test and Egger's test were used to assess the publication bias. A significant association and a borderline association between the PAI-1 4G/5G polymorphism and IS were found under the recessive model (OR = 1.639, 95% CI = 1.136-2.364) and allelic model (OR = 1.256, 95% CI = 1.000-1.578), respectively. However, no significant association was observed under homogeneous comparison model (OR = 1.428, 95% CI = 0.914-2.233), heterogeneous comparison model (OR = 0.856, 95% CI = 0.689-1.063) and dominant model (OR = 1.036, 95% CI = 0.846-1.270). This meta-analysis suggested that 4G4G genotype of PAI-1 4G/5G polymorphism might be a risk factor for IS in the Chinese population.
High precision mass measurements for wine metabolomics
Roullier-Gall, Chloé; Witting, Michael; Gougeon, Régis D.; Schmitt-Kopplin, Philippe
2014-01-01
An overview of the critical steps for the non-targeted Ultra-High Performance Liquid Chromatography coupled with Quadrupole Time-of-Flight Mass Spectrometry (UPLC-Q-ToF-MS) analysis of wine chemistry is given, ranging from the study design, data preprocessing and statistical analyses, to markers identification. UPLC-Q-ToF-MS data was enhanced by the alignment of exact mass data from FTICR-MS, and marker peaks were identified using UPLC-Q-ToF-MS2. In combination with multivariate statistical tools and the annotation of peaks with metabolites from relevant databases, this analytical process provides a fine description of the chemical complexity of wines, as exemplified in the case of red (Pinot noir) and white (Chardonnay) wines from various geographic origins in Burgundy. PMID:25431760
High precision mass measurements for wine metabolomics
NASA Astrophysics Data System (ADS)
Roullier-Gall, Chloé; Witting, Michael; Gougeon, Régis; Schmitt-Kopplin, Philippe
2014-11-01
An overview of the critical steps for the non-targeted Ultra-High Performance Liquid Chromatography coupled with Quadrupole Time-of-Flight Mass Spectrometry (UPLC-Q-ToF-MS) analysis of wine chemistry is given, ranging from the study design, data preprocessing and statistical analyses, to markers identification. UPLC-Q-ToF-MS data was enhanced by the alignment of exact mass data from FTICR-MS, and marker peaks were identified using UPLC-Q-ToF-MS². In combination with multivariate statistical tools and the annotation of peaks with metabolites from relevant databases, this analytical process provides a fine description of the chemical complexity of wines, as exemplified in the case of red (Pinot noir) and white (Chardonnay) wines from various geographic origins in Burgundy.
Automatic identification of bullet signatures based on consecutive matching striae (CMS) criteria.
Chu, Wei; Thompson, Robert M; Song, John; Vorburger, Theodore V
2013-09-10
The consecutive matching striae (CMS) numeric criteria for firearm and toolmark identifications have been widely accepted by forensic examiners, although there have been questions concerning its observer subjectivity and limited statistical support. In this paper, based on signal processing and extraction, a model for the automatic and objective counting of CMS is proposed. The position and shape information of the striae on the bullet land is represented by a feature profile, which is used for determining the CMS number automatically. Rapid counting of CMS number provides a basis for ballistics correlations with large databases and further statistical and probability analysis. Experimental results in this report using bullets fired from ten consecutively manufactured barrels support this developed model. Published by Elsevier Ireland Ltd.
Wu, Johnny C; Gardner, David P; Ozer, Stuart; Gutell, Robin R; Ren, Pengyu
2009-08-28
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.
Hand-held computer operating system program for collection of resident experience data.
Malan, T K; Haffner, W H; Armstrong, A Y; Satin, A J
2000-11-01
To describe a system for recording resident experience involving hand-held computers with the Palm Operating System (3 Com, Inc., Santa Clara, CA). Hand-held personal computers (PCs) are popular, easy to use, inexpensive, portable, and can share data among other operating systems. Residents in our program carry individual hand-held database computers to record Residency Review Committee (RRC) reportable patient encounters. Each resident's data is transferred to a single central relational database compatible with Microsoft Access (Microsoft Corporation, Redmond, WA). Patient data entry and subsequent transfer to a central database is accomplished with commercially available software that requires minimal computer expertise to implement and maintain. The central database can then be used for statistical analysis or to create required RRC resident experience reports. As a result, the data collection and transfer process takes less time for residents and program director alike, than paper-based or central computer-based systems. The system of collecting resident encounter data using hand-held computers with the Palm Operating System is easy to use, relatively inexpensive, accurate, and secure. The user-friendly system provides prompt, complete, and accurate data, enhancing the education of residents while facilitating the job of the program director.
MPD: a pathogen genome and metagenome database
Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen
2018-01-01
Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
A Dynamic Human Health Risk Assessment System
Prasad, Umesh; Singh, Gurmit; Pant, A. B.
2012-01-01
An online human health risk assessment system (OHHRAS) has been designed and developed in the form of a prototype database-driven system and made available for the population of India through a website – www.healthriskindia.in. OHHRAS provide the three utilities, that is, health survey, health status, and bio-calculators. The first utility health survey is functional on the basis of database being developed dynamically and gives the desired output to the user on the basis of input criteria entered into the system; the second utility health status is providing the output on the basis of dynamic questionnaire and ticked (selected) answers and generates the health status reports based on multiple matches set as per advise of medical experts and the third utility bio-calculators are very useful for the scientists/researchers as online statistical analysis tool that gives more accuracy and save the time of user. The whole system and database-driven website has been designed and developed by using the software (mainly are PHP, My-SQL, Deamweaver, C++ etc.) and made available publically through a database-driven website (www.healthriskindia.in), which are very useful for researchers, academia, students, and general masses of all sectors. PMID:22778520
Vasconcelos, Hemerson Bruno da Silva; Woods, David John
2017-01-01
This study aimed to identify the knowledge, skills and attitudes of Brazilian hospital pharmacists in the use of information technology and electronic tools to support clinical practice. Methods: A questionnaire was sent by email to clinical pharmacists working public and private hospitals in Brazil. The instrument was validated using the method of Polit and Beck to determine the content validity index. Data (n = 348) were analyzed using descriptive statistics, Pearson's Chi-square test and Gamma correlation tests. Results: Pharmacists had 1–4 electronic devices for personal use, mainly smartphones (84.8%; n = 295) and laptops (81.6%; n = 284). At work, pharmacists had access to a computer (89.4%; n = 311), mostly connected to the internet (83.9%; n = 292). They felt competent (very capable/capable) searching for a web page/web site on a specific subject (100%; n = 348), downloading files (99.7%; n = 347), using spreadsheets (90.2%; n = 314), searching using MeSH terms in PubMed (97.4%; n = 339) and general searching for articles in bibliographic databases (such as Medline/PubMed: 93.4%; n = 325). Pharmacists did not feel competent in using statistical analysis software (somewhat capable/incapable: 78.4%; n = 273). Most pharmacists reported that they had not received formal education to perform most of these actions except searching using MeSH terms. Access to bibliographic databases was available in Brazilian hospitals, however, most pharmacists (78.7%; n = 274) reported daily use of a non-specific search engine such as Google. This result may reflect the lack of formal knowledge and training in the use of bibliographic databases and difficulty with the English language. The need to expand knowledge about information search tools was recognized by most pharmacists in clinical practice in Brazil, especially those with less time dedicated exclusively to clinical activity (Chi-square, p = 0.006). Conclusion: These results will assist in defining minimal competencies for the training of pharmacists in the field of information technology to support clinical practice. Knowledge and skill gaps are evident in the use of bibliographic databases, spreadsheets and statistical tools. PMID:29272292
Néri, Eugenie Desirèe Rabelo; Meira, Assuero Silva; Vasconcelos, Hemerson Bruno da Silva; Woods, David John; Fonteles, Marta Maria de França
2017-01-01
This study aimed to identify the knowledge, skills and attitudes of Brazilian hospital pharmacists in the use of information technology and electronic tools to support clinical practice. A questionnaire was sent by email to clinical pharmacists working public and private hospitals in Brazil. The instrument was validated using the method of Polit and Beck to determine the content validity index. Data (n = 348) were analyzed using descriptive statistics, Pearson's Chi-square test and Gamma correlation tests. Pharmacists had 1-4 electronic devices for personal use, mainly smartphones (84.8%; n = 295) and laptops (81.6%; n = 284). At work, pharmacists had access to a computer (89.4%; n = 311), mostly connected to the internet (83.9%; n = 292). They felt competent (very capable/capable) searching for a web page/web site on a specific subject (100%; n = 348), downloading files (99.7%; n = 347), using spreadsheets (90.2%; n = 314), searching using MeSH terms in PubMed (97.4%; n = 339) and general searching for articles in bibliographic databases (such as Medline/PubMed: 93.4%; n = 325). Pharmacists did not feel competent in using statistical analysis software (somewhat capable/incapable: 78.4%; n = 273). Most pharmacists reported that they had not received formal education to perform most of these actions except searching using MeSH terms. Access to bibliographic databases was available in Brazilian hospitals, however, most pharmacists (78.7%; n = 274) reported daily use of a non-specific search engine such as Google. This result may reflect the lack of formal knowledge and training in the use of bibliographic databases and difficulty with the English language. The need to expand knowledge about information search tools was recognized by most pharmacists in clinical practice in Brazil, especially those with less time dedicated exclusively to clinical activity (Chi-square, p = 0.006). These results will assist in defining minimal competencies for the training of pharmacists in the field of information technology to support clinical practice. Knowledge and skill gaps are evident in the use of bibliographic databases, spreadsheets and statistical tools.
Buell, Gary R.; Gurley, Laura N.; Calhoun, Daniel L.; Hunt, Alexandria M.
2017-06-12
This report serves as metadata and a user guide for five out of six hydrologic and landscape databases developed by the U.S. Geological Survey, in cooperation with the U.S. Fish and Wildlife Service, to describe data-collection, data-reduction, and data-analysis methods used to construct the databases and provides statistical and graphical descriptions of the databases. Six hydrologic and landscape databases were developed: (1) the Cache River and White River National Wildlife Refuges (NWRs) and contributing watersheds in Arkansas, Missouri, and Oklahoma, (2) the Cahaba River NWR and contributing watersheds in Alabama, (3) the Caloosahatchee and J.N. “Ding” Darling NWRs and contributing watersheds in Florida, (4) the Clarks River NWR and contributing watersheds in Kentucky, Tennessee, and Mississippi, (5) the Lower Suwannee NWR and contributing watersheds in Georgia and Florida, and (6) the Okefenokee NWR and contributing watersheds in Georgia and Florida. Each database is composed of a set of ASCII files, Microsoft Access files, and Microsoft Excel files. The databases were developed as an assessment and evaluation tool for use in examining NWR-specific hydrologic patterns and trends as related to water availability and water quality for NWR ecosystems, habitats, and target species. The databases include hydrologic time-series data, summary statistics on landscape and hydrologic time-series data, and hydroecological metrics that can be used to assess NWR hydrologic conditions and the availability of aquatic and riparian habitat. Landscape data that describe the NWR physiographic setting and the locations of hydrologic data-collection stations were compiled and mapped. Categories of landscape data include land cover, soil hydrologic characteristics, physiographic features, geographic and hydrographic boundaries, hydrographic features, and regional runoff estimates. The geographic extent of each database covers an area within which human activities, climatic variation, and hydrologic processes can potentially affect the hydrologic regime of the NWRs and adjacent areas. The hydrologic and landscape database for the Cache and White River NWRs and contributing watersheds in Arkansas, Missouri, and Oklahoma has been described and documented in detail (Buell and others, 2012). This report serves as a companion to the Buell and others (2012) report to describe and document the five subsequent hydrologic and landscape databases that were developed: Chapter A—the Cahaba River NWR and contributing watersheds in Alabama, Chapter B—the Caloosahatchee and J.N. “Ding” Darling NWRs and contributing watersheds in Florida, Chapter C—the Clarks River NWR and contributing watersheds in Kentucky, Tennessee, and Mississippi, Chapter D—the Lower Suwannee NWR and contributing watersheds in Georgia and Florida, and Chapter E—the Okefenokee NWR and contributing watersheds in Georgia and Florida.
Mining Claim Activity on Federal Land for the Period 1976 through 2003
Causey, J. Douglas
2005-01-01
Previous reports on mining claim records provided information and statistics (number of claims) using data from the U.S. Bureau of Land Management's (BLM) Mining Claim Recordation System. Since that time, BLM converted their mining claim data to the Legacy Repost 2000 system (LR2000). This report describes a process to extract similar statistical data about mining claims from LR2000 data using different software and procedures than were used in the earlier work. A major difference between this process and the previous work is that every section that has a mining claim record is assigned a value. This is done by proportioning a claim between each section in which it is recorded. Also, the mining claim data in this report includes all BLM records, not just the western states. LR2000 mining claim database tables for the United States were provided by BLM in text format and imported into a Microsoft? Access2000 database in January, 2004. Data from two tables in the BLM LR2000 database were summarized through a series of database queries to determine a number that represents active mining claims in each Public Land Survey (PLS) section for each of the years from 1976 to 2002. For most of the area, spatial databases are also provided. The spatial databases are only configured to work with the statistics provided in the non-spatial data files. They are suitable for geographic information system (GIS)-based regional assessments at a scale of 1:100,000 or smaller (for example, 1:250,000).
A generic method for improving the spatial interoperability of medical and ecological databases.
Ghenassia, A; Beuscart, J B; Ficheur, G; Occelli, F; Babykina, E; Chazard, E; Genin, M
2017-10-03
The availability of big data in healthcare and the intensive development of data reuse and georeferencing have opened up perspectives for health spatial analysis. However, fine-scale spatial studies of ecological and medical databases are limited by the change of support problem and thus a lack of spatial unit interoperability. The use of spatial disaggregation methods to solve this problem introduces errors into the spatial estimations. Here, we present a generic, two-step method for merging medical and ecological databases that avoids the use of spatial disaggregation methods, while maximizing the spatial resolution. Firstly, a mapping table is created after one or more transition matrices have been defined. The latter link the spatial units of the original databases to the spatial units of the final database. Secondly, the mapping table is validated by (1) comparing the covariates contained in the two original databases, and (2) checking the spatial validity with a spatial continuity criterion and a spatial resolution index. We used our novel method to merge a medical database (the French national diagnosis-related group database, containing 5644 spatial units) with an ecological database (produced by the French National Institute of Statistics and Economic Studies, and containing with 36,594 spatial units). The mapping table yielded 5632 final spatial units. The mapping table's validity was evaluated by comparing the number of births in the medical database and the ecological databases in each final spatial unit. The median [interquartile range] relative difference was 2.3% [0; 5.7]. The spatial continuity criterion was low (2.4%), and the spatial resolution index was greater than for most French administrative areas. Our innovative approach improves interoperability between medical and ecological databases and facilitates fine-scale spatial analyses. We have shown that disaggregation models and large aggregation techniques are not necessarily the best ways to tackle the change of support problem.
1981-08-01
of Transactions ..... . 29 5.5.2 Attached Execution of Transactions ........ ... 29 5.5.3 The Choice of Transaction Execution for Access Control...basic access control mech- anism for statistical security and value-dependent security. In Section 5.5, * we describe the process of execution of ...the process of request execution with access control for in- sert and non-insert requests in MDBS. We recall again (see Chapter 4) that the process
1993-03-30
Navy was highest in all three measures, followed by the Air Force, and the Army was the lowest. No branch accounted for a large proportion of the...the largest proportion of workload and costs were in the category ’Outside Catchment Area’. The total government pay for outside catchment area category...services. MACDILL REG HOSP MACDILL AFB had the highest amount of total government pay in the Air Force’s billable MTF. It accounted for 4.13
An Integrated Nursing Management Information System: From Concept to Reality
Pinkley, Connie L.; Sommer, Patricia K.
1988-01-01
This paper addresses the transition from the conceptualization of a Nursing Management Information System (NMIS) integrated and interdependent with the Hospital Information System (HIS) to its realization. Concepts of input, throughout, and output are presented to illustrate developmental strategies used to achieve nursing information products. Essential processing capabilities include: 1) ability to interact with multiple data sources; 2) database management, statistical, and graphics software packages; 3) online, batch and reporting; and 4) interactive data analysis. Challenges encountered in system construction are examined.
Advanced Land Imager Assessment System
NASA Technical Reports Server (NTRS)
Chander, Gyanesh; Choate, Mike; Christopherson, Jon; Hollaren, Doug; Morfitt, Ron; Nelson, Jim; Nelson, Shar; Storey, James; Helder, Dennis; Ruggles, Tim;
2008-01-01
The Advanced Land Imager Assessment System (ALIAS) supports radiometric and geometric image processing for the Advanced Land Imager (ALI) instrument onboard NASA s Earth Observing-1 (EO-1) satellite. ALIAS consists of two processing subsystems for radiometric and geometric processing of the ALI s multispectral imagery. The radiometric processing subsystem characterizes and corrects, where possible, radiometric qualities including: coherent, impulse; and random noise; signal-to-noise ratios (SNRs); detector operability; gain; bias; saturation levels; striping and banding; and the stability of detector performance. The geometric processing subsystem and analysis capabilities support sensor alignment calibrations, sensor chip assembly (SCA)-to-SCA alignments and band-to-band alignment; and perform geodetic accuracy assessments, modulation transfer function (MTF) characterizations, and image-to-image characterizations. ALIAS also characterizes and corrects band-toband registration, and performs systematic precision and terrain correction of ALI images. This system can geometrically correct, and automatically mosaic, the SCA image strips into a seamless, map-projected image. This system provides a large database, which enables bulk trending for all ALI image data and significant instrument telemetry. Bulk trending consists of two functions: Housekeeping Processing and Bulk Radiometric Processing. The Housekeeping function pulls telemetry and temperature information from the instrument housekeeping files and writes this information to a database for trending. The Bulk Radiometric Processing function writes statistical information from the dark data acquired before and after the Earth imagery and the lamp data to the database for trending. This allows for multi-scene statistical analyses.
Numerical Model Sensitivity to Heterogeneous Satellite Derived Vegetation Roughness
NASA Technical Reports Server (NTRS)
Jasinski, Michael; Eastman, Joseph; Borak, Jordan
2011-01-01
The sensitivity of a mesoscale weather prediction model to a 1 km satellite-based vegetation roughness initialization is investigated for a domain within the south central United States. Three different roughness databases are employed: i) a control or standard lookup table roughness that is a function only of land cover type, ii) a spatially heterogeneous roughness database, specific to the domain, that was previously derived using a physically based procedure and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, and iii) a MODIS climatologic roughness database that like (i) is a function only of land cover type, but possesses domain specific mean values from (ii). The model used is the Weather Research and Forecast Model (WRF) coupled to the Community Land Model within the Land Information System (LIS). For each simulation, a statistical comparison is made between modeled results and ground observations within a domain including Oklahoma, Eastern Arkansas, and Northwest Louisiana during a 4-day period within IHOP 2002. Sensitivity analysis compares the impact the three roughness initializations on time-series temperature, precipitation probability of detection (POD), average wind speed, boundary layer height, and turbulent kinetic energy (TKE). Overall, the results indicate that, for the current investigation, replacement of the standard look-up table values with the satellite-derived values statistically improves model performance for most observed variables. Such natural roughness heterogeneity enhances the surface wind speed, PBL height and TKE production up to 10 percent, with a lesser effect over grassland, and greater effect over mixed land cover domains.
Treatment trends in adolescent clavicle fractures.
Yang, Scott; Werner, Brian C; Gwathmey, Frank W
2015-01-01
Controversy continues with regard to decision making for operative treatment of adolescent clavicle fractures, while the literature continues to support operative treatment for select middle third fractures in adults. The purpose of our study was to evaluate the recent trends in nonoperative and operative management of adolescent clavicle fractures in the United States. Data were derived from a publicly available database of patients, PearlDiver Patient Records Database. The database was queried for ICD-9 810.02 (closed fracture of shaft of clavicle), with the age restriction of either 10 to 14 or 15 to 19 years old, along with CPT-23500 (closed treatment of clavicular fracture) and CPT-23515 (open treatment of clavicular fracture) from 2007 to 2011. The χ analysis was used to determine statistical significance with regard to procedural volumes, sex, and region. The Student t test was used to compare average charges between groups. A significant increase in the number of adolescent clavicle fractures managed operatively (CPT-23510, ages 10 to 19 y) from 309 in 2007 to 530 in 2011 was observed (P<0.0001). There was a significantly greater increase in operative management of clavicle fractures in the age 15 to 19 subgroup compared with the age 10 to 14 subgroup (P<0.0001). In the operative group, there was a trend toward a higher number of males being managed with operative intervention. The overall average monetary charge for both nonoperatively and operatively managed adolescent clavicle fractures increased significantly in the study period. A statistically significant increase in normalized incidence of operatively managed adolescent clavicle fractures was noted in the midwest, south, and west regions with the greatest increase in west region where the incidence increased over 2-fold (P<0.0001). Adolescent clavicle fractures seem to be being treated increasingly with open reduction and internal fixation recently, especially in the 15 to 19 age group. Nevertheless, there remains of lack of high-level studies comparing outcomes of operative and conservative treatment specifically for the adolescent population to justify this recent trend. Level IV-retrospective database analysis.
Zaki, Rafdzah; Bulgiba, Awang; Nordin, Noorhaire; Azina Ismail, Noor
2013-06-01
Reliability measures precision or the extent to which test results can be replicated. This is the first ever systematic review to identify statistical methods used to measure reliability of equipment measuring continuous variables. This studyalso aims to highlight the inappropriate statistical method used in the reliability analysis and its implication in the medical practice. In 2010, five electronic databases were searched between 2007 and 2009 to look for reliability studies. A total of 5,795 titles were initially identified. Only 282 titles were potentially related, and finally 42 fitted the inclusion criteria. The Intra-class Correlation Coefficient (ICC) is the most popular method with 25 (60%) studies having used this method followed by the comparing means (8 or 19%). Out of 25 studies using the ICC, only 7 (28%) reported the confidence intervals and types of ICC used. Most studies (71%) also tested the agreement of instruments. This study finds that the Intra-class Correlation Coefficient is the most popular method used to assess the reliability of medical instruments measuring continuous outcomes. There are also inappropriate applications and interpretations of statistical methods in some studies. It is important for medical researchers to be aware of this issue, and be able to correctly perform analysis in reliability studies.
NASA Astrophysics Data System (ADS)
Rougier, Jonty; Cashman, Kathy; Sparks, Stephen
2016-04-01
We have analysed the Large Magnitude Explosive Volcanic Eruptions database (LaMEVE) for volcanoes that classify as stratovolcanoes. A non-parametric statistical approach is used to assess the global recording rate for large (M4+). The approach imposes minimal structure on the shape of the recording rate through time. We find that the recording rates have declined rapidly, going backwards in time. Prior to 1600 they are below 50%, and prior to 1100 they are below 20%. Even in the recent past, e.g. the 1800s, they are likely to be appreciably less than 100%.The assessment for very large (M5+) eruptions is more uncertain, due to the scarcity of events. Having taken under-recording into account the large-eruption rates of stratovolcanoes are modelled exchangeably, in order to derive an informative prior distribution as an input into a subsequent volcano-by-volcano hazard assessment. The statistical model implies that volcano-by-volcano predictions can be grouped by the number of recorded large eruptions. Further, it is possible to combine all volcanoes together into a global large eruption prediction, with an M4+ rate computed from the LaMEVE database of 0.57/yr.
Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I; Marcotte, Edward M
2011-07-01
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for every possible PSM and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for most proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses.
Kwon, Taejoon; Choi, Hyungwon; Vogel, Christine; Nesvizhskii, Alexey I.; Marcotte, Edward M.
2011-01-01
Shotgun proteomics using mass spectrometry is a powerful method for protein identification but suffers limited sensitivity in complex samples. Integrating peptide identifications from multiple database search engines is a promising strategy to increase the number of peptide identifications and reduce the volume of unassigned tandem mass spectra. Existing methods pool statistical significance scores such as p-values or posterior probabilities of peptide-spectrum matches (PSMs) from multiple search engines after high scoring peptides have been assigned to spectra, but these methods lack reliable control of identification error rates as data are integrated from different search engines. We developed a statistically coherent method for integrative analysis, termed MSblender. MSblender converts raw search scores from search engines into a probability score for all possible PSMs and properly accounts for the correlation between search scores. The method reliably estimates false discovery rates and identifies more PSMs than any single search engine at the same false discovery rate. Increased identifications increment spectral counts for all detected proteins and allow quantification of proteins that would not have been quantified by individual search engines. We also demonstrate that enhanced quantification contributes to improve sensitivity in differential expression analyses. PMID:21488652
Cervical lacerations in planned versus labor cerclage removal: a systematic review.
Simonazzi, Giuliana; Curti, Alessandra; Bisulli, Maria; Seravalli, Viola; Saccone, Gabriele; Berghella, Vincenzo
2015-10-01
The aim of this study was to evaluate the incidence of cervical lacerations with cerclage removal planned before labor compared to after the onset of labor by a systematic review of published studies. Searches were performed in electronic databases from inception of each database to November 2014. We identified all studies reporting the rate of cervical lacerations and the timing of cerclage removal (either before or after the onset of labor). The primary outcome was the incidence of spontaneous and clinically significant intrapartum cervical lacerations (i.e. lacerations requiring suturing). Six studies, which met the inclusion criteria, were included in the analysis. The overall incidence of cervical lacerations was 8.9% (32/359). There were 23/280 (6.4%) cervical lacerations in the planned removal group, and 9/79 (11.4%) in the removal after labor group (odds ratio 0.70, 95% confidence interval 0.31-1.57). In summary, planned removal of cerclage before labor was not shown to be associated with statistically significant reduction in the incidence of cervical lacerations. However, since that our data probably did not reach statistical significance because of a type II error, further studies are needed. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Ronza, A; Vílchez, J A; Casal, J
2007-07-19
Risk assessment of hazardous material spill scenarios, and quantitative risk assessment in particular, make use of event trees to account for the possible outcomes of hazardous releases. Using event trees entails the definition of probabilities of occurrence for events such as spill ignition and blast formation. This study comprises an extensive analysis of ignition and explosion probability data proposed in previous work. Subsequently, the results of the survey of two vast US federal spill databases (HMIRS, by the Department of Transportation, and MINMOD, by the US Coast Guard) are reported and commented on. Some tens of thousands of records of hydrocarbon spills were analysed. The general pattern of statistical ignition and explosion probabilities as a function of the amount and the substance spilled is discussed. Equations are proposed based on statistical data that predict the ignition probability of hydrocarbon spills as a function of the amount and the substance spilled. Explosion probabilities are put forth as well. Two sets of probability data are proposed: it is suggested that figures deduced from HMIRS be used in land transportation risk assessment, and MINMOD results with maritime scenarios assessment. Results are discussed and compared with previous technical literature.
Smith, W Brad; Cuenca Lara, Rubí Angélica; Delgado Caballero, Carina Edith; Godínez Valdivia, Carlos Isaías; Kapron, Joseph S; Leyva Reyes, Juan Carlos; Meneses Tovar, Carmen Lourdes; Miles, Patrick D; Oswalt, Sonja N; Ramírez Salgado, Mayra; Song, Xilong Alex; Stinson, Graham; Villela Gaytán, Sergio Armando
2018-05-21
Forests cannot be managed sustainably without reliable data to inform decisions. National Forest Inventories (NFI) tend to report national statistics, with sub-national stratification based on domestic ecological classification systems. It is becoming increasingly important to be able to report statistics on ecosystems that span international borders, as global change and globalization expand stakeholders' spheres of concern. The state of a transnational ecosystem can only be properly assessed by examining the entire ecosystem. In global forest resource assessments, it may be useful to break national statistics down by ecosystem, especially for large countries. The Inventory and Monitoring Working Group (IMWG) of the North American Forest Commission (NAFC) has begun developing a harmonized North American Forest Database (NAFD) for managing forest inventory data, enabling consistent, continental-scale forest assessment supporting ecosystem-level reporting and relational queries. The first iteration of the database contains data describing 1.9 billion ha, including 677.5 million ha of forest. Data harmonization is made challenging by the existence of definitions and methodologies tailored to suit national circumstances, emerging from each country's professional forestry development. This paper reports the methods used to synchronize three national forest inventories, starting with a small suite of variables and attributes.
Austvoll-Dahlgren, Astrid; Guttersrud, Øystein; Nsangi, Allen; Semakula, Daniel; Oxman, Andrew D
2017-05-25
The Claim Evaluation Tools database contains multiple-choice items for measuring people's ability to apply the key concepts they need to know to be able to assess treatment claims. We assessed items from the database using Rasch analysis to develop an outcome measure to be used in two randomised trials in Uganda. Rasch analysis is a form of psychometric testing relying on Item Response Theory. It is a dynamic way of developing outcome measures that are valid and reliable. To assess the validity, reliability and responsiveness of 88 items addressing 22 key concepts using Rasch analysis. We administrated four sets of multiple-choice items in English to 1114 people in Uganda and Norway, of which 685 were children and 429 were adults (including 171 health professionals). We scored all items dichotomously. We explored summary and individual fit statistics using the RUMM2030 analysis package. We used SPSS to perform distractor analysis. Most items conformed well to the Rasch model, but some items needed revision. Overall, the four item sets had satisfactory reliability. We did not identify significant response dependence between any pairs of items and, overall, the magnitude of multidimensionality in the data was acceptable. The items had a high level of difficulty. Most of the items conformed well to the Rasch model's expectations. Following revision of some items, we concluded that most of the items were suitable for use in an outcome measure for evaluating the ability of children or adults to assess treatment claims. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
FBIS: A regional DNA barcode archival & analysis system for Indian fishes
Nagpure, Naresh Sahebrao; Rashid, Iliyas; Pathak, Ajey Kumar; Singh, Mahender; Singh, Shri Prakash; Sarkar, Uttam Kumar
2012-01-01
DNA barcode is a new tool for taxon recognition and classification of biological organisms based on sequence of a fragment of mitochondrial gene, cytochrome c oxidase I (COI). In view of the growing importance of the fish DNA barcoding for species identification, molecular taxonomy and fish diversity conservation, we developed a Fish Barcode Information System (FBIS) for Indian fishes, which will serve as a regional DNA barcode archival and analysis system. The database presently contains 2334 sequence records of COI gene for 472 aquatic species belonging to 39 orders and 136 families, collected from available published data sources. Additionally, it contains information on phenotype, distribution and IUCN Red List status of fishes. The web version of FBIS was designed using MySQL, Perl and PHP under Linux operating platform to (a) store and manage the acquisition (b) analyze and explore DNA barcode records (c) identify species and estimate genetic divergence. FBIS has also been integrated with appropriate tools for retrieving and viewing information about the database statistics and taxonomy. It is expected that FBIS would be useful as a potent information system in fish molecular taxonomy, phylogeny and genomics. Availability The database is available for free at http://mail.nbfgr.res.in/fbis/ PMID:22715304
Determinants of Post-fire Water Quality in the Western United States
NASA Astrophysics Data System (ADS)
Rust, A.; Saxe, S.; Dolan, F.; Hogue, T. S.; McCray, J. E.
2015-12-01
Large wildfires are becoming increasingly common in the Western United States. Wildfires that consume greater than twenty percent of the watershed impact river water quality. The surface waters of the arid West are limited and in demand by the aquatic ecosystems, irrigated agriculture, and the region's growing human population. A range of studies, typically focused on individual fires, have observed mobilization of contaminants, nutrients (including nitrates), and sediments into receiving streams. Post-fire metal concentrations have also been observed to increase when fires were located in streams close to urban centers. The objective of this work was to assemble an extensive historical water quality database through data mining from federal, state and local agencies into a fire-database. Data from previous studies on individual fires by the co-authors was also included. The fire-database includes observations of water quality, discharge, geospatial and land characteristics from over 200 fire-impacted watersheds in the western U.S. since 1985. Water quality data from burn impacted watersheds was examined for trends in water quality response using statistical analysis. Watersheds where there was no change in water quality after fire were also examined to determine characteristics of the watershed that make it more resilient to fire. The ultimate goal is to evaluate trends in post-fire water quality response and identify key drivers of resiliency and post-fire response. The fire-database will eventually be publicly available.Large wildfires are becoming increasingly common in the Western United States. Wildfires that consume greater than twenty percent of the watershed impact river water quality. The surface waters of the arid West are limited and in demand by the aquatic ecosystems, irrigated agriculture, and the region's growing human population. A range of studies, typically focused on individual fires, have observed mobilization of contaminants, nutrients (including nitrates), and sediments into receiving streams. Post-fire metal concentrations have also been observed to increase when fires were located in streams close to urban centers. The objective of this work was to assemble an extensive historical water quality database through data mining from federal, state and local agencies into a fire-database. Data from previous studies on individual fires by the co-authors was also included. The fire-database includes observations of water quality, discharge, geospatial and land characteristics from over 200 fire-impacted watersheds in the western U.S. since 1985. Water quality data from burn impacted watersheds was examined for trends in water quality response using statistical analysis. Watersheds where there was no change in water quality after fire were also examined to determine characteristics of the watershed that make it more resilient to fire. The ultimate goal is to evaluate trends in post-fire water quality response and identify key drivers of resiliency and post-fire response. The fire-database will eventually be publicly available.
Rafiei, Atefeh; Sleno, Lekha
2015-01-15
Data analysis is a key step in mass spectrometry based untargeted metabolomics, starting with the generation of generic peak lists from raw liquid chromatography/mass spectrometry (LC/MS) data. Due to the use of various algorithms by different workflows, the results of different peak-picking strategies often differ widely. Raw LC/HRMS data from two types of biological samples (bile and urine), as well as a standard mixture of 84 metabolites, were processed with four peak-picking softwares: Peakview®, Markerview™, MetabolitePilot™ and XCMS Online. The overlaps between the results of each peak-generating method were then investigated. To gauge the relevance of peak lists, a database search using the METLIN online database was performed to determine which features had accurate masses matching known metabolites as well as a secondary filtering based on MS/MS spectral matching. In this study, only a small proportion of all peaks (less than 10%) were common to all four software programs. Comparison of database searching results showed peaks found uniquely by one workflow have less chance of being found in the METLIN metabolomics database and are even less likely to be confirmed by MS/MS. It was shown that the performance of peak-generating workflows has a direct impact on untargeted metabolomics results. As it was demonstrated that the peaks found in more than one peak detection workflow have higher potential to be identified by accurate mass as well as MS/MS spectrum matching, it is suggested to use the overlap of different peak-picking workflows as preliminary peak lists for more rugged statistical analysis in global metabolomics investigations. Copyright © 2014 John Wiley & Sons, Ltd.
Meta-analysis on the effectiveness of team-based learning on medical education in China.
Chen, Minjian; Ni, Chunhui; Hu, Yanhui; Wang, Meilin; Liu, Lu; Ji, Xiaoming; Chu, Haiyan; Wu, Wei; Lu, Chuncheng; Wang, Shouyu; Wang, Shoulin; Zhao, Liping; Li, Zhong; Zhu, Huijuan; Wang, Jianming; Xia, Yankai; Wang, Xinru
2018-04-10
Team-based learning (TBL) has been adopted as a new medical pedagogical approach in China. However, there are no studies or reviews summarizing the effectiveness of TBL on medical education. This study aims to obtain an overall estimation of the effectiveness of TBL on outcomes of theoretical teaching of medical education in China. We retrieved the studies from inception through December, 2015. Chinese National Knowledge Infrastructure, Chinese Biomedical Literature Database, Chinese Wanfang Database, Chinese Scientific Journal Database, PubMed, EMBASE and Cochrane Database were searched. The quality of included studies was assessed by the Newcastle-Ottawa scale. Standardized mean difference (SMD) was applied for the estimation of the pooled effects. Heterogeneity assumption was detected by I 2 statistics, and was further explored by meta-regression analysis. A total of 13 articles including 1545 participants eventually entered into the meta-analysis. The quality scores of these studies ranged from 6 to 10. Altogether, TBL significantly increased students' theoretical examination scores when compared with lecture-based learning (LBL) (SMD = 2.46, 95% CI: 1.53-3.40). Additionally, TBL significantly increased students' learning attitude (SMD = 3.23, 95% CI: 2.27-4.20), and learning skill (SMD = 2.70, 95% CI: 1.33-4.07). The meta-regression results showed that randomization, education classification and gender diversity were the factors that caused heterogeneity. TBL in theoretical teaching of medical education seems to be more effective than LBL in improving the knowledge, attitude and skill of students in China, providing evidence for the implement of TBL in medical education in China. The medical schools should implement TBL with the consideration on the practical teaching situations such as students' education level.
A spatial database of wildfires in the United States, 1992-2011
NASA Astrophysics Data System (ADS)
Short, K. C.
2013-07-01
The statistical analysis of wildfire activity is a critical component of national wildfire planning, operations, and research in the United States (US). However, there are multiple federal, state, and local entities with wildfire protection and reporting responsibilities in the US, and no single, unified system of wildfire record-keeping exists. To conduct even the most rudimentary interagency analyses of wildfire numbers and area burned from the authoritative systems of record, one must harvest records from dozens of disparate databases with inconsistent information content. The onus is then on the user to check for and purge redundant records of the same fire (i.e. multijurisdictional incidents with responses reported by several agencies or departments) after pooling data from different sources. Here we describe our efforts to acquire, standardize, error-check, compile, scrub, and evaluate the completeness of US federal, state, and local wildfire records from 1992-2011 for the national, interagency Fire Program Analysis (FPA) application. The resulting FPA Fire-occurrence Database (FPA FOD) includes nearly 1.6 million records from the 20 yr period, with values for at least the following core data elements: location at least as precise as a Public Land Survey System section (2.6 km2 grid), discovery date, and final fire size. The FPA FOD is publicly available from the Research Data Archive of the US Department of Agriculture, Forest Service (doi:10.2737/RDS-2013-0009). While necessarily incomplete in some aspects, the database is intended to facilitate fairly high-resolution geospatial analysis of US wildfire activity over the past two decades, based on available information from the authoritative systems of record.
A spatial database of wildfires in the United States, 1992-2011
NASA Astrophysics Data System (ADS)
Short, K. C.
2014-01-01
The statistical analysis of wildfire activity is a critical component of national wildfire planning, operations, and research in the United States (US). However, there are multiple federal, state, and local entities with wildfire protection and reporting responsibilities in the US, and no single, unified system of wildfire record keeping exists. To conduct even the most rudimentary interagency analyses of wildfire numbers and area burned from the authoritative systems of record, one must harvest records from dozens of disparate databases with inconsistent information content. The onus is then on the user to check for and purge redundant records of the same fire (i.e., multijurisdictional incidents with responses reported by several agencies or departments) after pooling data from different sources. Here we describe our efforts to acquire, standardize, error-check, compile, scrub, and evaluate the completeness of US federal, state, and local wildfire records from 1992-2011 for the national, interagency Fire Program Analysis (FPA) application. The resulting FPA Fire-Occurrence Database (FPA FOD) includes nearly 1.6 million records from the 20 yr period, with values for at least the following core data elements: location, at least as precise as a Public Land Survey System section (2.6 km2 grid), discovery date, and final fire size. The FPA FOD is publicly available from the Research Data Archive of the US Department of Agriculture, Forest Service (doi:10.2737/RDS-2013-0009). While necessarily incomplete in some aspects, the database is intended to facilitate fairly high-resolution geospatial analysis of US wildfire activity over the past two decades, based on available information from the authoritative systems of record.
Analysis of recreational closed-circuit rebreather deaths 1998-2010.
Fock, Andrew W
2013-06-01
Since the introduction of recreational closed-circuit rebreathers (CCRs) in 1998, there have been many recorded deaths. Rebreather deaths have been quoted to be as high as 1 in 100 users. Rebreather fatalities between 1998 and 2010 were extracted from the Deeplife rebreather mortality database, and inaccuracies were corrected where known. Rebreather absolute numbers were derived from industry discussions and training agency statistics. Relative numbers and brands were extracted from the Rebreather World website database and a Dutch rebreather survey. Mortality was compared with data from other databases. A fault-tree analysis of rebreathers was compared to that of open-circuit scuba of various configurations. Finally, a risk analysis was applied to the mortality database. The 181 recorded recreational rebreather deaths occurred at about 10 times the rate of deaths amongst open-circuit recreational scuba divers. No particular brand or type of rebreather was over-represented. Closed-circuit rebreathers have a 25-fold increased risk of component failure compared to a manifolded twin-cylinder open-circuit system. This risk can be offset by carrying a redundant 'bailout' system. Two-thirds of fatal dives were associated with a high-risk dive or high-risk behaviour. There are multiple points in the human-machine interface (HMI) during the use of rebreathers that can result in errors that may lead to a fatality. While rebreathers have an intrinsically higher risk of mechanical failure as a result of their complexity, this can be offset by good design incorporating redundancy and by carrying adequate 'bailout' or alternative gas sources for decompression in the event of a failure. Designs that minimize the chances of HMI errors and training that highlights this area may help to minimize fatalities.
Using Statistical Process Control to Make Data-Based Clinical Decisions.
ERIC Educational Resources Information Center
Pfadt, Al; Wheeler, Donald J.
1995-01-01
Statistical process control (SPC), which employs simple statistical tools and problem-solving techniques such as histograms, control charts, flow charts, and Pareto charts to implement continual product improvement procedures, can be incorporated into human service organizations. Examples illustrate use of SPC procedures to analyze behavioral data…
Low energy peripheral scaling in nucleon-nucleon scattering and uncertainty quantification
NASA Astrophysics Data System (ADS)
Ruiz Simo, I.; Amaro, J. E.; Ruiz Arriola, E.; Navarro Pérez, R.
2018-03-01
We analyze the peripheral structure of the nucleon-nucleon interaction for LAB energies below 350 MeV. To this end we transform the scattering matrix into the impact parameter representation by analyzing the scaled phase shifts (L + 1/2) δ JLS (p) and the scaled mixing parameters (L + 1/2)ɛ JLS (p) in terms of the impact parameter b = (L + 1/2)/p. According to the eikonal approximation, at large angular momentum L these functions should become an universal function of b, independent on L. This allows to discuss in a rather transparent way the role of statistical and systematic uncertainties in the different long range components of the two-body potential. Implications for peripheral waves obtained in chiral perturbation theory interactions to fifth order (N5LO) or from the large body of NN data considered in the SAID partial wave analysis are also drawn from comparing them with other phenomenological high-quality interactions, constructed to fit scattering data as well. We find that both N5LO and SAID peripheral waves disagree more than 5σ with the Granada-2013 statistical analysis, more than 2σ with the 6 statistically equivalent potentials fitting the Granada-2013 database and about 1σ with the historical set of 13 high-quality potentials developed since the 1993 Nijmegen analysis.
Woo, Jason R; Shikanov, Sergey; Zorn, Kevin C; Shalhav, Arieh L; Zagaja, Gregory P
2009-12-01
Posterior rhabdosphincter (PR) reconstruction during robot-assisted radical prostatectomy (RARP) was introduced in an attempt to improve postoperative continence. In the present study, we evaluate time to achieve continence in patients who are undergoing RARP with and without PR reconstruction. A prospective RARP database was searched for most recent cases that were accomplished with PR reconstruction (group 1, n = 69) or with standard technique (group 2, n = 63). We performed the analysis applying two definitions of continence: 0 pads per day or 0-1 security pad per day. Patients were evaluated by telephone interview. Statistical analysis was carried out using the Kaplan-Meier method and log-rank test. With PR reconstruction, continence was improved when defined as 0-1 security pad per day (median time of 90 vs 150 days; P = 0.01). This difference did not achieve statistical significance when continence was defined as 0 pads per day (P = 0.12). A statistically significant improvement in continence rate and time to achieve continence is seen in patients who are undergoing PR reconstruction during RARP, with continence defined as 0-1 security/safety pad per day. A larger, prospective and randomized study is needed to better understand the impact of this technique on postoperative continence.
Statistical analysis of magnetically soft particles in magnetorheological elastomers
NASA Astrophysics Data System (ADS)
Gundermann, T.; Cremer, P.; Löwen, H.; Menzel, A. M.; Odenbach, S.
2017-04-01
The physical properties of magnetorheological elastomers (MRE) are a complex issue and can be influenced and controlled in many ways, e.g. by applying a magnetic field, by external mechanical stimuli, or by an electric potential. In general, the response of MRE materials to these stimuli is crucially dependent on the distribution of the magnetic particles inside the elastomer. Specific knowledge of the interactions between particles or particle clusters is of high relevance for understanding the macroscopic rheological properties and provides an important input for theoretical calculations. In order to gain a better insight into the correlation between the macroscopic effects and microstructure and to generate a database for theoretical analysis, x-ray micro-computed tomography (X-μCT) investigations as a base for a statistical analysis of the particle configurations were carried out. Different MREs with quantities of 2-15 wt% (0.27-2.3 vol%) of iron powder and different allocations of the particles inside the matrix were prepared. The X-μCT results were edited by an image processing software regarding the geometrical properties of the particles with and without the influence of an external magnetic field. Pair correlation functions for the positions of the particles inside the elastomer were calculated to statistically characterize the distributions of the particles in the samples.
Solomon, Patricia J; Kasza, Jessica; Moran, John L
2014-04-22
The Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database (APD) collects voluntary data on patient admissions to Australian and New Zealand intensive care units (ICUs). This paper presents an in-depth statistical analysis of risk-adjusted mortality of ICU admissions from 2000 to 2010 for the purpose of identifying ICUs with unusual performance. A cohort of 523,462 patients from 144 ICUs was analysed. For each ICU, the natural logarithm of the standardised mortality ratio (log-SMR) was estimated from a risk-adjusted, three-level hierarchical model. This is the first time a three-level model has been fitted to such a large ICU database anywhere. The analysis was conducted in three stages which included the estimation of a null distribution to describe usual ICU performance. Log-SMRs with appropriate estimates of standard errors are presented in a funnel plot using 5% false discovery rate thresholds. False coverage-statement rate confidence intervals are also presented. The observed numbers of deaths for ICUs identified as unusual are compared to the predicted true worst numbers of deaths under the model for usual ICU performance. Seven ICUs were identified as performing unusually over the period 2000 to 2010, in particular, demonstrating high risk-adjusted mortality compared to the majority of ICUs. Four of the seven were ICUs in private hospitals. Our three-stage approach to the analysis detected outlying ICUs which were not identified in a conventional (single) risk-adjusted model for mortality using SMRs to compare ICUs. We also observed a significant linear decline in mortality over the decade. Distinct yearly and weekly respiratory seasonal effects were observed across regions of Australia and New Zealand for the first time. The statistical approach proposed in this paper is intended to be used for the review of observed ICU and hospital mortality. Two important messages from our study are firstly, that comprehensive risk-adjustment is essential in modelling patient mortality for comparing performance, and secondly, that the appropriate statistical analysis is complicated.
Identifying unusual performance in Australian and New Zealand intensive care units from 2000 to 2010
2014-01-01
Background The Australian and New Zealand Intensive Care Society (ANZICS) Adult Patient Database (APD) collects voluntary data on patient admissions to Australian and New Zealand intensive care units (ICUs). This paper presents an in-depth statistical analysis of risk-adjusted mortality of ICU admissions from 2000 to 2010 for the purpose of identifying ICUs with unusual performance. Methods A cohort of 523,462 patients from 144 ICUs was analysed. For each ICU, the natural logarithm of the standardised mortality ratio (log-SMR) was estimated from a risk-adjusted, three-level hierarchical model. This is the first time a three-level model has been fitted to such a large ICU database anywhere. The analysis was conducted in three stages which included the estimation of a null distribution to describe usual ICU performance. Log-SMRs with appropriate estimates of standard errors are presented in a funnel plot using 5% false discovery rate thresholds. False coverage-statement rate confidence intervals are also presented. The observed numbers of deaths for ICUs identified as unusual are compared to the predicted true worst numbers of deaths under the model for usual ICU performance. Results Seven ICUs were identified as performing unusually over the period 2000 to 2010, in particular, demonstrating high risk-adjusted mortality compared to the majority of ICUs. Four of the seven were ICUs in private hospitals. Our three-stage approach to the analysis detected outlying ICUs which were not identified in a conventional (single) risk-adjusted model for mortality using SMRs to compare ICUs. We also observed a significant linear decline in mortality over the decade. Distinct yearly and weekly respiratory seasonal effects were observed across regions of Australia and New Zealand for the first time. Conclusions The statistical approach proposed in this paper is intended to be used for the review of observed ICU and hospital mortality. Two important messages from our study are firstly, that comprehensive risk-adjustment is essential in modelling patient mortality for comparing performance, and secondly, that the appropriate statistical analysis is complicated. PMID:24755369
Determining Faculty Staffing Using Lotus 1-2-3.
ERIC Educational Resources Information Center
Ebner, Stanley G.
1987-01-01
Discusses how to manipulate a database to create a spreadsheet which can be used to help decide which teaching areas are understaffed and by how much. Focuses on the use of the Lotus 1-2-3 database statistical functions. (TW)
Scholl, Joep H G; van Hunsel, Florence P A M; Hak, Eelko; van Puijenbroek, Eugène P
2018-02-01
The statistical screening of pharmacovigilance databases containing spontaneously reported adverse drug reactions (ADRs) is mainly based on disproportionality analysis. The aim of this study was to improve the efficiency of full database screening using a prediction model-based approach. A logistic regression-based prediction model containing 5 candidate predictors was developed and internally validated using the Summary of Product Characteristics as the gold standard for the outcome. All drug-ADR associations, with the exception of those related to vaccines, with a minimum of 3 reports formed the training data for the model. Performance was based on the area under the receiver operating characteristic curve (AUC). Results were compared with the current method of database screening based on the number of previously analyzed associations. A total of 25 026 unique drug-ADR associations formed the training data for the model. The final model contained all 5 candidate predictors (number of reports, disproportionality, reports from healthcare professionals, reports from marketing authorization holders, Naranjo score). The AUC for the full model was 0.740 (95% CI; 0.734-0.747). The internal validity was good based on the calibration curve and bootstrapping analysis (AUC after bootstrapping = 0.739). Compared with the old method, the AUC increased from 0.649 to 0.740, and the proportion of potential signals increased by approximately 50% (from 12.3% to 19.4%). A prediction model-based approach can be a useful tool to create priority-based listings for signal detection in databases consisting of spontaneous ADRs. © 2017 The Authors. Pharmacoepidemiology & Drug Safety Published by John Wiley & Sons Ltd.
Dziadkowiec, Oliwier; Callahan, Tiffany; Ozkaynak, Mustafa; Reeder, Blaine; Welton, John
2016-01-01
Objectives: We examine the following: (1) the appropriateness of using a data quality (DQ) framework developed for relational databases as a data-cleaning tool for a data set extracted from two EPIC databases, and (2) the differences in statistical parameter estimates on a data set cleaned with the DQ framework and data set not cleaned with the DQ framework. Background: The use of data contained within electronic health records (EHRs) has the potential to open doors for a new wave of innovative research. Without adequate preparation of such large data sets for analysis, the results might be erroneous, which might affect clinical decision-making or the results of Comparative Effectives Research studies. Methods: Two emergency department (ED) data sets extracted from EPIC databases (adult ED and children ED) were used as examples for examining the five concepts of DQ based on a DQ assessment framework designed for EHR databases. The first data set contained 70,061 visits; and the second data set contained 2,815,550 visits. SPSS Syntax examples as well as step-by-step instructions of how to apply the five key DQ concepts these EHR database extracts are provided. Conclusions: SPSS Syntax to address each of the DQ concepts proposed by Kahn et al. (2012)1 was developed. The data set cleaned using Kahn’s framework yielded more accurate results than the data set cleaned without this framework. Future plans involve creating functions in R language for cleaning data extracted from the EHR as well as an R package that combines DQ checks with missing data analysis functions. PMID:27429992
Trevarton, Alexander J.; Mann, Michael B.; Knapp, Christoph; Araki, Hiromitsu; Wren, Jonathan D.; Stones-Havas, Steven; Black, Michael A.; Print, Cristin G.
2013-01-01
Despite on-going research, metastatic melanoma survival rates remain low and treatment options are limited. Researchers can now access a rapidly growing amount of molecular and clinical information about melanoma. This information is becoming difficult to assemble and interpret due to its dispersed nature, yet as it grows it becomes increasingly valuable for understanding melanoma. Integration of this information into a comprehensive resource to aid rational experimental design and patient stratification is needed. As an initial step in this direction, we have assembled a web-accessible melanoma database, MelanomaDB, which incorporates clinical and molecular data from publically available sources, which will be regularly updated as new information becomes available. This database allows complex links to be drawn between many different aspects of melanoma biology: genetic changes (e.g., mutations) in individual melanomas revealed by DNA sequencing, associations between gene expression and patient survival, data concerning drug targets, biomarkers, druggability, and clinical trials, as well as our own statistical analysis of relationships between molecular pathways and clinical parameters that have been produced using these data sets. The database is freely available at http://genesetdb.auckland.ac.nz/melanomadb/about.html. A subset of the information in the database can also be accessed through a freely available web application in the Illumina genomic cloud computing platform BaseSpace at http://www.biomatters.com/apps/melanoma-profiler-for-research. The MelanomaDB database illustrates dysregulation of specific signaling pathways across 310 exome-sequenced melanomas and in individual tumors and identifies the distribution of somatic variants in melanoma. We suggest that MelanomaDB can provide a context in which to interpret the tumor molecular profiles of individual melanoma patients relative to biological information and available drug therapies. PMID:23875173
The Astrobiology Habitable Environments Database (AHED)
NASA Astrophysics Data System (ADS)
Lafuente, B.; Stone, N.; Downs, R. T.; Blake, D. F.; Bristow, T.; Fonda, M.; Pires, A.
2015-12-01
The Astrobiology Habitable Environments Database (AHED) is a central, high quality, long-term searchable repository for archiving and collaborative sharing of astrobiologically relevant data, including, morphological, textural and contextural images, chemical, biochemical, isotopic, sequencing, and mineralogical information. The aim of AHED is to foster long-term innovative research by supporting integration and analysis of diverse datasets in order to: 1) help understand and interpret planetary geology; 2) identify and characterize habitable environments and pre-biotic/biotic processes; 3) interpret returned data from present and past missions; 4) provide a citable database of NASA-funded published and unpublished data (after an agreed-upon embargo period). AHED uses the online open-source software "The Open Data Repository's Data Publisher" (ODR - http://www.opendatarepository.org) [1], which provides a user-friendly interface that research teams or individual scientists can use to design, populate and manage their own database according to the characteristics of their data and the need to share data with collaborators or the broader scientific community. This platform can be also used as a laboratory notebook. The database will have the capability to import and export in a variety of standard formats. Advanced graphics will be implemented including 3D graphing, multi-axis graphs, error bars, and similar scientific data functions together with advanced online tools for data analysis (e. g. the statistical package, R). A permissions system will be put in place so that as data are being actively collected and interpreted, they will remain proprietary. A citation system will allow research data to be used and appropriately referenced by other researchers after the data are made public. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, Mars Science Laboratory Investigations. [1] Nate et al. (2015) AGU, submitted.
Implementation of a Big Data Accessing and Processing Platform for Medical Records in Cloud.
Yang, Chao-Tung; Liu, Jung-Chun; Chen, Shuo-Tsung; Lu, Hsin-Wen
2017-08-18
Big Data analysis has become a key factor of being innovative and competitive. Along with population growth worldwide and the trend aging of population in developed countries, the rate of the national medical care usage has been increasing. Due to the fact that individual medical data are usually scattered in different institutions and their data formats are varied, to integrate those data that continue increasing is challenging. In order to have scalable load capacity for these data platforms, we must build them in good platform architecture. Some issues must be considered in order to use the cloud computing to quickly integrate big medical data into database for easy analyzing, searching, and filtering big data to obtain valuable information.This work builds a cloud storage system with HBase of Hadoop for storing and analyzing big data of medical records and improves the performance of importing data into database. The data of medical records are stored in HBase database platform for big data analysis. This system performs distributed computing on medical records data processing through Hadoop MapReduce programming, and to provide functions, including keyword search, data filtering, and basic statistics for HBase database. This system uses the Put with the single-threaded method and the CompleteBulkload mechanism to import medical data. From the experimental results, we find that when the file size is less than 300MB, the Put with single-threaded method is used and when the file size is larger than 300MB, the CompleteBulkload mechanism is used to improve the performance of data import into database. This system provides a web interface that allows users to search data, filter out meaningful information through the web, and analyze and convert data in suitable forms that will be helpful for medical staff and institutions.
Atlantic Hurricane Activity: 1851-1900
NASA Astrophysics Data System (ADS)
Landsea, C. W.
2001-12-01
This presentation reports on the second year's work of a three year project to re-analyze the North Atlantic hurricane database (or HURDAT). The original database of six-hourly positions and intensities were put together in the 1960s in support of the Apollo space program to help provide statistical track forecast guidance. In the intervening years, this database - which is now freely and easily accessible on the Internet from the National Hurricane Center's (NHC's) Webpage - has been utilized for a wide variety of uses: climatic change studies, seasonal forecasting, risk assessment for county emergency managers, analysis of potential losses for insurance and business interests, intensity forecasting techniques and verification of official and various model predictions of track and intensity. Unfortunately, HURDAT was not designed with all of these uses in mind when it was first put together and not all of them may be appropriate given its original motivation. One problem with HURDAT is that there are numerous systematic as sell as some random errors in the database which need correction. Additionally, analysis techniques have changed over the years at NHC as our understanding of tropical cyclones has developed, leading to biases in the historical database that have not been addressed. Another difficulty in applying the hurricane database to studies concerned with landfalling events is the lack exact location, time and intensity at hurricane landfall. Finally, recent efforts into uncovering undocumented historical hurricanes in the late 1800s and early 1900s led by Jose Fernandez-Partagas have greatly increased our knowledge of these past events, which are not yet incorporated into the HURDAT database. Because of all of these issues, a re-analysis of the Atlantic hurricane database is being attempted that will be completed in three years. As part of the re-analyses, three files will be made available: {* } The revised Atlantic HURDAT (with six hourly intensities & positions) {* }{* } HURDAT meta-file: A text file with detailed information about each suggested change proposed in the revised HURDAT. {* }{* }{* } A ``center fix" file: This file is composed of actual observations of tropical cyclone positions and intensity estimates from the following platforms: aircraft, satellite, radar, and synoptic. All changes made to HURDAT will be approved by a NHC Committee as this database is one that is officially maintained by them. At the conference, results will be shown including a revised climatology of U.S. hurricane strikes back to 1851. >http://www.aoml.noaa.gov/hrd/hurdat/index.html
Integrated database for rapid mass movements in Norway
NASA Astrophysics Data System (ADS)
Jaedicke, C.; Lied, K.; Kronholm, K.
2009-03-01
Rapid gravitational slope mass movements include all kinds of short term relocation of geological material, snow or ice. Traditionally, information about such events is collected separately in different databases covering selected geographical regions and types of movement. In Norway the terrain is susceptible to all types of rapid gravitational slope mass movements ranging from single rocks hitting roads and houses to large snow avalanches and rock slides where entire mountainsides collapse into fjords creating flood waves and endangering large areas. In addition, quick clay slides occur in desalinated marine sediments in South Eastern and Mid Norway. For the authorities and inhabitants of endangered areas, the type of threat is of minor importance and mitigation measures have to consider several types of rapid mass movements simultaneously. An integrated national database for all types of rapid mass movements built around individual events has been established. Only three data entries are mandatory: time, location and type of movement. The remaining optional parameters enable recording of detailed information about the terrain, materials involved and damages caused. Pictures, movies and other documentation can be uploaded into the database. A web-based graphical user interface has been developed allowing new events to be entered, as well as editing and querying for all events. An integration of the database into a GIS system is currently under development. Datasets from various national sources like the road authorities and the Geological Survey of Norway were imported into the database. Today, the database contains 33 000 rapid mass movement events from the last five hundred years covering the entire country. A first analysis of the data shows that the most frequent type of recorded rapid mass movement is rock slides and snow avalanches followed by debris slides in third place. Most events are recorded in the steep fjord terrain of the Norwegian west coast, but major events are recorded all over the country. Snow avalanches account for most fatalities, while large rock slides causing flood waves and huge quick clay slides are the most damaging individual events in terms of damage to infrastructure and property and for causing multiple fatalities. The quality of the data is strongly influenced by the personal engagement of local observers and varying observation routines. This database is a unique source for statistical analysis including, risk analysis and the relation between rapid mass movements and climate. The database of rapid mass movement events will also facilitate validation of national hazard and risk maps.
A VBA Desktop Database for Proposal Processing at National Optical Astronomy Observatories
NASA Astrophysics Data System (ADS)
Brown, Christa L.
National Optical Astronomy Observatories (NOAO) has developed a relational Microsoft Windows desktop database using Microsoft Access and the Microsoft Office programming language, Visual Basic for Applications (VBA). The database is used to track data relating to observing proposals from original receipt through the review process, scheduling, observing, and final statistical reporting. The database has automated proposal processing and distribution of information. It allows NOAO to collect and archive data so as to query and analyze information about our science programs in new ways.
Hu, Xiangdong; Liu, Yujiang; Qian, Linxue
2017-10-01
Real-time elastography (RTE) and shear wave elastography (SWE) are noninvasive and easily available imaging techniques that measure the tissue strain, and it has been reported that the sensitivity and the specificity of elastography were better in differentiating between benign and malignant thyroid nodules than conventional technologies. Relevant articles were searched in multiple databases; the comparison of elasticity index (EI) was conducted with the Review Manager 5.0. Forest plots of the sensitivity and specificity and SROC curve of RTE and SWE were performed with STATA 10.0 software. In addition, sensitivity analysis and bias analysis of the studies were conducted to examine the quality of articles; and to estimate possible publication bias, funnel plot was used and the Egger test was conducted. Finally 22 articles which eventually satisfied the inclusion criteria were included in this study. After eliminating the inefficient, benign and malignant nodules were 2106 and 613, respectively. The meta-analysis suggested that the difference of EI between benign and malignant nodules was statistically significant (SMD = 2.11, 95% CI [1.67, 2.55], P < .00001). The overall sensitivities of RTE and SWE were roughly comparable, whereas the difference of specificities between these 2 methods was statistically significant. In addition, statistically significant difference of AUC between RTE and SWE was observed between RTE and SWE (P < .01). The specificity of RTE was statistically higher than that of SWE; which suggests that compared with SWE, RTE may be more accurate on differentiating benign and malignant thyroid nodules.
The Development of Vocational Vehicle Drive Cycles and Segmentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Duran, Adam W.; Phillips, Caleb T.; Konan, Arnaud M.
Under a collaborative interagency agreement between the U.S. Environmental Protection Agency and the U.S Department of Energy (DOE), the National Renewable Energy Laboratory (NREL) performed a series of in-depth analyses to characterize the on-road driving behavior including distributions of vehicle speed, idle time, accelerations and decelerations, and other driving metrics of medium- and heavy-duty vocational vehicles operating within the United States. As part of this effort, NREL researchers segmented U.S. medium- and heavy-duty vocational vehicle driving characteristics into three distinct operating groups or clusters using real world drive cycle data collected at 1 Hz and stored in NREL's Fleet DNAmore » database. The Fleet DNA database contains millions of miles of historical real-world drive cycle data captured from medium- and heavy vehicles operating across the United States. The data encompass data from existing DOE activities as well as contributions from valued industry stakeholder participants. For this project, data captured from 913 unique vehicles comprising 16,250 days of operation were drawn from the Fleet DNA database and examined. The Fleet DNA data used as a source for this analysis has been collected from a total of 30 unique fleets/data providers operating across 22 unique geographic locations spread across the United States. This includes locations with topology ranging from the foothills of Denver, Colorado, to the flats of Miami, Florida. The range of fleets, geographic locations, and total number of vehicles analyzed ensures results that include the influence of these factors. While no analysis will be perfect without unlimited resources and data, it is the researchers understanding that the Fleet DNA database is the largest and most thorough publicly accessible vocational vehicle usage database currently in operation. This report includes an introduction to the Fleet DNA database and the data contained within, a presentation of the results of the statistical analysis performed by NREL, review of the logistic model developed to predict cluster membership, and a discussion and detailed summary of the development of the vocational drive cycle weights and representative transient drive cycles for testing and simulation. Additional discussion of known limitations and potential future work are also included in the report content.« less
NASA Technical Reports Server (NTRS)
Decker, Ryan K.; Burns, Lee; Merry, Carl; Harrington, Brian
2008-01-01
Atmospheric parameters are essential in assessing the flight performance of aerospace vehicles. The effects of the Earth's atmosphere on aerospace vehicles influence various aspects of the vehicle during ascent ranging from its flight trajectory to the structural dynamics and aerodynamic heatmg on the vehicle. Atmospheric databases charactenzing the wind and thermodynamic environments, known as Range Reference Atmospheres (RRA), have been developed at space launch ranges by a governmental interagency working group for use by aerospace vehicle programs. The National Aeronantics and Space Administration's (NASA) Space Shuttle Program (SSP), which launches from Kennedy Space Center, utilizes atmosphenc statistics derived from the Cape Canaveral Air Force Station Range Reference Atmosphere (CCAFS RRA) database to evaluate environmental constraints on various aspects of the vehlcle during ascent.